Lab 5 (due 2/8)
Preliminaries
These instructions might get edited a bit over the next
couple of days. I'll try to flag changes.
As usual, check the write up instructions first.
Requirements for this assignment
- 0. Make sure you have a baseline test suite corresponding
to your lab 4 grammar.
- 1. Add adjectival and adverbial modifiers.
- 2. If your language has agreement between
adjectives and head nouns, implement the appropriate lexical rules
for adjectives to model this.
- 3. Add demontratives and markers of definiteness (if any).
- 4. Add rules allowing for optional arguments (argument drop).
- 5. Test your grammar using [incr tsdb()].
[incr tsdb()] should be part of your test-development
cycle. In addition, you'll need to run a final test suite instance
for this lab to submit along with your basline.
- 6. Write up the phenomena you have analyzed.
Modification
Head-modifier rules
The Matrix distinguishes scopal from intersective modification.
We're going to pretend that everything is intersective and
just not worry about the scopal guys for now.
- Create an instance of head-adj-int-phrase, an
instance of adj-head-int-phrase, or both, depending on
whether you need only prehead modifiers, only posthead modifiers,
or both. (You may already have some of these, depending on
what you said about negation in the customization system.) To
do this, add the following to rules.tdl:
head-adj-int := head-adj-int-phrase.
adj-head-int := adj-head-int-phrase.
- Try parsing a sentence without a modifier, and examine
the parse chart. Did the head-adj phrase fire? If so,
Constrain your non-adj/adv subtypes of head. You can do
this by adding the following to your my_language.tdl.
+nvcdmo :+ [ MOD < > ].
That adds the constraint [MOD < > ] to the
type +nvcdmo, which is the supertype of all the head
types other than adj and adv.
- Try parsing the misbehaving sentence again.
Adjectives
- Create a type adjective-lex which inherits from
basic-adjective-lex. The following type works for English
assuming that:
- We're not worried about predicative adjectives or adjectives
taking complements for now.
- English has both pre-head and post-head modifiers, (head-adj and adj-head), but simple adjectives
are (almost) always prehead (hence the value of POSTHEAD).
- We're only dealing with intersective adjectives (as stipulated).
adjective-lex := basic-adjective-lex & intersective-mod-lex &
norm-ltop-lex-item &
[ SYNSEM [ LOCAL [ CAT [ HEAD.MOD < [ LOCAL.CAT [ HEAD noun,
VAL.SPR cons ]]>,
VAL [ SPR < >,
SUBJ < >,
COMPS < >,
SPEC < > ],
POSTHEAD - ]]]].
- If adjectives display agreement in your language, you'll
be adding that information to the MOD value in agreement below.
For now, leave it underspecified (this will cause your grammar
to overgenerate).
- Create one or more adjective instances.
- Parse sentences with your adjectives, and examine the MRSs.
Are the adjective relations being predicated of the right indices?
Adverbs
- Create one or more types for adverbs. The following type
definition inherits from appropriately-defined Matrix supertypes,
and constrains the modified constituent to be verbal.
adverb-lex := basic-adverb-lex & intersective-mod-lex &
[ SYNSEM [ LOCAL [ CAT [ HEAD.MOD < [ LOCAL.CAT.HEAD verb ]>,
VAL [ SPR < >,
SUBJ < >,
COMPS < >,
SPEC < > ]]]]].
- Try parsing a sentence with an adverb and then generating
to see where else the adverb can show up. If you language allows
multiple attachment sites for the adverb, admire the results.
If it doesn't, or doesn't allow *that* many, constrain them further.
- In order to constrain the possible attachment sites
for adverbs, you may need to constrain the value of POSTHEAD,
or the value of SPR inside MOD or the value of LIGHT inside MOD. ([LIGHT +] picks out lexical Vs, including both transitive and intransitive ones.)
- Parse sentences with your adverbs, and examing the MRSs.
Are the adverb relations being predicated of the right indices?
Adjective Agreement
To model adjective agreement, you'll probably want
to write lexical rules that inflect the adjectives and
constrain the features inside the MOD value so that each
inflected adjective can only modify the right kind of nouns.
Below is some general information on writing lexical rules.
Please also refer to the lexical rules emitted by the
customization system. Adjective agreement lexical rules
should be of the "add only" type. Note that if you have an
apparently uninflected form, you'll need to make sure it
goes through a constant lexical rule (no spelling change)
which fills in the relevant feature values.
Lexical rules
- Pick a supertype for your rule:
- Determine whether your lexical rule needs to change SYNSEM
information, or just add to it. (Examples: If the input has a
non-empty SPR list and the output has an empty SPR list, that's
changing information. If the input has no value
specified for CASE and the output is [CASE nom], that's just adding
information.)
- Determine whether your lexical rule creates fully inflected
forms, or whether there's more inflection you'd like to stack
on top of it.
- Rules creating fully inflected forms and only adding
information to SYNSEM can inherit from infl-ltow-rule.
- Rules creating not-yet fully inflected forms and only adding
information should inherit
from infl-add-only-no-ccont-ltol-rule.
- If your rule needs to change the SYNSEM value, determine
which part of SYNSEM is changing (e.g., VAL only, HEAD only,
CAT only) and choose an appropriate type out of the types
called infl-***-change-only-ltol-rule. Unless you're
adding any relations, your rule should also inherit from
no-ccont-lex-rule. I expect most lexical
rules created for this lab to be of the add-only variety, rather
than changing information.
- Define a rule type in my_language.tdl which contains
all of the information about your rule except the spelling changes.
The value of DTR should be specific enough to constrain the
rule to only applying to the right type of inputs. The value of
SYNSEM should include the primary information contributed by the
rule (e.g., the constraints on MOD reflected in the agreement morphology).
If you aren't using one of the "add only" types, then you need to
be sure that the rest of the SYNSEM value is sufficiently constrained.
Here's an example from English (where
the value of SYNSEM ends up being very specific since all the
information from the daughter is also in the mother):
3sg_verb-lex-rule := infl-ltow-rule &
[ SYNSEM.LOCAL.CAT.VAL.SUBJ < [ LOCAL.CONT.HOOK.INDEX.PNG [ PER third,
NUM sg ]] >,
DTR verb-lex ].
- If you have multiple rules applying to the same form, constrain
the innermost (rightmost prefix or leftmost suffix) to take (some
subtype of) lex-item as its DTR. The next one to apply to
should take the first rule as its DTR, etc. If multiple rules can
appear in one slot, define a supertype for them which can be the DTR
of the next rule type out.
- Define an instance of the rule type in irules.tdl. This
instance should give the spelling change subrules on a line beginning
with %prefix or %suffix. Assuming you're working
from regularized morphophonology, these should be simple concatenation,
of the form (* pref) or (* suff).
- A slightly more complicated example from English (without
regularized morphophonology) follows. After %suffix there is
a list of pairs in which the first member matches the input form
and the second member describes the output form. * matches the empty
string. ! signals a letter-set. More specific subrules to the
right.
3sg_verb :=
%suffix (!s !ss) (!ss !ssses) (ss sses)
3sg_verb-lex-rule.
And here's the letter set that's used:
%(letter-set (!s abcedfghijklmnopqrtuvwxyz))
- Make sure that your lexical entries give the stem
instead of the inflected word (i.e., so that your lexical rule
can do the work). Be sure that the lexical type says
[INFLECTED -] if the rule is obligatory. (And note
that lexical rules for agreement usually are.)
- Test your grammar. Does the lexical rule apply to the
words it should apply to? Does it apply to words it shouldn't apply to?
The basics
We are modeling the cognitive status attributed to discourse
referents by particular referring expressions through a pair of
features COG-ST and SPECI on ref-ind (the
value of INDEX for nouns). Here is our first-pass guess
at the cognitive status associated with various types of overt
expressions (for dropped arguments, see below):
Marker | COG-ST value | SPECI value |
Personal pronoun | activ-or-more | + |
Demonstrative article/adjective | activ+fam | |
Definite article/inflection | uniq+fam+act | |
Indefinite article/inflection | type-id | |
If you have any overt personal pronouns, constrain their INDEX values
to be [COG-ST activ-or-more, SPECI + ].
If you have any determiners which mark definitness, have them
constrain the COG-ST of their SPEC appropriately.
For demonstrative determiners, see below.
If you have any nominal inflections associated with discourse
status, implement lexical rules which add them and constrain the
COG-ST value appropriately.
Note that in some cases an unmarked form is underspecified,
where in others it stands in contrast to a marked form. You should
figure out which is the case for any unmarked forms in your language
(e.g., bare NPs in a language with determiners, unmarked nouns
in a language with definiteness markers), and constrain the unmarked
forms appropriately. For bare NPs, the place to do this is the bare
NP rule (note that you might have to create separate bare NP rules
for pronouns v. common nouns in this case). For definiteness affixes,
you'll want a constant-lex-rule that constrains COG-ST, and
that is parallel to the inflecting-lex-rule that adds the affix for
the overtly marked case.
Some languages have agreement for definiteness on adjectives.
In this case, you'll want to add lexical rules for adjectives that
constrain the COG-ST of the item on their MOD list.
Demonstratives
All demonstratives (determiners, adjectives and pronouns [not
on the todo list this year]) will share a set of relations
which express the proximity to hearer and speaker. We will arrange
these relations into a hierarchy so that languages with just a one- or
two-way distinction can be more easily mapped to languages with a two-
or three-way distinction. In order to do this, we're using
types for these PRED values rather than strings. Note the
absence of quotation marks. We will treat the demonstrative relations
as adjectival relations, no matter how they are introduced (via
pronouns, determiners, or quantifiers).
There are (at least) two different types of three-way distinctions.
Here are two of them. Let me know if your language isn't modeled by either.
demonstrative_a_rel := predsort.
proximal+dem_a_rel := demonstrative_a_rel. ; close to speaker
distal+dem_a_rel := demonstrative_a_rel. ; away from speaker
remote+dem_a_rel := distal+dem_a_rel. ; away from speaker and hearer
hearer+dem_a_rel := distal+dem_a_rel. ; near hearer
demonstrative_a_rel := predsort.
proximal+dem_a_rel := demonstrative_a_rel. ; close to speaker
distal+dem_a_rel := demonstrative_a_rel. ; away from speaker
mid+dem_a_rel := distal+dem_a_rel. ; away, but not very far away
far+dem_a_rel := distal+dem_a_rel. ; very far away
Demonstrative adjectives
Demonstrative adjectives come out as the easy case in this system.
They are just like regular adjectives, except that in addition to
introducing a relation whose PRED value is one of the subtypes of
demonstrative_a_rel defined above, they also constrain the INDEX.COG-ST of their MOD value to be activ+fam.
Demonstrative determiners
Demonstrative determiners introduce
two relations. This time, they are introducing the quantifier relation
(Let's say "exist_q_rel") and the demonstrative relation.
This analysis entails changes to the Matrix core, as
basic-determiner-lex assumes just one relation being contributed.
Accordingly, we are going to by-pass the current version of basic-determiner-lex and define instead determiner-lex-supertype as follows:
determiner-lex-supertype := norm-hook-lex-item & basic-zero-arg &
[ SYNSEM [ LOCAL [ CAT [ HEAD det,
VAL[ SPEC.FIRST.LOCAL.CONT.HOOK [ INDEX #ind,
LTOP #larg ],
SPR < >,
SUBJ < >,
COMPS < >]],
CONT.HCONS < ! qeq &
[ HARG #harg,
LARG #larg ] ! > ],
LKEYS.KEYREL quant-relation &
[ ARG0 #ind,
RSTR #harg ] ] ].
This type should have two subtypes (assuming you have demonstrative
determiners as well as others in your language --- otherwise, just incorporate
the constraints for demonstrative determiners into the type above).
- The subtype for ordinary (non-demonstrative) determiners should add
the constraint that the RELS list has exactly one thing on it, by
adding the supertype single-rel-lex-item.
- The subtype for demonstrative determiners should specify a RELS
list with two things on it: the first should have the
"exist_q_rel" for its PRED value. (It's already
constrained to be a quant-relation because the type
norm-hook-lex-item inherited by determiner-lex-supertype
identifies the first element of the RELS list with the
LKEYS.KEYREL.) The second one should be identified with
LKEYS.ALTKEYREL and should be an arg1-ev-relation (the
type we use for the relations of intransitive adjectives).
The HOOK.INDEX.COG-ST inside the SPEC value should
be constrained to activ+fam. Finally, the LBL and ARG1 of
the arg1-ev-relation should be identified with the
SPEC..HOOK.LTOP and SPEC..HOOK.INDEX of the determiner, respectively. (This will result in the
demonstrative adjective relation sharing its handle with the N' the
determiner attaches to.)
Make sure your ordinary determiners in the lexicon inherit
from the first subtype, and that your demonstrative determiners inherit
from the second subtype. Demonstrative determiner lexical entries
should constrain their LKEYS.ALTKEYREL.PRED to be an appropriate
subtype of demonstrative_a_rel.
Background
The goal of this part of the lab is to allow for unexpressed arguments.
As many of you have noticed, there are plenty of languages that
don't use pronouns as much as English does, but rather leave
the NP out entirely if it was just going to be a pronoun. Generally,
the meaning is about as recoverable from context as it is with pronouns
(afterall, pronouns only give small clues to the referent in terms
of person, number, and gender; among 3rd person referents, that
usually leaves a lot of ambiguity). In some languages (e.g., Spanish),
this kind of pronoun omission seems to be 'licensed' by the
fact that the verbal inflections carry as much information as the
pronouns would. In other languages (e.g., Japanese), the verbal
inflections don't in fact carry person/number/gender information,
but pronouns still aren't required.
Even in English (which likes pronouns so much that it has two
expletive [meaningless] ones -- it and there) there are
cases where arguments appear to be optional. Prime examples are verbs
like eat and drink. The sentence I already ate
means 'I already ate something', but the addressee is in no way
expected to know what exactly was eaten. This is called indefinite
null instantiation (see e.g., Johnson
and Fillmore 2000) This contrasts with definite null
instantiation (ibid), in which null arguments have definite
reference, that is, the utterance is only felicitous if the addressee
can determine the referent. English verbs which do this include
tell as in I already told you. (Which is a cute
example, because it's most likely to be used in a case where
the addressee can't determine what exactly s/he was already told,
but it's licensed because it means something like 'I already told
you the answer to that question'.)
Our general strategy is going to be similar to the way
we handle missing determiners. That is, we're going to write
unary phrase structure rules in which the mother and single
daughter have different valence requirements.
I believe that most languages should fall into one of
the following patterns (restricting our attention to
verbs and their arguments):
- Any argument can be left out with the definite interpretation.
Non-subject arguments of certain verbs can also receive the indefinite
interpretation when they are missing.
- Any subject can be left out with the definite interpretation.
Non-subject arguments of certain verbs can also be omitted. Their
interpretation (definite or indefinite) is lexically determined
by the verb.
- Subjects are required, but non-subject arguments of certain
verbs can be left out, with definite or indefinite reference depending
on the verb.
- In languages with optional agreement marking on the verb,
it may be the case that you see the `any old subject' or `any old
direct object' being left out pattern only when agreement marking
is present. Without agreement marking, you may only find lexically
licensed definite/indefinite null instantiation.
I've written this assignment based on those four possibilities,
and it should be straight-forward to the extent that I'm right :-).
If your language instantiates a different pattern, talk to me.
Create instances of rules
- The Matrix provides definitions of
decl-head-opt-subj-phrase and basic-head-opt-comp-phrase which should
be specific enough.
- If your language allows subject pro-drop, create an
instance of decl-head-opt-subj-phrase in rules.tdl.
- Parse a sentence without a subject to see if it works.
- If your language allows object pro-drop (in general, or
only certain arguments of certain verbs), create an instance
of basic-head-opt-comp-phrase.
- Parse a sentence with a missing object to see if it works.
- (At the moment, this might overgenerate, as it should allow
any complement to go missing, and not all languages allow that.
We'll fix it presently.)
Add verbal subtypes for argument optionality
For expository purposes, I'm assuming that you have a subtype
verb-lex called trans-verb-lex. If you've called
it something else, not to worry, just use your corresponding types
whenever I mention these.
For languages without general pro-drop of objects
- If you have an example of a transitive verb with
an optional argument, create two subtypes of trans-verb-lex.
- One (for verbs whose arguments are not optional), should
say [OPT -] on the complement.
- The other can
leave the value of OPT unspecified, but should possibly specify
a value for OPT-CS, according to how it the argument
gets interpreted when it is missing. OPT-CS takes its
value from the same range of types as COG-ST. The rules
for optional arguments fill in the COG-ST value according
to the OPT-CS value.
- (You might find that you need multiple
subtypes here, if you have examples of different verbs with
different behavior.)
- Edit lexicon.tdl so that the lexical entries
which used to inherit from trans-verb-lex now inherit
from your new subtypes.
For languages with general pro-drop of objects
>[NB: Languages in this type might have agreement
markers for objects or they might not. If the agreement
markers are optional, however, see the next section.]
- If some, but not all, of your transitive verbs allow
indefinite as well as definite interpretations of missing
objects, (the translation of eat would be a likely
suspect), create two subtypes of trans-verb-lex.
- One specifies [OPT-CS activ-or-more] (or similar) on the complement.
- The other should leave OPT-CS unspecified
on the complement (thus allowing either interpretation).
- Edit lexicon.tdl so that the lexical entries
which used to inherit from trans-verb-lex now inherit
from your new subtypes.
For languages where arugment optionality corresponds
to the presence of optional verbal inflection
- Modify your verb inflection lexical rules (if you have
them) to change status of the relevant arguments:
If the affix (object or subject marker) precludes overt NP
realization of that same argument, then the lexical rule should
shorten the appropriate valence list appropriately. If the
affix allows both overt realization and dropping of the argument,
it should change the OPT value from - (given on the lexical
entry) to underspecified. In either case, you'll be writing
a valence-change-only-lex-rule. Be sure to copy up all other
valence information that is not changed.
- Consider what the OPT-CS value of those arguments should
be, according to whether the verbal inflection is present or
absent.
- Make appropriate changes to OPT-CS in the lexical entries
and lexical rules.
Write up your analyses
For each of the following phenomena, please include
the following your write up:
- A descriptive statement of the facts of your language.
- Illustrative IGT examples from your testsuite.
- A statement of how you implemented the phenomenon (in terms of types you added/modified and particular tdl constraints).
- If the analysis is not (fully) working, a description of the problems
you are encountering.
- Adjectival and adverbial modifiers.
- Agreement between adjectives and head nouns. (If your language doesn't have this, then just say so.)
- Demonstratives and markers of definiteness.
- Argument optionality
In addition, your write up should include a statement of the current
coverage of your grammar over your test suite (using numbers you can
get from Analyze | Coverage and Analyze | Overgeneration in [incr tsdb()])
and a comparison between your baseline test suite run and your final
one for this lab (see Compare | Competence).
Back to main course page
ebender at u dot washington dot edu
Last modified: Wed Feb 11 16:33:16 PST 2009