Linguistics 567: Knowledge Engineering for NLP
Lab 6 Due 2/11
Navigation
Some of you may have already covered some of this material.
If you would like to work on something else this week, let me
know what it is.
Run a base-line test suite instance, and save this to
submit with your lab. As usual, consider adding to your
basic test suite if you have not sufficiently covered the
phenomena addressed here.
There is a bug in the current matrix customization
script. To fix it, add the type no-hcons-lex-item as
a supertype for noun-lex in klingon.tdl.
Without this fix, the HCONS list in all sentences with nouns
is empty. Once it's fixed, HCONS introduced by determiners or
the bare-np rule will now show up.
First, download an updated copy of matrix.tdl,
and drop it in in place of your old matrix.tdl
This version has the cognitive status hierarchy (cog-st and subtypes)
as well as the feature COG-ST and SPECI defined.
If you have any overt personal pronouns, they should be constrained
to be [COG-ST activ-or-more & [SPECI + ]]. (This is a first
pass guess at what the function of such things is crosslinguistically.
If you have information suggesting this is not appropriate for your language,
please let me know.)
If you have any determiners, consider whether they constrain the
COG-ST value of the N' they attach to. An indefinite
determiner in English, for example, would probably contribute
[COG-ST type-id]. Demonstrative determiners should probably
be somewhere in the activated-familiar range. For the moment, we
won't encode the information about whether the object being pointed
to is closer to the speaker or the hearer (or away from both).
If you have any nominal inflections associated with discourse
status, implement lexical rules which add them and constrain the
COG-ST value appropriately.
Updated 2/8/07 To summarize, here is our
first-pass guess at the cognitive status associated with various types
of words/markers, with the caveat that this is just a starting point
for any given language, and that language-internal evidence might
point to a different classification and/or homophony among
words/markers of the same form.
Marker | COG-ST value |
Personal pronoun | activ-or-more |
Demonstrative article/adjective | activ+fam |
Definite article/inflection | uniq+fam+act |
Indefinite article/inflection | type-id |
Note that in some cases an unmarked form is underspecified,
where in others it stands in contrast to a marked form.
By request, here are some instructions for creating demonstrative,
first demonstrative adjectives, then demonstrative pronouns, then demonstrative
determiners.
All three types of demonstratives will share a set of relations
which express the proximity to hearer and speaker. We will arrange
these relations into a hierarchy so that languages with just a one- or
two-way distinction can be more easily mapped to languages with a two-
or three-way distinction. In order to do this, we're using
types for these PRED values rather than strings. Note the
absence of quotation marks. We will treat the demonstrative relations
as adjectival relations, no matter how they are introduced (via
pronouns, determiners, or quantifiers).
demonstrative_a_rel := predsort.
proximal+dem_a_rel := demonstrative_a_rel. ; close to speaker
distal+dem_a_rel := demonstrative_a_rel. ; away from speaker
remote+dem_a_rel := distal+dem_a_rel. ; away from speaker and hearer
hearer+dem_a_rel := distal+dem_a_rel. ; near hearer
Demonstrative adjectives
Demonstrative adjectives come out as the easy case in this system.
They are just like regular adjectives, except that in addition to
introducing a relation whose PRED value is one of the subtypes of
demonstrative_a_rel defined above, they also constrain the INDEX.COG-ST of their MOD value to be activ+fam.
Demonstrative pronouns
On this analysis, demonstrative pronouns differ from other pronouns
in introducing two relations: the pronoun_n_rel that all other
pronouns introduce and one of the demonstrative_a_rel subtypes
defined above. Because they introduce two relations, they can't
inherit from noun-lex as it is currently defined in your
grammars (nor even basic-noun-lex defined in the Matrix), since
both of those ultimately inherit the constraint that only one relation
is contributed. If we stick with this analysis of demonstratives
in the long run, we will probably reformulate things on the Matrix side
to make this a bit cleaner. For now, in order to capture the similarites
that do exist among nominal lexical items, I recommend doing the following.
- Define a type noun-lex-supertype as follows, and add to it
any constraints common to all nouns including demonstrative pronouns
in your language.
noun-lex-supertype := basic-one-arg & norm-hook-lex-item &
[ SYNSEM [ LOCAL.CAT [ HEAD noun,
VAL [ SPR < #spr &
[ LOCAL.CAT.HEAD det ] >,
COMPS < >,
SUBJ < >,
SPEC < > ]],
LKEYS.KEYREL noun-relation ],
ARG-ST < #spr > ] .
This type has all of the information in noun-lex as defined
by the customization script and basic-noun-lex defined in
matrix.tdl, with the exception of the constraint that it have
exactly one thing on the RELS list.
- Define a subtype noun-lex of noun-lex-supertype which
adds in the single rel constraint. This type should fit into your
hierarchy the same way your old noun-lex did (i.e., have the
same subtypes).
noun-lex := noun-lex-supertype & single-rel-lex-item.
- Define another subtype of noun-lex-supertype for the demonstrative
pronouns.
- Like other pronouns, these should say that their specifiers are
[OPT +] (i.e., they obligatorily undergo the bare-np rule).
- They should have two things on their CONT.RELS list. The
first is identified with the LKEYS.KEYREL and is a noun-relation whose
PRED value is the string "_pronoun_n_rel". The second thing on the RELS list is
identified with LKEYS.ALTKEYREL, and is an
event-relation.
- The LBL values of the two relations are identified with
each other.
- Finally, the COG-ST value on HOOK.INDEX should
be constrained to be activ+fam. Verify that this ends up on the ARG0 of the pronoun relation by parsing a sentence and examining its MRS.
- Define lexical entries for your demonstrative pronouns which
constrain their LKEYS.ALTKEYREL.PRED (rather than
LKEYS.KEYREL.PRED) to be the appropriate subtype of
demonstrative_a_rel taken from the list above.
Demonstrative determiners
As with the demonstrative pronouns, the demonstrative determiners introduce
two relations. This time, they are introducing the quantifier relation
(Let's say "exist_q_rel") and the demonstrative relation. Once
again, this analysis is going to entail changes to the Matrix core, as
basic-determiner-lex assumes just one relation being contributed.
Accordingly, we are going to by-pass the current version of basic-determiner-lex and define instead determiner-lex-supertype as follows:
determiner-lex-supertype := norm-hook-lex-item & basic-zero-arg &
[ SYNSEM [ LOCAL [ CAT [ HEAD det,
VAL[ SPEC.FIRST.LOCAL.CONT.HOOK [ INDEX #ind,
LTOP #larg ],
SPR < >,
SUBJ < >,
COMPS < >]],
CONT.HCONS < ! qeq &
[ HARG #harg,
LARG #larg ] ! > ],
LKEYS.KEYREL quant-relation &
[ ARG0 #ind,
RSTR #harg ] ] ].
This type should have two subtypes (assuming you have demonstrative
determiners as well as others in your language --- otherwise, just incorporate
the constraints for demonstrative determiners into the type above).
- The subtype for ordinary (non-demonstrative) determiners should add
the constraint that the RELS list has exactly one thing on it:
[ RELS <! relation !> ].
- The subtype for demonstrative determiners should specify a RELS
list with two things on it: the first should have the
"exist_q_rel" for its PRED value. (It's already
constrained to be a quant-relation because the type
norm-hook-lex-item inherited by determiner-lex-supertype
identifies the first element of the RELS list with the
LKEYS.KEYREL.) The second one should be identified with
LKEYS.ALTKEYREL and should be an adjective-relation.
The HOOK.INDEX.COG-ST inside the SPEC value should
be constrained to activ+fam. Finally, the LBL of
the adjective-relation should be identified with the
SPEC..HOOK.LTOP of the determiner. (This will result in the
demonstrative adjective relation sharing its handle with the N' the
determiner attaches to.)
Make sure your ordinary determiners in the lexicon inherit
from the first subtype, and that your demonstrative determiners inherit
from the second subtype. Demonstrative determiner lexical entries
should constrain their LKEYS.ALTKEYREL.PRED to be an appropriate
subtype of demonstrative_q_rel.
Background
The goal of this lab is to allow for unexpressed arguments.
As many of you have noticed, there are plenty of languages that
don't use pronouns as much as English does, but rather leave
the NP out entirely if it was just going to be a pronoun. Generally,
the meaning is about as recoverable from context as it is with pronouns
(afterall, pronouns only give small clues to the referent in terms
of person, number, and gender; among 3rd person referents, that
usually leaves a lot of ambiguity). In some languages (e.g., Spanish),
this kind of pronoun omission seems to be 'licensed' by the
fact that the verbal inflections carry as much information as the
pronouns would. In other languages (e.g., Japanese), the verbal
inflections don't in fact carry person/number/gender information,
but pronouns still aren't required.
Even in English (which likes pronouns so much that it has two
expletive [meaningless] ones -- it and there) there are
cases where arguments appear to be optional. Prime examples are verbs
like eat and drink. The sentence I already ate
means 'I already ate something', but the addressee is in no way
expected to know what exactly was eaten. This is called indefinite
null instantiation (see e.g., Johnson
and Fillmore 2000) This contrasts with definite null
instantiation (ibid), in which null arguments have definite
reference, that is, the utterance is only felicitous if the addressee
can determine the referent. English verbs which do this include
tell as in I already told you. (Which is a cute
example, because it's most likely to be used in a case where
the addressee can't determine what exactly s/he was already told,
but it's licensed because it means something like 'I already told
you the answer to that question'.)
Our general strategy is going to be similar to the way
we handled missing determiners. That is, we're going to write
unary phrase structure rules in which the mother and single
daughter have different valence requirements.
I believe that most languages should fall into one of
the following patterns (restricting our attention to
verbs and their arguments):
- Any argument can be left out with the definite interpretation.
Non-subject arguments of certain verbs can also receive the indefinite
interpretation when they are missing.
- Any subject can be left out with the definite interpretation.
Non-subject arguments of certain verbs can also be omitted. Their
interpretation (definite or indefinite) is lexically determined
by the verb.
- Subjects are required, but non-subject arguments of certain
verbs can be left out, with definite or indefinite reference depending
on the verb.
- In languages with optional agreement marking on the verb,
it may be the case that you see the `any old subject' or `any old
direct object' being left out pattern only when agreement marking
is present. Without agreement marking, you may only find lexically
licensed definite/indefinite null instantiation.
I've written this assignment based on those four possibilities,
and it should be straight-forward to the extent that I'm right :-).
If your language instantiates a different pattern, talk to me.
Create instances of rules
- The Matrix provides definitions of
decl-head-opt-subj-phrase and basic-head-opt-comp-phrase which should
be specific enough.
- If your language allows subject pro-drop, create an
instance of decl-head-opt-subj-phrase in rules.tdl.
- Parse a sentence without a subject to see if it works.
- If your language allows object pro-drop (in general, or
only certain arguments of certain verbs), create an instance
of basic-head-opt-comp-phrase.
- Parse a sentence with a missing object to see if it works.
- (At the moment, this might overgenerate, as it should allow
any complement to go missing, and not all languages allow that.
We'll fix it presently.)
Add ditransitive verbs
If you don't already have any verbs that take three arguments,
try putting one in:
- Add a new subtype of verb-lex which inherits
from the type ditransitive-lex-item (defined in
matrix.tdl). This new subtype should have two
elements on its COMPS list, and three things on ARG-ST.
- Create at least one instance of the new subtype
in lexicon.tdl. Places to look for ditransitive
verbs include the translations of give, sell,
and tell.
- Some languages don't allow two NP complements, and so
you might see PP complements for the first time. If this
comes up, talk to me.
- You might find that you need to add a new value of CASE
and a corresponding lexical rule.
- Add examples (positive and negative) to your main test suite
to test ditransitive verbs. (It's a good idea to create a
new [incr tsdb()] skeleton that includes the previous
testsuite plus the new items, while keeping your original
skeleton around.)
Add verbal subtypes for argument optionality
For expository purposes, I'm assuming that you have
subtypes of verb-lex called trans-verb-lex
and ditrans-verb-lex. If you've called them
something else, not to worry, just use your corresponding
types whenever I mention these.
For languages without general pro-drop of objects
- If you have an example of a transitive verb with
an optional argument, create two subtypes of trans-verb-lex.
- One (for verbs whose arguments are not optional), should
say [OPT -] on the complement.
- The other can
leave the value of OPT unspecified, but should possibly specify
a value for OPT-CS, according to how it the argument
gets interpreted when it is missing.
- (You might find that you need multiple
subtypes here, if you have examples of different verbs with
different behavior.)
- Edit lexicon.tdl so that the lexical entries
which used to inherit from trans-verb-lex now inherit
from your new subtypes.
- If you have an example of a ditransitive verb with an
optional argument, make analogous subtypes of ditrans-verb-lex
and change your lexicon accordingly. Note that when dealing
with ditransitive verbs, you have to pay attention to both
elements of the COMPS list.
- Otherwise, constrain ditrans-verb-lex itself
to ensure that both arguments are [OPT -].
For languages with general pro-drop of objects
- If some, but not all, of your transitive verbs allow
indefinite as well as definite interpretations of missing
objects, (the translation of eat would be a likely
suspect), create two subtypes of trans-verb-lex.
- One specifies [OPT-CS activ-or-more] (or similar) on the complement.
- The other should leave OPT-CS unspecified
on the complement (thus allowing either interpretation).
- Edit lexicon.tdl so that the lexical entries
which used to inherit from trans-verb-lex now inherit
from your new subtypes.
- If you have an example of a ditransitive verb which allows
indefinite null instantiation for one or more of its arguments, make
analogous subtypes of ditrans-verb-lex and change your
lexicon accordingly. Note that when dealing with ditransitive verbs,
you have to pay attention to both elements of the COMPS list.
- Otherwise, constrain ditrans-verb-lex itself
to ensure that both arguments are [OPT-CS activ-or-more].
For languages where arugment optionality corresponds
to the presence of optional verbal inflection
- Modify your verb inflection lexical rules (if you have
them) to change the OPT status of the relevant arguments.
- Consider what the OPT-CS value of those arguments should
be, according to whether the verbal inflection is present or
absent.
- Make appropriate changes to OPT-CS in the lexical entries
and lexical rules.
In this part, you will add basic functionality for intersective
adjectives and adverbs.
Head-modifier rules
The Matrix distinguishes scopal from intersective modification.
We're going to pretend that everything is intersective and
just not worry about the scopal guys for now.
- Create an instance of head-adj-int-phrase, an
instance of adj-head-int-phrase, or both, depending on
whether you need only prehead modifiers, only posthead modifiers,
or both. (You may already have some of these, depending on
what you said about negation on the configuration page.)
- Try parsing a transitive sentence. Be surprised by the extra
parse. Or, if you don't get an extra parse, try parsing some of your
ungrammatical examples from earlier labs. Look at the enlarged trees
(or try Parse > Compare) to see what's going on.
- Constrain your existing subtypes of head (e.g., +nvd:
verb, noun,
det) to be [MOD < >].
- Try parsing the misbehaving sentence again.
Adjectives
- Create a type adjective-lex which inherits from
basic-adjective-lex. The following type works for English
assuming that:
- We're not worried about predicative adjectives or adjectives
taking complements for now.
- English has both pre-head and post-head modifiers, (head-adj and adj-head), but simple adjectives
are (almost) always prehead (hence the value of POSTHEAD).
- We're only dealing with intersective adjectives (as stipulated).
adjective-lex := basic-adjective-lex & intersective-mod-lex &
[ SYNSEM [ LOCAL [ CAT [ HEAD.MOD < [ LOCAL.CAT [ HEAD noun,
VAL.SPR ne-list ]]>,
VAL [ SPR < >,
SUBJ < >,
COMPS < >,
SPEC < > ],
POSTHEAD - ]]]].
- If adjectives in your language agree with nouns in person, number,
gender, and/or case, add appropriate constraints to the MOD value
of the adjectives. Consider writing lexical rules to create the
inflected forms, or at least types representing the different possibilities.
- Create one or more adjective instances.
- (Adjectives separated from the NPs they belong to will have
to await further developments. If your language allows this possibility,
please document it in your write ups and in your test suites.)
Adverbs
- Create one or more types for adverbs. The following type
definition inherits from appropriately-defined Matrix supertypes,
and constrains the modified constituent to be verbal.
adverb-lex := basic-adverb-lex & intersective-mod-lex &
[ SYNSEM [ LOCAL [ CAT [ HEAD.MOD < [ LOCAL.CAT.HEAD verb ]>,
VAL [ SPR < >,
SUBJ < >,
COMPS < >,
SPEC < > ]]]]].
- Try parsing a sentence with an adverb and then generating
to see where else the adverb can show up. If you language allows
multiple attachment sites for the adverb, admire the results.
If it doesn't, or doesn't allow *that* many, constrain them further.
- In order to constrain the possible attachment sites
for adverbs, you may need to constrain the value of POSTHEAD,
or the value of SPR inside MOD or the value of LIGHT inside MOD.
Write up (Updated 2/8/07)
- Describe the elements of your grammar which contribute information
about discourse status and how you implemented this.
- Describe how your languages expresses demonstratives, and how
you implemented this.
- Describe the situation with respect to
argument optionality in your language (as best you were able to
determine it), including glossed examples.
- Describe the lexical types that you made to
account for the argument optionality pattern you found and/or any
changes to lexical entries.
- Describe how you tested your argument
optionality analysis, and any ways in which the grammar is not yet
having the correct behavior. Speculate on what might need to be done
to fix it.
- Describe the modification facts (for adjectives and adverbs)
that you found in your language.
- Describe the constraints you needed to place on lexical types
to get the right behavior.
- Describe how you tested your modification analysis, and any ways
in which the grammar is not yet having the correct behavior.
Speculate on what might need to be done to fix it.
Submit via ESubmit
- Be sure your matrix folder includes your write-up and baseline/final [incr tsdb()] file.
- Consider removing the doc/ subdirectory in order to save
space on E-Submit.
- Compress the folder, and upload it to ESubmit.
- Submit it by midnight Sunday night (preferably by Friday evening :-).
Back to main course page