Linguistics 567: Knowledge Engineering for NLP

Lab 6 Due 2/11

Navigation

Preliminaries
Part 0: Patch
Part 1: Discourse status
Part 1': Demonstratives
Part 2: Optional arguments
Part 3: Modification
Write up

Preliminaries

Some of you may have already covered some of this material. If you would like to work on something else this week, let me know what it is.

Run a base-line test suite instance, and save this to submit with your lab. As usual, consider adding to your basic test suite if you have not sufficiently covered the phenomena addressed here.

Part 0: Fix the HCONS on nouns

There is a bug in the current matrix customization script. To fix it, add the type no-hcons-lex-item as a supertype for noun-lex in klingon.tdl.

Without this fix, the HCONS list in all sentences with nouns is empty. Once it's fixed, HCONS introduced by determiners or the bare-np rule will now show up.

First, download an updated copy of matrix.tdl, and drop it in in place of your old matrix.tdl This version has the cognitive status hierarchy (cog-st and subtypes) as well as the feature COG-ST and SPECI defined.

If you have any overt personal pronouns, they should be constrained to be [COG-ST activ-or-more & [SPECI + ]]. (This is a first pass guess at what the function of such things is crosslinguistically. If you have information suggesting this is not appropriate for your language, please let me know.)

If you have any determiners, consider whether they constrain the COG-ST value of the N' they attach to. An indefinite determiner in English, for example, would probably contribute [COG-ST type-id]. Demonstrative determiners should probably be somewhere in the activated-familiar range. For the moment, we won't encode the information about whether the object being pointed to is closer to the speaker or the hearer (or away from both).

If you have any nominal inflections associated with discourse status, implement lexical rules which add them and constrain the COG-ST value appropriately.

Updated 2/8/07 To summarize, here is our first-pass guess at the cognitive status associated with various types of words/markers, with the caveat that this is just a starting point for any given language, and that language-internal evidence might point to a different classification and/or homophony among words/markers of the same form.

Marker COG-ST value

Personal pronoun activ-or-more

Demonstrative article/adjective activ+fam

Definite article/inflection uniq+fam+act

Indefinite article/inflection type-id

Marker	COG-ST value
Personal pronoun	activ-or-more
Demonstrative article/adjective	activ+fam
Definite article/inflection	uniq+fam+act
Indefinite article/inflection	type-id

Note that in some cases an unmarked form is underspecified, where in others it stands in contrast to a marked form.

Part 1': Demonstratives (Added 2/8/07)

By request, here are some instructions for creating demonstrative, first demonstrative adjectives, then demonstrative pronouns, then demonstrative determiners.

All three types of demonstratives will share a set of relations which express the proximity to hearer and speaker. We will arrange these relations into a hierarchy so that languages with just a one- or two-way distinction can be more easily mapped to languages with a two- or three-way distinction. In order to do this, we're using types for these PRED values rather than strings. Note the absence of quotation marks. We will treat the demonstrative relations as adjectival relations, no matter how they are introduced (via pronouns, determiners, or quantifiers).

demonstrative_a_rel := predsort.
proximal+dem_a_rel := demonstrative_a_rel. ; close to speaker
distal+dem_a_rel := demonstrative_a_rel.   ; away from speaker
remote+dem_a_rel := distal+dem_a_rel.      ; away from speaker and hearer
hearer+dem_a_rel := distal+dem_a_rel.      ; near hearer

Demonstrative adjectives

Demonstrative adjectives come out as the easy case in this system. They are just like regular adjectives, except that in addition to introducing a relation whose PRED value is one of the subtypes of demonstrative_a_rel defined above, they also constrain the INDEX.COG-ST of their MOD value to be activ+fam.

Demonstrative pronouns

On this analysis, demonstrative pronouns differ from other pronouns in introducing two relations: the pronoun_n_rel that all other pronouns introduce and one of the demonstrative_a_rel subtypes defined above. Because they introduce two relations, they can't inherit from noun-lex as it is currently defined in your grammars (nor even basic-noun-lex defined in the Matrix), since both of those ultimately inherit the constraint that only one relation is contributed. If we stick with this analysis of demonstratives in the long run, we will probably reformulate things on the Matrix side to make this a bit cleaner. For now, in order to capture the similarites that do exist among nominal lexical items, I recommend doing the following.

Define a type noun-lex-supertype as follows, and add to it any constraints common to all nouns including demonstrative pronouns in your language.

noun-lex-supertype := basic-one-arg & norm-hook-lex-item &
  [ SYNSEM [ LOCAL.CAT [ HEAD noun,
                         VAL [ SPR < #spr &
                                   [ LOCAL.CAT.HEAD det ] >,
                               COMPS < >,
                               SUBJ < >,
                               SPEC < > ]],
             LKEYS.KEYREL noun-relation ],
    ARG-ST < #spr > ] .

This type has all of the information in noun-lex as defined by the customization script and basic-noun-lex defined in matrix.tdl, with the exception of the constraint that it have exactly one thing on the RELS list.

Define a subtype noun-lex of noun-lex-supertype which adds in the single rel constraint. This type should fit into your hierarchy the same way your old noun-lex did (i.e., have the same subtypes).
```
noun-lex := noun-lex-supertype & single-rel-lex-item.
```
Define another subtype of noun-lex-supertype for the demonstrative pronouns.
- Like other pronouns, these should say that their specifiers are [OPT +] (i.e., they obligatorily undergo the bare-np rule).
- They should have two things on their CONT.RELS list. The first is identified with the LKEYS.KEYREL and is a noun-relation whose PRED value is the string "_pronoun_n_rel". The second thing on the RELS list is identified with LKEYS.ALTKEYREL, and is an event-relation.
- The LBL values of the two relations are identified with each other.
- Finally, the COG-ST value on HOOK.INDEX should be constrained to be activ+fam. Verify that this ends up on the ARG0 of the pronoun relation by parsing a sentence and examining its MRS.
Define lexical entries for your demonstrative pronouns which constrain their LKEYS.ALTKEYREL.PRED (rather than LKEYS.KEYREL.PRED) to be the appropriate subtype of demonstrative_a_rel taken from the list above.

Demonstrative determiners

As with the demonstrative pronouns, the demonstrative determiners introduce two relations. This time, they are introducing the quantifier relation (Let's say "exist_q_rel") and the demonstrative relation. Once again, this analysis is going to entail changes to the Matrix core, as basic-determiner-lex assumes just one relation being contributed. Accordingly, we are going to by-pass the current version of basic-determiner-lex and define instead determiner-lex-supertype as follows:

determiner-lex-supertype := norm-hook-lex-item & basic-zero-arg &
  [ SYNSEM [ LOCAL [ CAT [ HEAD det,
			   VAL[ SPEC.FIRST.LOCAL.CONT.HOOK [ INDEX #ind,
				  			     LTOP #larg ],
                                SPR < >,
                                SUBJ < >,
                                COMPS < >]],
		     CONT.HCONS < ! qeq &
				 [ HARG #harg,
				   LARG #larg ] ! > ], 
	     LKEYS.KEYREL quant-relation &
		   [ ARG0 #ind,
		     RSTR #harg ] ] ].

This type should have two subtypes (assuming you have demonstrative determiners as well as others in your language --- otherwise, just incorporate the constraints for demonstrative determiners into the type above).

The subtype for ordinary (non-demonstrative) determiners should add the constraint that the RELS list has exactly one thing on it:
```
[ RELS <! relation !> ].
```
The subtype for demonstrative determiners should specify a RELS list with two things on it: the first should have the "exist_q_rel" for its PRED value. (It's already constrained to be a quant-relation because the type norm-hook-lex-item inherited by determiner-lex-supertype identifies the first element of the RELS list with the LKEYS.KEYREL.) The second one should be identified with LKEYS.ALTKEYREL and should be an adjective-relation. The HOOK.INDEX.COG-ST inside the SPEC value should be constrained to activ+fam. Finally, the LBL of the adjective-relation should be identified with the SPEC..HOOK.LTOP of the determiner. (This will result in the demonstrative adjective relation sharing its handle with the N' the determiner attaches to.)

Make sure your ordinary determiners in the lexicon inherit from the first subtype, and that your demonstrative determiners inherit from the second subtype. Demonstrative determiner lexical entries should constrain their LKEYS.ALTKEYREL.PRED to be an appropriate subtype of demonstrative_q_rel.

Part 2: Optional arguments

Background

The goal of this lab is to allow for unexpressed arguments. As many of you have noticed, there are plenty of languages that don't use pronouns as much as English does, but rather leave the NP out entirely if it was just going to be a pronoun. Generally, the meaning is about as recoverable from context as it is with pronouns (afterall, pronouns only give small clues to the referent in terms of person, number, and gender; among 3rd person referents, that usually leaves a lot of ambiguity). In some languages (e.g., Spanish), this kind of pronoun omission seems to be 'licensed' by the fact that the verbal inflections carry as much information as the pronouns would. In other languages (e.g., Japanese), the verbal inflections don't in fact carry person/number/gender information, but pronouns still aren't required.

Even in English (which likes pronouns so much that it has two expletive [meaningless] ones -- it and there) there are cases where arguments appear to be optional. Prime examples are verbs like eat and drink. The sentence I already ate means 'I already ate something', but the addressee is in no way expected to know what exactly was eaten. This is called indefinite null instantiation (see e.g., Johnson and Fillmore 2000) This contrasts with definite null instantiation (ibid), in which null arguments have definite reference, that is, the utterance is only felicitous if the addressee can determine the referent. English verbs which do this include tell as in I already told you. (Which is a cute example, because it's most likely to be used in a case where the addressee can't determine what exactly s/he was already told, but it's licensed because it means something like 'I already told you the answer to that question'.)

Our general strategy is going to be similar to the way we handled missing determiners. That is, we're going to write unary phrase structure rules in which the mother and single daughter have different valence requirements.

I believe that most languages should fall into one of the following patterns (restricting our attention to verbs and their arguments):

Any argument can be left out with the definite interpretation. Non-subject arguments of certain verbs can also receive the indefinite interpretation when they are missing.
Any subject can be left out with the definite interpretation. Non-subject arguments of certain verbs can also be omitted. Their interpretation (definite or indefinite) is lexically determined by the verb.
Subjects are required, but non-subject arguments of certain verbs can be left out, with definite or indefinite reference depending on the verb.
In languages with optional agreement marking on the verb, it may be the case that you see the `any old subject' or `any old direct object' being left out pattern only when agreement marking is present. Without agreement marking, you may only find lexically licensed definite/indefinite null instantiation.

I've written this assignment based on those four possibilities, and it should be straight-forward to the extent that I'm right :-). If your language instantiates a different pattern, talk to me.

Create instances of rules

The Matrix provides definitions of decl-head-opt-subj-phrase and basic-head-opt-comp-phrase which should be specific enough.
If your language allows subject pro-drop, create an instance of decl-head-opt-subj-phrase in rules.tdl.
Parse a sentence without a subject to see if it works.
If your language allows object pro-drop (in general, or only certain arguments of certain verbs), create an instance of basic-head-opt-comp-phrase.
Parse a sentence with a missing object to see if it works.
(At the moment, this might overgenerate, as it should allow any complement to go missing, and not all languages allow that. We'll fix it presently.)

Add ditransitive verbs

If you don't already have any verbs that take three arguments, try putting one in:

Add a new subtype of verb-lex which inherits from the type ditransitive-lex-item (defined in matrix.tdl). This new subtype should have two elements on its COMPS list, and three things on ARG-ST.
Create at least one instance of the new subtype in lexicon.tdl. Places to look for ditransitive verbs include the translations of give, sell, and tell.
Some languages don't allow two NP complements, and so you might see PP complements for the first time. If this comes up, talk to me.
You might find that you need to add a new value of CASE and a corresponding lexical rule.
Add examples (positive and negative) to your main test suite to test ditransitive verbs. (It's a good idea to create a new [incr tsdb()] skeleton that includes the previous testsuite plus the new items, while keeping your original skeleton around.)

Add verbal subtypes for argument optionality

For expository purposes, I'm assuming that you have subtypes of verb-lex called trans-verb-lex and ditrans-verb-lex. If you've called them something else, not to worry, just use your corresponding types whenever I mention these.

For languages without general pro-drop of objects

If you have an example of a transitive verb with an optional argument, create two subtypes of trans-verb-lex.
- One (for verbs whose arguments are not optional), should say [OPT -] on the complement.
- The other can leave the value of OPT unspecified, but should possibly specify a value for OPT-CS, according to how it the argument gets interpreted when it is missing.
- (You might find that you need multiple subtypes here, if you have examples of different verbs with different behavior.)
Edit lexicon.tdl so that the lexical entries which used to inherit from trans-verb-lex now inherit from your new subtypes.
If you have an example of a ditransitive verb with an optional argument, make analogous subtypes of ditrans-verb-lex and change your lexicon accordingly. Note that when dealing with ditransitive verbs, you have to pay attention to both elements of the COMPS list.
Otherwise, constrain ditrans-verb-lex itself to ensure that both arguments are [OPT -].

For languages with general pro-drop of objects

If some, but not all, of your transitive verbs allow indefinite as well as definite interpretations of missing objects, (the translation of eat would be a likely suspect), create two subtypes of trans-verb-lex.
- One specifies [OPT-CS activ-or-more] (or similar) on the complement.
- The other should leave OPT-CS unspecified on the complement (thus allowing either interpretation).
Edit lexicon.tdl so that the lexical entries which used to inherit from trans-verb-lex now inherit from your new subtypes.
If you have an example of a ditransitive verb which allows indefinite null instantiation for one or more of its arguments, make analogous subtypes of ditrans-verb-lex and change your lexicon accordingly. Note that when dealing with ditransitive verbs, you have to pay attention to both elements of the COMPS list.
Otherwise, constrain ditrans-verb-lex itself to ensure that both arguments are [OPT-CS activ-or-more].

For languages where arugment optionality corresponds to the presence of optional verbal inflection

Modify your verb inflection lexical rules (if you have them) to change the OPT status of the relevant arguments.
Consider what the OPT-CS value of those arguments should be, according to whether the verbal inflection is present or absent.
Make appropriate changes to OPT-CS in the lexical entries and lexical rules.

Part 3: Modification

In this part, you will add basic functionality for intersective adjectives and adverbs.

Head-modifier rules

The Matrix distinguishes scopal from intersective modification. We're going to pretend that everything is intersective and just not worry about the scopal guys for now.

Create an instance of head-adj-int-phrase, an instance of adj-head-int-phrase, or both, depending on whether you need only prehead modifiers, only posthead modifiers, or both. (You may already have some of these, depending on what you said about negation on the configuration page.)
Try parsing a transitive sentence. Be surprised by the extra parse. Or, if you don't get an extra parse, try parsing some of your ungrammatical examples from earlier labs. Look at the enlarged trees (or try Parse > Compare) to see what's going on.
Constrain your existing subtypes of head (e.g., +nvd: verb, noun, det) to be [MOD < >].
Try parsing the misbehaving sentence again.

Adjectives

Create a type adjective-lex which inherits from basic-adjective-lex. The following type works for English assuming that:
1. We're not worried about predicative adjectives or adjectives taking complements for now.
2. English has both pre-head and post-head modifiers, (head-adj and adj-head), but simple adjectives are (almost) always prehead (hence the value of POSTHEAD).
3. We're only dealing with intersective adjectives (as stipulated).
```
adjective-lex := basic-adjective-lex & intersective-mod-lex &
  [ SYNSEM [ LOCAL [ CAT [ HEAD.MOD < [ LOCAL.CAT [ HEAD noun,
                                                    VAL.SPR ne-list ]]>,
			   VAL [ SPR < >,
				 SUBJ < >,
				 COMPS < >,
				 SPEC < > ],
			   POSTHEAD - ]]]].
```
If adjectives in your language agree with nouns in person, number, gender, and/or case, add appropriate constraints to the MOD value of the adjectives. Consider writing lexical rules to create the inflected forms, or at least types representing the different possibilities.
Create one or more adjective instances.
(Adjectives separated from the NPs they belong to will have to await further developments. If your language allows this possibility, please document it in your write ups and in your test suites.)

Adverbs

Create one or more types for adverbs. The following type definition inherits from appropriately-defined Matrix supertypes, and constrains the modified constituent to be verbal.

adverb-lex := basic-adverb-lex & intersective-mod-lex &
  [ SYNSEM [ LOCAL [ CAT [ HEAD.MOD < [ LOCAL.CAT.HEAD verb ]>,
			   VAL [ SPR < >,
				 SUBJ < >,
				 COMPS < >,
				 SPEC < > ]]]]].

Try parsing a sentence with an adverb and then generating to see where else the adverb can show up. If you language allows multiple attachment sites for the adverb, admire the results. If it doesn't, or doesn't allow *that* many, constrain them further.
In order to constrain the possible attachment sites for adverbs, you may need to constrain the value of POSTHEAD, or the value of SPR inside MOD or the value of LIGHT inside MOD.

Write up (Updated 2/8/07)

Describe the elements of your grammar which contribute information about discourse status and how you implemented this.
Describe how your languages expresses demonstratives, and how you implemented this.
Describe the situation with respect to argument optionality in your language (as best you were able to determine it), including glossed examples.
Describe the lexical types that you made to account for the argument optionality pattern you found and/or any changes to lexical entries.
Describe how you tested your argument optionality analysis, and any ways in which the grammar is not yet having the correct behavior. Speculate on what might need to be done to fix it.
Describe the modification facts (for adjectives and adverbs) that you found in your language.
Describe the constraints you needed to place on lexical types to get the right behavior.
Describe how you tested your modification analysis, and any ways in which the grammar is not yet having the correct behavior. Speculate on what might need to be done to fix it.

Submit via ESubmit

Be sure your matrix folder includes your write-up and baseline/final [incr tsdb()] file.
Consider removing the doc/ subdirectory in order to save space on E-Submit.
Compress the folder, and upload it to ESubmit.
Submit it by midnight Sunday night (preferably by Friday evening :-).

Back to main course page