Linguistics 567: Knowledge Engineering for NLP

Lab 6 Due 2/15


Run a baseline test suite

Before making any changes to your grammar for this lab, run a baseline test suite instance. If you decide to add items to your test suite for the material covered here, consider doing so before modifying your grammar so that your baseline can include those examples. (Alternatively, if you add examples in the course of working on your grammar and want to make the snapshot later, you can do so using the grammar you turned in for Lab 5.)

Part 1: Discourse status

The basics

We are modeling the cognitive status attributed to discourse referents by particular referring expressions through a pair of features COG-ST and SPECI on ref-ind (the value of INDEX for nouns). Here is our first-pass guess at the cognitive status associated with various types of overt expressions (for dropped arguments, see below):
MarkerCOG-ST valueSPECI value
Personal pronounactiv-or-more+
Demonstrative article/adjectiveactiv+fam 
Definite article/inflectionuniq+fam+act 
Indefinite article/inflectiontype-id 

If you have any overt personal pronouns, constrain their INDEX values to be [COG-ST activ-or-more, SPECI + ].

If you have any determiners which mark definitness, have them constrain the COG-ST of their SPEC appropriately. For demonstrative determiners, see below.

If you have any nominal inflections associated with discourse status, implement lexical rules which add them and constrain the COG-ST value appropriately.

Note that in some cases an unmarked form is underspecified, where in others it stands in contrast to a marked form. You should figure out which is the case for any unmarked forms in your language (e.g., bare NPs in a language with determiners, unmarked nouns in a language with definiteness markers), and constrain the unmarked forms appropriately. For bare NPs, the place to do this is the bare NP rule (note that you might have to create separate bare NP rules for pronouns v.\ common nouns in this case). For definiteness affixes, you'll want a constant-lex-rule that constrains COG-ST, and that is parallel to the inflecting-lex-rule that adds the affix for the overtly marked case.

Some languages have agreement for definiteness on adjectives. In this case, you'll want to add lexical rules for adjectives that constrain the COG-ST of the item on their MOD list.


All three types of demonstratives will share a set of relations which express the proximity to hearer and speaker. We will arrange these relations into a hierarchy so that languages with just a one- or two-way distinction can be more easily mapped to languages with a two- or three-way distinction. In order to do this, we're using types for these PRED values rather than strings. Note the absence of quotation marks. We will treat the demonstrative relations as adjectival relations, no matter how they are introduced (via pronouns, determiners, or quantifiers).

demonstrative_a_rel := predsort.
proximal+dem_a_rel := demonstrative_a_rel. ; close to speaker
distal+dem_a_rel := demonstrative_a_rel.   ; away from speaker
remote+dem_a_rel := distal+dem_a_rel.      ; away from speaker and hearer
hearer+dem_a_rel := distal+dem_a_rel.      ; near hearer

Demonstrative adjectives

Demonstrative adjectives come out as the easy case in this system. They are just like regular adjectives, except that in addition to introducing a relation whose PRED value is one of the subtypes of demonstrative_a_rel defined above, they also constrain the INDEX.COG-ST of their MOD value to be activ+fam.

Demonstrative pronouns

On this analysis, demonstrative pronouns differ from other pronouns in introducing two relations: the pronoun_n_rel that all other pronouns introduce and one of the demonstrative_a_rel subtypes defined above. Because they introduce two relations, they can't inherit from noun-lex as it is currently defined in your grammars (nor even basic-noun-lex defined in the Matrix), since both of those ultimately inherit the constraint that only one relation is contributed. If we stick with this analysis of demonstratives in the long run, we will probably reformulate things on the Matrix side to make this a bit cleaner. For now, in order to capture the similarites that do exist among nominal lexical items, I recommend doing the following.

  1. Define a type noun-lex-supertype as follows, and add to it any constraints common to all nouns including demonstrative pronouns in your language.
    noun-lex-supertype := basic-one-arg & norm-hook-lex-item &
      [ SYNSEM [ LOCAL.CAT [ HEAD noun,
                             VAL [ SPR < #spr &
                                       [ LOCAL.CAT.HEAD det ] >,
                                   COMPS < >,
                                   SUBJ < >,
                                   SPEC < > ]],
                 LKEYS.KEYREL noun-relation ],
        ARG-ST < #spr > ] .

    This type has all of the information in noun-lex as defined by the customization script and basic-noun-lex defined in matrix.tdl, with the exception of the constraint that it have exactly one thing on the RELS list.

  2. Define a subtype noun-lex of noun-lex-supertype which adds in the single rel constraint. This type should fit into your hierarchy the same way your old noun-lex did (i.e., have the same subtypes).
    noun-lex := noun-lex-supertype & single-rel-lex-item.
  3. Define another subtype of noun-lex-supertype for the demonstrative pronouns.
  4. Define lexical entries for your demonstrative pronouns which constrain their LKEYS.ALTKEYREL.PRED (rather than LKEYS.KEYREL.PRED) to be the appropriate subtype of demonstrative_a_rel taken from the list above.

Demonstrative determiners

As with the demonstrative pronouns, the demonstrative determiners introduce two relations. This time, they are introducing the quantifier relation (Let's say "exist_q_rel") and the demonstrative relation. Once again, this analysis is going to entail changes to the Matrix core, as basic-determiner-lex assumes just one relation being contributed. Accordingly, we are going to by-pass the current version of basic-determiner-lex and define instead determiner-lex-supertype as follows:

determiner-lex-supertype := norm-hook-lex-item & basic-zero-arg &
  [ SYNSEM [ LOCAL [ CAT [ HEAD det,
				  			     LTOP #larg ],
                                SPR < >,
                                SUBJ < >,
                                COMPS < >]],
		     CONT.HCONS < ! qeq &
				 [ HARG #harg,
				   LARG #larg ] ! > ], 
	     LKEYS.KEYREL quant-relation &
		   [ ARG0 #ind,
		     RSTR #harg ] ] ].

This type should have two subtypes (assuming you have demonstrative determiners as well as others in your language --- otherwise, just incorporate the constraints for demonstrative determiners into the type above).

  1. The subtype for ordinary (non-demonstrative) determiners should add the constraint that the RELS list has exactly one thing on it, by adding the supertype single-rel-lex-item.
  2. The subtype for demonstrative determiners should specify a RELS list with two things on it: the first should have the "exist_q_rel" for its PRED value. (It's already constrained to be a quant-relation because the type norm-hook-lex-item inherited by determiner-lex-supertype identifies the first element of the RELS list with the LKEYS.KEYREL.) The second one should be identified with LKEYS.ALTKEYREL and should be an arg1-ev-relation (the type we use for the relations of intransitive adjectives). The HOOK.INDEX.COG-ST inside the SPEC value should be constrained to activ+fam. Finally, the LBL of the arg1-ev-relation should be identified with the SPEC..HOOK.LTOP of the determiner. (This will result in the demonstrative adjective relation sharing its handle with the N' the determiner attaches to.)

Make sure your ordinary determiners in the lexicon inherit from the first subtype, and that your demonstrative determiners inherit from the second subtype. Demonstrative determiner lexical entries should constrain their LKEYS.ALTKEYREL.PRED to be an appropriate subtype of demonstrative_a_rel.

Part 2: Optional arguments


The goal of this part of the lab is to allow for unexpressed arguments. As many of you have noticed, there are plenty of languages that don't use pronouns as much as English does, but rather leave the NP out entirely if it was just going to be a pronoun. Generally, the meaning is about as recoverable from context as it is with pronouns (afterall, pronouns only give small clues to the referent in terms of person, number, and gender; among 3rd person referents, that usually leaves a lot of ambiguity). In some languages (e.g., Spanish), this kind of pronoun omission seems to be 'licensed' by the fact that the verbal inflections carry as much information as the pronouns would. In other languages (e.g., Japanese), the verbal inflections don't in fact carry person/number/gender information, but pronouns still aren't required.

Even in English (which likes pronouns so much that it has two expletive [meaningless] ones -- it and there) there are cases where arguments appear to be optional. Prime examples are verbs like eat and drink. The sentence I already ate means 'I already ate something', but the addressee is in no way expected to know what exactly was eaten. This is called indefinite null instantiation (see e.g., Johnson and Fillmore 2000) This contrasts with definite null instantiation (ibid), in which null arguments have definite reference, that is, the utterance is only felicitous if the addressee can determine the referent. English verbs which do this include tell as in I already told you. (Which is a cute example, because it's most likely to be used in a case where the addressee can't determine what exactly s/he was already told, but it's licensed because it means something like 'I already told you the answer to that question'.)

Our general strategy is going to be similar to the way we handled missing determiners. That is, we're going to write unary phrase structure rules in which the mother and single daughter have different valence requirements.

I believe that most languages should fall into one of the following patterns (restricting our attention to verbs and their arguments):

I've written this assignment based on those four possibilities, and it should be straight-forward to the extent that I'm right :-). If your language instantiates a different pattern, talk to me.

Create instances of rules

Add ditransitive verbs (optional)

If you don't already have any verbs that take three arguments, try putting one in:

Add verbal subtypes for argument optionality

For expository purposes, I'm assuming that you have subtypes of verb-lex called trans-verb-lex and ditrans-verb-lex. If you've called them something else, not to worry, just use your corresponding types whenever I mention these.

For languages without general pro-drop of objects

For languages with general pro-drop of objects

For languages where arugment optionality corresponds to the presence of optional verbal inflection

Part 3: Matrix yes-no questions

The semantics for declarative and interrogative clauses will be the same except for the value of the feature SF (sentential force) on the event index of the main predicate.

The customization script may have provided the right kind of semantics for matrix yes-no questions already. Try parsing an example from your test suite. If it parses, examine the MRS. Is the value of SF on the INDEX of the clause ques? (Or in the case of intonation questions only, do you get prop-or-ques?)

If your yes-no question doesn't parse, or if it does but not with the right semantics, contact me, and we will work out what needs to be done.

Part 4: Embedded clauses

Clause embedding verbs

We will be using clausal complements as our example of embedded clauses. To do so, we need to create clause-embedding verbs. First, find examples of verbs that can embed propositions and verbs that can embed questions. If you also find verbs that are happy to embed either, we can make use of them. For inspiration, you can look here or here.

If your matrix and embedded clauses look the same, you should be able to test this immediately. If not, you'll have to wait until you've implemented the syntax for your embedded clauses.


Some languages mark embedded clauses (declaractive, interrogative or both) with complementizers (e.g., that and whether in English). To implement this, you'll need to do the following. (If your language also marks matrix questions with a question particle, you have some of the following in your grammar already.)

Test your embedded clauses. Do they parse as expected? Can you still generate?

Other strategies

Other possible syntactic differences between main and subordinate clauses include:

  1. Differences in word order (the general strategy here will be to add more head-subj and head-comp variants, but to constrain some of them to be [MC +] and/or [MC -]).
  2. Different verb forms (the general strategy here will be lexical rules which produce the forms of the embedded verbs and give them a distinctive HEAD.FORM value that the embedding verbs and/or complementizers can select for)

Consult with me to work out an analysis for whatever your language is doing in this case.

The feature MC

If your matrix and embedded clauses have different syntactic properties (e.g., presence v.\ absence of complementizers), you'll need to constrain things so that the embedded clause syntax only appears in embedded clauses and vice versa for matrix clause syntax. There are three resources for doing so:

If the difference is strictly S v. CP, you don't need the feature MC. Otherwise, you probably will need all three: The root condition will require [MC +], the embedding verb will require [MC -], and the constructions/lexical rules/etc which create the embedded and matrix clauses themselves should set appropriate values for MC.

Be sure your test suite contains negative examples illustrating matrix clause syntax in embedded clauses and vice versa.

Test your grammar

Write up

Your write up should address the following phenomena. For each, be sure to include examples in IGT format from your test suite to illustrate your points. (I should be able to parse the examples as given to see for myself what your grammar is doing.)

  1. Describe the ways in which discourse status is marked in your language.
  2. Describe how you implemented the discourse status marking in your grammar, and any difficulties you encountered (especially if you haven't yet been able to solve them).
  3. Describe the extent to which your language allows dropped arguments, and what the interpretation of the arguments is when they are dropped.
  4. Describe how you implemented argument drop in your grammar, and any difficulties you encountered (especially if you haven't yet been able to solve them).
  5. Describe the expression of yes-no questions in your language.
  6. Document what happened when you tested yes-no questions, and anything you had to change in your grammar to get them working properly.
  7. Describe the form of clausal complements in your grammar (statements and questions).
  8. Describe how you implemented embedded clauses in your grammar, and any difficulties you encountered (especially if you haven't yet been able to solve them).

Submit your assignment

Back to main course page
ebender at u dot washington dot edu
Last modified: Fri Feb 8 2008