Linguistics 567: Grammar Engineering

Lab 5 Due 4/28

Navigation

Requirements (what should I do?)
Phenomena
Write up instructions
Minimum requirements for this assignment

This assignment covers material which varies greatly from language to language. While you are welcome to do more, this section lays out what is actually required for this lab. The instructions given below are then a superset of what any one student needs to do.
Everyone should:
- (0.) Make sure you have a baseline test suite corresponding to your lab 4 grammar.
- (1.) Implement a number distinction for whichever class of nouns (minimum case is just pronouns) is appropriate, and a person distinction.
- (2.) Implement some additional phenomena, according to the "packages" below.
- (3.) Test your grammar using [incr tsdb()]. [incr tsdb()] should be part of your test-development cycle. In addition, you'll need to run a final test suite instance for this lab to submit along with your basline.
- (4.) Write up the phenomena you have analyzed.
For part (2.) choose one of the following "packages" (in many cases, you may find it easier to write more lexical rules rather than put in all the lexical entries you'll need if you don't):
- If your language has both case and agreement: Case, one kind of agreement (e.g., subject-verb or determiner-noun), 2 lexical rules
- If your language has agreement but no case: 2 kinds of agreement (e.g., subject-verb and determiner-noun), 2 lexical rules
- If your language has case but not agreement: Case, 2 lexical rules for case, 1 lexical rule for something else (e.g., past-tense verbs).
- If your language has numeral classifiers, numeral classifiers plus classifier-noun agreement.
- If your language has neither case nor agreement, negotiate something else with me.
Pronouns, person and number distinctions

Because person and number information are also interpreted semantically, we want to record it regardless of whether it is syntactically relevant (i.e., whether it gets used for agreement).
Some of the instructions in this section are very specific (i.e., I'm giving you lots of answers) because I want you to have time to focus your efforts on other parts of the lab. Don't be surprised then, when all of the sudden things get less specific!
- Add the following type definitions to klingon.tdl:
```
png :+ [ PER person,
         NUM number ].

person := *top*.
first := person.
second := person.
third := person.

number := *top*.
sg := number.
non-sg := number. ; use this one if your language only has sg-pl
dual := non-sg.   ; at these two if your language has sg-du-pl
pl := non-sg.
```
  (The type non-sg is there to facilitate a mapping between languages with sg-pl and languages with sg-du-pl systems in the MT exercise. In some languages with a du-pl distinction it might also be useful language internally. If you language makes more than a three way distinction (some do!) talk to me.)
  (If your language does person and number agreement with an elsewhere case -- like English non-3sg -- you may want to define subtypes of png which groups the values of PER and NUM in interesting ways. If you want to know more about this, talk to me.)
- If your language has gender/nounclass distinctions, you'll want to use this definition of png instead, along with appropriate definitions for subtypes of gender.
```
png :+ [ PER person,
         NUM number,
         GEND gender ].

gender := *top*.
...
```
- If your language has noun classes that aren't plausibly called gender, you migth use a different name for that feature. Numeral classifiers are arguably best handled with reference to an ontology (rather than unification), but we'll treat them like other nouns classes for our purposes.
- The following assumes that your language has at least some stand-alone pronouns. If not, you'll still want to do the proper noun related stuff.
- Pronouns will be nouns that obligatorily undergo a covert-det rule (unless they do optionally take determiners, if so, let me know!). To facilitate this, we're going to define some relation types (rather than strings) to use with determiners. Begin by adding the following to klingon.tdl.
```
quantifier_rel := predsort.
pronoun_q_rel := quantifier_rel.
proper_q_rel := quantifier_rel.
reg_quant_rel := quantifier_rel.
```
- If your grammar allows determiners with proper nouns, you'll actually want something slightly different:
```
quantifier_rel := predsort.
pronoun_q_rel := quantifier_rel.
reg_or_proper_q_rel := quantifier_rel.
proper_q_rel := reg_or_proper_q_rel.
reg_quant_rel := reg_or_proper_q_rel.
```
- If your grammar has determiners, you'll want to define subtypes of reg_quant_rel for their PRED values. For example:
```
demonstrative_q_rel := reg_quant_rel.
non+demonstrative_q_rel := reg_quant_rel.
proximal+dem_q_rel := demonstrative_q_rel. ; close to speaker
distal+dem_q_rel := demonstrative_q_rel.   ; away from speaker
remote+dem_q_rel := distal+dem_q_rel.      ; away from speaker and hearer
hearer+dem_q_rel := distal+dem_q_rel.      ; near hearer
def_q_rel := non+demonstrative_q_rel.      ; definite
indef_q_rel := non+demonstrative_q_rel.    ; indefinite
```
- Now edit your determiner lexical entries ni lexicon.tdl to take one of these types (no quotes, since they're not strings) as their PRED values.
- Your grammar probably already has a subtype of basic-bare-np-phrase. You'll want to replace it with one to three subtypes, depending on how many of the following your language has:
  1. Independent pronouns without determiners.
  2. Proper nouns without determiners.
  3. Common nouns without determiners.
- Each subtype will specify the PRED value of th quantifier relation it contributes, and require that the daughter's SPR value is compatible with it.
```
some-bare-np-phrase := basic-bare-np-phrase &
  [ HEAD-DTR.SYNSEM.LOCAL.CAT.VAL.SPR 
                < [ LOCAL.CONT.RELS < ! [ PRED #pred ] ! > ] >,
    C-CONT.RELS < ! [ PRED #pred & some_quant_rel ] ! > ].
```
- Create a instances of your bare-np-phrases in rules.tdl.
- Make subtypes of noun-lex for proper nouns, common nouns, and (if appropraite) pronouns.
```
pronoun-lex := noun-lex &
  [ SYNSEM [ LOCAL.CAT.VAL.SPR 
                < [ LOCAL.CONT.RELS < ! [PRED pronoun_q_rel] ! > ] >,
	     LKEYS.KEYREL.PRED 'pronoun_n_rel ] ].

common-noun-lex := noun-lex &
  [ SYNSEM.LOCAL [ CAT.VAL.SPR 
		      < [ LOCAL.CONT.RELS < ! [PRED reg_quant_rel] ! > ] >,
             CONT.HOOK.INDEX.PNG [ PER third ] ] ].

proper-noun-lex := noun-lex &
  [ SYNSEM.LOCAL [ CAT.VAL.SPR 
		      < [ LOCAL.CONT.RELS < ! [PRED proper_q_rel] ! > ] >,
             CONT.HOOK.INDEX.PNG [ PER third ] ] ].
```
  Note that pronoun-lex specifies a PRED value, so all pronouns will have the same one. The only difference will be in the person and number values. (Something will have to be said about demonstrative pronouns, probably something about what kind of quantifier relation they should appear with.) common-noun-lex is constrained to [PER third] since only pronouns have other PER values.
- Create lexical entries for pronouns in lexicon.tdl, specifying PER, NUM and GEND values, as appropriate. Here's an example for English:
```
we := pronoun-lex &
        [ STEM < "we" >,
          SYNSEM.LOCAL.CONT.HOOK.INDEX.PNG [ PER first,
					     NUM non-sg ] ].
```
- Update your lexical entries for common nouns to inherit from common-noun-lex and to specify number and gender information. If you're going to use a lexical rule for noun number, you might consider doing only a couple lexical entries now for testing purposes. If your language has a gender system, you might consider defining subtypes of common-noun-lex for each gender (which constrain the GEND value), and inheriting from those instead. (A similar thing could be done for number, but it's redundant if you're going to write a lexical rule.)
- Test your grammar by checking whether pronouns and common nouns can appear with or without determiners, and make sure that the results are what you want!
Case --- Inflection
- Define a feature CASE appropriate for the type noun (if you think it might also be appropriate for other types, talk to me).
```
noun :+ [ CASE case ].
```
- Define a type case and subtypes as appropriate (e.g., nom, acc, dat, ...). There is no need to use consistent type names across grammars for our purposes here, as the MT exercise won't involve case (that is, case doesn't appear in semantic representations).
- Add case information to the ARG-ST or valence features of your types for transitive and intransitive verbs, e.g.:
```
trans-verb-lex := basic-verb-lex & transitive-lex-item &
  [ SYNSEM.LOCAL [ CAT [ HEAD verb,
                    	 VAL [ SPR < >,
                               SUBJ < #subj & synsem 
				      & [ LOCAL.CAT [ HEAD noun &
							   [ CASE nom ],
						      VAL.SPR <> ]] >,
                               COMPS < #comps 
				       & [ LOCAL.CAT [ HEAD noun &
							    [ CASE acc ],
						       VAL.SPR <> ]]>,
                               SPEC < > ]]],
    ARG-S < #subj, #comps > ].
```
- (If some verbs in your language require quirky case marking, you might want to define a subtype of trans-verb-lex, say nom-acc-trans-verb-lex which encodes the regular pattern, and let most verbs inherit from it. The exceptional verbs would inherit directly from trans-verb-lex instead, and specify the cases they require in their lexical entries.)
- Modify your lexical entries for nouns to reflect their case values. If you're going to write a lexical rule for case inflection on nouns, just do one or two now to test this part of your grammar. Here's an example from English:
```
we := pronoun-lex &
        [ STEM < "we" >,
          SYNSEM.LOCAL [ CAT.HEAD.CASE nom,
			 CONT.HOOK.INDEX.PNG [ PER first,
					       NUM non-sg ] ] ].

us := pronoun-lex &
        [ STEM < "us" >,
          SYNSEM.LOCAL [ CAT.HEAD.CASE acc,
			 CONT.HOOK.INDEX.PNG [ PER first,
					       NUM non-sg ] ] ].
```
- If case is also marked on nominal dependents (perhaps without being overtly marked on the nouns themselves), have the dependents constrain the CASE value of their SPEC (for determiners) or MOD (for adjectives) value. Consider writing lexical rules to create case-inflected adjectives.
- Test your grammar: Do sentences with nouns in the right case parse? Do sentences with nouns in the wrong case parse?
Case --- case marking adpositions

Some languages mark case with adpositions rather than affixes. These adpositions are analyzed as semantically empty (though they may fill the same 'slot' as semantically contentful adpositions).
- Declare a feature CASE. If there are any verbs that can take either an NP or a PP in the same argument position, you probably want to make CASE appropriate for both nouns and adpositions:
```
+np :+ [ CASE case ].
```
  Otherwise, you might be able to declare it just for adpositions:
```
adp :+ [ CASE case ].
```
- The type for case marking adpositions should look like this:
```
case-marker-p-lex := basic-one-arg & raise-sem-lex-item &
   [ SYNSEM.LOCAL.CAT [ HEAD adp & [ MOD < > ],
                        VAL [ SPR < >,
                              SUBJ < >,
                              COMPS < #comps >,
                              SPEC < > ]],
     ARG-ST < #comps & [ LOCAL.CAT [ HEAD noun,
                                      VAL.SPR < > ]] > ].
```
- Particular case-marker-adp lexical entries will instantiate this type and constrain the value of SYNSEM.LOCAL.CAT.HEAD.CASE.
- Verb types will constrain the CASE value of their arguments (see the directions above).
- You may need to modify your verb types to have them select [HEAD adp] (PP only) or [HEAD +np] (PP or NP) arguments.
- If your language has free word order but the adpositions can only occur on one side of the noun (i.e., they're strictly prepositions or postpositions), you'll need to constrain one of your head-complement rules to exclude adpositions as head daugthers. You can do this by constraining the HEAD value of the HEAD-DTR of the head-comp rule that does not allow adpositions to be +nvjrcdmo.
Agreement
- Determine which element is doing the agreeing (e.g., in subject-verb agreement, it's the verb; in determiner-noun agreement, its the determiner, arguably even if the noun itself doesn't overtly show the information being agreed upon).
- Determine where in the feature structure for the agreeing element, the information it is agreeing with should be available (e.g., in subject-verb agreement, the information is available inside the verb's SUBJ value; in determiner-noun agreement, the information is available inside the determiner's SPEC feature; in adjective-noun agreement, the information is available inside the adjective's MOD feature).
- Constrain the information in both places. (e.g., if you're doing determiner-noun agreement for number and gender in a Romance language, make sure your noun lexical entries specify the relevant values for number and gender. Then constrain the SPEC value of the determiner entries.)
  Example from French:
```
chat := common-noun-lex &
     	[ STEM < "chat" >,
	  SYNSEM [ LOCAL.CONT.HOOK.INDEX.PNG [ NUM sg,
					       GEND masc ],
		   LKEYS.KEYREL.PRED '_cat_n_rel ] ].

le := determiner-lex &
	[ STEM < "le" >,
	  SYNSEM [ LOCAL.CAT.VAL.SPEC < [ LOCAL.CONT.HOOK.INDEX.PNG 
                                            [ NUM sg,
					      GEND masc ] ] >,
                   LKEYS.KEYREL.PRED def_q_rel  ] ].
```
- Consider writing lexical rules to generate the appropriate lexical entries (e.g., singular and plural nouns, 2-person-plural-feminine verbs, etc).
- Test your grammar: do sentences with agreement parse and sentences without agreement fail to parse?
Lexical rules
- Pick a supertype for your rule:
  - Determine whether your lexical rule needs to change SYNSEM information, or just add to it. (Examples: If the input has a non-empty SPR list and the output has an empty SPR list, that's changing information. If the input has no value specified for CASE and the output is [CASE nom], that's just adding information.)
  - Determine whether your lexical rule creates fully inflected forms, or whether there's more inflection you'd like to stack on top of it.
  - Rules creating fully inflected forms and only adding information to SYNSEM can inherit from infl-ltow-rule.
  - Rules creating not-yet fully inflected forms and only adding information should inherit from infl-add-only-no-ccont-ltol-rule.
  - If your rule needs to change the SYNSEM value, determine which part of SYNSEM is changing (e.g., VAL only, HEAD only, CAT only) and choose an appropriate type out of the types called infl-***-change-only-ltol-rule. Unless you're adding any relations, your rule should also inherit from no-ccont-lex-rule.
- Define a rule type in klingon.tdl which contains all of the information about your rule except the spelling changes. The value of DTR should be specific enough to constrain the rule to only applying to the right type of words. The value of SYNSEM should be at least as specific as the lexical entries you've been writing so far. Here's an example from English (where the value of SYNSEM ends up being very specific since all the information from the daughter is also in the mother):
```
3sg_verb-lex-rule := infl-ltow-rule &
  [ SYNSEM.LOCAL.CAT.VAL.SUBJ < [ LOCAL.CONT.HOOK.INDEX.PNG [ PER third,
							      NUM sg ]] >,
    DTR.SYNSEM.LOCAL.CAT.HEAD verb ].
```
- If you have multiple rules applying to the same form, constrain the innermost (rightmost prefix or leftmost suffix) to take lex-item as its DTR. The next one to apply to should take the first rule as its DTR, etc. If multiple rules can appear in one slot, define a supertype for them which can be the DTR of the next rule type out.
- Define an instance of the rule type in irules.tdl. This instance should give the spelling change subrules on a line beginning with %prefix or %suffix. Assuming you're working from regularized morphophonology, these should be simple concatenation, of the form (* pref) or (* suff).
- A slightly more complicated example from English (without regularized morphophonology) follows. After %suffix there is a list of pairs in which the first member matches the input form and the second member describes the output form. * matches the empty string. ! signals a letter-set. More specific subrules to the right.
```
3sg_verb :=
%suffix (!s s) (!ss !ssses) (ss sses)
3sg_verb-lex-rule.
```
  And here's the letter set that's used:
```
%(letter-set (!s abcedfghijklmnopqrtuvwxyz))
```
- Update your lexical entries so that they give the stem instead of the inflected word (i.e., so that your lexical rule can do the work). Any such stem entries should also be marked [INFLECTED -]. Consider making [INFLECTED -] a constraint on the relevant lexical types, so you don't have to keep remembering to type it.
- Test your grammar. Does the lexical rule apply to the words it should apply to? Does it apply to words it shouldn't apply to?
Head-modifier rules

If you're doing adjectives because your only agreement involves adjectives, you're going to need one or two head-modifier rules.
The Matrix distinguishes scopal from intersective modification. We're going to pretend that everything is intersective and just not worry about the scopal guys for now.
- Create an instance of head-adj-int-phrase, an instance of adj-head-int-phrase, or both, depending on whether you need only prehead modifiers, only posthead modifiers, or both.
- Try parsing a transitive sentence. Be surprised by the extra parse. Or, if you don't get an extra parse, try parsing some of your ungrammatical examples from earlier labs. Look at the enlarged trees (or try Parse > Compare) to see what's going on.
- Constrain your existing subtypes of head (verb, noun, det) to be [MOD < >].
- Try parsing the misbehaving sentence again.
Adjectives
- Create a type adjective-lex which inherits from basic-adjective-lex. The following type works for English assuming that:
  1. We're not worried about predicative adjectives or adjectives taking complements for now.
  2. We have both orders (head-adj and adj-head), but adjectives are always prehead (hence the value of POSTHEAD).
  3. We're only dealing with intersective adjectives (as stipulated).
```
adjective-lex := basic-adjective-lex &
  [ SYNSEM [ LOCAL [ CAT [ HEAD adj &
				[ MOD < [ LOCAL [ CAT.HEAD noun,
						  CONT.HOOK [ INDEX #ind,
							      LTOP #ltop ]]]>],
			   VAL [ SPR < >,
				 SUBJ < >,
				 COMPS < >,
				 SPEC < > ],
			   POSTHEAD - ],
		     CONT.HOOK.LTOP #ltop ],
	     LKEYS.KEYREL.ARG1 #ind ] ].
```
- Create one or more adjective instances.
Numeral classifiers

Here is the basic strategy with numeral classifiers:
- NumCls take a number name specifier to form a NumClP.
- NumClPs may modify nouns or serve as determiners, depending on the language. Consult with me about which is going on in your language.
- Nouns will specify which classifiers they take by specifying a noun class inside PNG.
- NumCls will constrain the INDEX.PNG of their MOD/SPEC value so as to only modify compatible nouns.
- NumCls that are determiners may also take demonstrative elements as their specifiers. We'll probably want to analyze this as affecting the quant_rel they introduce (in the current imperfect system).
- (Don't worry for now about the use of NumClP or Demonstrastive+NumCl as NPs in and of themselves for now. We can probably eventually handle this via a non-branching rule.)
I hope to have some specific instructions here by Tuesday, but in the meantime, here are some references:
- Bender, Emily M. and Melanie Siegel. 2004. Implementing the Syntax of Japanese Numeral Classifiers. Proceedings of IJCNLP-04, Hainan Island, China. [.bib]
- Bender, Emily M. 2002. Number Names in Japanese: A Head-Medial Construction in a Head-Final Language. ms. (Also available as pdf) Comments welcome!
Possessives

Since several folks are going to work on possessives for this assignment, here are some details on how to handle that.
The target semantics that we want for something like "Kim's dog" is:
```
[ proper_q_rel
  LBL: h6
  ARG0: x7
  RSTR: h8
  BODY: h9 ]
[ named_rel
  LBL: h10
  ARG0: x7
  CARG: "kim" ]
[ def_explicit_q_rel
  LBL: h11
  ARG0: x12
  RSTR: h13
  BODY: h14 ]
[ poss_rel
  LBL: h15
  ARG0: e16
  ARG1: x12
  ARG2: x7 ]
[ "_dog_n_rel"
  LBL: h15
  ARG0: x12 ]

h8 qeq h10, h13 qeq h15
```
Things to note:
- There are two quantifiers (one for Kim and one for dog).
- There's a poss_rel which takes dog as its ARG1 and Kim as its ARG2.
- In addition, the poss_rel has the same LBL as _dog_n_rel. This indicates intersective modification.
The semantics we want for "Their dog" is:
```
[ pronoun_q_rel
  LBL: h6
  ARG0: x7
  RSTR: h8
  BODY: h9 ]
[ "_pronoun_n_rel"
  LBL: h10
  ARG0: x7
  CARG: "kim" ]
[ def_explicit_q_rel
  LBL: h11
  ARG0: x12
  RSTR: h13
  BODY: h14 ]
[ poss_rel
  LBL: h15
  ARG0: e16
  ARG1: x12
  ARG2: x7 ]
[ "_dog_n_rel"
  LBL: h15
  ARG0: x12 ]

h8 qeq h10, h13 qeq h15
```
This is the same as the above, except that the noun rel and the quantifier rel corresponding to the possessor have changed. Note that the index x7 (ARG0 of the pronoun relation) should bear the person/number information third non-sg.
What it takes to build these semantic representations depends on what exactly is going on in your language. There are at least the following parameters of variation:
1. Possessives can be modifiers or determiners.
2. Pronoun possessors can have special forms or be analytical sequences of personal pronoun + possessive marker.
3. Possessive constructions can be marked on the possessor, the possesee, or potentially both.
If possessives are modifiers, they introduce the poss_rel and their own noun and quantifier relations. If they are determiners, they also introduce the quantifier relation for the head noun.
If possessive pronouns have a special form, you'll need lexical entries that introduce three relations (poss_rel, _pronoun_n_rel, and pronoun_q_rel). If pronominal possessors involve the regular possessive marker, then there isn't any need for a special series of pronouns.
If the possessive construction is marked on the head noun, then the nouns which are so marked must select for the possessor (if it is required that they co-occur). This analysis thus predicts that languages that mark possession on the possessed noun should treat the possessive phrase as a determiner if it's required. Languages that mark possession on the possessed noun but allow the possessor to remain unexpressed can still be handled under the modifier analysis: the possessor phrase would require a possessive-marked noun as its MOD.
Here is a sample lexical type for a possessive linker that is a modifier (such as Mandarin de). Such a linker selects for the possessor NP through its SUBJ or COMPS (whichever is convenient for purposes of order) and the possesee through its MOD. NB: I haven't tested this yet, so it may take some debugging.
```
link := head.

possessive-marker-lex := intersective-mod-lex & 
                         norm-hook-lex-item & 
                         single-rel-lex-item &
                         no-hcons-lex-item &
			 intransitive-lex-item &
 [ SYNSEM [ LOCAL [ CAT [ HEAD link,
	                  VAL [ SUBJ < #subj &
                                       [ LOCAL.CAT [ HEAD noun,
                                                     VAL [ SPR < >,
                                                           COMPS < > ]]] >,
                                COMPS < >,
                                SPR < >,
                                SPEC < > ]]],
            LKEYS.KEYREL.PRED "poss_rel" ]].
```
Notes:
- A possessive marker that builds a DP would be [HEAD det] and would introduce a quantifier relation.
- A possessive pronoun (that is, a pronoun that is possessive without requiring a linker) would introduce three relations (and so should not inherit from single-rel-lex-item or no-hcons-lex-item and should specify its RELS and HCONS values).
- A possessive pronoun which acts as a determiner should introduce four relations.
Write up your analyses
- Indicate which "package" you choose to do for this lab.
- Indicate whether you decided to write any additional lexical rules.
- Describe the phenomena that you analyzed (e.g., how the case system works in your language) indicating which aspects of the phenomenon you decided to cover (e.g., your language might have a split-ergative case system, which you can avoid for now by carefully choosing your verb tenses...).
- Describe how you analyzed the phenomena in your grammar, with reference to the particular types and features you used.
The descriptions of phenomena and analyses be at least a page per phenomenon. If you feel that the analyses presented here don't sit well with your language, describe (as best you can) why not.
Submit via ESubmit
- Make sure your write-up is inside the matrix folder.
- Include copies of tsdb/skeletons and tsdb/home in your folder, but delete intermediate test suite instances (i.e., submit the baseline from Lab 4 and the final Lab 5 versions).
- Compress the folder, and upload it to ESubmit.
- Submit it by midnight Sunday night (preferably by Friday evening :).
Back to main course page

Linguistics 567: Grammar Engineering

Lab 5 Due 4/28

Navigation

Minimum requirements for this assignment

Pronouns, person and number distinctions

Case --- Inflection

Case --- case marking adpositions

Agreement

Lexical rules

Head-modifier rules

Adjectives

Numeral classifiers

Possessives

Write up your analyses

Submit via ESubmit