Lab 5 (due 2/8)

Preliminaries

These instructions might get edited a bit over the next couple of days. I'll try to flag changes.

As usual, check the write up instructions first.

Navigation

Requirements (what should I do?)
Phenomena
Write up instructions
Requirements for this assignment

Everyone should:
- (0.) Make sure you have a baseline test suite corresponding to your lab 4 grammar.
- (1.) Implement a number distinction for whichever class of nouns (minimum case is just pronouns) is appropriate, and a person distinction.
- (2.) Add adjectival and adverbial modifiers.
- (3.) Implement agreement all of the following kinds of agreement present in your language: verb-subject, verb-object, determiner-noun, adjective-noun. If you language doesn't have agreement, or has just one kind of agreement in that list, email me by Wednesday 2/6 to propose something else to add to your grammar.
- (4.) Test your grammar using [incr tsdb()]. [incr tsdb()] should be part of your test-development cycle. In addition, you'll need to run a final test suite instance for this lab to submit along with your basline.
- (5.) Make sure that you can generate as well as parse with your grammar.
- (6.) Write up the phenomena you have analyzed.
Pronouns, person and number distinctions

Because person and number information are also interpreted semantically, we want to record them regardless of whether they are syntactically relevant (i.e., whether they get used for agreement).
Some of the instructions in this section are very specific (i.e., I'm giving you lots of answers) because I want you to have time to focus your efforts on other parts of the lab. Don't be surprised then, when all of the sudden things get less specific!
- Add the following type definitions to klingon.tdl:
```
png :+ [ PER person,
         NUM number ].

person := *top*.
first := person.
second := person.
third := person.

number := *top*.
sg := number.
non-sg := number. ; use this one if your language only has sg-pl
dual := non-sg.   ; add these two if your language has sg-du-pl
pl := non-sg.
```
  (The type non-sg is there to facilitate a mapping between languages with sg-pl and languages with sg-du-pl systems in the MT exercise. In some languages with a du-pl distinction it might also be useful language internally. If you language makes more than a three way distinction (some do!) talk to me.)
  (If your language does person and number agreement with an elsewhere case -- like English non-3sg -- you may want to define subtypes of png which groups the values of PER and NUM in interesting ways. If you want to know more about this, talk to me.)
- If your language has gender/nounclass distinctions, you'll want to use this definition of png instead, along with appropriate definitions for subtypes of gender.
```
png :+ [ PER person,
         NUM number,
         GEND gender ].

gender := *top*.
...
```
- If your language has noun classes that aren't plausibly called gender, you migth use a different name for that feature. Numeral classifiers are arguably best handled with reference to an ontology (rather than unification), but we'll treat them like other nouns classes for our purposes.
- The following assumes that your language has at least some stand-alone pronouns. If not, you'll still want to do the proper noun related stuff. For uniformity, we treat pronouns as lexical NOMs (i.e., they have an unsaturated SPR requirement), and use the bare-np-phrase to fill in the quantifier. The customization script should have given you an instance of the bare-np-phrase, which contributes a quantifier with the specific PRED value "unspec_q_rel", which we'll be changing below.
  The current plan is to treat pronouns as quantified by an existential quantifer, and as lexically definite. (We'll return to discourse status in the next lab.) We further assume that bare noun phrases always get an existential quantifier (and that generic interpretations are derivative of this, say). So the first step is to edit the type bare-np-phrase in klingon.tdl to have it introduce "_exist_q_rel" rather than "unspec_q_rel".
  The next step is to create the lexical type for pronouns:
```
pronoun-lex := noun-lex &
  [ SYNSEM [ LOCAL.CAT.VAL.SPR 
                < [ OPT + ] >,
	     LKEYS.KEYREL.PRED "pronoun_n_rel" ] ].
```
  Note that pronoun-lex specifies a PRED value, so all pronouns will have the same one. The only difference will be in the person and number values. (Something will have to be said about demonstrative pronouns, but that's for a later lab.) In creating this type, you may need to move some constraints on noun-lex down to a subtype, say common-noun-lex. common-noun-lex should also be constrained to be [PER third] since only pronouns have other PER values.
- Create lexical entries for pronouns in lexicon.tdl, specifying PER, NUM and GEND values, as appropriate. Here's an example for English:
```
we := pronoun-lex &
        [ STEM < "we" >,
          SYNSEM.LOCAL.CONT.HOOK.INDEX.PNG [ PER first,
					     NUM non-sg ] ].
```
- Update your lexical entries for common nouns to inherit from common-noun-lex and to specify number and gender information. If you're going to use a lexical rule for noun number, you might consider doing only a couple lexical entries now for testing purposes. If your language has a gender system, you might consider defining subtypes of common-noun-lex for each gender (which constrain the GEND value), and inheriting from those instead. (A similar thing could be done for number, but it's redundant if you're going to write a lexical rule.)
- Test your grammar by checking whether pronouns and common nouns can appear with or without determiners, and make sure that the results are what you want! Also, examine the semantic representations to see that the correct person/number/gender information is showing up on each index.
Modification

Head-modifier rules

The Matrix distinguishes scopal from intersective modification. We're going to pretend that everything is intersective and just not worry about the scopal guys for now.
- Create an instance of head-adj-int-phrase, an instance of adj-head-int-phrase, or both, depending on whether you need only prehead modifiers, only posthead modifiers, or both. (You may already have some of these, depending on what you said about negation on the configuration page.)
- Try parsing a sentence without a modifier, and examine the parse chart. Did the head-adj phrase fire? If so, Constrain your existing subtypes of head (e.g., +nvd: verb, noun, det) to be [MOD < >].
- Try parsing the misbehaving sentence again.
Adjectives
- Create a type adjective-lex which inherits from basic-adjective-lex. The following type works for English assuming that:
  1. We're not worried about predicative adjectives or adjectives taking complements for now.
  2. English has both pre-head and post-head modifiers, (head-adj and adj-head), but simple adjectives are (almost) always prehead (hence the value of POSTHEAD).
  3. We're only dealing with intersective adjectives (as stipulated).
```
adjective-lex := basic-adjective-lex & intersective-mod-lex &
	      norm-ltop-lex-item &
  [ SYNSEM [ LOCAL [ CAT [ HEAD.MOD < [ LOCAL.CAT [ HEAD noun,
                                                    VAL.SPR cons ]]>,
			   VAL [ SPR < >,
				 SUBJ < >,
				 COMPS < >,
				 SPEC < > ],
			   POSTHEAD - ]]]].
```
- If adjectives display agreement in your language, you'll be adding that information to the MOD value in agreement below. For now, leave it underspecified (this will cause your grammar to overgenerate).
- Create one or more adjective instances.
- Parse sentences with your adjectives, and examing the MRSs. Are the adjective relations being predicated of the right indices?
Adverbs
- Create one or more types for adverbs. The following type definition inherits from appropriately-defined Matrix supertypes, and constrains the modified constituent to be verbal.
```
adverb-lex := basic-adverb-lex & intersective-mod-lex &
  [ SYNSEM [ LOCAL [ CAT [ HEAD.MOD < [ LOCAL.CAT.HEAD verb ]>,
			   VAL [ SPR < >,
				 SUBJ < >,
				 COMPS < >,
				 SPEC < > ]]]]].
```
- Try parsing a sentence with an adverb and then generating to see where else the adverb can show up. If you language allows multiple attachment sites for the adverb, admire the results. If it doesn't, or doesn't allow *that* many, constrain them further.
- In order to constrain the possible attachment sites for adverbs, you may need to constrain the value of POSTHEAD, or the value of SPR inside MOD or the value of LIGHT inside MOD.
- Parse sentences with your adverbs, and examing the MRSs. Are the adverb relations being predicated of the right indices?
Agreement
- Determine which element is doing the agreeing (e.g., in subject-verb agreement, it's the verb; in determiner-noun agreement, its the determiner, arguably even if the noun itself doesn't overtly show the information being agreed upon).
- Determine where in the feature structure for the agreeing element the information it is agreeing with should be available (e.g., in subject-verb agreement, the information is available inside the verb's SUBJ value; in determiner-noun agreement, the information is available inside the determiner's SPEC feature; in adjective-noun agreement, the information is available inside the adjective's MOD feature).
- Constrain the information in both places. (E.g., if you're doing determiner-noun agreement for number and gender in a Romance language, make sure your noun lexical entries specify the relevant values for number and gender. Then constrain the SPEC value of the determiner entries.)
  Example from French:
```
chat := common-noun-lex &
     	[ STEM < "chat" >,
	  SYNSEM [ LOCAL.CONT.HOOK.INDEX.PNG [ NUM sg,
					       GEND masc ],
		   LKEYS.KEYREL.PRED "_cat_n_rel" ] ].

le := determiner-lex &
	[ STEM < "le" >,
	  SYNSEM [ LOCAL.CAT.VAL.SPEC < [ LOCAL.CONT.HOOK.INDEX.PNG 
                                            [ NUM sg,
					      GEND masc ] ] >,
                   LKEYS.KEYREL.PRED "exist_q_rel"  ] ].
```
- After testing the basic functionality of agreement with a couple of hand-coded lexical entries, write lexical rules to generate appropriately constrained inflected forms off of stems given by lexical entries (e.g., singular and plural nouns, 2-person-plural-feminine verbs, etc).
- Test your grammar: do sentences with agreement parse and sentences without agreement fail to parse?
Lexical rules
- Pick a supertype for your rule:
  - Determine whether your lexical rule needs to change SYNSEM information, or just add to it. (Examples: If the input has a non-empty SPR list and the output has an empty SPR list, that's changing information. If the input has no value specified for CASE and the output is [CASE nom], that's just adding information.)
  - Determine whether your lexical rule creates fully inflected forms, or whether there's more inflection you'd like to stack on top of it.
  - Rules creating fully inflected forms and only adding information to SYNSEM can inherit from infl-ltow-rule.
  - Rules creating not-yet fully inflected forms and only adding information should inherit from infl-add-only-no-ccont-ltol-rule.
  - If your rule needs to change the SYNSEM value, determine which part of SYNSEM is changing (e.g., VAL only, HEAD only, CAT only) and choose an appropriate type out of the types called infl-***-change-only-ltol-rule. Unless you're adding any relations, your rule should also inherit from no-ccont-lex-rule. I expect most lexical rules created for this lab to be of the add-only variety, rather than changing information.
- Define a rule type in klingon.tdl which contains all of the information about your rule except the spelling changes. The value of DTR should be specific enough to constrain the rule to only applying to the right type of words. The value of SYNSEM should be at least as specific as the lexical entries you've been writing so far. Here's an example from English (where the value of SYNSEM ends up being very specific since all the information from the daughter is also in the mother):
```
3sg_verb-lex-rule := infl-ltow-rule &
  [ SYNSEM.LOCAL.CAT.VAL.SUBJ < [ LOCAL.CONT.HOOK.INDEX.PNG [ PER third,
							      NUM sg ]] >,
    DTR.SYNSEM.LOCAL.CAT.HEAD verb ].
```
- If you have multiple rules applying to the same form, constrain the innermost (rightmost prefix or leftmost suffix) to take lex-item as its DTR. The next one to apply to should take the first rule as its DTR, etc. If multiple rules can appear in one slot, define a supertype for them which can be the DTR of the next rule type out.
- Define an instance of the rule type in irules.tdl. This instance should give the spelling change subrules on a line beginning with %prefix or %suffix. Assuming you're working from regularized morphophonology, these should be simple concatenation, of the form (* pref) or (* suff).
- A slightly more complicated example from English (without regularized morphophonology) follows. After %suffix there is a list of pairs in which the first member matches the input form and the second member describes the output form. * matches the empty string. ! signals a letter-set. More specific subrules to the right.
```
3sg_verb :=
%suffix (!s !ss) (!ss !ssses) (ss sses)
3sg_verb-lex-rule.
```
  And here's the letter set that's used:
```
%(letter-set (!s abcedfghijklmnopqrtuvwxyz))
```
- Update your lexical entries so that they give the stem instead of the inflected word (i.e., so that your lexical rule can do the work). Any such stem entries should also be marked [INFLECTED -]. Consider making [INFLECTED -] a constraint on the relevant lexical types, so you don't have to remember to add it to every lexical entry.
- Test your grammar. Does the lexical rule apply to the words it should apply to? Does it apply to words it shouldn't apply to?
Write up your analyses
- Describe the phenomena that you analyzed:
  - What are the person/number/noun class distinctions, and how are they marked?
  - What kinds of agreement (if any) does your language display?
  - If you analyzed somethign other than agreement, what was it?
  - Include examples in IGT format from your test suite that I can use to test out your anaylses and/or help diagnose what is wrong.
- Describe how you analyzed the phenomena in your grammar, with reference to the particular types and features you used, as well as the semantic representations your grammar assigns.
- Document what happened when you tested generation, and what, if anything, you needed to do to fix it.
The descriptions of phenomena and analyses should be at least a page per phenomenon. If you feel that the analyses presented here don't sit well with your language, describe (as best you can) why not.
Submit your assignment
- Create a tarball of your grammar, your tsdb directory, and your write up. The best way to do this (so that it unpacks most easily when I download from CollectIt) is to cd into the directory containing your lab and do:
  tar czf lab3.tgz *
  (When I download your submission from CollectIt, it comes in a directory named with your UWNetID. The above method avoids extra directory structure inside that directory.)
- Upload the tarball to CollectIt
Back to main course page
ebender at u dot washington dot edu
Last modified: Sat Feb 2 2008

Lab 5 (due 2/8)

Preliminaries

Navigation

Requirements for this assignment

Pronouns, person and number distinctions

Modification

Head-modifier rules

Adjectives

Adverbs

Agreement

Lexical rules

Write up your analyses

Submit your assignment