Linguistics 567: Grammar Engineering

Lab 3 Due 4/15

Minimum requirements for this assignment

This assignment covers material which varies greatly from language to language. While you are welcome to do more, this section lays out what is actually required for this lab. The instructions given below are then a superset of what any one student needs to do.

Everyone should:

(0.) Download an updated matrix.tdl, and use it to replace your existing matrix.tdl.
(1.) Map out the space to be covered and create at least one [incr tsdb()] testsuite skeleton.
(2.) Implement a number distinction for whichever class of nouns (minimum case is just pronouns) is appropriate, and a person distinction.
(3.) Test your grammar using [incr tsdb()].
(4.) Write up the phenomena you have analyzed.

Then choose one of the following "packages" (in many cases, you may find it easier to write more lexical rules rather than put in all the lexical entries you'll need if you don't):

If your language has both case and agreement: Case, one kind of agreement (e.g., subject-verb or determiner-noun), 2 lexical rules
If your language has agreement but no case: 2 kinds of agreement (e.g., subject-verb and determiner-noun), 2 lexical rules
If your language has case but not agreement: Case, 2 lexical rules for case, 1 lexical rule for something else (e.g., past-tense verbs).
If your language has neither case nor agreement, negotiate something else with me.

Instructions

Map out the space you intend to cover and create a [incr tsdb()] skeleton.

Choose one of the "packages" above that is appropriate for your language.
Develop a test suite (batch parsing file) which illustrates both the syntactic and the morphological ground you intend to cover.
- For example, if your language has case, your test suite should include grammatical sentences with NPs in the correct cases as well as ungrammatical sentences with NPs in incorrect cases.
- It should also include morphologically incorrect forms if there is anything at all tricky about the morphophonology (e.g., order of affixes, regularized forms for irregulars).
- It's up to you whether to include phenomena discussed in this lab that are not in your chosen "package". For example, if your language has determiner-noun agreement, but you are only implementing subject-verb agreement, it's up to you whether to include ungrammatical examples where the ungrammaticality is due to a determiner-noun mismatch.
- Likewise, can restrict yourself to the vocabulary that's already in your grammar (with potentially some additions if there is some aspect of the phenomenon you would like to illustrate). No need to find an exhaustive list of all the irregular forms in the language!
Run [incr tsdb()] by typing M-x itsdb in emacs.
Create a subdirectory tsdb inside your matrix directory.
Create two subdirectories inside tsdb called home (where the test runs will be stored) and skeletons where the templates for creating new test runs will be stored.
Copy the file Index.lisp to your skeletons directory.
In the [incr tsdb()] podium (window), under Options, set the values of Database Root to be the path to the home. (For now, you'll need to do this everytime to you start [incr tsdb()], but there should be a way to do it automatically. I'm looking into it...) (Don't set the Skeleton Root yet, or you'll get the "Myserious error".)
Use File > Import > Test items to import a test suite (in the same format as an LKB test file, with asterisks on the bad examples.) Call the database "import1".
Create a subdirectory for your new skeleton (call it "lab3") and copy the files item and relations from home/import1 to skeletons/lab3.
Look at the file Index.lisp and observe that it has an entry for the new skeleton you made, with appropriate information about the path and the contents. Whenever you add a skeleton, you'll need to edit this file to add an entry.
Now use Options > Skeleton Root to set it to the path ending in tsdb/skeletons.
Remove the diretory home/import1
Use Options > Update > Database List. Observe that import1 disappears.
Load your grammar in the LKB.
Use File > Create > Testsuite Lab 3 to create a testsuite. Edit the pathname that appears in the dialogue window so that it contains no spaces or semi colons. (Consider editing the file Version.lsp so that it won't propose a string with spaces or semicolons.
Use Process > All items to process the items in the test suite.
Play around a bit with the [incr tsdb()] menus to get a sense of how you can browse the results.

Pronouns, person and number distinctions

Because person and number information are also interpreted semantically, we want to record it regardless of whether it is syntactically relevant (i.e., whether it gets used for agreement).

Some of the instructions in this section are very specific (i.e., I'm giving you lots of answers) because I want you to have time to focus your efforts on other parts of the lab. Don't be surprised then, when all of the sudden things get less specific!

Add the following type definitions to esperanto.tdl:
```
png :+ [ PER person,
         NUM number ].

person := *top*.
first := person.
second := person.
third := person.

number := *top*.
sg := number.
non-sg := number. ; use this one if your language only has sg-pl
dual := non-sg.   ; at these two if your language has sg-du-pl
pl := non-sg.
```
(The type non-sg is there to facilitate a mapping between languages with sg-pl and languages with sg-du-pl systems in the MT exercise. In some languages with a du-pl distinction it might also be useful language internally. If you language makes more than a three way distinction (some do!) talk to me.)
(If your language does person and number agreement with an elsewhere case -- like English non-3sg -- you may want to define subtypes of png which groups the values of PER and NUM in interesting ways. If you want to know more about this, talk to me.)
If your language has gender/nounclass distinctions, you'll want to use this definition of png instead, along with appropriate definitions for subtypes of gender.
```
png :+ [ PER person,
         NUM number,
         GEND gender ].

gender := *top*.
...
```
Pronouns will be nouns that obligatorily undergo a covert-det rule (unless they do optionally take determiners, if so, let me know!). To facilitate this, we're going to define some relation types (rather than strings) to use with determiners. Begin by adding the following to esperanto.tdl.
```
quantifier_rel := predsort.
pronoun_q_rel := quantifier_rel.
reg_quant_rel := quantifier_rel.
```

Now define types for the quantifier relations you already had, working from the following (and noting the hierarchical relationships):

demonstrative_q_rel := reg_quant_rel.
non+demonstrative_q_rel := reg_quant_rel.
proximal+dem_q_rel := demonstrative_q_rel. ; close to speaker
distal+dem_q_rel := demonstrative_q_rel.   ; away from speaker
remote+dem_q_rel := distal+dem_q_rel.      ; away from speaker and hearer
hearer+dem_q_rel := distal+dem_q_rel.      ; near hearer
def_q_rel := non+demonstrative_q_rel.      ; definite
indef_q_rel := non+demonstrative_q_rel.    ; indefinite

Reload your grammar and observe the effect these changes had on the type hierarchy under quantifier_rel (use View > Type Hierarchy). We'll use the contrast between reg_quant_rel and pronoun_q_rel to keep the pronouns and the other nouns from using each others' covert-det-rules.
Update your lexical entries for determiners to use the types defined above rather than strings.
Last week, you defined one or more subtypes of basic-bare-np-phrase, with some value for the PRED of the quantifier relation that it introduced. You'll need to update those PRED values to the new type-based ones as well. In addition, you'll want to add a constraint to the head daughter's SPR value to make sure it's compatible with the quantifier relation you're introducing:
```
bare-np-phrase := basic-bare-np-phrase &
  [ HEAD-DTR.SYNSEM.LOCAL.CAT.VAL.SPR 
                < [ LOCAL.CONT.RELS < ! [ PRED #pred ] ! > ] >,
    C-CONT.RELS < ! [ PRED #pred & some_quant_rel ] ! > ].
```

Define another subtype of basic-bare-np-phrase which will work for pronouns:

pronoun-bare-np-phrase := basic-bare-np-phrase &
  [ HEAD-DTR.SYNSEM.LOCAL.CAT.VAL.SPR 
                < [ LOCAL.CONT.RELS < ! [ PRED #pred & pronoun_q_rel ] ! > ] >,
    C-CONT.RELS < ! [ PRED #pred ] ! > ].

Create an instance of pronoun-bare-np-phrase in rules.tdl.
Make subtypes of noun-lex for pronouns and common nouns:
```
pronoun-lex := noun-lex &
  [ SYNSEM [ LOCAL.CAT.VAL.SPR 
                < [ LOCAL.CONT.RELS < ! [PRED pronoun_q_rel] ! > ] >,
	     LKEYS.KEYREL.PRED 'pronoun_n_rel ] ].

common-noun-lex := noun-lex &
  [ SYNSEM.LOCAL [ CAT.VAL.SPR 
		      < [ LOCAL.CONT.RELS < ! [PRED reg_quant_rel] ! > ] >,
             CONT.HOOK.INDEX.PNG [ PER third ] ] ].
```
Note that pronoun-lex specifies a PRED value, so all pronouns will have the same one. The only difference will be in the person and number values. (Something will have to be said about demonstrative pronouns, probably something about what kind of quantifier relation they should appear with.) common-noun-lex is constrained to [PER third] since only pronouns have other PER values.
Create lexical entries for pronouns in lexicon.tdl, specifying PER, NUM and GEND values, as appropriate. Here's an example for English:
```
we := pronoun-lex &
        [ STEM < "we" >,
          SYNSEM.LOCAL.CONT.HOOK.INDEX.PNG [ PER first,
					     NUM non-sg ] ].
```
Update your lexical entries for common nouns to inherit from common-noun-lex and to specify number and gender information. If you're going to use a lexical rule for noun number, you might consider doing only a couple lexical entries now for testing purposes. If your language has a gender system, you might consider defining subtypes of common-noun-lex for each gender (which constrain the GEND value), and inheriting from those instead. (A similar thing could be done for number, but it's redundant if you're going to write a lexical rule.)
Test your grammar by checking whether pronouns and common nouns can appear with or without determiners, and make sure that the results are what you want!
See whether adding in the person and number information cut down on some of the overgeneration you experienced with Lab 2.

Case

Define a feature CASE appropriate for the type noun (if you think it might also be appropriate for other types, talk to me).
```
noun :+ [ CASE case ].
```
Define a type case and subtypes as appropriate (e.g., nom, acc, dat, ...). There is no need to use consistent type names across grammars for our purposes here, as the MT exercise won't involve case (that is, case doesn't appear in semantic representations).

Add case information to the ARG-ST or valence features of your types for transitive and intransitive verbs, e.g.:

trans-verb-lex := basic-verb-lex & transitive-lex-item &
  [ SYNSEM.LOCAL [ CAT [ HEAD verb,
                    	 VAL [ SPR < >,
                               SUBJ < #subj & synsem 
				      & [ LOCAL.CAT [ HEAD noun &
							   [ CASE nom ],
						      VAL.SPR <> ]] >,
                               COMPS < #comps 
				       & [ LOCAL.CAT [ HEAD noun &
							    [ CASE acc ],
						       VAL.SPR <> ]]>,
                               SPEC < > ]]],
    ARG-S < #subj, #comps > ].

(If some verbs in your language require quirky case marking, you might want to define a subtype of trans-verb-lex, say nom-acc-trans-verb-lex which encodes the regular pattern, and let most verbs inherit from it. The exceptional verbs would inherit directly from trans-verb-lex instead, and specify the cases they require in their lexical entries.)

Modify your lexical entries for nouns to reflect their case values. If you're going to write a lexical rule for case inflection on nouns, just do one or two now to test this part of your grammar. Here's an example from English:

we := pronoun-lex &
        [ STEM < "we" >,
          SYNSEM.LOCAL [ CAT.HEAD.CASE nom,
			 CONT.HOOK.INDEX.PNG [ PER first,
					       NUM non-sg ] ] ].

us := pronoun-lex &
        [ STEM < "us" >,
          SYNSEM.LOCAL [ CAT.HEAD.CASE acc,
			 CONT.HOOK.INDEX.PNG [ PER first,
					       NUM non-sg ] ] ].

Test your grammar: Do sentences with nouns in the right case parse? Do sentences with nouns in the wrong case parse?

Agreement

Determine which element is doing the agreeing (e.g., in subject-verb agreement, it's the verb; in determiner-noun agreement, its the determiner, arguably even if the noun itself doesn't overtly show the information being agreed upon).
Determine where in the feature structure for the agreeing element, the information it is agreeing with should be available (e.g., in subject-verb agreement, the information is available inside the verb's SUBJ value; in determiner-noun agreement, the information is available inside the determiner's SPEC feature; in adjective-noun agreement, the information is available inside the adjective's MOD feature).

Constrain the information in both places. (e.g., if you're doing determiner-noun agreement for number and gender in a Romance language, make sure your noun lexical entries specify the relevant values for number and gender. Then constrain the SPEC value of the determiner entries.)

Example from French:

chat := common-noun-lex &
     	[ STEM < "chat" >,
	  SYNSEM [ LOCAL.CONT.HOOK.INDEX.PNG [ NUM sg,
					       GEND masc ],
		   LKEYS.KEYREL.PRED '_cat_n_rel ] ].

le := determiner-lex &
	[ STEM < "le" >,
	  SYNSEM [ LOCAL.CAT.VAL.SPEC < [ LOCAL.CONT.HOOK.INDEX.PNG 
                                            [ NUM sg,
					      GEND masc ] ] >,
                   LKEYS.KEYREL.PRED def_q_rel  ] ].

Consider writing lexical rules to generate the appropriate lexical entries (e.g., singular and plural nouns, 2-person-plural-feminine verbs, etc).
Test your grammar: do sentences with agreement parse and sentences without agreement fail to parse?

Lexical rules

Pick a supertype for your rule:
- Determine whether your lexical rule needs to change SYNSEM information, or just add to it. (Examples: If the input has a non-empty SPR list and the output has an empty SPR list, that's changing information. If the output has more relations than the input, that's changing information. If the input has no value specified for CASE and the output is [CASE nom], that's just adding information.)
- Determine whether your lexical rule creates fully inflected forms, or whether there's more inflection you'd like to stack on top of it.
- Rules creating fully inflected forms and only adding information to SYNSEM can inherit from infl-ltow-rule.
- Rules creating not-yet fully inflected forms and only adding information should inherit from infl-add-only-no-ccont-ltol-rule.
- If your rule needs to change the SYNSEM value, determine which part of SYNSEM is changing (e.g., VAL only, HEAD only, CAT only) and choose an appropriate type out of the types called infl-***-change-only-ltol-rule.
Define a rule type in esperanto.tdl which contains all of the information about your rule except the spelling changes. The value of DTR should be specific enough to constrain the rule to only applying to the right type of words. The value of SYNSEM should be at least as specific as the lexical entries you've been writing so far. Here's an example from English (where the value of SYNSEM ends up being very specific since all the information from the daughter is also in the mother):
```
3sg_verb-lex-rule := infl-ltow-rule &
  [ SYNSEM.LOCAL.CAT.VAL.SUBJ < [ LOCAL.CONT.HOOK.INDEX.PNG [ PER third,
							      NUM sg ]] >,
    DTR.SYNSEM.LOCAL.CAT.HEAD verb ].
```
Define whatever letter classes you need in irules.tdl. They should all be at the beginning of the file. Here's the one I need for the English example:
```
%(letter-set (!s abcedfghijklmnopqrtuvwxyz))
```
Define an instance of the rule type in irules.tdl. This instance should give the spelling change subrules on a line beginning with %prefix or %suffix. The rest of the line is a list of pairs in which the first member matches the input form and the second member describes the output form. * matches the empty string. ! signals a letter-set. More specific subrules to the right.
```
3sg_verb :=
%suffix (!s s) (!ss !ssses) (ss sses)
3sg_verb-lex-rule.
```
Define exceptional ('suppletive') forms in irregs.tab. The file should begin and end with a line consisting only of ". In between, there is one line per suppletive form, listing the suppletive form, the rule name, and the stem form the suppletive form goes with. Here's an example from English (don't think too hard about it -- English doesn't have very good examples of suppletive forms in the present tense, given the grammar I'm laying out here).
```
"
is 3SG_VERB be
"
```
(If you're going to use irregs.tab, you'll need to edit the file lkb/globals.lsp to comment out the definition of *lex-rule-suffix*. Restart the LKB once you've done this. Sorry.)
Update your lexical entries so that they give the stem instead of the inflected word (i.e., so that your lexical rule can do the work). Any such stem entries should also be marked [INFLECTED -]. Consider making [INFLECTED -] a constraint on the relevant lexical types, so you don't have to keep remembering to type it.
Test your grammar. Does the lexical rule apply to the words it should apply to? Does it apply to words it shouldn't apply to?

Head-modifier rules

If you're doing adjectives because your only agreement involves adjectives, you're going to need one or two head-modifier rules.

The Matrix distinguishes scopal from intersective modification. We're going to pretend that everything is intersective and just not worry about the scopal guys for now.

Create an instance of head-adj-int-phrase, an instance of adj-head-int-phrase, or both, depending on whether you need only prehead modifiers, only posthead modifiers, or both.
Try parsing a transitive sentence. Be surprised by the extra parse. Or, if you don't get an extra parse, try parsing some of your ungrammatical examples from earlier labs. Look at the enlarged trees (or try Parse > Compare) to see what's going on.
Constrain your existing subtypes of head (verb, noun, det) to be [MOD < >].
Try parsing the misbehaving sentence again.

Adjectives

Create a type adjective-lex which inherits from basic-adjective-lex. The following type works for English assuming that:
1. We're not worried about predicative adjectives or adjectives taking complements for now.
2. We have both orders (head-adj and adj-head), but adjectives are always prehead (hence the value of POSTHEAD).
3. We're only dealing with intersective adjectives (as stipulated).
```
adjective-lex := basic-adjective-lex &
  [ SYNSEM [ LOCAL [ CAT [ HEAD adj &
				[ MOD < [ LOCAL [ CAT.HEAD noun,
						  CONT.HOOK [ INDEX #ind,
							      LTOP #ltop ]]]>],
			   VAL [ SPR < >,
				 SUBJ < >,
				 COMPS < >,
				 SPEC < > ],
			   POSTHEAD - ],
		     CONT.HOOK.LTOP #ltop ],
	     LKEYS.KEYREL.ARG1 #ind ] ].
```
Create one or more adjective instances.

Test your grammar

Using the testsuite you developed under "Map out the space you intend to cover", test your grammar. Did you in fact cover everything you set out to?
Go back to the testsuites (test.items files) you made for labs 1 and 2. Does your grammar still have the right behavior over those examples? (This is called "regression testing" and it should be done on a regular basis.)
Create a master test file with all of the examples you've tested so far (i.e., the total of the test suites from all the labs) which you can keep adding to as you go.

Write up your analyses

Indicate which "package" you choose to do for this lab.
Indicate whether you decided to write any additional lexical rules.
Describe the phenomena that you analyzed (e.g., how the case system works in your language) indicating which aspects of the phenomenon you decided to cover (e.g., your language might have a split-ergative case system, which you can avoid for now by carefully choosing your verb tenses...).
Describe how you analyzed the phenomena in your grammar, with reference to the particular types and features you used.

Upload files to Dante

Submit via ESubmit

Make sure your write-up is inside the matrix folder.
Make sure your testsuite databases and skeletons are in the matrix folder.
Compress the folder, and upload it to ESubmit.
Submit it by midnight Sunday night (preferably by Friday evening :).

Back to main course page