Linguistics 567: Grammar Engineering

Lab 4 Due 4/29

Read all the way through the assignment once before starting it. Once again I'll be asking for write ups, and weighting them heavily in the assignment grade.

Part 0: Matrix patch/[incr tsdb()] tips

As discussed in class on Monday, there is a new version of matrix.tdl with some fixes to the lexical rule supertypes.

Move your existing matrix.tdl to a new file name (e.g., matrix.tdl-) for safekeeping.
Download the new matrix.tdl and save it in your grammar directory.
Reload the grammar, and attend to any errors the LKB reports.
Parse your testsuite to check for any changes in behavior. If you're using [incr tsdb()], this can be done as follows:
- Run [incr tsdb()] (M-x itsdb in emacs).
- Set the Database root and Skeleton root to the directories you are using.
- Create a new instance of your testsuite from the last lab. (File > Create > Testsuite Lab 3)
- Process the testsuite (Process > All items)
- Select the most recent earlier testsuite as your (g)old standard, by middle-clicking (mouse button 2) on it. You should now have one testsuite highlighted in blue and one in gold.
- Get an overall comparison between the two testsuites by doing Compare > Competence. The table that appears should let you see whether any sentences have gained or lost analyses.
- For a more detailed comparison, select Compare > Detail. This will show you any sentences that have gained or lost readings.
- You can look for sentences whose analyses have changed in other ways (e.g., different derivations) by setting the flags under Compare > Intersection.
- To look within one testsuite at a particular subset of sentences, use the flags under Options > TSQL Condition. For example, to find the sentences that are parsing but shouldn't be, turn `illformed' and `analyzed' both on. Then do, e.g., Browse > Results.
- Since the TSQL Condition will affect many [incr tsdb()] podium actions, be sure to set it back to `no condition' when you are done.

Part 1: Optional arguments

Background

The goal of this lab is to allow for unexpressed arguments. As many of you have noticed, there are plenty of languages that don't use pronouns as much as English does, but rather leave the NP out entirely if it was just going to be a pronoun. Generally, the meaning is about as recoverable from context as it is with pronouns (afterall, pronouns only give small clues to the referent in terms of person, number, and gender; among 3rd person referents, that usually leaves a lot of ambiguity). In some languages (e.g., Spanish), this kind of pronoun omission seems to be 'licensed' by the fact that the verbal inflections carry as much information as the pronouns would. In other languages (e.g., Japanese), the verbal inflections don't in fact carry person/number/gender information, but pronouns still aren't required.

Even in English (which likes pronouns so much that it has two expletive [meaningless] ones -- it and there) there are cases where arguments appear to be optional. Prime examples are verbs like eat and drink. The sentence I already ate means 'I already ate something', but the addressee is in no way expected to know what exactly was eaten. This is called indefinite null instantiation (see e.g., Johnson and Fillmore 2000) This contrasts with definite null instantiation (ibid), in which null arguments have definite reference, that is, the utterance is only felicitous if the addressee can determine the referent. English verbs which do this include tell as in I already told you. (Which is a cute example, because it's most likely to be used in a case where the addressee can't determine what exactly s/he was already told, but it's licensed because it means something like 'I already told you the answer to that question'.)

Our general strategy is going to be similar to the way we handled missing determiners. That is, we're going to write unary phrase structure rules in which the mother and single daughter have different valence requirements.

I believe that most languages should fall into one of the following patterns (restricting our attention to verbs and their arguments):

Any argument can be left out with the definite interpretation. Non-subject arguments of certain verbs can also receive the indefinite interpretation when they are missing.
Any subject can be left out with the definite interpretation. Non-subject arguments of certain verbs can also be omitted. Their interpretation (definite or indefinite) is lexically determined by the verb.
Subjects are required, but non-subject arguments of certain verbs can be left out, with definite or indefinite reference depending on the verb.
In languages with optional agreement marking on the verb, it may be the case that you see the `any old subject' or `any old direct object' being left out pattern only when agreement marking is present. Without agreement marking, you may only find lexically licensed definite/indefinite null instantiation.

I've written this assignment based on those four possibilities, and it should be straight-forward to the extent that I'm right :-). If your language instantiates a different pattern, talk to me.

Create instances of rules

The Matrix provides definitions of basic-head-opt-subj-phrase and basic-head-opt-comp-phrase which should be specific enough.
If your language allows subject pro-drop, create an instance of basic-head-opt-subj-phrase in rules.tdl.
Parse a sentence without a subject to see if it works.
If your language allows object pro-drop (in general, or only certain arguments of certain verbs), create an instance of basic-head-opt-comp-phrase.
Parse a sentence with a missing object to see if it works.
(At the moment, this might overgenerate, as it should allow any complement to go missing, and not all languages allow that. We'll fix it presently.)

Clean up your verb type hierarchy

We're going to be creating some more subtypes of verbs, so it's time to a little house cleaning, if you haven't already.

Define a type verb-lex which inherits from basic-verb-lex.
Make all of your other types which inherited from basic-verb-lex inherit from verb-lex instead.
Reload your grammar and parse a sentence to make sure things still work. Debug as necessary.
Now look at all of your subtypes of verb-lex and find what constraints they have in common.
Remove all of those repeated constraints from the subtypes and state them once on verb-lex instead.
Reload your grammar and parse a sentence to make sure things still work. Debug as necessary.
Look in lexicon.tdl at your verb lexical entries. If there are any constraints on those lexical entries other than values for STEM and LKEYS, move them up to the types instead. You may need to create some new subtypes in the process.
Reload your grammar and parse a sentence to make sure things still work. Debug as necessary.
Now run your test suite and check that your coverage hasn't changed. Debug as necessary.

Add ditransitive verbs

Add a new subtype of verb-lex which inherits from the type ditransitive-lex-item (defined in matrix.tdl). This new subtype should have two elements on its COMPS list, and three things on ARG-S.
Create at least one instance of the new subtype in lexicon.tdl. Places to look for ditransitive verbs include the translations of give, sell, and tell.
Some languages don't allow two NP complements, and so you might see PP complements for the first time. If this comes up, talk to me.
You might find that you need to add a new value of CASE and a corresponding lexical rule.
Add examples (positive and negative) to your main test suite to test ditransitive verbs. (If you're using [incr tsdb()], you might consider making a new skeleton that includes the previous testsuite plus the new items, while keeping your original skeleton around.)

Add verbal subtypes for argument optionality

For expository purposes, I'm assuming that you have subtypes of verb-lex called trans-verb-lex and ditrans-verb-lex. If you've called them something else, not to worry, just use your corresponding types whenever I mention these.

For languages without general pro-drop of objects

If you have an example of a transitive verb with an optional argument, create two subtypes of trans-lex.
- One (for verbs whose arguments are not optional), should say [OPT -] on the complement.
- The other can leave the value of OPT unspecified, but should possibly specify a value for DEF-OPT, + if the argument gets a definite interpretation when missing, - if the argument gets an indefinite interpretation, and underspecified if it could be either.
- (You might find that you need multiple subtypes here, if you have examples of different verbs with different behavior.)
Edit lexicon.tdl so that the lexical entries which used to inherit from trans-verb-lex now inherit from your new subtypes.
If you have an example of a ditransitive verb with an optional argument, make analogous subtypes of ditrans-verb-lex and change your lexicon accordingly. Note that when dealing with ditransitive verbs, you have to pay attention to both elements of the COMPS list.
Otherwise, constrain ditrans-verb-lex itself to ensure that both arguments are [OPT -].

For languages with general pro-drop of objects

If some, but not all, of your transitive verbs allow indefinite as well as definite interpretations of missing objects, (the translation of eat would be a likely suspect), create two subtypes of trans-verb-lex.
- One specifies [DEF-OPT +] on the complement.
- The other should leave DEF-OPT unspecified on the complement (thus allowing either interpretation).
Edit lexicon.tdl so that the lexical entries which used to inherit from trans-verb-lex now inherit from your new subtypes.
If you have an example of a ditransitive verb which allows indefinite null instantiation for one or more of its arguments, make analogous subtypes of ditrans-verb-lex and change your lexicon accordingly. Note that when dealing with ditransitive verbs, you have to pay attention to both elements of the COMPS list.
Otherwise, constrain ditrans-verb-lex itself to ensure that both arguments are [DEF-OPT +].

For languages where arugment optionality corresponds to the presence of optional verbal inflection

Modify your verb inflection lexical rules (if you have them) to change the OPT status of the relevant arguments.
Consider what the DEF-OPT value of those arguments should be, according to whether the verbal inflection is present or absent.
Make appropriate changes to DEF-OPT in the lexical entries and lexical rules.

Construct a test suite

Construct a test suite illustrating argument optionality in your language. Include positive examples (arguments being left out legitimately) and negative examples, if possible (arguments being left out where they shouldn't be). Be sure to include examples of both definite and indefinite null instantiation.
In addition to making a separate test suite for this assignment, add these sentences to your general test suite.
Use batch parse to test the syntactic coverage of your grammar over this test suite. Include the resulting test.out or a tsdb testsuite run (the latter is preferable) when you turn in your homework, and describe your coverage in your write up. (NB: I'll be allocating points to this description, whether or not you have perfect coverage, so do the write up!) If you're using [incr tsdb()] please also include the source file that you imported the testsuite items from, with glosses of the sentences (in commented out lines). I'm looking into how to include glosses in the [incr tsdb()] files directly...
Parse some representative sentences one at a time and check the semantics. Is the value of DEF on the relevant arguments correct? Describe the examples you tested and the results you found in your write-up (again, points allocated just for doing the write-up, whether or not your grammar has the right behavior yet).

Part 2: Modification

In this part, you will add basic functionality for intersective adjectives and adverbs.

Head-modifier rules

The Matrix distinguishes scopal from intersective modification. We're going to pretend that everything is intersective and just not worry about the scopal guys for now.

Create an instance of head-adj-int-phrase, an instance of adj-head-int-phrase, or both, depending on whether you need only prehead modifiers, only posthead modifiers, or both.
Try parsing a transitive sentence. Be surprised by the extra parse. Or, if you don't get an extra parse, try parsing some of your ungrammatical examples from earlier labs. Look at the enlarged trees (or try Parse > Compare) to see what's going on.
Constrain your existing subtypes of head (verb, noun, det) to be [MOD < >].
Try parsing the misbehaving sentence again.

Adjectives

Create a type adjective-lex which inherits from basic-adjective-lex. The following type works for English assuming that:
1. We're not worried about predicative adjectives or adjectives taking complements for now.
2. English has both pre-head and post-head modifiers, (head-adj and adj-head), but simple adjectives are (almost) always prehead (hence the value of POSTHEAD).
3. We're only dealing with intersective adjectives (as stipulated).
```
adjective-lex := basic-adjective-lex & intersective-mod-lex &
  [ SYNSEM [ LOCAL [ CAT [ HEAD.MOD < [ LOCAL.CAT [ HEAD noun,
                                                    VAL.SPR ne-list ]]>],
			   VAL [ SPR < >,
				 SUBJ < >,
				 COMPS < >,
				 SPEC < > ],
			   POSTHEAD - ]]].
```
If adjectives in your language agree with nouns in person, number, gender, and/or case, add appropriate constraints to the MOD value of the adjectives. Consider writing lexical rules to create the inflected forms, or at least types representing the different possibilities.
Create one or more adjective instances.
(Adjectives separated from the NPs they belong to will have to await further developments. If your language allows this possibility, please document it in your write ups and in your test suites.)

Adverbs

Create one or more types for adverbs. The following type definition inherits from appropriately-defined Matrix supertypes, and constrains the modified constituent to be verbal.

adverb-lex := basic-adverb-lex & intersective-mod-lex &
  [ SYNSEM [ LOCAL [ CAT [ HEAD.MOD < [ LOCAL.CAT.HEAD verb ]>],
			   VAL [ SPR < >,
				 SUBJ < >,
				 COMPS < >,
				 SPEC < > ]]]].

Try parsing a sentence with an adverb and then generating to see where else the adverb can show up. If you language allows multiple attachment sites for the adverb, admire the results. If it doesn't, or doesn't allow *that* many, constrain them further.
In order to constrain the possible attachment sites for adverbs, you may need to constrain the value of POSTHEAD, or the value of SPR inside MOD or the value of LIGHT inside MOD.

Test your analysis

Add positive and negative examples to your testsuite illustrating the possible attachment sites of adjectives and adverbs (including, if appropriate, the inability of adverbs to attach to nouns and adjectives to verbs). If your language has adj-noun agreement, include positive and negative examples for that as well.
Run your testsuite and determine whether your grammar has the right behavior. Debug as necessary.
In your write up, describe the testsuite you created, and the coverage you achieved. If you are not reaching full coverage, describe what you think might be wrong. (Once again, points will be awarded here for the write up, even if the grammar hasn't achieved perfect coverage.)

Write up

The preferred format for write ups is plain text files.

Describe the sentences that you added to your test suite for ditransitive verbs, and any other things you need to add (PPs, a new value for case, etc).
Describe the situation with respect to argument optionality in your language (as best you were able to determine it).
Describe the lexical types that you made to account for the argument optionality pattern you found and/or any changes to lexical entries.
Describe how you tested your argument optionality analysis, and any ways in which the grammar is not yet having the correct behavior. Speculate on what might need to be done to fix it.
Describe the modification facts (for adjectives and adverbs) that you found in your language.
Describe the constraints you needed to place on lexical types to get the right behavior.
Describe how you tested your modification analysis, and any ways in which the grammar is not yet having the correct behavior. Speculate on what might need to be done to fix it.

Submit via ESubmit

Be sure your matrix folder includes your write-up and test.out/tsdb files.
Consider removing the doc/ subdirectory in order to save space on E-Submit.
Compress the folder, and upload it to ESubmit.
Submit it by midnight Sunday night (preferably by Friday evening :-).

Back to main course page