Lab 6 (due 4/26 11:59 pm)

Overview

This week we have left the customization system behind and will begin tdl editing! The goals for this week are:

Extend your testsuite to cover examples of non-verbal predicates
Report initial testsuite & test corpus run
Implement NP, AP, and PP (or locative NP) predicates
Implement PP (or locative NP) modifiers
Test generation with the LKB
Get parsing and generation working with ace
Report final testsuite & test corpus run

There are several places in this lab where I ask you to contact me if your grammar requires information not in these instructions. Please read through this lab by class on Tuesday, preferably earlier, so we can start that conversation in a timely fashion.

Create a small testsuite for your additional phenomena

Add examples to your testsuite, according to the general instructions for testsuites and the formatting instructions, for non-verbal predicates and PPs or locative NPs acting as verbal modifiers.

Use make_item as before to create the item file from this extended testsuite and create a testsuite skeleton:

Make a subdirectory called lab6 inside tsdb/skeletons for your test suite.

Edit tsdb/skeletons/Index.lisp to include a line for this directory, e.g.:

(
((:path . "matrix") (:content . "matrix: A test suite created automatically from the test sentences given in the Grammar Matrix questionnaire."))
((:path . "corpus") (:content . "IGT provided by the linguist"))
((:path . "lab6") (:content . "Test suite collected for Lab 6."))
)

Initial testsuite run

Create and run initial testsuite instances for both the linguist-provided data and your small testsuite, using the initial grammar.
Note If your tsdb/ directory is inside a shared folder on VirtualBox, it will not work.
For each of these, explore the results, collect the following information to provide in your write up:
- How many items parsed?
- What is the average number of parses per parsed item?
- How many parses did the most ambiguous item receive?

Non-verbal predicates

Background

The goal of this part of this lab is to extend the grammars to cover sentences where the main (semantic) predicate is not a verb, i.e., NP, PP, and AP predicates. In some languages (including English) such predicates require the "support" of a particular bleached verb (the copula, or perhaps a verb of location). In others, they can serve as predicates on their own. In still other languages, we find a mix: The copula (or other verb) is required for certain types of predicates or in certain tenses but not others. Or the copula (or other verb) is optional: possible but not required.

It's also possible that in some languages the copula is optional in matrix clauses but required in embedded clauses. I haven't found an example like this yet, but I'd be curious to know about it if you find one.

Note that in some languages, NPs inflected for locative case (or similar) function like locative PPs in other languages.

As you work on this, practice incremental development: You should be loading your grammar and checking that it compiles frequently. Similarly, as soon as you've put in enough to get one new sentences parsing, try testing that before going to the next sentence. Once the sentence parses, run your full testsuite before moving on. This practice will help you catch bugs early which makes them easier to find.

Adpositions

Some of your grammars have adpositions already, but few, if any, have semantically contentful adpositions. You'll need to define these for this lab. The matrix provides a type basic-int-mod-adposition-lex, which should have most of the information required. Define a subtype with appropriate constraints on the MOD and VAL values, and try it out to see what else you might need to add.

Copula (AP or PP or locative NP predicates)

We analyze copulas as semantically empty auxiliaries. You may already have a type very similar to this, perhaps from the adjectives library. The tdl for a copula should look something like this:

copula-verb-lex := verb-lex-supertype & trans-first-arg-raising-lex-item-2 &
  [ SYNSEM.LOCAL [ CAT.VAL [ SUBJ < #subj >,
                             COMPS < #comps >,
                             SPR < >,
                             SPEC < > ],
                   CONT.HOOK.XARG #xarg ],
    ARG-ST < #subj &
             [ LOCAL [ CONT.HOOK.INDEX #xarg,
                       CAT [ VAL [ SPR < >,
                                   COMPS < > ],
                             HEAD noun ] ] ],
             #comps &
             [ LOCAL.CAT [ VAL [ COMPS < > ],
                           HEAD +jp ] ] > ].

You may also need to create verb-lex-supertype which inherits from some of the types that your verb-lex type does, but not all of them. In particular, you want to get the types that give it access to whatever verbal morphology is relevant, as well as constraining it to be [HEAD verb].

The constraint [HEAD +jp] on the complement specifies that the complement should be (headed by) an adjective or an adposition. Depending on where copulas are required in your language, you might want to change this. If you need to give adjectives or adpositions non-empty SUBJ lists (e.g., because they can be stand-alone predicates in some cases; see below), then you'll also want to constrain the COMPS's SUBJ to be < [ ] > (aka cons) to make sure that the subject isn't realized twice.

Note that the copula verb uses the XARG to do the linking (the relevant constraint is declared on the supertype trans-first-arg-raising-lex-item in matrix.tdl). This means that the adjectives and adpositions will need to link their ARG1 to their XARG. This should already be the case, but you should double check.

Copula (non-locative NP predicates)

We will follow the ERG in positing a different copula for use with NP predicates. This is because we don't want to give every noun a semantic argument position for a potential subject. The copula verb for NP predicates will instead introduce an elementary predication linking its subject and complement.

This means that in many languages, this copula might just be an ordinary transitive verb. It's not in English, because it also has auxiliary properties. If the NP-predicate-supporting-copula in your language differs in its behavior from (other) transitive verbs, post to Canvas.

The PRED value for this verb should be "_be_v_id_rel".

Locative NPs

For languages that express meanings like in the park with locative NPs (i.e. no adposition), we will write a non-branching phrase structure rule that builds a PP out of locative case NP. You'll also need a lexical rule creating the right form of the NP and constraining it to be [CASE loc] (or whatever you called your locative case). This lexical rule should fit into the same position class as your other case lexical rules.

Here is a sample PP over NP rule, from the Marathi grammar from 2014. This rule uses C-CONT to introduce the locative relation.

Note that this rule builds PPs that can either be the complement of a copula or function as modifiers of verbal projections. Locative NPs as stand-alone predicates would need a non-empty SUBJ value, with an NP on it, whose INDEX is identified with #xarg and whose CASE value is constrained as appopriate. Similarly, if your locative NPs can't be adverbial modifiers, then the mother of this rule should have an empty MOD list.

locative-pp-phrase := unary-phrase &
  [ SYNSEM [ LOCAL.CAT [ HEAD adp & [ MOD < [ LOCAL intersective-mod &
                                                    [ CAT.HEAD verb,
                                                      CONT.HOOK.INDEX #xarg ] ] > ],
		          VAL [ COMPS < >,
			        SUBJ < >, 
			        SPR < > ]]],
    C-CONT [ HOOK [ LTOP #ltop,
		    INDEX #index,
		    XARG #xarg ],
	     RELS <! arg12-ev-relation &
		   [ PRED "_loc_p_rel",
		     LBL #ltop,
		     ARG0 #index,
		     ARG1 #xarg,
		     ARG2 #dtr ] !>,
	     HCONS <! !>  ],
    ARGS < [ SYNSEM.LOCAL [ CAT [ HEAD noun & [CASE loc],
				  VAL.SPR < > ],
			    CONT.HOOK [ INDEX #dtr ]]] > ].

Locative verbs

In some languages, PP predicates appear with a locative verb that is not quite semantically bleached, but means something like "be-located". In this case, it seems at least arguably incorrect to have the verb not introduce any predicate of its own. Instead, it will be an example of trans-first-arg-raising-lex-item-1:

locative-verb-lex := verb-lex & trans-first-arg-control-lex-item &
  [ SYNSEM.LOCAL [ CAT.VAL [ SUBJ < #subj >,
                             COMPS < #comps >,
                             SPR < >,
                             SPEC < > ],
                   CONT.HOOK.XARG #xarg ],
    ARG-ST < #subj &
             [ LOCAL [ CONT.HOOK.INDEX #xarg,
                       CAT [ VAL [ SPR < >,
                                   COMPS < > ],
                             HEAD noun ] ] ],
             #comps &
             [ LOCAL.CAT [ VAL [ COMPS < > ],
                           HEAD adp ] ] > ].

Note that there are many share constraints between this and copula-verb-lex. If you have both, please make a supertype for the shared constraints.

The lexical entry for the locative verb can introduce "_be+located_v_rel" as its LKEYS.KEYREL.PRED.

If you have a locative verb that takes NP complements, then it is best analyzed as a simple transitive verb with the PRED value "_be+located_v_rel".

APs, PPs and locative NPs as stand-alone predicates

If your language allows APs and PPs as stand-alone predicates, the basic strategy is to modify the selecting contexts for sentences (initial symbol, clause embedding verbs) to generalize the requirements on HEAD. This needs to be done slightly differently depending on how tense/aspect are marked in these clauses.

For locative NPs as stand-alone predicates, modify the PP over NP rule introduced above to have a non-empty SUBJ list, as noted.

Note that some languages don't have adjectives at all, just a class of stative intransitive verbs. For present purposes, the definitive test is what happens when these elements modify nouns. If they appear to enter the same construction as relative clauses headed by transitive verbs (and non-stative intransitives), then they're just verbs. However, for the purposes of the MT exercise, it will be helpful to have their PRED values end in _a_rel, rather than _v_rel.

Non-empty SUBJ values

The first step is to get from the attributive entries for As or Ps (or both) to predicative uses. It may be possible to use one and the same lexical entry in both uses. To enable predicative uses, your As or Ps (or both) need to have non-empty SUBJ lists. The sole element of the SUBJ list should be an NP or PP as appropriate (with appropriate constraints on its CASE value), and share its INDEX with the XARG and ARG1 of the A/P. (This index sharing is the same as with the MOD value.)

Finally, if some but not all As or Ps can serve as predicates, you can handle this by declaring a new feature, PRD, on the type head. Make the attributive-only As/Ps [PRD -], and any predicative-only ones [PRD +]. Then edit the root condition to require [PRD +]. This can also be useful if you have different inflection for predicative v.\ attributive uses of adjectives.

head :+ [ PRD bool ].

Unrestricted tense/aspect

If an AP or PP stand-alone predicate has underspecified tense and aspect (i.e., can be used in any tense/aspect context) or if it actually takes tense/aspect markers directly, then you can allow for AP or PP predicates by redefining the selecting contexts. In particular:

Change the HEAD value on the root condition to allow adjectives and adpositions (+vj, +vp, or +vjp).
Change the HEAD value on the clausal complement position of clause-embedding verbs (as above).

Note that even if it is possible to use a copula for, e.g., past tense AP/PP predicate sentences, you might still have unrestricted tense/aspect on the copulaless counterparts of these sentences. The key question is whether the copulaless sentences are necessarily interpreted as having a particular tense/aspect value. If so, see the next section.

Restricted to (e.g.) present tense sentences

If APs or PPs without a copula are interpreted as having some specific tense/aspect value (e.g., present tense) then these sentences need to have their TENSE value constrained. I see several ways of doing this. Though none jumps out yet as ideal (especially at a cross-linguistic level), the third one is probably the best of the bunch. If you need one or more elaborated, please post to Canvas:

The selecting contexts are bifurcated allowing [HEAD verb] constituents (with any tense/aspect value) and [HEAD adp] or [HEAD adj] or [HEAD +jp] constituents with only a particular tense/aspect value. This would be reasonably easy for the root condition (you can have more than one, just define them in roots.tdl and then reference them in the definition of *start-symbol* in lkb/globals.lsp). It's a bit clunkier in the case of clause-embedding verbs, which would need two entries each.
There is a non-branching rule that turns a PP/AP headed constituent into something that looks like an S ([HEAD verb, SUBJ < >, COMPS < >]), and along the way fills in the tense information.
You write lexical rules to create predicative and attributive forms of As/Ps from uninflected base forms (even if there is no overt morphology involved). One rule gives [ PRD + ] forms which have the specific TENSE value required. The other makes [ PRD - ] forms. In this case, if the copula can combine with APs/PPs, it would actually take the [ PRD - ] ones, so it can fill in different tense information.

NPs as stand-alone predicates

Finally, we come to the case of (non-locative) NPs used as predicates without any supporting verb. As with NPs used as the complement of a copula, we need to do something to get an extra predication in. Here, I think the best solution is a non-branching non-headed phrase structure rule which takes an NP daughter and produces a VP mother. It should introduce the "_be_v_id_rel" relation through the C-CONT.RELS, linking the C-CONT.INDEX to the ARG0 of this relation. If NPs as stand-alone predicates necessarily get present tense interpretation, this rule can also fill in that information.

Here is a version of the rule we worked out in class for Halkomelem in 2013. Note that in Halkomelem (hur), the nouny predicates are actually N-bars. This means the rule has to fill in the quantifier rel as well as the "_be_v_id_rel".

n-bar-predicate-rule := unary-phrase & nocoord &
  [ SYNSEM.LOCAL.CAT [ HEAD verb,
		       VAL [ COMPS < >,
			     SUBJ < [ LOCAL [ CONT.HOOK.INDEX #arg1,
					      CAT [ HEAD noun,
						  VAL.SPR < > ] ] ] > ] ],
    C-CONT [ HOOK [ LTOP #ltop,
		    INDEX #index,
		    XARG #arg1 ],
	     RELS <! arg12-ev-relation &
		   [ PRED "_be_v_id_rel",
		     LBL #ltop,
		     ARG0 #index,
		     ARG1 #arg1,
		     ARG2 #arg2 ],
		   quant-relation &
		   [ PRED "exist_q_rel",
		     ARG0 #arg2,
		     RSTR #harg ] !>,
	     HCONS <! qeq & [ HARG #harg, LARG #larg ] !> ],
    ARGS < [ SYNSEM.LOCAL [ CAT [ HEAD noun,
				  VAL.SPR cons ],
			    CONT.HOOK [ INDEX #arg2,
					LTOP #larg ]]] > ].

If you also need a non-branching rule for tense-restricted PP or AP predicates, you might consider doing those the same way (VP over PP/AP), and sharing many constraints between the two rules. Note, however, that the PP/AP rule would have an empty C-CONT.RELS list.

Check your MRSs

Here are some sample MRSs to give you a sense of what we're looking for. Note that yours might differ in detail, because of e.g., different tense values or the use of a locative verb.

The cat is hungry.

The cat is in the park.

The cat is the dog.

Locative modifiers

The next step is to modify your grammar so that you can parse the equivalent of the two following sentences, i.e. sentences where PPs or locative NPs function as modifiers of verbal projections or as modifiers of nominal projections:

The cat sleeps in the park.
The cat in the park sleeps.

Create examples illustrating the phenomenon and add them to your testsuite.
Try parsing and see if they parse already.
If not, use interactive unification to find the point of failure (and post to Canvas with questions!).
Likely culprits:
- The adpositions don't inherit from intersective-mod-lex, and/or have a MOD value incompatible with the category you'd like them to modify.
- To get PPs modifying VPs in my English grammar, I had to remove SPR < > on main-verb-lex to. This didn't cause problems, because the head-spec and bare-np rules already don't expect verby head daughters.
Rerun your full testsuite to make sure you didn't add any additional unwarrented ambiguity.
- For my little English grammar, I had to constrain the PP to be [ POSTHEAD + ] so it couldn't appear as a pre-head modifier of VPs.
Once the examples are parsing, check your semantics. Note in particular the LBL and ARG1 values on _in_p_rel.

The cat sleeps in the park.

The cat in the park sleeps.

Test your grammar

Use your test suite to check the syntactic coverage of your grammar.
Examine the semantic representations you assign to each of the clause types.
Check for overgeneration (syntactic forms associated with one clause type showing up in other clause types, multiple parses for single sentences with spurious clause type assignments or lack of clausal semantics).

Test generation

To ensure that you don't send the LKB off on an impossible errand, first limit the size of its search space, by entering the following command at the lisp propmt:

(setf *maximum-number-of-edges* 400)

Find a short and simple sentence (just an intransitive verb and its sole argument) that parses. In the pop-up menu in the parse tree, select "generate". What happens?

If it runs out of edges before giving you any strings, try increasing that number to 1000. If that still doesn't do it, post to Canvas for help.

If you succeed with the short intransitive sentence, try a transtive, and then if that works, a clausal complement or clausal modifier sentence. Post to Canvas for help :)

Get set up with ace for parsing and generation

Try parsing with ace, according to the following instructions.

[Note: If you aren't using the VM, you may need to install ace. You can find instructions here.]

Compile your grammar. In the grammar directory in the terminal, type:
```
ace -G iso.dat -g ace/config.tdl
```
Where iso is actually the iso code for your language. This will produce a compiled grammar called iso.dat.
If you get any compilation errors, post to Canvas for help sorting them out. Be sure to include the text of the error.
To parse a sentence you can use this command:
```
ace -g iso.dat -l
```
This will invoke ace in a loop where it waits for an input sentence to parse and then puts the results (if any) into the LUI display. When you are done, Ctrl-C will break out of the loop.
Note: This may not be terribly convenient if you have lots of special characters. You can also pass inputs to ace by piping them in (STDIN). It's not clear to me how to get this to play nice with the LUI display, but you can get text output like this (assuming test is a file containing your test item):
```
cat test | ace -g iso.dat 
```
Further info on ace can be found here.
To test generation with ace, try:
```
ace -g iso.dat -Tf1 | ace -g iso.dat -e
```
This will take the MRS of the first parse (if any) of the input sentence and then try generating from it.

Run both the test corpus and the testsuite

Following the same procedure as usual, do test runs over both the testsuite and the test corpus.

Again, collect the following information to provide in your write up:

How many items parsed?
What is the average number of parses per parsed item?
How many parses did the most ambiguous item receive?
What sources of ambiguity can you identify?
For 4 newly parsing or otherwise fixed items, do any of the parses look reasonable in the semantics?

Write up

NB: While the test suite and grammar development is joint work, the write up should be done by one partner (the other will get a turn next week). The writing partner should have the non-writing partner review the write up and make suggestions.

Your write up should be a plain text file (not .doc, .rtf or .pdf) which includes the following:

Documentation of your analyses of non-verbal predicates and locative modifiers:
1. A descriptive statement of the facts of your language.
2. Illustrative IGT examples from your testsuite.
3. A statement of how you implemented the phenomenon (in terms of types you added/modified and particular tdl constraints). (Yes, I want to see actual tdl snippets.)
4. If the analysis is not (fully) working, a description of the problems you are encountering.
Documentation of what happened when you tried generating with the LKB. Did it work right away? If it didn't, but you were able to get it working, what did you have to do?
Documentation of what happened when you tried parsing and generating wiht ace. Did it work right away? If it didn't, but you were able to get it working, what did you have to do?

Submit your assignment

Be sure your write up and the text-file version of your test suite are included in your grammar directory.
Likewise, make sure to include your most current tsdb profile in the grammar directory (ideally inside tsdb/home/).
If you're using svn, export the grammar so I don't get all your .svn files:
```
svn export yourgrammar iso-lab6

For git, please do the equivalent.
```

Create a tarball:

      tar czf iso-lab6.tgz iso-lab6

Upload the tarball to Canvas under the name of the partner who did the write up.

Back to course page

Last modified: