Lab 7 (due 2/21 11:59 pm)

Overview

The goals for this lab are to get non-verbal predicates working, to keep making sure that MT works (now with the non-verbal predicates, unless transfer rules are required), to refine the semantics of definiteness marking (if applicable), and to further refine the target translations (iso.txt) file, if possible/applicable.

For tdl editing, please practice incremental development: Test as frequently as you possibly can, both by compiling the grammar and by testing specific sentences.

In general, this lab involves the following steps:

Understand how non-verbal predicates work in your language.
Build out/refine your testsuite for non-verbal predicates.
Ensure that your iso.txt file has translations for lines 15-17. (If you can't find these, please contact me for alternate plans.)
Run your Lab 6 grammar over your expanded testsuite.
Extend your grammar (via tdl editing) to handle non-verbal predicates.
Test your analysis with your testsuite.
Test your analysis with the MT system (items 15-17 in eng.txt).
Add appropriate semantic constraints to any marking of (in)definiteness as applicable.
Run the testsuite with the final version of your grammar.
Write it all up.

Build out your testsuite for non-verbal predicates

Can NPs, adjectives, or adpositional phrases function as predicates in your language? If so, do they require a copula in some or all cases? Does the form of the copula vary? Examples from English include:

NP: The winner is a doctor.
Adj: This dog is small.
PP: The cat is in the park.

Note that in some languages, the copula is required only in non-present tense, or only with NP predicates, etc. In other languages, there is not really a class of adjectives distinct from stative verbs. Or there maybe two classes of adjective-like predicates, those that pattern with intransitive verbs (or at least appear without the copula) and those that require a copula or are otherwise differentiated. Likewise, some languages don't really have PPs in this use, but rather use a locative verb together with an NP. Or they have PPs, but combine them with a locative verb rather than a copula.

Look over what's in your testsuite for non-verbal predicates (from Lab 3). Do you need more examples? Are the examples sufficiently simple?

Non-verbal predicates

Background

The goal of this part of this lab is to extend the grammars to cover sentences where the main (semantic) predicate is not a verb, i.e., NP, PP, and AP predicates. In some languages (including English) such predicates require the "support" of a particular bleached verb (the copula, or perhaps a verb of location). In others, they can serve as predicates on their own. In still other languages, we find a mix: The copula (or other verb) is required for certain types of predicates or in certain tenses but not others. Or the copula (or other verb) is optional: possible but not required.

It's also possible that in some languages the copula is optional in matrix clauses but required in embedded clauses. I haven't found an example like this yet, but I'd be curious to know about it if you find one.

Note that in some languages, NPs inflected for locative case (or similar) function like locative PPs in other languages.

As you work on this, practice incremental development: You should be loading your grammar and checking that it compiles frequently. Similarly, as soon as you've put in enough to get one new sentences parsing, try testing that before going to the next sentence. Once the sentence parses, run your full testsuite before moving on. This practice will help you catch bugs early which makes them easier to find.

Adpositions

Some of your grammars have adpositions already, but few, if any, have semantically contentful adpositions. You'll need to define these for this lab. The matrix provides a type basic-int-mod-adposition-lex, which should have most of the information required. Define a subtype with appropriate constraints on the MOD and VAL values, and try it out to see what else you might need to add.

Copula (AP or PP or locative NP predicates)

We analyze copulas as semantically empty auxiliaries. You may already have a type very similar to this, perhaps from the adjectives library. The tdl for a copula should look something like this:

copula-verb-lex := verb-lex-supertype & trans-first-arg-raising-lex-item-2 &
  [ SYNSEM.LOCAL [ CAT.VAL [ SUBJ < #subj >,
                             COMPS < #comps >,
                             SPR < >,
                             SPEC < > ],
                   CONT.HOOK.XARG #xarg ],
    ARG-ST < #subj &
             [ LOCAL [ CONT.HOOK.INDEX #xarg,
                       CAT [ VAL [ SPR < >,
                                   COMPS < > ],
                             HEAD noun ] ] ],
             #comps &
             [ LOCAL.CAT [ VAL [ COMPS < > ],
                           HEAD +jp ] ] > ].

You may also need to create verb-lex-supertype which inherits from some of the types that your verb-lex type does, but not all of them. In particular, you want to get the types that give it access to whatever verbal morphology is relevant, as well as constraining it to be [HEAD verb].

The constraint [HEAD +jp] on the complement specifies that the complement should be (headed by) an adjective or an adposition. Depending on where copulas are required in your language, you might want to change this. If you need to give adjectives or adpositions non-empty SUBJ lists (e.g., because they can be stand-alone predicates in some cases; see below), then you'll also want to constrain the COMPS's SUBJ to be < [ ] > (aka cons) to make sure that the subject isn't realized twice.

Note that the copula verb uses the XARG to do the linking (the relevant constraint is declared on the supertype trans-first-arg-raising-lex-item in matrix.tdl). This means that the adjectives and adpositions will need to link their ARG1 to their XARG. This should already be the case, but you should double check.

Copula (non-locative NP predicates)

We will follow the ERG in positing a different copula for use with NP predicates. This is because we don't want to give every noun a semantic argument position for a potential subject. The copula verb for NP predicates will instead introduce an elementary predication linking its subject and complement.

This means that in many languages, this copula might just be an ordinary transitive verb. It's not in English, because it also has auxiliary properties. If the NP-predicate-supporting-copula in your language differs in its behavior from (other) transitive verbs, post to Canvas.

The PRED value for this verb should be "_be_v_id_rel".

Locative NPs

For languages that express meanings like in the park with locative NPs (i.e. no adposition), we will write a non-branching phrase structure rule that builds a PP out of locative case NP. You'll also need a lexical rule creating the right form of the NP and constraining it to be [CASE loc] (or whatever you called your locative case). This lexical rule should fit into the same position class as your other case lexical rules.

Here is a sample PP over NP rule, from the Marathi grammar from 2014. This rule uses C-CONT to introduce the locative relation.

Note that this rule builds PPs that can either be the complement of a copula or function as modifiers of verbal projections. Locative NPs as stand-alone predicates would need a non-empty SUBJ value, with an NP on it, whose INDEX is identified with #xarg and whose CASE value is constrained as appopriate. Similarly, if your locative NPs can't be adverbial modifiers, then the mother of this rule should have an empty MOD list.

locative-pp-phrase := unary-phrase &
[ SYNSEM [ NON-LOCAL #nl,
           LOCAL.CAT [ HEAD adp & [ MOD < [ LOCAL intersective-mod &
                                                    [ CAT.HEAD verb,
                                                      CONT.HOOK.INDEX #xarg ] ] > ],
		          VAL [ COMPS < >,
			        SUBJ < >, 
			        SPR < > ]]],
    C-CONT [ HOOK [ LTOP #ltop,
		    INDEX #index,
		    XARG #xarg ],
	  RELS.LIST < arg12-ev-relation &
		   [ PRED "_loc_p_rel",
		     LBL #ltop,
		     ARG0 #index,
		     ARG1 #xarg,
		     ARG2 #dtr ] >,
	     HCONS.LIST < >  ],
	  ARGS < [ SYNSEM [ NON-LOCAL #nl,
		            LOCAL [ CAT [ HEAD noun & [CASE loc],
		                          VAL.SPR < > ],
			            CONT.HOOK [ INDEX #dtr ] ] ] ] > ].

Locative verbs

In some languages, PP predicates appear with a locative verb that is not quite semantically bleached, but means something like "be-located". In this case, it seems at least arguably incorrect to have the verb not introduce any predicate of its own. Instead, it will be an example of trans-first-arg-raising-lex-item-1:

locative-verb-lex := verb-lex & trans-first-arg-control-lex-item &
  [ SYNSEM.LOCAL [ CAT.VAL [ SUBJ < #subj >,
                             COMPS < #comps >,
                             SPR < >,
                             SPEC < > ],
                   CONT.HOOK.XARG #xarg ],
    ARG-ST < #subj &
             [ LOCAL [ CONT.HOOK.INDEX #xarg,
                       CAT [ VAL [ SPR < >,
                                   COMPS < > ],
                             HEAD noun ] ] ],
             #comps &
             [ LOCAL.CAT [ VAL [ COMPS < > ],
                           HEAD adp ] ] > ].

Note that there are many shared constraints between this and copula-verb-lex. If you have both, please make a supertype for the shared constraints.

The lexical entry for the locative verb can introduce "_be+located_v_rel" as its LKEYS.KEYREL.PRED.

If you have a locative verb that takes NP complements, then it is best analyzed as a simple transitive verb with the PRED value "_be+located_v_rel".

APs, PPs and locative NPs as stand-alone predicates

If your language allows APs and PPs as stand-alone predicates, the basic strategy is to modify the selecting contexts for sentences (initial symbol, clause embedding verbs) to generalize the requirements on HEAD. This needs to be done slightly differently depending on how tense/aspect are marked in these clauses.

For locative NPs as stand-alone predicates, modify the PP over NP rule introduced above to have a non-empty SUBJ list, as noted.

Note that some languages don't have adjectives at all, just a class of stative intransitive verbs. For present purposes, the definitive test is what happens when these elements modify nouns. If they appear to enter the same construction as relative clauses headed by transitive verbs (and non-stative intransitives), then they're just verbs. However, for the purposes of the MT exercise, it will be helpful to have their PRED values end in _a_rel, rather than _v_rel.

Non-empty SUBJ values

The first step is to get from the attributive entries for As or Ps (or both) to predicative uses. It may be possible to use one and the same lexical entry in both uses. To enable predicative uses, your As or Ps (or both) need to have non-empty SUBJ lists. The sole element of the SUBJ list should be an NP or PP as appropriate (with appropriate constraints on its CASE value), and share its INDEX with the XARG and ARG1 of the A/P. (This index sharing is the same as with the MOD value.)

Finally, if some but not all As or Ps can serve as predicates, you can handle this by declaring a new feature, PRD, on the type head. Make the attributive-only As/Ps [PRD -], and any predicative-only ones [PRD +]. Then edit the root condition to require [PRD +]. This can also be useful if you have different inflection for predicative v.\ attributive uses of adjectives.

head :+ [ PRD bool ].

Unrestricted tense/aspect

If an AP or PP stand-alone predicate has underspecified tense and aspect (i.e., can be used in any tense/aspect context) or if it actually takes tense/aspect markers directly, then you can allow for AP or PP predicates by redefining the selecting contexts. In particular:

Change the HEAD value on the root condition to allow adjectives and adpositions (+vj, +vp, or +vjp).
Change the HEAD value on the clausal complement position of clause-embedding verbs (as above).

Note that even if it is possible to use a copula for, e.g., past tense AP/PP predicate sentences, you might still have unrestricted tense/aspect on the copulaless counterparts of these sentences. The key question is whether the copulaless sentences are necessarily interpreted as having a particular tense/aspect value. If so, see the next section.

Restricted to (e.g.) present tense sentences

If APs or PPs without a copula are interpreted as having some specific tense/aspect value (e.g., present tense) then these sentences need to have their TENSE value constrained. I see several ways of doing this. Though none jumps out yet as ideal (especially at a cross-linguistic level), the third one is probably the best of the bunch. If you need one or more elaborated, please post to Canvas:

The selecting contexts are bifurcated allowing [HEAD verb] constituents (with any tense/aspect value) and [HEAD adp] or [HEAD adj] or [HEAD +jp] constituents with only a particular tense/aspect value. This would be reasonably easy for the root condition (you can have more than one, just define them in roots.tdl and then reference them in the definition of *start-symbol* in lkb/globals.lsp). It's a bit clunkier in the case of clause-embedding verbs, which would need two entries each.
There is a non-branching rule that turns a PP/AP headed constituent into something that looks like an S ([HEAD verb, SUBJ < >, COMPS < >]), and along the way fills in the tense information.
You write lexical rules to create predicative and attributive forms of As/Ps from uninflected base forms (even if there is no overt morphology involved). One rule gives [ PRD + ] forms which have the specific TENSE value required. The other makes [ PRD - ] forms. In this case, if the copula can combine with APs/PPs, it would actually take the [ PRD - ] ones, so it can fill in different tense information.

NPs as stand-alone predicates

Finally, we come to the case of (non-locative) NPs used as predicates without any supporting verb. As with NPs used as the complement of a copula, we need to do something to get an extra predication in. Here, I think the best solution is a non-branching non-headed phrase structure rule which takes an NP daughter and produces a VP mother. It should introduce the "_be_v_id_rel" relation through the C-CONT.RELS, linking the C-CONT.INDEX to the ARG0 of this relation. If NPs as stand-alone predicates necessarily get present tense interpretation, this rule can also fill in that information.

Here is a version of the rule we worked out in class for Halkomelem in 2013. Note that in Halkomelem (hur), the nouny predicates are actually N-bars. This means the rule has to fill in the quantifier rel as well as the "_be_v_id_rel".

n-bar-predicate-rule := unary-phrase & nocoord &
  [ SYNSEM [ LOCAL.CAT [ HEAD verb,
	  	         VAL [ COMPS < >,
			       SUBJ < [ LOCAL [ CONT.HOOK.INDEX #arg1,
					         CAT [ HEAD noun,
				      VAL.SPR < > ] ] ] > ] ],
             NON-LOCAL #nl ],
    C-CONT [ HOOK [ LTOP #ltop,
		    INDEX #index,
		    XARG #arg1 ],
	     RELS.LIST < arg12-ev-relation &
		   [ PRED "_be_v_id_rel",
		     LBL #ltop,
		     ARG0 #index,
		     ARG1 #arg1,
		     ARG2 #arg2 ],
		   quant-relation &
		   [ PRED "exist_q_rel",
		     ARG0 #arg2,
		     RSTR #harg ] >,
	     HCONS.LIST < qeq & [ HARG #harg, LARG #larg ] > ],
    ARGS < [ SYNSEM [ LOCAL [ CAT [ HEAD noun,
		 		    VAL.SPR cons ],
			      CONT.HOOK [ INDEX #arg2,
	                                  LTOP #larg ]],
	              NON-LOCAL #nl ]] > ].

If you also need a non-branching rule for tense-restricted PP or AP predicates, you might consider doing those the same way (VP over PP/AP), and sharing many constraints between the two rules. Note, however, that the PP/AP rule would have an empty C-CONT.RELS list.

Check your MRSs

Here are some sample MRSs to give you a sense of what we're looking for. Note that yours might differ in detail, because of e.g., different tense values or the use of a locative verb.

The cat is hungry.

The cat is in the park.

The cat is the dog.

Test items 15-17 in the MMT system and observe whether they work, and if not, how the MRSes differ. Is this a difference you can fix by editing your own grammar, or does it require transfer rules?

Demonstratives and definiteness

NOTE: I am providing info on demonstratives because they help provide context for the COG-ST feature and just in case there's nothing to be done with definiteness otherwise in a grammar. If you have plenty to do between non-verbal predicates and definiteness marking, demonstratives are strictly optional.

The basics

We are modeling the cognitive status attributed to discourse referents by particular referring expressions through a pair of features COG-ST and SPECI on ref-ind (the value of INDEX for nouns). Here is our first-pass guess at the cognitive status associated with various types of overt expressions (for dropped arguments, see below):

Marker COG-ST value SPECI value

Personal pronoun activ-or-more +

Demonstrative article/adjective activ+fam

Definite article/inflection uniq+fam+act

Indefinite article/inflection type-id

Marker	COG-ST value	SPECI value
Personal pronoun	activ-or-more	+
Demonstrative article/adjective	activ+fam
Definite article/inflection	uniq+fam+act
Indefinite article/inflection	type-id

If you have any overt personal pronouns, constrain their INDEX values to be [COG-ST activ-or-more, SPECI + ].

If you have any determiners which mark definiteness, have them constrain the COG-ST of their SPEC appropriately. For demonstrative determiners, see below.

If you have any nominal inflections associated with discourse status, implement lexical rules which add them and constrain the COG-ST value appropriately, or add COG-ST constraints to the lexical rules defined by the customization system.

Note that in some cases an unmarked form is underspecified, where in others it stands in contrast to a marked form. You should figure out which is the case for any unmarked forms in your language (e.g., bare NPs in a language with determiners, unmarked nouns in a language with definiteness markers), and constrain the unmarked forms appropriately. For bare NPs, the place to do this is the bare NP rule (note that you might have to create separate bare NP rules for pronouns v. common nouns in this case). For definiteness affixes, you'll want a constant-lex-rule that constrains COG-ST, and that is parallel to the inflecting-lex-rule that adds the affix for the overtly marked case.

Some languages have agreement for definiteness on adjectives. In this case, you'll want to add lexical rules for adjectives that constrain the COG-ST of the item on their MOD list.

Some languages have different case marking for definites vs. indefinites. For such situations, it's likely best to have the verb selecting for the specific case value also constrain the definiteness of the argument (though note that we should think through how this interacts with argument optionality).

Demonstratives

Note Working on demonstratives is OPTIONAL (see note above).

All demonstratives (determiners, adjectives and pronouns [NB: demonstrative pronouns are not on the todo list this year]) will share a set of relations which express the proximity to hearer and speaker. We will arrange these relations into a hierarchy so that languages with just a one- or two-way distinction can be more easily mapped to languages with a two- or three-way distinction. In order to do this, we're using types for these PRED values rather than strings. Note the absence of quotation marks. We will treat the demonstrative relations as adjectival relations, no matter how they are introduced (via pronouns, determiners, or quantifiers).

There are (at least) two different types of three-way distinctions. Here are two of them. Let me know if your language isn't modeled by either.

demonstrative_a_rel := predsort.
proximal+dem_a_rel := demonstrative_a_rel. ; close to speaker
distal+dem_a_rel := demonstrative_a_rel.   ; away from speaker
remote+dem_a_rel := distal+dem_a_rel.      ; away from speaker and hearer
hearer+dem_a_rel := distal+dem_a_rel.      ; near hearer

demonstrative_a_rel := predsort.
proximal+dem_a_rel := demonstrative_a_rel. ; close to speaker
distal+dem_a_rel := demonstrative_a_rel.   ; away from speaker
mid+dem_a_rel := distal+dem_a_rel.         ; away, but not very far away
far+dem_a_rel := distal+dem_a_rel.         ; very far away

Demonstrative adjectives

Demonstrative adjectives come out as the easy case in this system. They are just like regular adjectives, except that in addition to introducing a relation whose PRED value is one of the subtypes of demonstrative_a_rel defined above, they also constrain the INDEX.COG-ST of their MOD value to be activ+fam.

Demonstrative determiners

Demonstrative determiners introduce two relations. This time, they are introducing the quantifier relation (Let's say "exist_q_rel") and the demonstrative relation. This analysis entails changes to the Matrix core, as basic-determiner-lex assumes just one relation being contributed. Accordingly, we are going to by-pass the current version of basic-determiner-lex and define instead determiner-lex-supertype as follows:

determiner-lex-supertype := norm-hook-lex-item & basic-zero-arg &
  [ SYNSEM [ LOCAL [ CAT [ HEAD det,
			   VAL[ SPEC.FIRST.LOCAL.CONT.HOOK [ INDEX #ind,
				  			     LTOP #larg ],
                                SPR < >,
                                SUBJ < >,
                                COMPS < >]],
		     CONT.HCONS.LIST < qeq &
				 [ HARG #harg,
				   LARG #larg ] > ], 
	     LKEYS.KEYREL quant-relation &
		   [ ARG0 #ind,
		     RSTR #harg ] ] ].

This type should have two subtypes (assuming you have demonstrative determiners as well as others in your language --- otherwise, just incorporate the constraints for demonstrative determiners into the type above).

The subtype for ordinary (non-demonstrative) determiners should add the constraint that the RELS list has exactly one thing on it, by adding the supertype single-rel-lex-item.
The subtype for demonstrative determiners should specify a RELS list with two things on it: the first should have the "exist_q_rel" for its PRED value. (It's already constrained to be a quant-relation because the type norm-hook-lex-item inherited by determiner-lex-supertype identifies the first element of the RELS list with the LKEYS.KEYREL.) The second one should be identified with LKEYS.ALTKEYREL and should be an arg1-ev-relation (the type we use for the relations of intransitive adjectives). The HOOK.INDEX.COG-ST inside the SPEC value should be constrained to activ+fam. Finally, the LBL and ARG1 of the arg1-ev-relation should be identified with the SPEC..HOOK.LTOP and SPEC..HOOK.INDEX of the determiner, respectively. (This will result in the demonstrative adjective relation sharing its handle with the N' the determiner attaches to.)

Make sure your ordinary determiners in the lexicon inherit from the first subtype, and that your demonstrative determiners inherit from the second subtype. Demonstrative determiner lexical entries should constrain their LKEYS.ALTKEYREL.PRED to be an appropriate subtype of demonstrative_a_rel.

Check your MRSs

Here is a sample MRS showing what an NP with a demonstrative should look like. Note that whether the demonstrative_a_rel comes from a determiner or an adjective, it should end up looking the same. I have expanded the first instance of the variable x4 so that you can see the cog-st value.

That dog sleeps

Optional arguments

Again this section (on optional arguments) is optional --- only if you don't have enough to work on otherwise in the lab.

The customization system includes an argument optionality library which we believe to be fairly thorough, regarding the syntax of optional arguments. The goal of this part of this lab therefore is to (a) fix up anything that is not quite right in the syntax and (b) try to model the semantics, and in particular, the cognitive status associated with different kinds of dropped arguments. Regarding (a), if the analysis provided by the customization system isn't quite working, post to Canvas and we'll discuss how to fix it with tdl editing.

Regarding (b), you need to do the following:

Determine the cognitive status of the different types of dropped arguments in your language. For example, dropped subjects might always be the equivalent of unstressed pronouns, i.e., [COG-ST in-foc], while objects might be [COG-ST activ-or-more] (like the dropped argument of told in I already told you!) and others might be [COG-ST type-id] (like the dropped argument of eat in Did you already eat?). Languages with object markers might forgo the object markers in the case of [COG-ST type-id] arguments. In addition, the COG-ST of the dropped argument might depend on the verb.
Edit the lexical rules and lexical entries involved in licensing dropped arguments to provide the COG-ST value. Since the same argument might be overtly realized in most cases, rather than constraining the COG-ST directly, use the feature OPT-CS instead. This feature takes the same range of values as COG-ST, and the phrase structure rules that discharge the optional arguments check it for the value of put in COG-ST.
Test examples and examine the MRS to see if the expected COG-ST values are appearing.

Note that the Matrix currently assumings that dropped subjects are always [COG-ST in-foc]. This may not be true, especially in various impersonal constructions. If it's not true for your language, please let me know.

Continue/finish collecting target translations

Please continue working on finding appropriate translations for the sentences in eng.txt and recording them in your iso.txt.

Run the testsuite

Following the same procedure as usual, do a test runs over your testsuite.

Collect the following information to provide in your write up:

How many items parsed?
What is the average number of parses per parsed item?
How many parses did the most ambiguous item receive?
What NEW sources of ambiguity can you identify? (I.e. either new in the grammar or new in the sense that you didn't write about it last week.)

Write up

Your write up should be a plain text file (not .doc, .rtf or .pdf) which includes the following:

A description of how non-verbal predicates work in your language, including IGT.
A description of your implementation of these phenomena, including:
- A prose description of the analysis you implemented
- The specific tdl you added/changed (paste it into the file)
- IGT I can use to test the analysis
- Any questions you have/things you want me to look into (with appropriate IGT for me to test).
A description of how items 15-17 are currently working in the MT set up for you. Do they go through? How much ambiguity? If they don't go through, how do the MRSes differ?
A description of how definiteness is marked in your language (if at all).
A description of your implementation of definiteness, including:
- A prose description of the analysis you implemented
- The specific tdl you added/changed (paste it into the file)
- IGT I can use to test the analysis
- Any questions you have/things you want me to look into (with appropriate IGT for me to test).
A brief description of any changes you made to your iso.txt file.
A description of the performance of your final grammar for this week on the test suite, as compared to your starting grammar (see details above).

Submit your assignment

Be sure your write up and the text-file version of your test suite are included in your grammar directory.
Likewise, make sure that tsdb/home includes two profiles:
1. Final testsuite with initial grammar for the week
2. Final testsuite with final grammar for the week

Create a tarball:

      tar czf iso-lab7.tgz iso-lab7

Upload the tarball to Canvas.

Back to course page

Last modified: