Lab 7 (Due 5/17 11:00pm)

Preliminaries

These instructions might get edited a bit over the next couple of days. I'll try to flag changes.

As usual, check the write up instructions first. Especially in the test corpus section, but also in general, it will be helpful to keep notes along the way as you are doing grammar development.

Requirements for this assignment

0. Make sure you have a baseline test suite corresponding to your lab 6 grammar.
1. Add "can"
2. Check that negation is working, and fix it if necessary.
3. Make sure your MRSs for I can eat glass. It doesn't hurt me. are right.
4. Find a simple sentence from your test corpus and try to get your grammar to be able to parse it.
5. Make sure your grammar can still generate, and debug as necessary.
6. Test your grammar using [incr tsdb()]. [incr tsdb()] should be part of your test-development cycle. In addition, you'll need to run a final test suite instance for this lab to submit along with your basline.
7. Write up the phenomena you have analyzed.

Run a baseline test suite

Before making any changes to your grammar for this lab, run a baseline test suite instance. If you decide to add items to your test suite for the material covered here, consider doing so before modifying your grammar so that your baseline can include those examples. (Alternatively, if you add examples in the course of working on your grammar and want to make the snapshot later, you can do so using the grammar you turned in for Lab 6.)

Background

The goal of this lab is to be able to parse the two sentences I can eat glass. It doesn't hurt me., assign them appropriate semantics, and generate back. You have already done some of the work: from previous labs, your grammar should already handle pronouns, case (if applicable), and transitive verbs. Negation may already be working from the customization system and/or previous work you've done. You may need to add some vocabulary and possibly some verb forms. In addition, depending on how the sentences translate in your language, you might need to consider a new valence pattern for verbs and a new type of nouns (e.g., mass nouns).

Semantic representations

Your semantic representations for the two sentences should look approximately like this, modulo the relations showing up in a different order, the variables (e's, x's, and h's) showing up with different numbers, the SEMSORT information showing up in different places. Also, if your language tends to use prodrop rather than overt pronouns, you might end up without any representation of the pronouns in these sentences. Finally, if you need a complex predicate in place of, say, "hurt", then you'll also have some differences. If you're unsure if your reprsentations are correct, please post them to GoPost.

(NB 2/17/12: These are somewhat out of date, because the LKB displays the GTOP not the LTOP for the whole MRS. If you look at the feature structure rather than the extracted MRS, you should see that the LTOP matches the LBL of the matrix verb's relation.)

I can eat glass.
```
<h1,e2:SEMSORT,
{h3:pronoun_n_rel(x4:SEMSORT:+:FIRST:SG),
h5:exist_q_rel(x4, h7, h6),
h1:_can_v_rel(e2:SEMSORT:TENSE:ASPECT:MOOD, h8),
h9:_eat_v_rel(e10:SEMSORT:TENSE:ASPECT:MOOD, x4, x11:SEMSORT:BOOL:THIRD:SG)
h12:_glass_n_rel(x11),
h13:exist_q_rel(x11,h15,h14)}
{h8 qeq h9,
h6 qeq h3,
h14 qeq h12} >
```
Things to note about this representation: _can_v_rel is a one-place relation (i.e., we're treating can as a raising verb), whose ARG1 is qeq (equal modulo quantifiers) to the handle of the _eat_v_rel as its argument. The _eat_v_rel is a two-place relation taking x4 (the index from the first-person pronoun) and x12 (the ARG0 of _glass_n_rel) as its arguments.
It doesn't hurt me.
```
<h1,e2:SEMSORT,
{h3:pronoun_n_rel(x4:SEMSORT:+:THIRD:SG),
h5:exist_q_rel(x4, h7, h6),
h1:neg_rel(u9:SEMSORT, h8),
h10:_hurt_v_rel(e2:SEMSORT:TENSE:ASPECT:MOOD, x4, x11:SEMSORT:+:FIRST:SG),
h12:pronoun_n_rel(x11),
h13:exist_q_rel(x11, h15, h14)},
{h6 qeq h3,
h8 qeq h10,
h14 qeq h12} >
```
Things to note about this representation: The neg_rel takes a handle as its argument, which is related through a qeq to the handle of the _hurt_v_rel. The handle of neg_rel is itself in turn the local top handle of the clause. These qeqs allow quantifiers to scope above or below neg_rel so that I can't eat some cheese can either mean 'There is some cheese that I can't eat', or 'I can't eat just some cheese (I end up eating more)'.

Modals

can as an auxiliary verb

Use this version if in your language the morpheme expressing the same notion as can is a separate word which takes a VP complement and a subject.

Define a new verb type which inherits from your verb-lex and trans-first-arg-raising-lex-item-1 (and take a look at the definition of this type in matrix.tdl so you know what they're doing). If you already have auxiliaries from the customization system, see if you have a type like this already. (Note that this is the type for semantically contentful/elementary predication contributing auxiliaries. If the rest of you auxiliaries are of the semantically empty type, you'll need to create a new type.)
- In addition to inheriting from these types, your new type should put appropriate constraints on the values of ARG-ST and the valence features.
- If your auxiliary can be the input to any of your lexical rules, make sure that it has the right supertypes (xxx-rule-dtr) to serve as the input to right rules.
- Make sure that it constrains the part of speech of each argument.
Define a lexical entry (with PRED value "_can_v_rel") which inherits from your new type.
Create the appropriate form of the verb meaning eat, if necessary. This can be done either directly as a lexical entry, or via a lexical rule.
If you needed an additional form of eat, ensure that only that form of eat can appear as the complement of can (and add whatever items you use to test this to your master testsuite), and that the new form of eat can or can't appear in matrix clauses (as appropriate).
- In English, this involves defining a feature FORM on verb (subtype of head), somewhat similar to CASE on noun. You may have a FORM feature and some appropriate types already from the customization system.
Parse your translation of I can eat glass, and examine the chart for extra edges. Are they legitimate, or spurious? If they're spurious, try to rule them out (and then rerun your master testsuite to see if they were, in fact, spurious :-).
Parse your translation of I can eat glass and see if you get the right semantics. Debug as necessary.

can as a bound morpheme

Use this version if the morpheme expressing the same meaning as can in your language attaches morphologically to the main verb of the sentence.

The first step is to decide which lexical rule type is appropriate. Look at the section of matrix.tdl titled "Lexical Rules" and see if any of the xxx-only-xxx-rule types are appropriate. If not, construct an appropriate one out of the next level of supertypes. Unless you have concomittant changes to the valence features (such as the CASE value required on one of the arguments), something like the following is probably appropriate:

   infl-add-only-lex-rule := add-only-rule &
                             infl-lex-rule.

(The type add-only-rule copies up everything from the daughter, but does not constrain the C-CONT.RELS and C-CONT.HCONS to be empty. infl-lex-rule means that this is a rule that adds an affix, rather than a zero.)

You'll also need to worry about morphotactics, primarily in chosing the DTR type so that the rule accepts the right inputs and in adding supertypes so it can serve as input to the right further position classes. As the position class for this rule is probably an optional one, simply copying up the INFLECTED value will likely work. Note that if the rule fits in to a position class you already defined, you can get most or all of these effects simply by having it inherit from the type corresponding to that position class.

Your subtype for this particular rule will now need to constrain all three features within its C-CONT: RELS, HCONS and HOOK. (The C-CONT.HOOK.XARG can be identified with the HOOK.XARG of the daughter.):

The lexical rule's C-CONT.RELS is a diff list containing a single relation of type arg1-ev-relation. The PRED value of that relation should be "_can_v_rel", the LBL should be identified with the C-CONT.HOOK.LTOP, the ARG0 with the C-CONT.HOOK.INDEX.
The lexical rule's C-CONT.HCONS is a diff list containing one qeq. The HARG of the qeq should be identified with the ARG1 of the arg1-ev-relation and its LARG with the daugther's LTOP.

Add an instance for your lexical rule to irules.tdl, with the appropriate spelling change information.

Test that your rule only applies to verbs (as appropriate), and if not, add constraints to ensure that it does.

Parse your translation of I can eat glass and see if you get the right semantics. Debug as necessary.

Negation

The negation library is more robust than in previous years, so we expect that in most cases the output is working or close to working.

Parse your translation of It doesn't hurt me and see if you get the right semantics.
If the sentence doesn't parse, post to GoPost with an explanation of how negation works in your language.
If the sentence parses, but the semantics is wrong, post to GoPost with the semantics you are getting and info on how negation works in your language.

One sentence from the test corpus

The goal of this section is to parse one more sentence from your test corpus than you are before starting this section. In most cases, that will mean parsing one sentence total. In your write up, you should document what you had to add to get the sentence working. Note that it is possible to get full credit here even if the sentence ultimately doesn't parse by documenting what you still have to get working.

This is a very open-ended part of the lab (even more so than usual), which means: A) you should get started early and post to GoPost so I can assist in developing analyses of whatever additional phenomena you run accross and B) you'll have to restrain yourselves; the goal isn't to parse the whole test corpus this week ;-).

Create a profile from your test corpus skeleton, and run a baseline.
Use Browse | Results to see if anything is parsing.
Look for some plausible candidate sentences. These should be relatively short and ideally have minimal additional grammatical phenomena beyond what we have already covered.
Examine the lexical items required for your target sentence(s). Add any that should belong to lexical types you have already created.
Try parsing the test corpus again (or just your target sentence from it).
If your target sentence parses, check the MRS to see if it is reasonable.
If your target sentence doesn't parse, check to see whether you still have lexical coverage errors. Fixing these may require adapting existing lexical rules, adding lexical rules, and/or adding lexical types. Post to GoPost for assistance.
If your target sentence doesn't parse but your grammar does find analyses for each lexical item, then examine the parse chart to identify the smallest expected constituent that the grammar is not finding, and debug from there. Do you have the phrase structure rule that should be creating the constituent? If so, try interactive unification to see why the expected daughters aren't forming a phrase with that rule. Do you need to add a phrase structure rule? Again, post to GoPost for assistance.
Iterate until either the sentence parses or you at least have a clear understanding of what you would need to add to get it parsing.
Run your full test suite after any changes you make to your grammar to make sure you aren't breaking previous coverage.

Write up your analyses

For each of the following phenomena, please include the following your write up:

A descriptive statement of the facts of your language.
Illustrative IGT examples from your testsuite. These should be examples that actually work in the current grammar, or would work if not for the particular problem you are talking about.
A statement of how you implemented the phenomenon (in terms of types you added/modified and particular tdl constraints).
If the analysis is not (fully) working, a description of the problems you are encountering.
A statement of whether or not you can generate from examples illustrating the phenomenon.

Phenomena:

"Can" (modals)
Negation
Whatever you fixed about your grammar as you worked on the test corpus sentence. (In this section, please include the test corpus example you are targeting and a narrative of what you worked on to try to get it parsing.)

In addition, your write up should include a statement of the current coverage of your grammar over your test suite (using numbers you can get from Analyze | Coverage and Analyze | Overgeneration in [incr tsdb()]) and a comparison between your baseline test suite run and your final one for this lab (see Compare | Competence).

Submit your assignment

If you're using svn for version control, run svn export to make a copy of your lab directory that does not include .svn files.
Remove extraneous [incr tsdb()] profiles from the copied directory. (I'd like the initial baseline and final result for both test suite and test corpus; Only keep intermediate versions that you specifically want to say something about.)
Create a tarball of your grammar, your tsdb directory including both initial and final profiles, and your write up. The best way to do this (so that it unpacks most easily when I download from CollectIt) is to cd into the directory containing your lab (e.g., cd lab7/) and do:
tar czf lab7.tgz *
(When I download your submission from CollectIt, it comes in a directory named with your UWNetID. The above method avoids extra directory structure inside that directory.)
Upload the tarball to CollectIt

Back to main course page

ebender at u dot washington dot edu

Last modified: 5/10/13