Lab 6 (due 2/13 11:59 pm)

Overview

The goals for this lab are to reduce the number of strings in MT output (for nay given input), by working with the VPM (variable property mapping) and doing further morphological clean up and/or adding constraints to semantically empty auxiliaries, and to get one more phenomenon working (preferably adnominal possession, clausal complements, or wh questions if you don't already have these working). We will primarily be working by editing tdl this week, but might consult the customization system for inspiration on how to implement things. You will also use [incr tsdb()] to document the final state of your grammar for the week over the testsuites.

For tdl editing, please practice incremental development: Test as frequently as you possibly can, both by compiling the grammar and by testing specific sentences.

This lab entails the following general steps, which are not (fully) ordered with respect to each other.

Refine your semi.vpm file to constrain the possible values of e.g. aspect in generated sentences.
Clean up morphology OR semantically empty words OR other sources of spurpious generator output.
Get one more major phenomenon working.
Continue testing MT to see the effects of those improvements.
Process your testsuite using [incr tsdb()], the LKB, and the grammar resulting from your updated choices file.
Examine the results of the second test run for coverage, accuracy and ambiguity, including as a diff to the final lab 5 test run.
Write it all up :)

Variable property mapping

You may have noticed that you get many variants on generation if you start with a form that is underspecified for e.g., aspect or evidentiality. We can get a handle on this by using variable property mapping to supply default values in the unmarked case (either in monolingual generation or in the MT scenario). The basic strategy is to take any underspecified values in variable properties and translate them, via vpm, to something that conflicts with any more specific values your grammar can produce.

The file semi.vpm provides a mapping between grammar-external features of indices (referential indices and events) and their values, and grammar-internal ones. For background on VPM, see the DELPH-IN wiki. As soon as you start using a VPM file, then only variable properties (features on indices) that are handled in the file are actually preserved.

You should already have a semi.vpm file provided by the customization system. Open it up and see which variable properties are there, and then look in your grammar to see what is missing. In general, we'd expect to see all of the features of the types event and ref-ind represented in a mature semi.vpm file.
Your script file tells the lkb to load the semi.vpm file with the following line:
```
(mt:read-vpm (lkb-pathname (parent-directory) "semi.vpm") :semi)
```
The semi.vpm files currently generated by the customization system are not fully ready for MT. In particular, the feature paths should include PNG. and E. only on the internal (left-hand) side. Thus you'll want to edit blocks like this:
```
E.TENSE : E.TENSE
  present <> present
  past <> past
  future <> future
  * <> *
```
To look like this:
```
E.TENSE : TENSE
  present <> present
  past <> past
  future <> future
  * <> *
```
If your grammar uses a PERNUM feature, you'll need to map separate PER and NUM features from the external (right-hand side) of the VPM to a single PRENUM feature on the internal (left-hand side). See the example under "Properties: An Example" on the DELPH-IN wiki page.
For both separate PER and NUM and joined PERNUM, the spelling of the values on the right hand side needs to match the right hand side of the semi.vpm file in the eng grammar. (For example, first, not 1st) etc.
For both separate PER and NUM and joined PERNUM, be sure the righthand side does not include PNG.
If you have any other features you have added on indices, you will need to provide VPM entries for them as well. (If you added them through the customization system, they may be there already.)
If your language has aspect marked in some sentences but other forms that are just underspecified for aspect, you'll want to have the default aspect be "no-aspect". Define this as a subtype of aspect in your grammar, but don't have anything other than the semi.vpm mention it otherwise. In semi.vpm, replace * <> * under aspect with the following:
- In my_language.tdl:
```
	  no-aspect := aspect.
	
```
- In semi.vpm:
```
	  * >> no-aspect
	  no-aspect << [e]
		       
```
If all forms in your language are marked for aspect, you may still want some default value in cases where the input MRS has no aspect, so that you don't get all the forms. If you wanted perfective as your default aspect, replace * <> * with:
```
perfective << [e]
```

A similar approach can be used for mood and other generation ambiguity relating to variable properties.

Test your semi.vpm file by doing self-translation (i.e. translating from your language to your language). You should see fewer strings coming out.

Morphology or semantically empty word clean up

Here you'll be doing tdl editing to add constraints to morphology that is underconstrained (so showing up all over the place...) If the problem is with semantically empty words, we'll be adding constraints to the words or possibly refining trigger rules. Note that with semantically empty words, the problem might be that they are appearing where they shouldn't be or that they aren't appearing at all (and causing generation to fail).

Please post to Canvas Tuesday with:

A description of the kinds of extra realizations you're seeing
A description of what the actual purpose of the forms (affixes, words) appears to be
Anything you already have planned in terms of clean-up.

I'll reply with suggestions about how to proceed.

For the write-up for this section, I'd like to see both a description of what changes you made and some quantitative observations about how the number of generator outputs is affected.

Additional phenomena

Here, we'd like to get wh questions OR adnominal possession OR clausal complements working, i.e. the equivalents SOME of these sentences from the eng.txt:

The dog s car sleeps
My dogs sleep
I think that you know that dogs chase cars
I ask whether you know that dogs chase cars
Who sleeps
What do the dogs chase
What do you think the dogs chase
Who asked what the dogs chase
I asked what the dogs chased

Please post to Canvas Tuesday with:

An indication of which of these phenomena you are working on
A description of the phenomenon in the language
A description of what is already working (if anything) in your grammar wrt the phenomenon.

Depending on which phenomenon you are working on, we might be primarily doing tdl fixes or I might suggest that you get an initial anlaysis from the customization system and merg it with your current grammar.

If the above sentences are already all working, document that. You can optionally post to Canvas with something you'd like to work on instead. OPTIONALLY.

I will advise on Canvas how to proceed in each case!

Run the testsuite

Following the same procedure as usual, do a test runs over your testsuite.

Collect the following information to provide in your write up:

How many items parsed?
What is the average number of parses per parsed item?
How many parses did the most ambiguous item receive?
What NEW sources of ambiguity can you identify? (I.e. either new in the grammar or new in the sense that you didn't write about it last week.)

Write up

Your write up should be a plain text file (not .doc, .rtf or .pdf) which includes the following:

A description of the chnages you made to your semi.vpm file, why you made those specifc changes, and how that affected generator output.
A description of the phenomena you improved in the grammar (including both the morphology etc clean up and adnominal possession, wh questions or clausal complements), including:
- Prose description of the phenomenon
- Prose description of your analysis
- The specific changes you made to the tdl (paste in the actual tdl)
- Specific IGT I can use to test the analysis / investigate if something isn't working and you need help.
A description of the performance of your final grammar for this week on the test suite, as compared to your starting grammar (see details above).

Submit your assignment

Be sure your write up and the text-file version of your test suite are included in your grammar directory.
Likewise, make sure that tsdb/home includes two profiles:
1. Final testsuite with initial grammar for the week
2. Final testsuite with final grammar for the week

Create a tarball:

      tar czf iso-lab6.tgz --exclude-vcs iso-lab6

Upload the tarball to Canvas.

Back to course page

Last modified: