Lab 8 (due 2/28 11:59 pm)

Overview

Refine/extend your list of MMT translations, as appropriate.
Choose one more phenomenon illustrated in those examples but not handled by your grammar to fix. Post to Canvas by midday Tuesday for pointers on how to fix it. (I'll likely direct you to lab instructions from previous years.)
Add a few further examples to your testsuite for that phenomenon (as appropriate).
Run initial testsuite & testcorpus profile.
Fix the one phenomenon you chose.
Make sure your grammar can generate. Edit as necessary until it can. Post many questions to Canvas!
Get the variable property mapping set up working.
Translate one sentence from English to your language with the MT system.
Run final testsuite & testcorpus profile.
Write it up!

Refine/extend MMT sentences

If you don't already have a complete set of MMT translations and you don't have additional phenomena to work on in the ones you do have, try to get more of the, translated. See Lab 7 for instructions.

Note that contrary to last week's instructions, the set up we're using this year expects these files to have one sentence pre line (no blank space in between).

Choose one phenomenon to develop

By midday Tuesday: Choose one phenomenon represented by the MMT sentences not already covered by your grammar to work on this week. Post to Canvas with the IGT for the relevant examples and the phenomena you intend to work on. I will reply with pointers to instructionsn for those phenomena/develop some if necessary.

Extend your testsuite for that phenomenon

Create additional positive/negative testsuite examples that illustrate your chosen phenomena and add them to the testsuite in the usual fashion. This testsuite should now include the MMT sentences plus ~2-6 more examples and should be called lab8.

Initial testsuite run

Create and run initial testsuite instances for both the linguist-provided data and your testsuite, using the initial grammar.
Note If your tsdb/ directory is inside a shared folder on VirtualBox, it will not work.
For each of these, explore the results, collect the following information to provide in your write up:
- How many items parsed?
- What is the average number of parses per parsed item?
- How many parses did the most ambiguous item receive?

Develop analyses and test your grammar

Based on the instructions I've pointed you to through Canvas + answers to the many questions I hope you will ask, develop analyses for your one additional phenomenon.

For the MMT sentences specifically, you can test your MRSs by looking at the output of this English grammar for the corresponding examples. We don't expect an exact match, but if things are different you should have a clear idea of why. And, of course, you are always welcome to post lots of questions!

Test generation

Test generation with both lkb & ace. Can you generate from short sentences? What about longer ones? To receive full credit on this lab, you need a grammar that can generate from simple transitive sentences and you need to have tested what happens with longer ones (e.g. sentences with clausal modifiers or clausal complements). See Lab 6 for detailed instructions on generation. And, of course, post lots of questions to Canvas.

Variable property mapping

You may have noticed that you get many variants on generation if you start with a form that is underspecified for e.g., aspect or evidentiality. We can get a handle on this by using variable property mapping to supply default values in the unmarked case (either in monolingual generation or in the MT scenario). The basic strategy is to take any underspecified values in variable properties and translate them, via vpm, to something that conflicts with any more specific values your grammar can produce.

The file semi.vpm provides a mapping between grammar-external features of indices (referential indices and events) and their values, and grammar-internal ones. For background on VPM, see the DELPH-IN wiki. As soon as you start using a VPM file, then only variable properties (features on indices) that are handled in the file are actually preserved.

You should already have a semi.vpm file provided by the customization system. Open it up and see which variable properties are there, and then look in your grammar to see what is missing. In general, we'd expect to see all of the features of the types event and ref-ind represented in a mature semi.vpm file.
Your script file tells the lkb to load the semi.vpm file with the following line:
```
(mt:read-vpm (lkb-pathname (parent-directory) "semi.vpm") :semi)
```
The semi.vpm files currently generated by the customization system are not fully ready for MT. In particular, the feature paths should include PNG. and E. only on the internal (left-hand) side. Thus you'll want to edit blocks like this:
```
E.TENSE : E.TENSE
  present <> present
  past <> past
  future <> future
  * <> *
```
To look like this:
```
E.TENSE : TENSE
  present <> present
  past <> past
  future <> future
  * <> *
```
If your grammar uses a PERNUM feature, you'll need to map separate PER and NUM features from the external (right-hand side) of the VPM to a single PRENUM feature on the internal (left-hand side). See the example under "Properties: An Example" on the DELPH-IN wiki page.
If you have any other features you have added on indices, you will need to provide VPM entries for them as well. (If you added them through the customization system, they may be there already.)
If your language has aspect marked in some sentences but other forms that are just underspecified for aspect, you'll want to have the default aspect be "no-aspect". Define this as a subtype of aspect in your grammar, but don't have anything other than the semi.vpm mention it otherwise. In semi.vpm, replace * <> * under aspect with the following:
```
* >> no-aspect
no-aspect << [e]
```
If all forms in your language are marked for aspect, you may still want some default value in cases where the input MRS has no aspect, so that you don't get all the forms. If you wanted perfective as your defaul aspect, replace * <> * with:
```
perfective << [e]
```

You can do a similar trick for other kinds of generation ambiguity relating to variable properties.

Test your semi.vpm file by parsing and then generating. You should see fewer strings coming out.

First MT

Preliminaries

The goal for the next lab will be to create an "accommodation" transfer grammar for your language by using it as the target language in two translation pairs, with English and Pite Saami as the inputs. For now, we'll be attempting to get just one sentence through. This will be one that doesn't actually require any transfer rules.

In all of the instructions below, replace "iso" with the ISO 693-3 code for your language.

Download and unpack mmt.tgz.

Test eng2sje and sje2eng translation:

 cd mmt/
 ./translate-line.sh eng sje 1

Note: If you aren't working on the VM, you'll need to fix the path to ace in the file translate-line.sh

Look inside translate-line.sh; try changing which line is not commented out and see what different behaviors you get.

Copy your grammar into mmt/grammars/iso and compile it afresh with ace:

  cd mmt/grammars/iso
  ace -G iso.dat -g ace/config.tdl

Move the generic transfer grammar mmt/tm/gen to mmt/tm/iso

  cd mmt/tm
  mv gen iso

Compile that generic transfer grammar:

  ace -G iso.dat -g ace/config.tdl

Copy your MMT entences to test_sentences/iso.txt. Note: last week I asked for a file with one blank line between each sentence. That's incorrect for this set up. Please just do one sentence per line, for 26 lines, in the same order as tne eng.txt file in test_sentences. Any item you are skipping can be replaced with SKIPPED.

Try translating the first sentence from eng and sje to your language:

 ./translate-line.sh eng iso 1

This one should not require any transfer rules. If it doesn't work, there are several possible causes:

A bug in your MT set up. If you are seeing errors that suggest this might be the problem, post to Canvas.
The MRSs don't match. Compare the eng (or sje) MRSs to yours. Can you spot the difference? If you find any, modify your grammar until the MRSs match. Post to Canvas for help. A subcase here is that the PREDs and their arguments match, but the variable properties don't. We might be debugging semi.vpm files. Either way, all changes should be in your grammar, and not eng or sje.
Your grammar isn't generating. Confirm that this is the problem by trying monolingual generation (or iso2iso translation). Post to Canvas for help debugging.
If you aren't working in the provided VM, your ace version may differ from that used to compile the eng and sje grammars (and also transfer grammars). I this case, recompiling all of those may help.

Run both the test corpus and the testsuite

Following the same procedure as usual, do test runs over both the testsuite and the test corpus.

Again, collect the following information to provide in your write up:

How many items parsed?
What is the average number of parses per parsed item?
How many parses did the most ambiguous item receive?
What sources of ambiguity can you identify?

Write up

NB: While the test suite and grammar development is joint work, the write up should be done by one partner (the other will get a turn next week). The writing partner should have the non-writing partner review the write up and make suggestions.

Your write up should be a plain text file (not .doc, .rtf or .pdf) which includes the following:

A statement of what you were able to find for translations of the MMT sentences, and why.
Documentation of your analyses of the additional phenomena:
1. A descriptive statement of the facts of your language.
2. Illustrative IGT examples from your testsuite.
3. A statement of how you implemented the phenomenon (in terms of types you added/modified and particular tdl constraints). (Yes, I want to see actual tdl snippets.)
4. If the analysis is not (fully) working, a description of the problems you are encountering.
Documentation of what happened when you tried generating with the LKB. Did it work right away? If it didn't, but you were able to get it working, what did you have to do?
Documentation of what happened when you tried the MT set up. Did it work right away? What did you change to get it working?
Documentation of your coverage over testsuite & test corpus for both the initial & final runs, including the answers to the questions given above.

Submit your assignment

Be sure your write up and the text-file version of your test suite are included in your grammar directory.
Likewise, make sure to include your most current tsdb profile in the grammar directory (ideally inside tsdb/home/).
If you're using svn, export the grammar so I don't get all your .svn files:
```
svn export yourgrammar iso-lab8

For git, please do the equivalent.
```

Create a tarball:

      tar czf iso-lab8.tgz iso-lab8

Upload the tarball to Canvas.

Back to course page

Last modified: