Lab 8 (due 2/28 11:59 pm)

Overview

Back to top

Refine/extend MMT sentences

If you don't already have a complete set of MMT translations and you don't have additional phenomena to work on in the ones you do have, try to get more of the, translated. See Lab 7 for instructions.

Note that contrary to last week's instructions, the set up we're using this year expects these files to have one sentence pre line (no blank space in between).

Choose one phenomenon to develop

By midday Tuesday: Choose one phenomenon represented by the MMT sentences not already covered by your grammar to work on this week. Post to Canvas with the IGT for the relevant examples and the phenomena you intend to work on. I will reply with pointers to instructionsn for those phenomena/develop some if necessary.

Extend your testsuite for that phenomenon

Create additional positive/negative testsuite examples that illustrate your chosen phenomena and add them to the testsuite in the usual fashion. This testsuite should now include the MMT sentences plus ~2-6 more examples and should be called lab8.

Initial testsuite run

  1. Create and run initial testsuite instances for both the linguist-provided data and your testsuite, using the initial grammar.

    Note If your tsdb/ directory is inside a shared folder on VirtualBox, it will not work.

  2. For each of these, explore the results, collect the following information to provide in your write up:

Back to top

Develop analyses and test your grammar

Based on the instructions I've pointed you to through Canvas + answers to the many questions I hope you will ask, develop analyses for your one additional phenomenon.

For the MMT sentences specifically, you can test your MRSs by looking at the output of this English grammar for the corresponding examples. We don't expect an exact match, but if things are different you should have a clear idea of why. And, of course, you are always welcome to post lots of questions!

Test generation

Test generation with both lkb & ace. Can you generate from short sentences? What about longer ones? To receive full credit on this lab, you need a grammar that can generate from simple transitive sentences and you need to have tested what happens with longer ones (e.g. sentences with clausal modifiers or clausal complements). See Lab 6 for detailed instructions on generation. And, of course, post lots of questions to Canvas.

Variable property mapping

You may have noticed that you get many variants on generation if you start with a form that is underspecified for e.g., aspect or evidentiality. We can get a handle on this by using variable property mapping to supply default values in the unmarked case (either in monolingual generation or in the MT scenario). The basic strategy is to take any underspecified values in variable properties and translate them, via vpm, to something that conflicts with any more specific values your grammar can produce.

The file semi.vpm provides a mapping between grammar-external features of indices (referential indices and events) and their values, and grammar-internal ones. For background on VPM, see the DELPH-IN wiki. As soon as you start using a VPM file, then only variable properties (features on indices) that are handled in the file are actually preserved.

  1. You should already have a semi.vpm file provided by the customization system. Open it up and see which variable properties are there, and then look in your grammar to see what is missing. In general, we'd expect to see all of the features of the types event and ref-ind represented in a mature semi.vpm file.
  2. Your script file tells the lkb to load the semi.vpm file with the following line:
    (mt:read-vpm (lkb-pathname (parent-directory) "semi.vpm") :semi)
    
  3. The semi.vpm files currently generated by the customization system are not fully ready for MT. In particular, the feature paths should include PNG. and E. only on the internal (left-hand) side. Thus you'll want to edit blocks like this:
    E.TENSE : E.TENSE
      present <> present
      past <> past
      future <> future
      * <> *
    
    To look like this:
    E.TENSE : TENSE
      present <> present
      past <> past
      future <> future
      * <> *
    
  4. If your grammar uses a PERNUM feature, you'll need to map separate PER and NUM features from the external (right-hand side) of the VPM to a single PRENUM feature on the internal (left-hand side). See the example under "Properties: An Example" on the DELPH-IN wiki page.
  5. If you have any other features you have added on indices, you will need to provide VPM entries for them as well. (If you added them through the customization system, they may be there already.)
  6. If your language has aspect marked in some sentences but other forms that are just underspecified for aspect, you'll want to have the default aspect be "no-aspect". Define this as a subtype of aspect in your grammar, but don't have anything other than the semi.vpm mention it otherwise. In semi.vpm, replace * <> * under aspect with the following:
    * >> no-aspect
    no-aspect << [e]
    
  7. If all forms in your language are marked for aspect, you may still want some default value in cases where the input MRS has no aspect, so that you don't get all the forms. If you wanted perfective as your defaul aspect, replace * <> * with:
    perfective << [e]
    
  • You can do a similar trick for other kinds of generation ambiguity relating to variable properties.

    Test your semi.vpm file by parsing and then generating. You should see fewer strings coming out.

    First MT

    Preliminaries

    The goal for the next lab will be to create an "accommodation" transfer grammar for your language by using it as the target language in two translation pairs, with English and Pite Saami as the inputs. For now, we'll be attempting to get just one sentence through. This will be one that doesn't actually require any transfer rules.

    In all of the instructions below, replace "iso" with the ISO 693-3 code for your language.

    1. Download and unpack mmt.tgz.
    2. Test eng2sje and sje2eng translation:
       cd mmt/
       ./translate-line.sh eng sje 1
      
      Note: If you aren't working on the VM, you'll need to fix the path to ace in the file translate-line.sh
    3. Look inside translate-line.sh; try changing which line is not commented out and see what different behaviors you get.
    4. Copy your grammar into mmt/grammars/iso and compile it afresh with ace:
        cd mmt/grammars/iso
        ace -G iso.dat -g ace/config.tdl
      
    5. Move the generic transfer grammar mmt/tm/gen to mmt/tm/iso
        cd mmt/tm
        mv gen iso
      
    6. Compile that generic transfer grammar:
        ace -G iso.dat -g ace/config.tdl
      
    7. Copy your MMT entences to test_sentences/iso.txt. Note: last week I asked for a file with one blank line between each sentence. That's incorrect for this set up. Please just do one sentence per line, for 26 lines, in the same order as tne eng.txt file in test_sentences. Any item you are skipping can be replaced with SKIPPED.
    8. Try translating the first sentence from eng and sje to your language:
       ./translate-line.sh eng iso 1
      
    9. This one should not require any transfer rules. If it doesn't work, there are several possible causes:
      • A bug in your MT set up. If you are seeing errors that suggest this might be the problem, post to Canvas.
      • The MRSs don't match. Compare the eng (or sje) MRSs to yours. Can you spot the difference? If you find any, modify your grammar until the MRSs match. Post to Canvas for help. A subcase here is that the PREDs and their arguments match, but the variable properties don't. We might be debugging semi.vpm files. Either way, all changes should be in your grammar, and not eng or sje.
      • Your grammar isn't generating. Confirm that this is the problem by trying monolingual generation (or iso2iso translation). Post to Canvas for help debugging.
      • If you aren't working in the provided VM, your ace version may differ from that used to compile the eng and sje grammars (and also transfer grammars). I this case, recompiling all of those may help.

    Run both the test corpus and the testsuite

    Following the same procedure as usual, do test runs over both the testsuite and the test corpus.

    Again, collect the following information to provide in your write up:

    1. How many items parsed?
    2. What is the average number of parses per parsed item?
    3. How many parses did the most ambiguous item receive?
    4. What sources of ambiguity can you identify?

    Back to top

    Write up

    NB: While the test suite and grammar development is joint work, the write up should be done by one partner (the other will get a turn next week). The writing partner should have the non-writing partner review the write up and make suggestions.

    Your write up should be a plain text file (not .doc, .rtf or .pdf) which includes the following:

    1. A statement of what you were able to find for translations of the MMT sentences, and why.
    2. Documentation of your analyses of the additional phenomena:
      1. A descriptive statement of the facts of your language.
      2. Illustrative IGT examples from your testsuite.
      3. A statement of how you implemented the phenomenon (in terms of types you added/modified and particular tdl constraints). (Yes, I want to see actual tdl snippets.)
      4. If the analysis is not (fully) working, a description of the problems you are encountering.
    3. Documentation of what happened when you tried generating with the LKB. Did it work right away? If it didn't, but you were able to get it working, what did you have to do?
    4. Documentation of what happened when you tried the MT set up. Did it work right away? What did you change to get it working?
    5. Documentation of your coverage over testsuite & test corpus for both the initial & final runs, including the answers to the questions given above.

    Back to top

    Submit your assignment

    Back to top

    Back to course page


    Last modified: