Lab 8 (Due 5/19 11:59pm)


As usual, check the write up instructions first.

Requirements for this assignment

Run a baseline test suite

Before making any changes to your grammar for this lab, run a baseline test suite instance. If you decide to add items to your test suite for the material covered here, consider doing so before modifying your grammar so that your baseline can include those examples. (Alternatively, if you add examples in the course of working on your grammar and want to make the snapshot later, you can do so using the grammar you turned in for Lab 7.)

One sentence from the test corpus

The goal of this section is to parse one more sentence from your test corpus than you are before starting this section, i.e., one more than last week. In your write up, you should document what you had to add to get the sentence working. Note that it is possible to get full credit here even if the sentence ultimately doesn't parse by documenting what you still have to get working.

This is (again) a very open-ended part of the lab (even more so than usual), which means: A) you should get started early and post to Canvas so I can assist in developing analyses of whatever additional phenomena you run accross and B) you'll have to restrain yourselves; the goal isn't to parse the whole test corpus this week ;-)---and I won't be able to support more than one new sentence per group.

Variable property mapping

You may have noticed that you get many variants on generation if you start with a form that is underspecified for e.g., aspect or evidentiality. We can get a handle on this by using variable property mapping to supply default values in the unmarked case (either in monolingual generation or in the MT scenario). The basic strategy is to take any underspecified values in variable properties and translate them, via vpm, to something that conflicts with any more specific values your grammar can produce.

The file semi.vpm provides a mapping between grammar-external features of indices (referential indices and events) and their values, and grammar-internal ones. For background on VPM, see the DELPH-IN wiki. As soon as you start using a VPM file, then only variable properties (features on indices) that are handled in the file are actually preserved.

  1. You should already have a semi.vpm file provided by the customization system. Open it up and see which variable properties are there, and then look in your grammar to see what is missing. In general, we'd expect to see all of the features of the types event and ref-ind represented in a mature semi.vpm file.
  2. Your script file tells the lkb to load the semi.vpm file with the following line:
    (mt:read-vpm (lkb-pathname (parent-directory) "semi.vpm") :semi)
  3. The semi.vpm files currently generated by the customization system are not fully ready for MT. In particular, the feature paths should include PNG. and E. only on the internal (left-hand) side. Thus you'll want to edit blocks like this:
      present <> present
      past <> past
      future <> future
      * <> *
    To look like this:
      present <> present
      past <> past
      future <> future
      * <> *
  4. If your grammar uses a PERNUM feature, you'll need to map separate PER and NUM features from the external (right-hand side) of the VPM to a single PRENUM feature on the internal (left-hand side). See the example under "Properties: An Example" on the DELPH-IN wiki page.
  5. If you have any other features you have added on indices, you will need to provide VPM entries for them as well. (If you added them through the customization system, they may be there already.)
  6. If your language has aspect marked in some sentences but other forms that are just underspecified for aspect, you'll want to have the default aspect be "no-aspect". Define this as a subtype of aspect in your grammar, but don't have anything other than the semi.vpm mention it otherwise. In semi.vpm, replace * <> * under aspect with the following:
    * >> no-aspect
    no-aspect << [e]
  7. If all forms in your language are marked for aspect, you may still want some default value in cases where the input MRS has no aspect, so that you don't get all the forms. If you wanted perfective as your defaul aspect, replace * <> * with:
    perfective << [e]
  • You can do a similar trick for other kinds of generation ambiguity relating to variable properties.

    Test your semi.vpm file by parsing and then generating. You should see fewer strings coming out.

    First MT


    The goal for the next lab will be to create an "accommodation" transfer grammar for your language by using it as the target language in two translation pairs, with English and Chadian Arabic as the inputs. For now, we'll be attempting to get just one sentence through, possibly one that doesn't actually require any transfer rules.

    Running the translation system

    The first step is to get the tranlsation system running for English to Chadian Arabic (eng2shu). Here are step-by-step instructions:

    1. Download the English and Chadian Arabic grammars. Unpack each of them with tar xzf eng.tgz and tar xzf shu.tgz.
    2. Start two separate emacsen. Put one on the left of your screen (this will be the "source" emacs). Put one on the right of your screen ("target" emacs).
    3. Start the LKB in each. Make sure the "source" LKB Top menu is on the left of the screen and the "target" one is on the right.
    4. Load the English grammar into the "source" LKB.
    5. Load the Chadian Arabic grammar into the "target" LKB.
    6. In the "target" LKB, select Options | Expand menu.
    7. In the "target" LKB, select Generate | Start server.
    8. In the "source" emacs/lkb parse the English sentence Dogs sleep.
    9. From the pop-up menu on the tree that comes up, select "Rephrase." You should see a transfer output window and then the Chadian Arabic grammar should output "KALIB-PL B-UNUUM-U" and "KALIB-PL Y-UNUUM-U" in a realizations window.

    Attempt to translate into your language

    1. Edit the file lkb/globals.lsp to add the following line, with "iso" replaced by the three-letter iso code for your language:
      (setf *translate-grid* '(:iso :eng :shu))
    2. Edit the flie lkb/globals.lsp in the English and Chadian Arabic grammars so that the line for *translate-grid* now looks like the appropriate one of the lines below (again replacing "iso" with the code for your language).
      (setf *translate-grid* '(:eng :shu :iso))
      (setf *translate-grid* '(:shu :eng :iso))
    3. Now load your grammar into the "target" lkb.
    4. Parse Dogs sleep with the English grammar in the "source" lkb and select "rephrase".
    5. Observe what happens: Do you get generation outputs? Some error in the emacs buffer in the "target" emacs?
    6. If you get an error, you'll need to compare the MRSs to to see what the difference is. I expect that for Dogs sleep you won't need any transfer rules, and thus any errors should be addressed through harmonization (aka cleaning up your MRS) and/or work on your semi.vpm file. Thus, this is a good one to work on for this week.

    Comparing MRSs

    To compare the MRSs, you can look at the MRS from the English grammar directly, but this can be a bit misleading, since you really want to look at the input to the generator (i.e., the transfer output). To do this, you can select "Generate | Display Input MRS" or "Generate | Display Internal MRS" from the "target" LKB Top menu.

    1. Generate | Display Internal MRS
    2. Parse the expected output
    3. Choose Indexed MRS from the pop-up menu

    There are a number of things that could be wrong:

    1. Missing RELS or HCONS (broken diff-list append).
    2. Misspelled PRED values (look carefully at the underscores).
    3. Misspelled/differently spelled feature values (e.g. sing instead of sg).
    4. Misspelled/differently spelled feature names (e.g., PERS instead of PER).
    5. Incompatible variable properties (features and values).

    MT look-ahead

    For next week, we'll be attempting to cover the sentences in the following files:

    ... and so you'll need to collect (or create) translations of those sentences into your language for Lab 9. We'll also work on transfer rules so that we can handle cases of translation divergence.

    Write up your analyses

    1. For whatever you fixed about your grammar as you worked on the test corpus sentence, provide the usual "phenomenon" write up:
      • Prose description of the phenomenon
      • IGT (that parses!) illustrating the phenomenon
      • Prose description of analysis, illustrated with tdl snippets
    2. Describe any changes you needed to make the semi.vpm file, and the effects that including the semi.vpm had on generation.
    3. Describe any changes you needed to make to get MT working, with "Dogs sleep" or some other simple sentence.
    4. Provide the sentence that you worked on for MT, so I can test it.
    5. Describe the current coverage of your grammar over your test suite (using numbers you can get from Analyze | Coverage and Analyze | Overgeneration in [incr tsdb()]) and a comparison between your baseline test suite run and your final one for this lab (see Compare | Competence).

    Submit your assignment

    Back to main course page
    Last modified: