Lab 8 (due 2/24 11:59 pm)

Overview

The goals for this lab are to get the grammars ready to have lots of success in MT, once we add transfer rules next week! Specifically, you will be doing the following:

  1. Getting one more phenomenon from the MMT set working in your grammar.
  2. Cleaning up generator output some more.
  3. Harmonizing the MRSes to the extent that is reasonable.
  4. Documenting the current status of translation from eng and sje to your language, per sentence.
  5. ... as usual, running before & after testsuites.
  6. ... as usual, writing it all up!

For tdl editing, please practice incremental development: Test as frequently as you possibly can, both by compiling the grammar and by testing specific sentences.

Back to top

Choose your phenomenon

From among the MT sentences, find a phenomenon that your grammar does not yet handle, or does not yet handle properly, and fix it. Post to Canvas by Tuesday what you're working on, with IGT, so I can provide guidance. Here are the sentences grouped by phenomena. The expectation is that for whichever phenomenon you pick this week, you'll have all of the associated sentences working (but this might not always be feasible; if not, please let me know over Canvas).

Back to top

Build out your testsuite and iso.txt for your phenomenon

Consult your descriptive materials for the phenomenon you chose, to understand how it is expressed in your language.

Add the relevant sentences to iso.txt and to your testsuite. The testsuite should also include relevant negative examples.

(If this was already done in previous labs for your phenomenon, that is okay.)

Back to top

Improve your grammar for your chosen phenomenon

I expect this to be done in collaboration with me, which is why I'm asking you to post by Tuesday indicating which phenomenon you are working on and how it is expressed in your language.

In some cases, I'll suggest that you get an analysis from the customization system and integrated it into your current grammar (or at least use it as a starting point). In others, I might directly suggest some tdl or a general approach.

You are welcome to start with a suggestion of how you'd like to handle it, but please don't dive into extensive implementation without discussing the approach with me.

As usual, please practice incremental development and test frequently (by compiling the grammar and testing individual sentences, as well as by running your full testsuite).

Back to top

Further generation clean up

For each MMT sentence that goes through, look at the range of generator outputs. How much realized-string ambiguity are you getting? What are the sources of ambiguity?

For MMT sentecnes that don't go through for lack of transfer rules, try using the MMT system with your language as both input and output. How much realized-string ambiguity are you getting? What are the sources of ambiguity?

If the ambiguity relates to overgeneration (i.e. your grammar generating ungrammatical strings), we'll want to work on adding appropriate constraints. Please psot to Canvas for assitance.

Likewise, if you have any ambiguity that relates to semantically empty things (affixes, words) that you didn't previously clean up, work on that some this week too. Please post to Canvas for assistance.

Harmonize MRSes and investigate need for transfer rules

In this part of the lab, your task is to try each of the items in the MMT sentences for both sje and eng as source languages and your language as the target language. (Note that sje doesn't have all of the items).

  1. For each item that does go through, are you getting the expected output?
  2. For each item that doesn't go through, how do the MRSes differ? Some differences might be legitimate (e.g. predications from overt pronouns v. dropped arguments, "hunger eats me"). Others might be things that could be resolved be assimilating your MRS to those from sje & eng or by refining your semi.vpm file further. In the latter cases, please make the changes (posting to Canvas for assistance as much as you need!). If it's not clear how to classify a given case, please post to Canvas for discussion.

Run both the test corpus and the testsuite

Following the same procedure as usual, do test runs over both the testsuite and the test corpus.

Collect the following information to provide in your write up:

  1. How many items parsed?
  2. What is the average number of parses per parsed item?
  3. How many parses did the most ambiguous item receive?
  4. What NEW sources of ambiguity can you identify?

Back to top

Write up

Your write up should be a plain text file (not .doc, .rtf or .pdf) which includes the following:

  1. A description of how the phenomenon you picked to work on is expressed in your language, including IGT.
  2. A description of your implementation of this phenomenon, including:
  3. A description of any clean up work you did to get generation down to a reasonable number of outputs, including:
  4. A description of the status of each MMT item, with sje as source and with eng as source. Possible statuses:
  5. A description of the performance of your final grammar for this week on the test suite and test corpus, as compared to your starting grammar (see details above).

Back to top

Submit your assignment

Back to top

Back to course page


Last modified: