Lab 8 (due 2/28 11:59 pm)

Overview

The goals for this lab are to explore just a bit of the space between these small grammars and coverage of naturally occurring data and to get the grammars ready to have lots of success in MT, once we add transfer rules next week! Specifically, you will be doing the following:

  1. Identify and implement one phenomenon from the small corpus you collected, if this is feasible.
  2. Getting one more phenomenon from the MMT set working in your grammar.
  3. Cleaning up generator output some more.
  4. Harmonizing the MRSes to the extent that is reasonable.
  5. Documenting the current status of translation from eng and sje to your language, per sentence.
  6. ... as usual, running before & after testsuites.
  7. ... as usual, writing it all up!

For tdl editing, please practice incremental development: Test as frequently as you possibly can, both by compiling the grammar and by testing specific sentences.

Back to top

Choose your phenomena

From among your test corpus, find a phenomenon that seems simple but is not yet handled by your grammar. Please post to Canvas by Tuesday about what you would like to work on, including IGT, and your understanding of how that phenomenon works in your grammar. I will provide guidance on how to approach this in tdl (if feasible).

If you have a lot to do still with the MMT sentences, or don't see anything particularly feasible from the corpus, you can pick two phenomena from the MMT sentences instead one of each.

From among the MT sentences, find a phenomenon that your grammar does not yet handle, or does not yet handle properly, and fix it. Post to Canvas by Tuesday what you're working on, with IGT, so I can provide guidance. Here are the sentences grouped by phenomena. The expectation is that for whichever phenomenon you pick this week, you'll have all of the associated sentences working (but this might not always be feasible; if not, please let me know over Canvas).

Back to top

Build out your testsuite and iso.txt for your phenomenon

Consult your descriptive materials for the phenomenon you chose, to understand how it is expressed in your language.

Add the relevant sentences to iso.txt and to your testsuite. The testsuite should also include relevant negative examples.

(If this was already done in previous labs for your phenomenon, that is okay.)

Back to top

Improve your grammar for your chosen phenomena

I expect this to be done in collaboration with me, which is why I'm asking you to post by Tuesday indicating which phenomena you are working on and how it is expressed in your language.

In some cases, I'll suggest that you get an analysis from the customization system and integrated it into your current grammar (or at least use it as a starting point). In others, I might directly suggest some tdl or a general approach.

You are welcome to start with a suggestion of how you'd like to handle it, but please don't dive into extensive implementation without discussing the approach with me.

As usual, please practice incremental development and test frequently (by compiling the grammar and testing individual sentences, as well as by running your full testsuite).

Back to top

Further generation clean up

For each MMT sentence that goes through, look at the range of generator outputs. How much realized-string ambiguity are you getting? What are the sources of ambiguity?

For MMT sentecnes that don't go through for lack of transfer rules, try using the MMT system with your language as both input and output. How much realized-string ambiguity are you getting? What are the sources of ambiguity?

If the ambiguity relates to overgeneration (i.e. your grammar generating ungrammatical strings), we'll want to work on adding appropriate constraints. Please psot to Canvas for assitance.

Likewise, if you have any ambiguity that relates to semantically empty things (affixes, words) that you didn't previously clean up, work on that some this week too. Please post to Canvas for assistance.

Harmonize MRSes and investigate need for transfer rules

In this part of the lab, your task is to try each of the items in the MMT sentences for both sje and eng as source languages and your language as the target language. (Note that sje doesn't have all of the items).

  1. For each item that does go through, are you getting the expected output?
  2. For each item that doesn't go through, how do the MRSes differ? Some differences might be legitimate (e.g. predications from overt pronouns v. dropped arguments, "hunger eats me"). Others might be things that could be resolved be assimilating your MRS to those from sje & eng or by refining your semi.vpm file further. In the latter cases, please make the changes (posting to Canvas for assistance as much as you need!). If it's not clear how to classify a given case, please post to Canvas for discussion.

Run both the test corpus and the testsuite

Following the same procedure as usual, do test runs over both the testsuite and the test corpus.

Collect the following information to provide in your write up:

  1. How many items parsed?
  2. What is the average number of parses per parsed item?
  3. How many parses did the most ambiguous item receive?
  4. What NEW sources of ambiguity can you identify?

Back to top

Write up

Your write up should be a plain text file (not .doc, .rtf or .pdf) which includes the following:

  1. A description of how the phenomena you picked to work on are expressed in your language, including IGT. For each:
  2. A description of your implementation of this phenomenon, including:
  3. A description of any clean up work you did to get generation down to a reasonable number of outputs, including:
  4. A description of the status of each MMT item, with sje as source and with eng as source. Possible statuses:
  5. A description of the performance of your final grammar for this week on the test suite and test corpus, as compared to your starting grammar (see details above).

Back to top

Submit your assignment

Back to top

Back to course page


Last modified: