Lab 5 (due 2/4 11:59 pm)

Overview

This is our final lab with the customization system. It is also our first foray into MT. The focus will be on finishing up the choices files (though it's not expected that you will have used every part of the customization system) and on getting one sentence translating from English to your language. You will also work on collecting the MMT sentences for your language and use [incr tsdb()] to compare the initial and final state of your grammar for the week over the testsuites.

This lab entails the following general steps, which are not (fully) ordered with respect to each other.

Back to top

Improve the choices file for three phenomena

For the three phenomena you chose, refine the choices file by hand. Please be sure to post lots of questions on Canvas as you work on this! I expect the write up of this portion to include copy paste of the specific choices values you changed as well as relevant IGT that I can use to test the effects.

Back to top

Begin collecting the MMT sentences for your language

We will be working with the sentences in eng.txt, but it is not expected that every grammmar will cover every sentence. For this week, I ask you to:

  1. Find translations (or approximations) for all of the words in the small vocabulary of those sentences
  2. Translate the first three, and include them as items in your testsuite.txt file
  3. Update your testsuite skeleton to reflect the new testsuite.text.
  4. Create a iso.txt file for your language (with iso changed to your language code) with just the line you expect your grammar to parse for each sentence, one per line, and in the same order as eng.txt. For any you haven't found a translation for yet, just write SKIPPED.
  5. Detemine (and document) whether any of the other sentences will be impossible to translate given your resources and/or impossible to model (involving phenomena you don't expect to get to).

For the write up for this portion, I expect you to tell me about the process you went through and report on item 5 above.

Back to top

Try a first translation

Preliminaries

In later labs, we will refine the variable property mapping and create small transfer grammars for each language by using it as the target language in two translation pairs, with English and Pite Saami as the inputs. For now, we'll be attempting to get just one sentence through. This will be one that doesn't actually require any transfer rules.

In all of the instructions below, replace "iso" with the ISO 693-3 code for your language.

  1. Download and unpack mmt.tgz.
  2. Test eng2sje and sje2eng translation:
     cd mmt/
     ./translate-line.sh eng sje 1
    
    Note: If you aren't working on the VM, you'll need to fix the path to ace in the file translate-line.sh
  3. Look inside translate-line.sh; try changing which line is not commented out and see what different behaviors you get.
  4. Copy your grammar into mmt/grammars/iso and compile it afresh with ace:
      cd mmt/grammars/iso
      ace -G iso.dat -g ace/config.tdl
    
  5. Move the generic transfer grammar mmt/tm/gen to mmt/tm/iso
      cd mmt/tm
      mv gen iso
    
  6. Compile that generic transfer grammar:
      ace -G iso.dat -g ace/config.tdl
    
  7. Copy your MMT entences to test_sentences/iso.txt.
  8. Try translating the first sentence from eng and sje to your language:
     ./translate-line.sh eng iso 1
    
  9. This one should not require any transfer rules. If it doesn't work, there are several possible causes:

For your write up for this part, please describe what happened when you tried the steps above. What difficulties did you encounter and how did you resolve them? What output did you get?

Back to top

Run both the test corpus and the testsuite

Following the same procedure as usual, do test runs over both the testsuite and the test corpus.

Again, collect the following information to provide in your write up:

  1. How many items parsed?
  2. What is the average number of parses per parsed item?
  3. How many parses did the most ambiguous item receive?
  4. What sources of ambiguity can you identify?
  5. For 10 items (if you have at least that many parsing), do any of the parses look reasonable in the semantics? --- It is fine to recheck the ones you have been checking, and only report diffs.

Back to top

Write up

Your write up should be a plain text file (not .doc, .rtf or .pdf) which includes the following:

  1. A description of the phenomena you improved in the choices file, including:
  2. A description of your process for translating the MMT sentences and your documentation about which sentences may be impossible.
  3. A description what happened when you tried the MT set up. What difficulties did you encounter and how did you resolve them? What output did you get?
  4. A description of the performance of your final grammar for this week on the test suite and test corpus, as compared to your starting grammar (see details above).

Back to top

Submit your assignment

Back to top

Back to course page


Last modified: