Lab 5 (due 2/7 11:59 pm)

Overview

This is our final lab with the customization system. It is also our first foray into MT. The focus will be on finishing up the choices files (though it's not expected that you will have used every part of the customization system) and on getting one sentence translating from English to your language. You will also work on collecting the MMT sentences for your language and use [incr tsdb()] to compare the initial and final state of your grammar for the week over the testsuites.

This lab entails the following general steps, which are not (fully) ordered with respect to each other.

Back to top

Begin collecting the MMT sentences for your language

We will be working with the sentences in eng.txt, but it is not expected that every grammmar will cover every sentence. For this week, I ask you to:

  1. Find translations (or approximations) for all of the words in the small vocabulary of those sentences.

For the write up for this portion, I expect you to tell me about the process you went through and report on item 5 above.

Back to top

Improve the choices file for three phenomena

For the three phenomena you chose, refine the choices file by hand (through the quesionnaire or via direct editing or some combination). Please be sure to post lots of questions on Canvas as you work on this! I expect the write up of this portion to include copy paste of the specific choices values you changed as well as relevant IGT that I can use to test the effects.

Back to top

Tdl edits

By now, you have may collected some suggested tdl edits (from lab 2-4 feedback or in class). Once you are all done refining things via the customization system, patch these into your grammar.

The only tdl edits this week should be things that I have suggested as bug fixes or work arounds. You are not expected to come up with tdl edits on your own. If I haven't suggested any to you, this section is a freebie --- nothing to do here!

For the write up, please include the actual tdl changes and an explanation of their purpose.

Try a first translation

Preliminaries

In later labs, we will refine the variable property mapping and create small transfer grammars for each language by using it as the target language in two translation pairs, with English and another language (probably Pite Saami) as the inputs. For now, we'll be attempting to get just one sentence through. This will be one that doesn't actually require any transfer rules.

In all of the instructions below, replace "iso" with the ISO 693-3 code for your language.

  1. Download and unpack mmt.tgz.
  2. Test eng2sje and sje2eng translation:
     cd mmt/
     ./translate-line.sh eng sje 1
    
  3. Look inside translate-line.sh; try changing which line is not commented out and see what different behaviors you get.
  4. Make a symlink to your grammar in mmt/grammars/iso
        ln -s /path/to/your/grammar mmt/grammars/iso
      
  5. and compile it afresh with ace:
      cd mmt/grammars/iso
      ace -G iso.dat -g ace/config.tdl
    
  6. Move the generic transfer grammar mmt/tm/gen to mmt/tm/iso
      cd mmt/tm
      mv gen iso
    
  7. Compile that generic transfer grammar:
      ace -G iso.dat -g ace/config.tdl
    
  8. Copy your MMT entences to test_sentences/iso.txt.
  9. Try translating the first sentence from eng to your language:
     ./translate-line.sh eng iso 1
    
  10. This one should not require any transfer rules. If it doesn't work, there are several possible causes:

For your write up for this part, please describe what happened when you tried the steps above. What difficulties did you encounter and how did you resolve them? What output did you get?

Back to top

Run the testsuite

Following the same procedure as usual, do a test run over your testsuite.

Collect the following information to provide in your write up:

  1. How many items parsed?
  2. What is the average number of parses per parsed item?
  3. How many parses did the most ambiguous item receive?
  4. What sources of ambiguity can you identify?

Back to top


Test corpus

In order to get a sense of the coverage of our grammars over naturally occurring text, we are going to collect small test corpora. Minimally, these should consist of 5-10 sentences of naturally occurring text. Perhaps your grammar resource has a collection of stories, in which case, 5-10 consecutive sentences. Alternatively, you might locate 5-10 interesting example sentences in your resource that appear to be collected from naturally occurring discourse (rather than looking like simple constructed sentences). As a last resort, you might look for other resources for your language online.

Creating large test corpora is discoraged, unless:

Note: 1,000 sentences is the maximum practical size for any single [incr tsdb()] skeleton. You could of course split your test corpus over multiple skeletons, but I'd be surprised if anyone got close to 1,000 sentences!

Note also that our grammars won't cover anything without lexicon. If you have access to a digitized lexical resource that you can import lexical items from, you can address this to a certain extent. Otherwise, you'll want to limit your test corpus to a size that you are willing to hand-enter vocabulary for. (If you have access to a Toolbox lexicon for your language, contact me about importing via the customization system.)

For Lab 5, your task is to locate your test corpus (5-10 sentences is what is expected, more only if you want and you have access to the resources described above) and format it for [incr tsdb()]. If you have IGT to work with in the first place, it may be convenient to use the make_item script to create the test corpus skeleton. (Note that you want this to be separate from your regular test suite skeleton.) Otherwise, you can use [incr tsdb()]'s own import tool (File | Import | Test items) which expects a plain text file with one item per line. The result of that command is a testsuite profile from which you'll need to copy the item (and relations) file to create a testsuite skeleton.

Check list:

Back to top

Write up

Your write up should be a plain text file (not .doc, .rtf or .pdf) which includes the following:

  1. A description of the phenomena you improved in the choices file, including:
  2. A description of any tdl edits you made and what they are for.
  3. A description of your process for translating the MMT sentences and your documentation about which sentences may be impossible.
  4. A description what happened when you tried the MT set up. What difficulties did you encounter and how did you resolve them? What output did you get?
  5. A description of what you collected for your test corpus and how you collected it.
  6. A description of the performance of your final grammar for this week on the test suite, as compared to your starting grammar (see details above).

Back to top

Submit your assignment

Back to top

Back to course page


Last modified: