Lab 5 (due 2/9 11:59 pm)
This is our final lab with the customization system. It is also our first foray into MT. The focus will be on finishing up the choices files (though it's not expected that you will have used every part of the customization system) and on getting one sentence translating from English to your language. You will also work on collecting the MMT sentences for your language and use [incr tsdb()] to compare the initial and final state of your grammar for the week over the testsuites.
This lab entails the following general steps, which are not (fully) ordered with respect to each other.
- Begin to collect the translations of the MMT sentences (and add the ones you are able to collect to your testsuite).
- Choose three phenomena from labs 2-4 to make further refinements on in your choices file (this can be ones you already worked on, but that need more work!)
- Only when you have finished working with the customization system, if you have simple tdl fixes I have suggested, add these in.
- Process your testsuite using [incr tsdb()], the LKB, and the grammar resulting from your updated choices file.
- Examine the results of the second test run for coverage, accuracy and ambiguity, including as a diff to the final lab 4 test run
- Try a first translation
- Write it all up :)
Back to top
Begin collecting the MMT sentences for your language
We will be working with the sentences in eng.txt, but it is not expected that every grammmar will cover every sentence. For this week, I ask you to:
- Find translations (or approximations) for all of the words in the small vocabulary of those sentences.
- For example, if you can't find park, you might replace it with field or beach.
- Translate the first three, and include them as items in your testsuite.txt file
- Update your testsuite skeleton to reflect the new testsuite.text.
- Create a iso.txt file for your language (with iso changed to your language code) with just the line you expect your grammar to parse for each sentence, one per line, and in the same order as eng.txt. For any you haven't found a translation for yet, just write SKIPPED.
- Detemine (and document) whether any of the other sentences will be impossible to translate given your resources and/or impossible to model (involving phenomena you don't expect to get to).
For the write up for this portion, I expect you to tell me about the process you went through and report on item 5 above.
Back to top
Improve the choices file for three phenomena
For the three phenomena you chose, refine the choices file by hand (through the quesionnaire or via direct editing or some combination). Please be sure to post lots of questions on Canvas as you work on this! I expect the write up of this portion to include copy paste of the specific choices values you changed as well as relevant IGT that I can use to test the effects.
Back to top
Tdl edits
By now, you have collected some suggested tdl edits (from lab 2-4 feedback or in class). Once you are all done refining things via the customization system, patch these into your grammar.
The only tdl edits this week should be things that I have suggested as bug fixes or work arounds. You are not expected to come up with tdl edits on your own. If I haven't suggested any to you, this section is a freebie --- nothing to do here!
For the write up, please include the actual tdl changes and an explanation of their purpose.
Try a first translation
Preliminaries
In later labs, we will refine the variable property mapping and create
small
transfer grammars for each language by using it as the target language
in two translation pairs, with English and another language (probably Pite Saami) as the inputs.
For now, we'll be attempting to get just one sentence through. This will be
one that doesn't actually require any transfer rules.
In all of the instructions below, replace "iso" with the ISO 693-3 code for your language.
- Download and unpack mmt.tgz.
- Test eng2sje and sje2eng translation:
cd mmt/
./translate-line.sh eng sje 1
Note: If you aren't working on the VM, you'll need to fix the path to ace in the file translate-line.sh (and possibly install ace).
- Look inside translate-line.sh; try changing which line is not commented out
and see what different behaviors you get.
- Make a symlink to your grammar in mmt/grammars/iso
ln -s /path/to/your/grammar mmt/grammars/iso
- and compile it afresh with ace:
cd mmt/grammars/iso
ace -G iso.dat -g ace/config.tdl
- Move the generic transfer grammar mmt/tm/gen to mmt/tm/iso
cd mmt/tm
mv gen iso
- Compile that generic transfer grammar:
ace -G iso.dat -g ace/config.tdl
- Copy your MMT entences to test_sentences/iso.txt.
- Try translating the first sentence from eng to your language:
./translate-line.sh eng iso 1
- This one should not require any transfer rules. If it doesn't work, there are several possible causes:
- A bug in your MT set up. If you are seeing errors that suggest this might be the problem, post to Canvas.
- Your grammar isn't generating. Confirm that this is the problem by trying monolingual generation (or iso2iso translation). Post to Canvas for help debugging.
- The MRSs don't match. Compare the eng (or sje) MRSs to yours. Can you spot the difference? If you find any, modify your grammar until the MRSs match. Post to Canvas for help. A subcase here is that the PREDs and their arguments match, but the variable properties don't. We might be debugging semi.vpm files. Either way, all changes should be in your grammar, and not eng or sje.
- If you aren't working in the provided VM, your ace version may differ from that used to compile the eng and sje grammars (and also transfer grammars). I this case, recompiling all of those may help.
For your write up for this part, please describe what happened when you tried the steps above. What difficulties did you encounter and how did you resolve them? What output did you get?
Back to top
Run the testsuite
Following the same procedure as usual, do a test run over your testsuite.
Collect the following information to provide in your write up:
- How many items parsed?
- What is the average number of parses per parsed item?
- How many parses did the most ambiguous item receive?
- What sources of ambiguity can you identify?
Back to top
Write up
Your write up should be a plain text file (not .doc, .rtf or .pdf)
which includes the following:
- A description of the phenomena you improved in the choices file, including:
- Prose description of the phenomenon
- Prose description of your analysis
- The specific changes you made to choices (paste in the actual choices)
- Specific IGT I can use to test the analysis / investigate if something isn't working and you need help.
- A description of any tdl edits you made and what they are for.
- A description of your process for translating the MMT sentences and your documentation about which sentences may be impossible.
- A description what happened when you tried the MT set up. What difficulties did you encounter and how did you resolve them? What output did you get?
- A description of the performance of your final grammar for this week on the test suite, as compared to your starting grammar (see details above).
Back to top
Back to top
Back to course page
Last modified: