Lab 8 (due 5/24 11:59 pm)
- Refine/extend your list of MMT translations, as appropriate.
- Choose one more phenomenon illustrated in those examples but not handled by your grammar to fix. Post to Canvas by class on Tuesday for pointers on how to fix it. (I'll likely direct you to lab instructions from previous years.)
- Add a few further examples to your testsuite for that phenomenon (as appropriate).
- Run initial testsuite & testcorpus profile.
- Fix the one phenomenon you chose.
- Make sure your grammar can generate. Edit as necessary until it can. Post many questions to Canvas!
- Get the variable property mapping set up working.
- Translate one sentence from English to your language with the MT system.
- Run final testsuite & testcorpus profile.
- Write it up!
Back to top
Refine/extend MMT sentences
If you don't already have a complete set of MMT translations and you don't have additional phenomena to work on in the ones you do have, try to get more of the, translated. See Lab 7 for instructions.
Note that contrary to last week's instructions, the set up we're using this year expects these files to have one sentence pre line (no blank space in between).
Choose one phenomenon to develop
By class on Tuesday: Choose one phenomenon represented by the MMT sentences not already covered by your grammar to work on this week. Post to Canvas with the IGT for the relevant examples and the phenomena you intend to work on. I will reply with pointers to instructionsn for those phenomena/develop some if necessary.
Extend your testsuite for that phenomenon
Create additional positive/negative testsuite examples that illustrate your chosen phenomena and add them to the testsuite in the usual fashion. This testsuite should now include the MMT sentences plus ~2-6 more examples and should be called lab8.
Initial testsuite run
- Create and run initial testsuite instances for both the linguist-provided data and your testsuite, using the initial grammar.
Note If your tsdb/ directory is inside a shared folder on VirtualBox, it will not work.
- For each of these, explore the results, collect the following information to provide in your write up:
- How many items parsed?
- What is the average number of parses per parsed item?
- How many parses did the most ambiguous item receive?
Back to top
Develop analyses and test your grammar
Based on the instructions I've pointed you to through Canvas + answers to the many questions I hope you will ask, develop analyses for your one additional phenomenon.
For the MMT sentences specifically, you can test your MRSs by looking at the output of this English grammar for the corresponding examples. We don't expect an exact match, but if things are different you should have a clear idea of why. And, of course, you are always welcome to post lots of questions!
Test generation
Test generation with both lkb & ace. Can you generate from short sentences? What about longer ones? To receive full credit on this lab, you need a grammar that can generate from simple transitive sentences and you need to have tested what happens with longer ones (e.g. sentences with clausal modifiers or clausal complements). See Lab 6 for detailed instructions on generation. And, of course, post lots of questions to Canvas.
You may have noticed that you get many variants on generation if
you start with a form that is underspecified for e.g., aspect or
evidentiality. We can get a handle on this by using variable property
mapping to supply default values in the unmarked case (either in
monolingual generation or in the MT scenario). The basic strategy is
to take any underspecified values in variable properties and translate
them, via vpm, to something that conflicts with any more specific
values your grammar can produce.
The file semi.vpm provides a mapping between grammar-external
features of indices (referential indices and events) and their values,
and grammar-internal ones. For background on VPM, see the
DELPH-IN wiki.
As soon as you start using a VPM file, then only variable properties
(features on indices) that are handled in the file are actually
preserved.
- You should already have a semi.vpm file provided by
the customization system. Open it up and see which variable
properties are there, and then look in your grammar to
see what is missing. In general, we'd expect to see all
of the features of the types event and ref-ind
represented in a mature semi.vpm file.
- Your script file tells the lkb to load the semi.vpm file with
the following line:
(mt:read-vpm (lkb-pathname (parent-directory) "semi.vpm") :semi)
- The semi.vpm files currently generated by the customization
system are not fully ready for MT. In particular, the feature paths
should include PNG. and E. only on the internal (left-hand) side. Thus
you'll want to edit blocks like this:
E.TENSE : E.TENSE
present <> present
past <> past
future <> future
* <> *
To look like this:
E.TENSE : TENSE
present <> present
past <> past
future <> future
* <> *
- If your grammar uses a PERNUM feature, you'll need to map
separate PER and NUM features from the external (right-hand side) of
the VPM to a single PRENUM feature on the internal (left-hand side).
See the example under "Properties: An Example" on the DELPH-IN wiki page.
- If you have any other features you have added on indices, you
will need to provide VPM entries for them as well. (If you added
them through the customization system, they may be there already.)
- If your language has aspect marked in some sentences but other forms that are just underspecified for aspect, you'll want to have the default aspect be "no-aspect". Define this as a subtype of aspect in your grammar, but don't have anything other than the semi.vpm mention it otherwise. In semi.vpm, replace * <> * under aspect with the following:
* >> no-aspect
no-aspect << [e]
- If all forms in your language are marked for aspect, you may still want some default value in cases where the input MRS has no aspect, so that you don't get all the forms. If you wanted perfective as your defaul aspect, replace * <> * with:
perfective << [e]
You can do a similar trick for other kinds of generation ambiguity
relating to variable properties.
Test your semi.vpm file by parsing and then generating. You
should see fewer strings coming out.
Preliminaries
The goal for the next lab will be to create an "accommodation"
transfer grammar for your language by using it as the target language
in two translation pairs, with English and Pite Saami as the inputs.
For now, we'll be attempting to get just one sentence through. This will be
one that doesn't actually require any transfer rules.
In all of the instructions below, replace "iso" with the ISO 693-3 code for your language.
- Download and unpack mmt.tgz.
- Test eng2sje and sje2eng translation:
cd mmt/
./translate-line.sh eng sje 1
Note: If you aren't working on the VM, you'll need to fix the path to ace in the file translate-line.sh
- Look inside translate-line.sh; try changing which line is not commented out
and see what different behaviors you get.
- Copy your grammar into mmt/grammars/iso and compile it afresh with ace:
cd mmt/grammars/iso
ace -G iso.dat -g ace/config.tdl
- Move the generic transfer grammar mmt/tm/gen to mmt/tm/iso
cd mmt/tm
mv gen iso
- Compile that generic transfer grammar:
ace -G iso.dat -g ace/config.tdl
- Copy your MMT entences to test_sentences/iso.txt. Note: last week I asked for a file with one blank line between each sentence. That's incorrect for this set up. Please just do one sentence per line, for 28 lines, in the same order as tne eng.txt file in test_sentences. Any item you are skipping can be replaced with SKIPPED.
- Try translating the first sentence from eng and sje to your language:
./translate-line.sh eng iso 1
- This one should not require any transfer rules. If it doesn't work, there are several possible causes:
- A bug in your MT set up. If you are seeing errors that suggest this might be the problem, post to Canvas.
- The MRSs don't match. Compare the eng (or sje) MRSs to yours. Can you spot the difference? If you find any, modify your grammar until the MRSs match. Post to Canvas for help. A subcase here is that the PREDs and their arguments match, but the variable properties don't. We might be debugging semi.vpm files. Either way, all changes should be in your grammar, and not eng or sje.
- Your grammar isn't generating. Confirm that this is the problem by trying monolingual generation (or iso2iso translation). Post to Canvas for help debugging.
- If you aren't working in the provided VM, your ace version may differ from that used to compile the eng and sje grammars (and also transfer grammars). I this case, recompiling all of those may help.
Run both the test corpus and the testsuite
Following the same procedure as usual, do test runs over both the testsuite and the test corpus.
Again, collect the following information to provide in your write up:
- How many items parsed?
- What is the average number of parses per parsed item?
- How many parses did the most ambiguous item receive?
- What sources of ambiguity can you identify?
- For 4 newly parsing or otherwise fixed items (2 in the testsuite, 2 in the corpus), do any of the parses look reasonable in the semantics?
Back to top
Write up
NB: While the test suite and grammar development
is joint work, the write up should be done by one partner (the
other will get a turn next week). The writing partner should
have the non-writing partner review the write up and make suggestions.
Your write up should be a plain text file (not .doc, .rtf or .pdf)
which includes the following:
- A statement of what you were able to find for translations of the MMT sentences, and why.
- Documentation of your analyses of the additional phenomena:
- A descriptive statement of the facts of your language.
- Illustrative IGT examples from your testsuite.
- A statement of how you implemented the phenomenon (in terms of types you added/modified and particular tdl constraints). (Yes, I want to see actual tdl snippets.)
- If the analysis is not (fully) working, a description of the problems
you are encountering.
- Documentation of what happened when you tried generating with the LKB. Did it work right away? If it didn't, but you were able to get it working, what did you have to do?
- Documentation of what happened when you tried the MT set up. Did it work right away? What did you change to get it working?
- Documentation of your coverage over testsuite & test corpus for both the initial & final runs, including the answers to the questions given above.
Back to top
Back to top
Back to course page
Last modified: