Linguistics 567: Grammar Engineering
Lab 9 Due 5/26 (11:59 pm)
Preliminaries
The goal for this lab is to create an "accommodation" transfer
grammar for your language by using it as the target language in two
translation pairs, with English and Chadian Arabic as the inputs. Along the
way, you will be cleaning up your grammar so that it generates for as
many of the MMT sentences as possible (ideally all of them), and
generates as few outputs as are motivated.
As usual, I'll be asking for before and after tsdb profiles
and a write up (see directions below).
Full credit for the "tasks" portion of the lab will be given if
you can translate 12 of the the MMT sentences into your languages from
English and 10 of the MMT sentences into your language from Chadian Arabic
Chadian Arabic (without excessive spurious realizations). At least 30/50
points if you can translate at least two MMT sentence into your language from
both English and Chadian Arabic (without excessive spurious realizations).
Running the translation system
I assume you got the basic translation system running last
week in Lab 8. If not, look at the Lab 8
instructions and/or post to Canvas for help.
Collect your expected outputs
Include a plain text file (called iso.txt, with iso replaced by your iso language code) with your 24 MMT sentences (in your language only) in the format your grammar expects (morpheme segmented or not), one per line, with a blank line between each. NB: I'm not looking for IGT here, just the actual strings your grammar expects.
It's possible that more than one of the English MMT strings translates as the same string in your language. That's fine. Just repeat it so the alignment between the English MMT file and your MMT file works.
Attempt to translate into your language
- You should already have the eng
and shu grammars from Lab 8.
- Make sure *translate-grid* is being set in
your lkb/globals.lsp file (per Lab 8):
(setf *translate-grid* '(:iso :eng :shu))
- Similarly, edit the flie lkb/globals.lsp in the English
and Chadian Arabic grammars so that the line for *translate-grid* now looks
like the appropriate one of the lines below (again replacing "iso"
with the code for your language).
(setf *translate-grid* '(:eng :shu :iso))
(setf *translate-grid* '(:shu :eng :iso))
- Now load your grammar into the "target" lkb.
- Parse Dogs sleep with the English grammar in the "source" lkb
and select "rephrase". Ideally, this should be working from Lab 8 :-).
- Observe what happens: Do you get generation outputs? Some
error in the emacs buffer in the "target" emacs?
- If you get an error, you'll need to compare the MRSs to to
see what the difference is. I expect that for Dogs sleep
you won't need any transfer rules, and thus any errors should be
addressed through harmonization (aka cleaning up your MRS) and/or
work on your semi.vpm file.
Comparing MRSs
To compare the MRSs, you can look at the MRS from the English
grammar directly, but this can be a bit misleading, since you really
want to look at the input to the generator (i.e., the transfer
output). To do this, you can select "Generate | Display Input MRS" or
"Generate | Display Internal MRS" from the "target" LKB Top menu.
- Generate | Display Internal MRS
- Parse the expected output
- Choose Indexed MRS from the pop-up menu
There are a number of things that could be wrong:
- Missing RELS or HCONS (broken diff-list append).
- Misspelled PRED values (look carefully at the underscores).
- Misspelled/differently spelled feature values (e.g. sing
instead of sg). Recall that the right hand side of the VPM rules should match the right hand sides from the eng and shu versions of this file.
- Misspelled/differently spelled feature names (e.g., PERS
instead of PER). Recall that the right hand side of the VPM rules should match the right hand sides from the eng and shu versions of this file.
- Incompatible variable properties (features and values).
Update semi.vpm, if necessary
(You may have done all there is to do here in Lab 8.)
The file semi.vpm provides a mapping between grammar-external
features of indices (referential indices and events) and their values,
and grammar-internal ones. For background on VPM, see the
DELPH-IN wiki.
- If your grammar uses a PERNUM feature, you'll need to map
separate PER and NUM features from the external (right-hand side) of
the VPM to a single PRENUM feature on the internal (left-hand side).
See the example under "Properties: An Example" on the DELPH-IN wiki page. (There is also a an example in the semi.vpm file in the eng grammar.)
- If your language has aspect marked in some sentences but other forms that are just underspecified for aspect, you'll want to have the default aspect be no-aspect. Define this as a subtype of aspect in your grammar, but don't have anything other than the semi.vpm mention it otherwise.
Create a transfer grammar
Once you have Dogs sleep translating, it's time to try
a broader range of the MMT sentences, as well as both English and
Chadian Arabic as input to see what kinds of transfer rules you will need.
Note that you will be modifying the English and Chadian Arabic grammars
for this part of the lab. The transfer rules types are in
mt-mrs.tdl, mtr.tdl and acm.tdl. Of
those, acm.tdl should be the most interesting. You'll
want to edit the file acm.mtr to create instances of the
transfer rules that you need for your grammar. It will be simplest to
edit this file in one grammar (say the English one) and create a
symbolic link to it in the other grammar, so that you have one
transfer grammar for your language.
- Try translating all of the MMT sentences from English to your
language and Chadian Arabic to your language.
- For each one that doesn't go through, compare the input MRS
to the MRS your expected output is giving.
- Do any harmonization that is warranted. If it's not clear whether
you should try to do MRS harmonization or try to build a transfer rule, post to Canvas.
- For the remaining differences, look to see if one of the existing
transfer rule types in acm.tdl will do the trick. If so,
create an instance of that transfer rule type in acm.mtr, e.g.,:
pro-drop := pronoun-delete-mtr.
- If you need a different transfer rule, post on Canvas about what you need, and we'll work out how to formulate it.
- Reload the "source" grammar and try translating again.
- Rinse and repeat.
Write up
- Include your iso.txt file with your translations of
the MMT sentences.
- Describe any clean up you did to your grammar.
- Describe the transfer rules you instantiated, and
why.
- Describe any further transfer rules you needed to
develop, and why.
- Document your current coverage on translating the
MMT sentences from English and Chadian Arabic into your language.
If you are generating more than one output for each input, explain the sources of variation.
- If you don't have full coverage, describe why not.
- Describe the current coverage of your grammar over your test suite (using numbers you can
get from Analyze | Coverage and Analyze | Overgeneration in [incr tsdb()])
and a comparison between your baseline test suite run and your final
one for this lab (see Compare | Competence).
- Create a tarball of your grammar (including your tsdb directory), your versions of the eng and shu grammars, and your
write up.
tar czf iso-lab9.tgz *
- Upload the tarball to Canvas
Back to main course page
Last modified: