Lab 8 (Due 2/26 11:00pm)
Preliminaries
As usual, check the write up instructions first.
Requirements for this assignment
- 0. Make sure you have a baseline test suite corresponding to your lab 7 grammar.
- 1. Find one more simple sentence from your test corpus and try to get your grammar to be able to parse it.
- Post at least the sentence you are working on (as IGT) and ideally some questions about it by the end of the day on Tuesday.
- 2. Use VPM to cut back on range of generation.
- 3. Get one sentence translation from English (or Chadian Arabic) to your language. (We'll do the rest of the sentences next week.)
- 4. Test your grammar using [incr tsdb()].
[incr tsdb()] should be part of your test-development
cycle. In addition, you'll need to run a final test suite instance
for this lab to submit along with your basline.
- 5. Write up the lab.
Before making any changes to your grammar for this lab,
run a baseline test suite instance. If you decide to add
items to your test suite for the material covered here, consider
doing so before modifying your grammar so that your baseline can
include those examples. (Alternatively, if you add examples
in the course of working on your grammar and want to make the
snapshot later, you can do so using the grammar you turned
in for Lab 7.)
The goal of this section is to parse one more sentence from your
test corpus than you are before starting this section, i.e., one more
than last week. In your write up,
you should document what you had to add to get the sentence working.
Note that it is possible to get full credit here even if the
sentence ultimately doesn't parse by documenting what you still
have to get working.
This is (again) a very open-ended part of the lab (even more
so than usual), which means: A) you should get started early
and post to GoPost so I can assist in developing analyses of
whatever additional phenomena you run accross and B) you'll
have to restrain yourselves; the goal isn't to parse the whole
test corpus this week ;-)---and I won't be able to support more
than one new sentence per group.
- Create a profile from your test corpus skeleton, and run a
baseline.
- Look for some plausible candidate sentences. These should
be relatively short and ideally have minimal additional grammatical
phenomena beyond what we have already covered.
- Examine the lexical items required for your target sentence(s).
Add any that should belong to lexical types you have already
created.
- Try parsing the test corpus again (or just your target sentence
from it).
- If your target sentence parses, check the MRS to see if it is
reasonable. If you are unsure, post it to GoPost for me to give you
feedback. If you're working with new semantic phenomena, please discuss
with me (on GoPost) what the target representation should be.
- If your target sentence doesn't parse, check to see whether
you still have lexical coverage errors. Fixing these may require
adapting existing lexical rules, adding lexical rules, and/or
adding lexical types. Post to GoPost for assistance.
- If your target sentence doesn't parse but your grammar does
find analyses for each lexical item, then examine the parse chart
to identify the smallest expected constituent that the grammar is
not finding, and debug from there. Do you have the phrase structure
rule that should be creating the constituent? If so, try interactive
unification to see why the expected daughters aren't forming
a phrase with that rule. Do you need to add a phrase structure rule?
Again, post to GoPost for assistance.
- Iterate until either the sentence parses or you at least have a clear
understanding of what you would need to add to get it parsing.
- Run your full test suite after any changes you make to your
grammar to make sure you aren't breaking previous coverage.
You may have noticed that you get many variants on generation if
you start with a form that is underspecified for e.g., aspect or
evidentiality. We can get a handle on this by using variable property
mapping to supply default values in the unmarked case (either in
monolingual generation or in the MT scenario). The basic strategy is
to take any underspecified values in variable properties and translate
them, via vpm, to something that conflicts with any more specific
values your grammar can produce.
The file semi.vpm provides a mapping between grammar-external
features of indices (referential indices and events) and their values,
and grammar-internal ones. For background on VPM, see the
DELPH-IN wiki.
As soon as you start using a VPM file, then only variable properties
(features on indices) that are handled in the file are actually
preserved.
- You should already have a semi.vpm file provided by
the customization system. Open it up and see which variable
properties are there, and then look in your grammar to
see what is missing. In general, we'd expect to see all
of the features of the types event and ref-ind
represented in a mature semi.vpm file.
- Your script file tells the lkb to load the semi.vpm file with
the following line:
(mt:read-vpm (lkb-pathname (parent-directory) "semi.vpm") :semi)
- The semi.vpm files currently generated by the customization
system are not fully ready for MT. In particular, the feature paths
should include PNG. and E. only on the internal (left-hand) side. Thus
you'll want to edit blocks like this:
E.TENSE : E.TENSE
present <> present
past <> past
future <> future
* <> *
To look like this:
E.TENSE : TENSE
present <> present
past <> past
future <> future
* <> *
- If your grammar uses a PERNUM feature, you'll need to map
separate PER and NUM features from the external (right-hand side) of
the VPM to a single PRENUM feature on the internal (left-hand side).
See the example under "Properties: An Example" on the DELPH-IN wiki page.
- If you have any other features you have added on indices, you
will need to provide VPM entries for them as well. (If you added
them through the customization system, they may be there already.)
- If your language has aspect marked in some sentences but other forms that are just underspecified for aspect, you'll want to have the default aspect be "no-aspect". Define this as a subtype of aspect in your grammar, but don't have anything other than the semi.vpm mention it otherwise.
- You can do a similar trick for other kinds of generation ambiguity
relating to variable properties.
Test your semi.vpm file by parsing and then generating. You
should see fewer strings coming out.
Preliminaries
The goal for the next lab will be to create an "accommodation"
transfer grammar for your language by using it as the target language
in two translation pairs, with English and Chadian Arabic as the inputs.
For now, we'll be attempting to get just one sentence through, possibly
one that doesn't actually require any transfer rules.
This week, we'll be using the LOGON MT set up, which doesn't
respect ICONS. Next week, I hope to try the ACE set up, which does.
(But you'll still need the LOGON/LKB version in order to debug
transfer, I believe.)
Running the translation system
The first step is to get the tranlsation system running
for English to Chadian Arabic (eng2shu). Here are step-by-step
instructions:
- Download the English (updated 2/22 8am!) and
Chadian Arabic grammars. Unpack each of
them with tar xzf eng.tgz and tar xzf shu.tgz.
- Start two separate emacsen. Put one on the left of your
screen (this will be the "source" emacs). Put one on the right
of your screen ("target" emacs).
- Start the LKB in each. Make sure the "source" LKB Top menu
is on the left of the screen and the "target" one is on the right.
- Load the English grammar into the "source" LKB.
- Load the Chadian Arabic grammar into the "target" LKB.
- In the "target" LKB, select Options | Expand menu.
- In the "target" LKB, select Generate | Start server.
- In the "source" emacs/lkb parse the English sentence Dogs sleep.
- From the pop-up menu on the tree that comes up, select "Rephrase."
You should see a transfer output window and then the Chadian Arabic grammar should output "KALIB-PL B-UNUUM-U" and "KALIB-PL Y-UNUUM-U" in a realizations window.
Attempt to translate into your language
- Edit the file lkb/globals.lsp to add the following line,
with "iso" replaced by the three-letter iso code for your language:
(setf *translate-grid* '(:iso :eng :shu))
- Edit the flie lkb/globals.lsp in the English and Chadian Arabic
grammars so that the line for *translate-grid* now looks like
the appropriate one of the lines below (again replacing "iso" with the
code for your language).
(setf *translate-grid* '(:eng :shu :iso))
(setf *translate-grid* '(:shu :eng :iso))
- Now load your grammar into the "target" lkb.
- Parse Dogs sleep with the English grammar in the "source" lkb
and select "rephrase".
- Observe what happens: Do you get generation outputs? Some
error in the emacs buffer in the "target" emacs?
- If you get an error, you'll need to compare the MRSs to to
see what the difference is. I expect that for Dogs sleep
you won't need any transfer rules, and thus any errors should be
addressed through harmonization (aka cleaning up your MRS) and/or
work on your semi.vpm file. Thus, this is a good one to
work on for this week.
Comparing MRSs
To compare the MRSs, you can look at the MRS from the English
grammar directly, but this can be a bit misleading, since you really
want to look at the input to the generator (i.e., the transfer
output). To do this, you can select "Generate | Display Input MRS" or
"Generate | Display Internal MRS" from the "target" LKB Top menu.
- Generate | Display Internal MRS
- Parse the expected output
- Choose Indexed MRS from the pop-up menu
There are a number of things that could be wrong:
- Missing RELS or HCONS (broken diff-list append).
- Misspelled PRED values (look carefully at the underscores).
- Misspelled/differently spelled feature values (e.g. sing
instead of sg).
- Misspelled/differently spelled feature names (e.g., PERS
instead of PER).
- Incompatible variable properties (features and values).
Update semi.vpm, if necessary
The file semi.vpm provides a mapping between grammar-external
features of indices (referential indices and events) and their values,
and grammar-internal ones. For background on VPM, see the
DELPH-IN wiki.
- If your grammar uses a PERNUM feature, you'll need to map
separate PER and NUM features from the external (right-hand side) of
the VPM to a single PRENUM feature on the internal (left-hand side).
See the example under "Properties: An Example" on the DELPH-IN wiki page. (There is also a an example in the semi.vpm file in the eng grammar.)
- If your grammar encodes aspectual distinctions, you'll need
to add an ASPECT section, modeled on tense. This should allow you
to specific a default value of ASPECT as well.
- If your language has aspect marked in some sentences but other forms that are just underspecified for aspect, you'll want to have the default aspect be "no-aspect". Define this as a subtype of aspect in your grammar, but don't have anything other than the semi.vpm mention it otherwise. In the semi.vpm file, at hte bottom of your section on aspect, add:
* >> no-aspect
no-aspect << [e]
MT look-ahead
For next week, we'll be attempting to cover the sentences
in the following files:
... and so you'll need to collect (or create) translations of
those sentences into your language for Lab 9. We'll also work
on transfer rules so that we can handle cases of translation divergence.
Write up your analyses
- For whatever you fixed about your grammar as you worked on the test corpus sentence, provide the usual "phenomenon" write up (as in previous labs).
- Describe any changes you needed to make the semi.vpm file,
and the effects that including the semi.vpm had on generation.
- Describe any changes you needed to make to get MT working, with "Dogs sleep" or some other simple sentence.
- Provide the sentence that you worked on for MT, so I can test it.
- Describe the current coverage of your grammar over your test suite (using numbers you can
get from Analyze | Coverage and Analyze | Overgeneration in [incr tsdb()])
and a comparison between your baseline test suite run and your final
one for this lab (see Compare | Competence).
Back to main course page
ebender at u dot washington dot edu
Last modified: 2/22/16