The goal of this lab is to take the grammar we got to last week and improve it for still more phenomena, again building out a small testsuite as you go, and also incorporating improvements to the auto-generated choices files provided by the AGG project. You'll also be using [incr tsdb()] to test the resulting grammar and compare it to your starting point from last week.
This lab entails the following general steps (ordered to have you working on testsuites first, so that we have a little extra time to produce better auto-generated choices files):
Choose 3 phenomena to work on, from the following list:
Add examples to your testsuite, according to the general instructions for testsuites and the formatting instructions, illustrating the phenomena you worked on above. The testsuite should have both positive and negative examples, but doesn't need to be exhaustive (since we're working with test corpora this year), but you'll want both positive and negative examples for each of the phenomena you work on in this section. I expect these testsuites to have about 20-30 examples total by the end of this week, though you can do more if you find that useful. All examples should be simple enough that your grammar can parse them or fails to parse them because of the one thing that's wrong with them.
( ((:path . "matrix") (:content . "matrix: A test suite created automatically from the test sentences given in the Grammar Matrix questionnaire.")) ((:path . "corpus") (:content . "IGT provided by the linguist")) ((:path . "lab4") (:content . "Test suite collected for Lab 4.")) )
make_item testsuite.txt
Notes on make_item:
testsuite.txt.item
would be created in the working directory. If the testsuite contains errors, it's possible that a lot of output will appear on stderr. It maybe useful to redirect this into a file that you can use to go through
and correct the errors one at a time. For example:
./make_item testsuite.txt item 2>errs
The command just above attempts to create 'item' in the working directory, and stderr messages are redirected to the file 'errs'.
make_item
contains a default mapping from testsuite line types into particular fields of the [incr_tsdb()]
item file. The default mapping puts 'orth' into 'i-input', the field which the is the input to the grammar. If your grammar targets a different testsuite line, override the default mapping with the -m
/--map
option.
./make_item --map orth-seg i-input testsuite.txt item
The invocation above maps the orth-seg
line into the input field.
You can run make_item
with -h
/--help
to see a summary of the options.
There should be updates to the choices files again to work with this week, at least adding choices for negation, possibly fixing other bugs. Incorporate these into your choices file.
For the three phenomena you chose above, refine the choices file by hand. Please be sure to post lots of questions on Canvas as you work on this!
Once you have created your starter grammar (or each time you create one, as you should iterate through grammar creation and testing a few times as you refine your choices), try it out on a couple of sentences interactively to see if it works:
Note that the questionnaire has a section for test sentences. If you use this, then the parse dialog will be pre-filled with your test sentences.
Following the same procedure as usual, do test runs over both the testsuite and the test corpus.
Again, collect the following information to provide in your write up:
NB: While the test suite and choices file creation is joint work, the write up should be done by one partner (the other will get a turn next week). The writing partner should have the non-writing partner review the write up and make suggestions.
Your write up should be a plain text file (not .doc, .rtf or .pdf) which includes the following:
svn export yourgrammar iso-lab4 For git, please do the equivalent.
tar czf iso-lab4.tgz iso-lab4