The goal of this lab is to make a start on the test suite that will be your development target, on the one hand, and to customize a first version of your grammar start on the other. I've selected phenomena to cover in this lab with an eye to starting with those that are essential to creating a working grammar. You will also be getting set up with ALL the software, including [incr tsdb()] and make_item, both of which are new this week. Please read through the instructions completely before starting and plan to work on the substasks in parallel, though they are described separately in the instructions below.
In order to encourage people to get started early, this assignment requires that you post a question to the discussion on the course Canvas by Tuesday night. Examples: A question about something in these instructions that's unclear, a question about something in your grammar resources that's unclear, or a question about something in the customization system that's unclear. Or all of the above! Ask away :)
(When I'm grading this part of the assignment, I'm going to check that both partners for each grammar posted a question.)
When posting to Canvas, please create separate threads for separate questions and always include the name of the language we're talking about!
Please use version control for your work in this class. You may use any system you are comfortable with. Please negotiate this with your partner. If neither of you already have experience with version control systems, please post to Canvas for guidance ASAP.
The first task is to create positive and negative example sentences illustrating the following phenomena, to the extent that they are relevant for your language:
Before you start, read the general instructions for testsuites and the formatting instructions.
The second task is to create a starter grammar by filling out the required sections of the Grammar Matrix customization questionnaire. The goal here is to get as much coverage as you can over your test suite for the phenomena indicated using only the customization system (no hand-editing of tdl files yet). In particular, you'll need to address these sections:
In the word order section, you can skip the auxiliaries by saying "no" on that question for now. When we get to auxiliaries, you may of course revise this answer.
Note that your testsuite for this week will include examples for wh-questions and tense/aspect, but your grammar is not expected to cover these phenomena. The plan is to have the testsuite construction run somewhat ahead of the grammar construction, so that by the time we're done with customization, we'll have testsuites mapping what the tdl editing should do.
In the lexicon section, you should define lexical types for transitive and intransitive verbs and nouns. If appropriate, you should define determiners and case-marking adpositions. Where the lexicon asks you for pred values, use English glosses instead of the stem itself. Thus no matter what language you are working on, if you have an entry for a noun which translates as dog, its pred value should be _dog_n_rel. The pred value for all pronouns should be pron_rel (the person/number/gender features will provide any further differentiation that is needed).
Please do NOT try to add any morphological rules this week! (Exception: If you have absolutely everything else working and your write up done, and you feel like doing more, then go for it.) This means that if you have any morphological complexity, you'll be creating full form lexical entries (e.g. for verbs marked for certain agreement information or nouns marked for certain case values). This means you'll want to keep your lexicon small for this week. We'll add morphological rules in Lab 3.
If you have case inflections, and you're making full form lexical entries for your nouns, then you'll need to define different noun types for different case values. When we add the lexical rules next week, your noun entries should have no case value and the rules should fill it in. (This will require replacing your multiple noun types with just one, another reason to keep the lexicon small for now.)
Once you have created your starter grammar (or each time you create one, as you should iterate through grammar creation and testing a few times as you refine your choices), try it out on a couple of sentences interactively to see if it works:
Note that the questionnaire has a section for test sentences. If you use this, then the parse dialog will be pre-filled with your test sentences.
In this portion of the lab you will use the [incr tsdb()] grammar profiling system to test the performance of your starter grammar over your test suite, and then examine the results. (You may find in doing so that you want to refine certain aspects of your starter grammar. You can do this by uploading the file "choices" which comes with your grammar into the customization system and then tweaking from there.)
( ((:path . "matrix") (:content . "matrix: A test suite created automatically from the test sentences given in the Grammar Matrix questionnaire.")) ((:path . "lab2") (:content . "Test suite collected for Lab 2.")) )
make_item testsuite.txt
Notes on make_item:
testsuite.txt.item
would be created in the working directory. If the testsuite contains errors, it's possible that a lot of output will appear on stderr. It maybe useful to redirect this into a file that you can use to go through
and correct the errors one at a time. For example:
./make_item testsuite.txt item 2>errs
The command just above attempts to create 'item' in the working directory, and stderr messages are redirected to the file 'errs'.
make_item
contains a default mapping from testsuite line types into particular fields of the [incr_tsdb()]
item file. The default mapping puts 'orth' into 'i-input', the field which the is the input to the grammar. If your grammar targets a different testsuite line, override the default mapping with the -m
/--map
option.
./make_item --map orth-seg i-input testsuite.txt item
The invocation above maps the orth-seg
line into the input field.
You can run make_item
with -h
/--help
to see a summary of the options.
grammar/data/testsuite grammar/data/make_item grammar/data/testsuite.item grammar/tsdb/skeletons/Index.lisp (lists the testsuites) grammar/tsdb/skeletons/Relations (master copy of the database schema) grammar/tsdb/skeletons/lab2/item (copy of ../../data/testsuite.item) grammar/tsdb/skeletons/lab2/relations (copy of ../Relations) grammar/tsdb/home (directory to store test profiles)
(tsdb:tsdb :home "path-to-tsdb/home") (tsdb:tsdb :skeleton "path-to-tsdb/skeletons")
Note If your tsdb/ directory is inside a shared folder on VirtualBox, it will not work.
NB: While the test suite and choices file creation is joint work, the write up should be done by one partner (the other will get a turn next week). The writing partner should have the non-writing partner review the write up and make suggestions. Please indicate in your write up which partner took which role this week.
Your write up should be a plain text file (not .doc, .rtf or .pdf) which includes the following:
svn export yourgrammar iso-lab2 For git, please do the equivalent.
tar czf iso-lab2.tgz iso-lab2