Lab 2 (due 4/12 11:00 pm)
The goal of this lab is to make a start on the test suite
that will be your development target, on the one hand, and to
customize a first version of your grammar start on the other.
I've selected phenomena to cover in this lab with an eye to
starting with those that are essential to creating a working
grammar. You'll probably want to work on these two subtasks
in parallel, though they are described separately in the instructions
below.
Back to top
Preliminaries
GoPost
In order to encourage people to get started early, this assignment
requires that you post a question
to GoPost
by Tuessday night. Examples: A question about something in these
instructions that's unclear, a question about something in your
grammar resources that's unclear, or a question about something in the
customization system that's unclear. Or all of the above! Ask away :)
(When I'm grading this part of the assignment, I'm going
to check that both partners for each grammar posted
a question.)
In order to support collaborative work on the grammars,
but for lots of other reasons, you are asked to use version
control for your work in this class. If both partners are
comfortable with some other version control system, you may
use it. Otherwise, please use svn. (NB: The repository should
be created only once --- not once by each partner --- but
checked out by both partners.)
- Read David's wiki pages on subversion (svn).
- Request an acccount on lemur, our svn server,
from linghelp-at-u (David Brodbeck).
- Decide on a project name (e.g., the name of your language)
and create a directory of that name on your local computer.
- You'll put your test suite and choices files in the directory for now.
(We'll put the grammars themselves under version control once we're done downloading them. Until then, it'll just be annoying, and all the information is in the choices files anyway.)
- From the directory above your project directory, invoke the following command:
svn import project-name svn://lemur.ling.washington.edu/shared/project-name
Where "project-name" should be the name of your directory (two instances).
- Move your old version of that directory out of the way:
mv project-name project-name-old
- Check out a working copy from svn:
svn checkout svn://lemur.ling.washington.edu/shared/project-name
- Check that it's the same as what you had:
diff -r project-name-old project-name
- If so, delete the version that's not under svn:
rm -r project-name-old
- As you make changes to your test suite, commit them to the svn repository:
svn commit
- Before working on the test suite or choices file, run
svn update
to grab any changes your partner made.
- Added bonus: If you are working on separate machines (e.g., if you want to work in the Treehouse, say, to get my help in debugging something during my office hours), you can use svn to keep the locations in sync.
Back to top
Test Suite
The first task is to create positive and negative example sentences
illustrating the following phenomena, to the extent that they
are relevant for your language:
Before you start, read the general instructions for
testsuites and the formatting
instructions.
Back to top
Starter grammar
The second task is to create a starter grammar by filling out
the required sections of the Grammar Matrix customization questionnaire. The goal
here is to get as much coverage as you can over your test suite using
only the customization system (no hand-editing of tdl files yet). In
particular, you'll need to address these sections:
- General information
- Word order
- Number
- Person
- Gender (if applicable)
- Case
- Direct-inverse (if appropriate)
- Lexicon
In the word order section,
you can skip the auxiliaries by saying "no" on that question for now.
When we get to auxiliaries, you may of course revise this answer.
In the lexicon section, you should define lexical types for transitive
and intransitive verbs and nouns. If appropriate, you should define
determiners and case-marking adpositions.
If you have case and/or agreement, you'll need to define morpheme
slots and morphemes for verbs and nouns as appropriate. In many
languages, the agreement morphemes on verbs also mark, say tense.
We'll ignore this for now, but return to it soon. If you want to define
other affixes without giving them morphosyntactic content, you can.
Make sure you can parse individual sentences
Once you have created your starter grammar (or each time you
create one, as you should iterate through grammar creation and
testing a few times as you refine your choices), try it out on a
couple of sentences interactively to see if it works:
- Load the grammar into the LKB.
- Using the parse dialog box (or 'C-c p' in emacs to get the parse
command inserted at your prompt), enter a sentence to parse.
- Examine the results. If it does parse, check out the semantics (pop-up menu on the little trees). If it doesn't look at the parse chart to see why not.
- Problems with lexical rules and lexical entries often become apparent here, too: If the LKB can't find an analysis for one of your words, it will say so, and (obviously) fail to parse the sentence.
Note that the questionnaire has a section for test sentences. If
you use this, then the parse dialog will be pre-filled with your test sentences.
Back to top
[incr tsdb()] profile
The final step for this lab is to use the [incr tsdb()] grammar
profiling system to test the performance of your starter grammar over
your test suite, and then examine the results. (You may find in doing
so that you want to refine certain aspects of your starter grammar.
You can do this by uploading the file "choices" which comes with your
grammar into the customization system and then tweaking from there.)
Create a test suite skeleton
- Create a directory called tsdb inside your grammar
directory.
- Inside tsdb, create two subdirectories: home (for
test suite instances) and skeletons (for skeletons).
- Save a copy of Index.lisp in
tsdb/skeletons
- Save a copy of Relations in
tsdb/skeletons. (If your browser doesn't like files without
extensions, here's another copy of the
same file with .txt appended. You should save it as just Relations.)
- Make a subdirectory called lab2 inside
tsdb/skeletons for your test suite. (If you choose a different
name for this subdirectory, you must edit Index.lisp accordingly.)
- Download the python script make_item, make
sure it is executable, and run it on your test suite:
make_item testsuite.txt
Notes on make_item:
- This is a new version of the script, produced by the AGGREGATION
project, and it should be much more informative than what we worked with
in previous years. It's also going to be pretty picky about the format
of your test suite. If you have questions, please post to GoPost.
- It requires python3, which is not part of the Knoppix+LKB VB appliance (yet). To install python3 on Knoppix+LKB, do the following in a terminal:
- cd
- mkdir tmp
- cd tmp
- wget http://www.python.org/ftp/python/3.3.0/Python-3.3.0.tgz && tar xvf python-3.3.0.tgz
- cd Python-3.3.0
- ./configure
- make
- sudo make altinstall
- The password is the same as the username
- Now you can invoke make_item as follows:
python3.3 make_item testsuite.txt
- Alternatively, you can copy your testsuite and make_item over to patas and run there, or install python3 (from http://python.org/download) on your host OS (mac or windows), and run make_item outside VirtualBox.
- If the above command is successful,
testsuite.txt.item
would be created in the working directory. If the testsuite contains errors, it's possible that a lot of output will appear on stderr. It maybe useful to redirect this into a file that you can use to go through
and correct the errors one at a time. For example:
./make_item testsuite.txt item 2>errs
The command just above attempts to create 'item' in the working directory, and stderr messages are redirected to the file 'errs'.
make_item
contains a default mapping from testsuite line types into particular fields of the [incr_tsdb()]
item file. The default mapping puts 'orth' into 'i-input', the field which the is the input to the grammar. If your grammar targets a different testsuite line, override the default mapping with the -m
/--map
option.
./make_item --map orth-seg i-input testsuite.txt item
The invocation above maps the orth-seg
line into the input field.
You can run make_item
with -h
/--help
to see a summary of the options.
- Copy the .item file which is output by make_item
to tsdb/skeletons/lab2/item.
- Copy tsdb/skeletons/Relations to tsdb/skeletons/lab2/relations (notice the change from R to r).
- The final directory structure should look like this:
grammar/data/testsuite
grammar/data/make_item
grammar/data/testsuite.item
grammar/tsdb/skeletons/Index.lisp (lists the testsuites)
grammar/tsdb/skeletons/Relations (master copy of the database schema)
grammar/tsdb/skeletons/lab2/item (copy of ../../data/testsuite.item)
grammar/tsdb/skeletons/lab2/relations (copy of ../Relations)
grammar/tsdb/home (directory to store test profiles)
Create and run an initial test suite instance
- Start the lkb
- Load your starter grammar. (The script file is in your-grammar-dir/lkb/script.)
- Start [incr tsdb()] (within emacs, that's M-x itsdb)
- In the [incr tsdb()] podium, select Options > Database Root
and input the path to tsdb/home.
- In the [incr tsdb()] podium, select Options > Skeleton Root
and input the path to tsdb/skeletons.
- Optional: For future use, you can set these variables
ahead of time in a file called .tsdbrc in your home directory.
It should contain these lines, with path names edited appropriately:
(tsdb:tsdb :home "path-to-tsdb/home")
(tsdb:tsdb :skeleton "path-to-tsdb/skeletons")
- In the [incr tsdb()] podium, select File > Create. You should
see your test suite in the menu there. Select it, and get a test suite
instance. Post to GoPost if this doesn't work.
- Make sure your grammr is loaded into the LKB.
- Once you have a test suite instance, select it (by clicking on it),
then do Process > All Items.
- Explore the results, with functions such as Browse > Results and Analyze > Competence.
- Be sure to save (i.e., not overwrite or delete) this test suite
instance, as you'll be asked to turn it in.
Back to top
Write up
NB: While the test suite and choices file creation
is joint work, the write up should be done by one partner (the
other will get a turn next week). The writing partner should
have the non-writing partner review the write up and make suggestions.
Your write up should be a plain text file (not .doc, .rtf or .pdf)
which includes the following:
- Documentation the choices you made in the customization
system, illustrated with examples from your test suite. Here's an example of what this should look like.
- Descriptions of any properties of your language illustrated
in your test suite but not covered by your starter grammar and/or
the customization system.
- Documentation the coverage of your grammar over the testsuite.
This should include both summary numbers, which you can get by using the Analyze | Coverage and Analyze | Overgeneration options in [incr tsdb()], and discussion of specific examples.
If there are examples that thare parsed incorrectly (unanalyzed
grammatical examples, analyzed ungrammatical examples, or grammatical
examples assigned surprising parses), reflect on why that might be.
- Finally, if there are any places where the customization system
seems unable to cope with the properties of your language (within the
phenomena addressed in this lab), describe them here.
Back to top
Back to top
Back to course page
ebender at u dot washington dot edu
Last modified: 4/4/13