Lab 2 (due 1/19 11:59 pm)

Overview

The goal of this lab is to make a start on the test suite that will be your development target, on the one hand, and to customize a first version of your grammar start on the other. I've selected phenomena to cover in this lab with an eye to starting with those that are essential to creating a working grammar. You'll probably want to work on these two subtasks in parallel, though they are described separately in the instructions below.

Preliminaries

Canvas Discussion

In order to encourage people to get started early, this assignment requires that you post a question to the discussion on the course Canvas by Tuesday night. Examples: A question about something in these instructions that's unclear, a question about something in your grammar resources that's unclear, or a question about something in the customization system that's unclear. Or all of the above! Ask away :) (When I'm grading this part of the assignment, I'm going to check that both partners for each grammar posted a question.)

Version control

In order to support collaborative work on the grammars, but for lots of other reasons, you are asked to use version control for your work in this class. If both partners are comfortable with some other version control system, you may use it. Otherwise, please use svn. (NB: The repository should be created only once --- not once by each partner --- but checked out by both partners.) Here are directions for doing this with svn; you can do analogous things with git or other version control systems.

Read the Treehouse wiki pages on subversion (svn).
Request an acccount on lemur, our svn server, from linghelp-at-u (Brandon Graves)
Decide on a project name (e.g., the name of your language) and create a directory of that name on your local computer.
You'll put your test suite and choices files in the directory for now. (We'll put the grammars themselves under version control once we're done downloading them. Until then, it'll just be annoying, and all the information is in the choices files anyway.)
From the directory above your project directory, invoke the following command:
```
svn import project-name svn://lemur.ling.washington.edu/shared/project-name
```
Where "project-name" should be the name of your directory (two instances).
Move your old version of that directory out of the way:
```
mv project-name project-name-old
```

Check out a working copy from svn:

svn checkout svn://lemur.ling.washington.edu/shared/project-name

Check that it's the same as what you had:
```
diff -r project-name-old project-name
```
If so, delete the version that's not under svn:
```
rm -r project-name-old
```
As you make changes to your test suite, commit them to the svn repository:
```
svn commit
```
Before working on the test suite or choices file, run
```
svn update
```
to grab any changes your partner made.
Added bonus: If you are working on separate machines (e.g., if you want to work in the Treehouse, say, to get my help in debugging something during my office hours), you can use svn to keep the locations in sync.

Test Suite

The first task is to create positive and negative example sentences illustrating the following phenomena, to the extent that they are relevant for your language:

word order
pronouns (includes person/number/gender)
case
the rest of the NP
Matrix wh questions
Tense/aspect

Before you start, read the general instructions for testsuites and the formatting instructions.

Starter grammar

The second task is to create a starter grammar by filling out the required sections of the Grammar Matrix customization questionnaire. The goal here is to get as much coverage as you can over your test suite for the phenomena indicated using only the customization system (no hand-editing of tdl files yet). In particular, you'll need to address these sections:

General information
Word order
Number
Person
Gender (if applicable)
Case
Direct-inverse (if appropriate)
Lexicon

In the word order section, you can skip the auxiliaries by saying "no" on that question for now. When we get to auxiliaries, you may of course revise this answer.

Note that your testsuite for this week will include examples for wh-questions and tense/aspect, but your grammar is not expected to cover these phenomena. In the first case (wh-questions), the customization system doesn't yet provide any analyses. In the second, I'm just asking you to save the customization system work for Lab 3!

In the lexicon section, you should define lexical types for transitive and intransitive verbs and nouns. If appropriate, you should define determiners and case-marking adpositions. Where the lexicon asks you for pred values, use English glosses instead of the stem itself. Thus no matter what language you are working on, if you have an entry for a noun which translates as dog, its pred value should be _dog_n_rel. The pred value for all pronouns should be pron_rel (the person/number/gender features will provide any further differentiation that is needed).

Please do NOT try to add any morphological rules this week! (Exception: If you have absolutely everything else working and your write up done, and you feel like doing more, then go for it.) This means that if you have any morphological complexity, you'll be creating full form lexical entries (e.g. for verbs marked for certain agreement information or nouns marked for certain case values). This means you'll want to keep your lexicon small for this week. We'll add morphological rules in Lab 3.

If you have case inflections, and you're making full form lexical entries for your nouns, then you'll need to define different noun types for different case values. When we add the lexical rules next week, your noun entries should have no case value and the rules should fill it in. (This will require replacing your multiple noun types with just one, another reason to keep the lexicon small for now.)

Make sure you can parse individual sentences

Once you have created your starter grammar (or each time you create one, as you should iterate through grammar creation and testing a few times as you refine your choices), try it out on a couple of sentences interactively to see if it works:

Load the grammar into the LKB.
Using the parse dialog box (or 'C-c p' in emacs to get the parse command inserted at your prompt), enter a sentence to parse.
Examine the results. If it does parse, check out the semantics (pop-up menu on the little trees). If it doesn't look at the parse chart to see why not.
Problems with lexical rules and lexical entries often become apparent here, too: If the LKB can't find an analysis for one of your words, it will say so, and (obviously) fail to parse the sentence.

Note that the questionnaire has a section for test sentences. If you use this, then the parse dialog will be pre-filled with your test sentences.

[incr tsdb()] profile

The final step for this lab is to use the [incr tsdb()] grammar profiling system to test the performance of your starter grammar over your test suite, and then examine the results. (You may find in doing so that you want to refine certain aspects of your starter grammar. You can do this by uploading the file "choices" which comes with your grammar into the customization system and then tweaking from there.)

Create a test suite skeleton

Look at the contents of your grammar directory and locate the subdirectory called tsdb.
Make a subdirectory called lab2 inside tsdb/skeletons for your test suite.

Edit tsdb/skeletons/Index.lisp to include a line for this directory, e.g.:

(
((:path . "matrix") (:content . "matrix: A test suite created automatically from the test sentences given in the Grammar Matrix questionnaire."))
((:path . "lab2") (:content . "Test suite collected for Lab 2."))
)

Download the python script make_item, make sure it is executable, and run it on your test suite:
```
make_item testsuite.txt
```
Notes on make_item:
- This script is going to be pretty picky about the format of your test suite. If you have questions, please post to Canvas (10 minute rule!).
- It requires python3, which is on the current version of the Ubuntu+LKB appliance.
- Alternatively, you can copy your testsuite and make_item over to patas and run there, or install python3 (from http://python.org/download) on your host OS (mac or windows), and run make_item outside VirtualBox.
- If the above command is successful, testsuite.txt.item would be created in the working directory. If the testsuite contains errors, it's possible that a lot of output will appear on stderr. It maybe useful to redirect this into a file that you can use to go through and correct the errors one at a time. For example:
  ./make_item testsuite.txt item 2>errs
  The command just above attempts to create 'item' in the working directory, and stderr messages are redirected to the file 'errs'.
  make_item contains a default mapping from testsuite line types into particular fields of the [incr_tsdb()] item file. The default mapping puts 'orth' into 'i-input', the field which the is the input to the grammar. If your grammar targets a different testsuite line, override the default mapping with the -m/--map option.
  ./make_item --map orth-seg i-input testsuite.txt item
  The invocation above maps the orth-seg line into the input field.
  You can run make_item with -h/--help to see a summary of the options.
Copy the .item file which is output by make_item to tsdb/skeletons/lab2/item.
Copy tsdb/skeletons/Relations to tsdb/skeletons/lab2/relations (notice the change from R to r).

The final directory structure should look like this:

grammar/data/testsuite
grammar/data/make_item
 

grammar/data/testsuite.item
grammar/tsdb/skeletons/Index.lisp            (lists the testsuites)
grammar/tsdb/skeletons/Relations             (master copy of the database schema)
grammar/tsdb/skeletons/lab2/item             (copy of ../../data/testsuite.item)
grammar/tsdb/skeletons/lab2/relations        (copy of ../Relations)
grammar/tsdb/home                            (directory to store test profiles)

Create and run an initial test suite instance

Start the lkb
Load your starter grammar. (The script file is in your-grammar-dir/lkb/script.)
Start [incr tsdb()] (within emacs, that's M-x itsdb)
In the [incr tsdb()] podium, select Options > Database Root and input the path to tsdb/home.
In the [incr tsdb()] podium, select Options > Skeleton Root and input the path to tsdb/skeletons.
Optional: For future use, you can set these variables ahead of time in a file called .tsdbrc in your home directory. It should contain these lines, with path names edited appropriately:
```
(tsdb:tsdb  :home  "path-to-tsdb/home")
(tsdb:tsdb  :skeleton  "path-to-tsdb/skeletons")
```
In the [incr tsdb()] podium, select File > Create. You should see your test suite in the menu there. Select it, and get a test suite instance. Post to GoPost if this doesn't work.
Make sure your grammr is loaded into the LKB.
Once you have a test suite instance, select it (by clicking on it), then do Process > All Items.
Explore the results, with functions such as Browse > Results, Analyze > Coverage, and Analyze > Overgeneration.
Be sure to save (i.e., not overwrite or delete) this test suite instance, as you'll be asked to turn it in.

Note If your tsdb/ directory is inside a shared folder on VirtualBox, it will not work.

Write up

NB: While the test suite and choices file creation is joint work, the write up should be done by one partner (the other will get a turn next week). The writing partner should have the non-writing partner review the write up and make suggestions.

Your write up should be a plain text file (not .doc, .rtf or .pdf) which includes the following:

Documentation of the phenomena you have added to your testsuite, illustrated with examples from the testsuite.
Documentation of the choices you made in the customization system, illustrated with examples from your test suite.
- This can be interleaved with the documentation of the phenomena (so you describe each phenomenon and then the choices you used to add an analysis of it to the grammar), but the documentation of the phenomenon and choices should be logically separate. Here's an example of what this should look like.
Descriptions of any properties of your language illustrated in your test suite but not covered by your starter grammar and/or the customization system.
Documentation the coverage of your grammar over the testsuite. This should include both summary numbers, which you can get by using the Analyze | Coverage and Analyze | Overgeneration options in [incr tsdb()], and discussion of specific examples. If there are examples that thare parsed incorrectly (unanalyzed grammatical examples, analyzed ungrammatical examples, or grammatical examples assigned surprising parses), reflect on why that might be.
Finally, if there are any places where the customization system seems unable to cope with the properties of your language (within the phenomena addressed in this lab), describe them here.

Submit your assignment

Be sure your write up and the text-file version of your test suite are included in your grammar directory.
Likewise, make sure to include your most current tsdb profile in the grammar directory (ideally inside tsdb/home/).
If you're using svn, export the grammar so I don't get all your .svn files:
```
svn export yourgrammar iso-lab2

For git, please do the equivalent.
```

Create a tarball:

      tar czf iso-lab2.tgz iso-lab2

Upload the tarball to Canvas under the name of the partner who did the write up.

Back to course page

ebender at u dot washington dot edu

Last modified: 1/8/16