Lab 4 (due 4/21)

Preliminaries

These instructions might get edited a bit over the next couple of days. I'll try to flag changes.

As usual, check the write up instructions first.

Download a customized grammar start

Be sure to try this by class on Wednesday. Bring questions about how to best answer the questionnaire for your language to class. EPost/Email if you hit bugs in the cgi script itself or don't understand the error messages it gives.

Import your test suite into [incr tsdb()]

These instructions will show you how to make a test suite 'skeleton' which [incr tsdb()] can use to create a new instance of your test suite whenever you need one (handy for comparison across time).

  1. Create a directory called tsdb
  2. Inside tsdb, create two subdirectories: home (for test suite instances) and skeletons (for skeletons).
  3. Updated 4/20/06 Save a copy of Index.lisp in tsdb/skeletons
  4. Updated 4/20/06 Save a copy of Relations in tsdb/skeletons. (If your browser doesn't like files without extensions, here's another copy of the same file with .txt appended. You should save it as just Relations.)
  5. Make a subdirectory called lab4 inside tsdb/skeletons for your testsuite. (If you choose a different name for this subdirectory, you must edit Index.lisp accordingly.)
  6. Download the perl script make_item.pl (updated 4/17 9pm, should have same functionality) and run it on your test suite from lab3:

    Updated 4/21/06

     perl make_item.pl lab3_suite.txt
  7. (If the perl script doesn't like the formatting of your test suite, edit the test suite appropriately and/or complain about the perl script on EPost.)
  8. Copy the .item file which is output by make_item.pl to tsdb/skeletons/lab4/item.
  9. Copy tsdb/skeletons/Relations to tsdb/skeletons/lab4/relations (notice the change from R to r).

Create and run an initial test suite instance

  1. Start the lkb
  2. Load your starter grammar
  3. Start [incr tsdb()] (within emacs, that's M-x itsdb)
  4. In the [incr tsdb()] podium, select Options > Database Root and input the path to tsdb/home.
  5. In the [incr tsdb()] podium, select Options > Skeleton Root and input the path to tsdb/skeletons.
  6. In the [incr tsdb()] podium, select File > Create. You should see your test suite in the menu there. Select it, and get a test suite instance. Post to EPost if this doesn't work.
  7. Once you have a test suite instance, select it (by clicking on it), then do Process > All Items.
  8. Observe that with the minimal vocabulary in your grammar, very few of your sentences will parse.
  9. Be sure to save (i.e., not overwrite or delete) this test suite instance, as you'll be asked to turn it in.

Expand lexical coverage (perhaps just a little)

When you created your customized grammar start, you created lexical types for transitive and intransitive verbs as well as at least one type of noun. Depending on the options you selected, you may have also created lexical types for determiners, case marking adpositions, a negative adverb and/or a question particle.

These lexical types are somewhat underspecified (e.g., no constraints pertaining to case or agreement). We're not going to fix that this week.

  1. Use emacs to explore the file lexicon.tdl and the lexical type definitions in klingon.tdl.
  2. Use the LKB to explore the expanded types and lexical entries.
  3. Find all the words in your test suite, and determine which ones correspond to the lexical types you already have. (NB: In most cases, a transitive verb appearing without its object is still a transitive verb, as it should introduce a two-place relation.)
  4. Updated 4/20/06 Add a few words (say, two transitive verbs, two intransitives, and two nouns) to the list on the wiki of words we'll use for MT purposes this year.
  5. New 4/20/06; Update 4/21/06 Use [incr tsdb()] to analyze the vocabulary in your test suite: Process > Vocabulary. The results which appear in your emacs window will give you a sense of which word forms have the highest token frequency in your list.
  6. For isolating languages: Add entries to lexicon.tdl for all of the words that match the types you have so far.
  7. For highly inflecting languages: For each word that matches a type you have so far, pick the most frequent form and add it to lexicon.tdl it that form. (NB: Even though a noun might be in, say, nominative form, the constraints on the lexical entry won't reflect this.)

    NB: The predicate names for your words should all be English glosses of the words (for MT purposes).

  8. Consider adding words from the MT lexicon.

Create a new test suite instance and parse again

  1. Reload your grammar in the LKB.
  2. In [incr tsdb()], choose File > Create and create a new test suite instance.
  3. Highlight the instance and choose Process > All Items
  4. Have [incr tsdb()] calculate the coverage (Analyze > Coverage) and overgeneration (Analyze > Overgeneration) your current grammar has and document it for your write up (also check out Browse > Errors).
  5. Play with the [incr tsdb()] menus and see what you can do.

Write up

Submit your assignment


ebender at u dot washington dot edu
Last modified: Mon Apr 17 2006