Lab 4 (due 2/1)
Preliminaries
These instructions might get edited a bit over the next
couple of days. I'll try to flag changes.
As usual, check the write up instructions first.
In this and future labs, I'll use klingon.tdl to refer
to the tdl file named after your language.
Download a customized grammar start
Be sure to try this by class
on Wednesday. Bring questions about how to best answer
the questionnaire for your language to class. GoPost/Email if
you hit bugs in the cgi script itself or don't understand the
error messages it gives.
- Fill out the questionnaire on the customization page
NB: Except where you've specified a lexical rule (e.g., for case
endings), the lexical entries it prompts you for should be full
form. That is, your starter grammar will only have at most a
handful of lexical rules, any other inflection you need will need to
be associated with lexical entries already.
NB2: You may find that none of the options under a given section
are appropriate. For some sections (e.g., coordination, negation) you
can just skip them. For others (e.g., word order) you have to pick
the best approximation. The customization script should indicate which
fields are required with red asterisks and/or error messages.
- Load the resulting grammar into the LKB.
- Try parsing a sentence or two.
- Edit lkb/Version.lsp in your starter grammar
so that the value of *grammar-version* reflects your grammar
rather than the Matrix.
Import your test suite into [incr tsdb()]
These instructions will show you how to make a test suite
'skeleton' which [incr tsdb()] can use to create a new instance
of your test suite whenever you need one (handy for comparison
across time).
- Create a directory called tsdb inside your grammar
directory.
- Inside tsdb, create two subdirectories: home (for
test suite instances) and skeletons (for skeletons).
- Save a copy of Index.lisp in
tsdb/skeletons
- Save a copy of Relations in
tsdb/skeletons. (If your browser doesn't like files without
extensions, here's another copy of the
same file with .txt appended. You should save it as just Relations.)
- Make a subdirectory called lab4 inside
tsdb/skeletons for your testsuite. (If you choose a different
name for this subdirectory, you must edit Index.lisp accordingly.)
- Download the perl script make_item.pl
and run it on your test suite from lab3:
perl make_item.pl lab3_suite.txt
- (If the perl script doesn't like the formatting of your test suite,
edit the test suite appropriately and/or complain about the perl
script on GoPost.)
- Copy the .item file which is output by make_item.pl
to tsdb/skeletons/lab4/item.
- Copy tsdb/skeletons/Relations to tsdb/skeletons/lab4/relations (notice the change from R to r).
Create and run an initial test suite instance
- Start the lkb
- Load your starter grammar
- Start [incr tsdb()] (within emacs, that's M-x itsdb)
- In the [incr tsdb()] podium, select Options > Database Root
and input the path to tsdb/home.
- In the [incr tsdb()] podium, select Options > Skeleton Root
and input the path to tsdb/skeletons.
- Optional: For future use, you can set these variables
ahead of time in a file called .tsdbrc in your home directory.
It should contain these lines, with path names edited appropriately:
(in-package :tsdb)
(setf *tsdb-home* "path-to-tsdb/home")
(setf *tsdb-skeleton-directory* "path-to-tsdb/skeletons")
- In the [incr tsdb()] podium, select File > Create. You should
see your test suite in the menu there. Select it, and get a test suite
instance. Post to GoPost if this doesn't work.
- Make sure your grammr is loaded into the LKB.
- Once you have a test suite instance, select it (by clicking on it),
then do Process > All Items.
- Observe that with the minimal vocabulary in your grammar,
very few of your sentences will parse.
- Be sure to save (i.e., not overwrite or delete) this test suite
instance, as you'll be asked to turn it in.
Expand lexical coverage (perhaps just a little)
When you created your customized grammar start, you created
lexical types for transitive and intransitive verbs as well
as at least one type of noun. Depending on the options you selected,
you may have also created lexical types for determiners, case
marking adpositions, a negative adverb and/or a question particle.
These lexical types are somewhat underspecified (e.g., no
constraints pertaining to agreement). We're not going
to fix that this week.
- Use emacs to explore the file lexicon.tdl and the
lexical type definitions in klingon.tdl.
- Use the LKB to explore the expanded types and lexical entries.
- Find all the words in your test suite, and determine which ones
correspond to the lexical types you already have. (NB: In most cases,
a transitive verb appearing without its object is still a transitive
verb, as it should introduce a two-place relation.)
- Use [incr tsdb()] to analyze
the vocabulary in your test suite: Process > Vocabulary. The results
which appear in your emacs window will give you a sense of which word forms
have the highest token frequency in your list.
- For isolating languages: Add entries to lexicon.tdl for all of the words
that match the types you have so far.
- For highly inflecting languages: For each word that matches
a type you have so far, pick the most frequent form and add it to
lexicon.tdl it that form. (NB: Even though a verb might be
in, say, the 3sg present tense form the constraints on the lexical entry won't
reflect this.)
NB: The predicate names for your words should all be English
glosses of the words (for MT purposes, and because it helps
in dealing with a language you don't know very well). Furthemore,
they should take the form
_gloss_pos_rel
where gloss
is the English gloss and pos
is a coarse-grained pos tag
(e.g., n
, v
). This is the standard DELPHIN/MRS
format for predicate names.
Create a new test suite instance and parse again
- Reload your grammar in the LKB.
- In [incr tsdb()], choose File > Create and create a new
test suite instance.
- Highlight the instance and choose Process > All Items
- Have [incr tsdb()] calculate the coverage (Analyze > Coverage) and
overgeneration (Analyze > Overgeneration) your
current grammar has and document it for your write up (also check out
Browse > Errors).
- Play with the [incr tsdb()] menus and see what you can do.
- Document the choices you made on the customization page and why. If you skipped over any sections because the option you needed wasn't there, please describe how your language differs from the options provided.
- Document what happened when you tried parsing a sentence with your
starter grammar. Did it parse? Did it return the expected structures?
If it didn't parse or didn't return the expected structures, why not?
(That is, what can you find out about why not?)
- Document the current coverage/overgeneration of your grammar
on your test suite, per [incr tsdb()], before and after you expanded
the vocabulary.
- In about a page, discuss which phenomena will give you the
biggest bang for the buck in terms of improving coverage/overgeneration
over your test suite. I.e., consider things that occur in many
of your sentences. Are there any phenomena which are unnecessarily
complicating your test suite? That is, are you inspired to simplify
any of your sentences to have them illustrate fewer phenomena at a time?
ebender at u dot washington dot edu
Last modified: Thu Jan 24 2008