Linguistics 471: Grammar Engineering
Lab 1 Due 4/4
Note: Be sure to upload your files to Dante (see
below) before leaving the lab, or all of your work will be lost.
In general, you'll want to keep the most current version of your grammar
on Dante so that you can access it from wherever you get a chance to
work. (The exception would be if you are working exclusively on your
own laptop... however even then, you should consider uploading to Dante
as a means of backing up your work.)
Download the Grammar Matrix
Start the LKB and load the grammar matrix
- Start emacs (Desktop > Linguistics471 > bin > runemacs.exe)
- In emacs, type M-x (alt-x) fi:common-lisp
(note that after you type M-x, you're in the mini-buffer)
- Hit return until it's done asking questions.
- The LKB top window should appear.
- In the LKB top window, do Load > Complete Grammar.
- Select the lkb subdirectory of the matrix folder and the file script.
Personalize your copy of the Matrix
- Rename the file my_language.tdl to reflect the language
you are working on, e.g., esperanto.tdl.
- Consider opening it in emacs and adding some comments
to the beginning. The comment character is ;.
- Open the file lkb/script and replace my_language.tdl
with your new file name. (It only occurs twice, once in
a comment (;;;) that you don't need to change, and once
in the code.)
- Observe that the bar across the bottom of
the emacs window changes when you've made a change to
the current buffer.
- Save the changes to script (C-x s), and observe
the bar across the bottom changing again.
- Open the file matrix.tdl.
- Make sure you know how to switch between buffers
so you can get to matrix.tdl and your renamed my_language.tdl.
(C-x b)
Add lexical entries
Note The default place to make changes is your
renamed my_language.tdl file. In principle, you should not
need to make any changes to matrix.tdl. You may need to use
some other files (e.g., for lexical entry and rule instances).
In the following instructions, if no particular file is mentioned,
use your renamed my_language.tdl file.
- Add some subtypes of the type
head for nouns, verbs, and determiners. I recommend the following:
noun := head.
verb := head.
det := head.
For some languages, the type det might not be necessary.
For others, you might want to add a type postposition or some
sort. (For example, one analysis of Japanese case particles
is that they are all in fact postpositions. Thus vanilla transitive
verbs don't take NP arguments, but rather PP arguments.)
- Now introduce a subtype of basic-noun-lex and either
basic-one-arg (if you're going to use a determiner) or norm-zero-arg
(if not), as follows:
noun-lex := basic-noun-lex & basic-one-arg &
[ SYNSEM.LOCAL [ CAT [ HEAD noun,
VAL [ SPR < #spr >,
COMPS < >,
SUBJ < >,
SPEC < > ]],
ARG-S < #spr > ]].
or
noun-lex := basic-noun-lex & norm-zero-arg &
[ SYNSEM.LOCAL [ CAT [ HEAD noun,
VAL [ SPR < >,
COMPS < >,
SUBJ < >,
SPEC < > ]]]].
These types specify that the HEAD value (part of speech) is
noun, and what to do with the valence features. In the first
case, there is a specifier requirement, but no subject, complements
or SPEC (we'll get back to this one), and furthermore, the single
item on the argument structure list (ARG-S) is the specifier.
In the second case, all of the valence lists are empty.
- Reload the grammar (LKB > Load > Reload grammar), and
fix any errors that the LKB reports.
- Look at the type definition for noun-lex within the
lkb (LKB > View > Type definition). This should show what you
just added (in a slightly different format).
- Now look at the expanded type for noun-lex
(LKB > View > Expanded type). All of that additional information
is inherited from the supertypes you specified.
- Constrain the HEAD value of the specifier requirement (if
any) to be det. Use the expanded type view to help you figure
out what the path is that you need to use for that feature.
(Hint: The elements of valence lists are objects of type synsem.
Also, you can join an identity tag and a feature constraint with &:
#spr & [ ... ].)
(If you're stuck on this one -- ask!)
- Open the file lexicon.tdl and replace the dummy entry (foo) with
a noun instance, inheriting from noun-lex, e.g.,:
penguin := noun-lex &
[ STEM < "penguin" >].
Use fully inflected forms for now. We'll add inflectional rules later.
This lexical entry is also missing semantics. We'll get to that next week.
- Open the file roots.tdl and add the following root condition:
root := lex-item.
(This says that a lexeme can serve as a stand-alone utterance.)
- Try to parse a sentence consisting of just your noun
(e.g., "penguin").
- Now add types for intransitive verbs (inheriting from
basic-verb-lex and intransitive-lex-item) and transitive
verbs (inheriting from basic-verb-lex and transitive-lex-item),
analogously for what we did for nouns above.
Note: The convention for ARG-S is that the first element is
the specifier (if any -- verbs probably won't), followed by
the subject (if any), followed by the complements. SPEC should
be empty for verbs (it's used for determiners and degree specifiers
of adjectives, etc).
- Add lexical entries (in lexicon.tdl) for your transitive
and intransitive verbs, again sticking with fully inflected forms
and not worrying about agreement for now.
- Add particles/determiners as necessary. For determiners, define a
subtype of basic-determiner-lex. If your language requires something
else to complete a noun phrase, talk to me.
- Reload the grammar, and make sure you can parse each of the
individual lexical items.
Add node labels
Specialize rules
- Determine whether the subject precedes or follows the verb
in your language (or both). And add one of the following types
as appropriate:
head-subj-phrase := basic-head-subj-phrase & head-final.
or
head-subj-phrase := basic-head-subj-phrase & head-initial.
If your language allows both, add two types, and be sure
to give them distinct names.
- Now add a rule instance (or two, if necessary) to the
file rules.tdl, like this:
head-subj := head-subj-phrase.
- Analogously, add types and instances for head-complement phrases (to
combine verbs with their objects, here)
and head-specifier phrases (to combine nouns with determiners, here).
(No need for head-specifier phrases quite yet if you're not using
any determiners.)
- Open the file roots.tdl, remove the root condition we added
above (or comment it out), and uncomment the other one that's there.
- Try parsing a sentence with the intransitive verb. If it doesn't parse, debug.
- Now try parsing a sentence with the transitive verb. Are there
more parses than you would expect? If so, constrain the
COMPS value of the head daughter of the head subject rule to rule some
out. (If you're working on a VSO or OSV language, let me know.)
- Try parsing a sentence with a transitive verb again, and examine
the parse chart (LKB > Parse > Show parse chart). Are there more
edges (phrases) than you would expect? If so, consider whether they
are really legit, and if not, how you might rule them out.
Test your grammar
- Make a list of sentences that your grammar should
parse (made up of the words you've added today). (At least
4.)
- Make a list of sentences that your grammar shouldn't
pares (i.e., ungrammatical sentences) with the words you
added today. (At least 12.)
- Put the lists into a file called test.items.
- Put a * before each of the ungrammatical sentences.
- Use the LKB batch test facility to check whether your
grammar behaves appropriately, and debug as necessary.
- LKB > Parse > Batch parse
- Specify matrix/test.items as the input file.
- Specify matrix/test.out as the output file.
- When the LKB is done, examine test.out. A 0 after a sentence
indicates that it didn't parse, 1 that it did.
Upload files to Dante
- Right click on the matrix folder and select "send to compressed
file" to compress it.
- Use UWICK's Secure FTP to move matrix.zip to your Dante account.
- Next time you work on this lab (or any future lab), you'll start
up using UWICK's Secure FTP to download that matrix.zip to the computer
you're working on.
Submit via ESubmit
- Be sure your matrix folder includes your batch test files (input
and output).
- Compress the folder, and upload it to ESubmit.
- Submit it by midnight Sunday night (preferably by Friday evening :).
- (For this first assignment, I strongly encourage you to submit
before the deadline, in case there are unexpected glitches with ESubmit.)
Back to main course page