Lab 2: Word order, small vocabulary, lexical semantics
Due 4/8/05
Download the Grammar Matrix
- Here's the file: matrix.tar.gz (v0.8.1)
- Save it in your working directory (if on the Treehouse machines,
your personal directory inside ~ling567), and unpack it:
tar xzf matrix.tar
Start the LKB and load the grammar Matrix
- Start the LKB: M-x lkb (note change from last time).
- Load the grammar Matrix (the script file is inside the lkb directory.
Personalize your copy of the Matrix
- Rename the file my_language.tdl to reflect the language
you are working on, e.g., esperanto.tdl.
- Consider opening it in emacs and adding some comments
to the beginning. The comment character is ;.
- Open the file lkb/script and replace my_language.tdl
with your new file name. (It only occurs twice, once in
a comment (;;;) that you don't need to change, and once
in the code.)
- Observe that the bar across the bottom of
the emacs window changes when you've made a change to
the current buffer.
- Save the changes to script (C-x s), and observe
the bar across the bottom changing again.
- Go to the very end of the script file, uncomment
the last two lines, and fill in two (or more, if you like) test
sentences that you'll be working with. This will save you from
having to delete "the dog barks" from the parse dialogue every time
you load the LKB.
- Save your changes again.
- Open the file matrix.tdl.
- Make sure you know how to switch between buffers
so you can get to matrix.tdl and your renamed my_language.tdl.
(C-x b)
Make a small testsuite
Make a small testsuite (lab1.items) illustrating intransitive
and transitive sentences in your language. Ungrammatical examples should
illustrate ungrammatical word orders, elements of the wrong part of speech
showing up as verbal arguments, etc. Your testsuite show also illustrate the
determiner optionality facts you discovered. Be sure to include all of your
lexical items in the grammatical examples. Use comments in your testsuite
file to provide glosses for your items (word by word and `free' translation).
Add lexical types and lexical entries
Note The default place to make changes is your
renamed my_language.tdl file. In principle, you should not
need to make any changes to matrix.tdl. You may need to use
some other files (e.g., for lexical entry and rule instances).
In the following instructions, if no particular file is mentioned,
use your renamed my_language.tdl file.
Add phrase structure rules
- Determine the basic word order of your language.
- Look in the subdirectory matrix/modules. This directory
contains, among other things, type and instance files for various
word orders. Locate the types file that matches your basic
word order (SOV.tdl, OSV.tdl, V-final.tdl,
free-order.tdl, etc). In the comments at the top of the
file, you find a reference to the "rules" (instance) file you want
to use.
- Incorporate the type definitions from the types file you selected
into matrix/esperanto.tdl.
- Incorporate the instance definitions from the rules file you selected
into matrix/rules.tdl.
- Create a type for head-secifier phrases in your language (with
overt specifiers). It should inherit from basic-head-spec-phrase
and either head-initial or head-final, as appropriate.
- Add an instance head-spec (or spec-head) in rules.tdl.
- Reload the grammar, and correct any errors detected by the LKB.
- Try parsing a sentence.
- Try parsing your testsuite.
- Debug as necessary.
- Write up #1: Indicate
what the basic word order is in your language, as well as the order of
nouns and determiners. Provide glossed examples illustrating the
generalizations. If you feel that you had to simplify anything about
the data in order to do this part of the lab, describe the
simplifications.
Semantics: Background
- Semantic information is handled inside the feature SYNSEM.LOCAL.CONT.
The value of CONT is a feature structure of type mrs, defined
as follows:
mrs := mrs-min &
[ HOOK hook,
RELS diff-list,
HCONS diff-list,
MSG basic_message ].
- The value of RELS is a list of elementary predications.
- The value of HOOK is a feature structure of type hook encoding the information that is available for further
semantic composition.
- The value of HCONS is a list of handle constraints (to represent
scope -- don't worry about the details for now!)
- The value of MSG is a representation of illocutionary force,
but we won't be addressing that in this lab.
- In addition, the feature SYNSEM.LKEYS.KEYREL (on words only, not phrases)
provides a pointer to the main relation contributed by the word.
This feature serves as a shortcut for defining lexical entries. The
type norm-hook-lex-item (defined in matrix.tdl) provides
the link between LKEYS.KEYREL and SYNSEM.LOCAL.CONT.RELS.
- relations (the things on the RELS list), come in different
types, all defined in matrix.
- Two example relations (NB: the following are not type definitions):
[ PRED '_cat_n_rel
ARG0 x
LBL h1 ]
[ PRED '_chase_v_rel
ARG0 e
ARG1 x
ARG2 y
LBL h2 ]
- The value of PRED is a unique predicate name. It can be a string
as in these examples (indicated by the ') or a subtype of
predsort (which will be useful for underspecification in some
cases). The value of LBL is a handle (for handle constraints, i.e.,
scope -- again, don't worry about this for now). The values of ARG0
through ARGn (in practice, up to ARG4) are the arguments to a
relation. For nouns, ARG0 is the index of the thing the noun denotes
(here, the cat). For verbs, ARG0 is an index for the event (here, the
chasing), and ARG1 and others are the participants in that event.
- The types basic-verb-lex et al constrain the KEYREL to be a
relation of the right type, and furthermore relate the HOOK values of
things on the ARG-S list to the ARGn roles in the relation. All you
need to provide is the PRED value.
Add relation names to your noun and verb lexical entries
- In order to make the machine translation exercise work
at the end of the quarter, we want to use identical PRED values
across all the different grammars. So, we will adopt the following
convention:
- For determiners, please select from the following inventory:
'def_q_rel ; definite
'indef_q_rel ; indefinite
'demonstrative_q_rel ; demonstrative, if your language only has one
'proximal+dem_q_rel ; demonstrative, near speaker (cf English 'this')
'distal+dem_q_rel ; demonstrative, away from speaker (languages with
; two-way contrast, cf English 'that')
'hearer+dem_q_rel ; demonstrative, near hearer (languages with
; three-way contrast
'remote+dem_q_rel ; demonstrative, away from speaker and hearer
; (languages with three-way contrast)
If this inventory is inadequate to the distinctions expressed by
determiners in your language, please talk to me.
- Your lexical entries presently look something like this:
gato := noun-lex &
[ STEM < "gato" > ].
- For each one, you can add a relation name as follows:
gato := noun-lex &
[ STEM < "gato" >,
SYNSEM.LKEYS.KEYREL.PRED '_cat_n_rel ].
- NB: Don't forget the ' at the beginning of the pred name.
That tells the LKB it's a string and not a type.
- Now parse a sentence again and look at its MRS and admire
the relation names that have appeared there.
- The various views on the MRS available through the menu
on the small tree are generated by the LKB on the basis of the feature
structure. To see the information in feature structure itself:
- Click on the small tree, and select "Show enlarged tree".
- Click on the S at the top of the tree, and select "Feature structure".
- Scroll the window until you can see the value of the path
SYNSEM.LOCAL.CONT.
- Appreciate the easier-to-read views provided by the LKB!
- Write-up #2: Write up
(in a couple of paragraphs) what you've done for the determiners, i.e.,
which determiners your grammar has, which PRED values you gave them,
and why.
Add a rule for determinerless NPs
One of the requirements on well-formed MRSs is that
each ARG0 of a noun-relation be bound by a quantifier
(i.e., also be the ARG0 of a quant-relation). If your noun
phrases contain overt determiners (and you're using the
basic-determiner-lex type provided by the Matrix) this
is already the case. For noun phrases that don't contain
overt determiners, we'll need to add a non-branching rule
which fills in the appropriate semantics.
- Create a subtype of basic-bare-np-phrase in esperanto.tdl.
- In the subtype, constrain the value of C-CONT.RELS.LIST.FIRST.PRED to
be something appropriate from the list of possible determiner relations
given above. (In other words, what is the semantic effect of not
having a determiner in your language?) Consider whether you might need
multiple bare-np-phrases to account for different semantic effects.
- Create an instance of your subtype in rules.tdl.
- Verify that you can now parse sentences with bare NPs. Check the
MRS to make sure the PRED value showed up where you expected it.
In many languages, determiners are optional only with some kinds of
nouns, and non-optional with others. If determiners are always
optional in your language, that is, if any given noun can appear
without a determiner, then you don't need to do this part. Read it
anyway though :-). The general strategy is going to be to define
two types of nouns, ones with optional determiners and one with
obligatory determiners. We will indicate optionality with a
feature OPT (appropriate objects of type synsem and therefore
found at the path SYNSEM.OPT). Nouns which require determiners will
say that the element of their SPR list is [OPT -]. Nouns which
can optionally appear without determiners won't say anything about
OPT. The basic-bare-np-phrase says that the SPR requirement of its
head daughter is [OPT +]. This will be incompatible with those nouns
that say [OPT -] but compatible (of course) with those that don't
mention OPT at all.
(Nouns like proper names and pronouns which generally can't
take determiners, i.e., must undergo the bare-np rule, need to
have a SPR requirement which is incompatible with any overt determiner,
but still compatible with the bare-np rule, or perhaps a special
bare-np rule just for proper names and pronouns. If your only
case of determinerless NPs is this one, talk to me.)
Try generating from the semantic representation
Submit via ESubmit
- Be sure your matrix folder includes four write-ups (1 2 3 4). Since I'm curious how much time this process takes, if you're willing,
please estimate the amount of time it took you to complete this lab, and indicate
it in your write up.
- Compress the folder (.tgz is preferred), and upload it to ESubmit.
- Submit it by midnight Sunday night (preferably by Friday evening :-).
Return to main course page
ebender at u dot washington dot edu
Last modified: Mon Apr 4 21:26:44 PDT 2005