Linguistics 471: Grammar Engineering
Lab 3 Due 4/25
Minimum requirements for this assignment
This assignment covers material which varies greatly
from language to language. While you are welcome to do more,
this section lays out what is actually required for this
lab. The instructions given below are then a superset of
what any one student needs to do.
Everyone should:
- (1.) Map out the space to be covered.
- (2.) Implement a number distinction for whichever class
of nouns (minimum case is just pronouns) is appropriate,
and a person distinction.
- (3.) Test your grammar.
- (4.) Write up the phenomena you have analyzed.
Then choose one of the following "packages" (in many
cases, you may find it easier to write more lexical rules
rather than put in all the lexical entries you'll need if
you don't):
- If your language has both case and agreement:
Case, one kind of agreement (e.g., subject-verb or
determiner-noun), 2 lexical rules
- If your language has agreement but no case:
2 kinds of agreement (e.g., subject-verb and
determiner-noun), 2 lexical rules
- If your language has case but not agreement:
Case, 2 lexical rules for case, 1 lexical rule
for something else (e.g., past-tense verbs).
- If your language has neither case nor agreement,
but it does have some other inflectional morphology:
Adjectives, 2 lexical rules
- If your language has zip zilch no inflectional
Adjectives, adverbs (or something else you negotiate
with me).
Map out the space you intend to cover
- Choose one of the "packages" above that is appropriate
for your language.
- Develop a test suite (batch parsing file) which
illustrates both the syntactic and the morphological
ground you intend to cover.
- For example, if your language
has case, your test suite should include grammatical sentences
with NPs in the correct cases as well as ungrammatical sentences
with NPs in incorrect cases.
- It should also include
morphologically incorrect forms if there is anything at
all tricky about the morphophonology.
- Your test suite does
not need to cover aspects of your language that you are
not addressing in this lab. For example, if your language
has determiner-noun agreement, but you are handling
subject-verb agreement instead, your test suite should
only have agreeing determiner-noun combinations.
- Likewise, can restrict yourself to the vocabulary
that's already in your grammar (with potentially some
additions if there is some aspect of the phenomenon you
would like to illustrate). No need to find an exhaustive
list of all the irregular forms in the language!
Pronouns, person and number distinctions
Because person and number information are also interpreted
semantically, we want to record it regardless of whether
it is syntactically relevant (i.e., whether it gets used
for agreement).
Some of the instructions in this section are very specific
(i.e., I'm giving you lots of answers) because I want you
to have time to focus your efforts on other parts of the lab.
Don't be surprised then, when all of the sudden things get
less specific!
- Add the following type definitions to esperanto.tdl:
(NB: Adding this definition for png will cause the LKB
to print a warning when you load the grammar, since we're overriding
the existing definition of png in matrix.tdl.)
png := avm &
[ PER person,
NUM number ].
person := *top*.
first := person.
second := person.
third := person.
number := *top*.
sg := number.
non-sg := number.
dual := non-sg.
pl := non-sg.
(The purpose of non-sg here is to allow
a mapping between a language with a dual/plural distinction and
one without. So languages like English would in fact only use
non-sg, but should define all of the above.)
(If your language does person and number agreement
with an elsewhere case -- like English non-3sg -- you may want
to define subtypes of png which groups the values of
PER and NUM in interesting ways. If you want to know more about
this, talk to me.)
- If your language has gender distinctions, you'll want to
use this definition of png instead, along with appropriate
definitions for subtypes of gender.
png := avm &
[ PER person,
NUM number,
GEND gender ].
gender := *top*.
- Pronouns will be nouns that obligatorily undergo a
covert-det rule (unless they do optionally take determiners,
if so, let me know!). To facilitate this, add the following
types (esperanto.tdl):
pronoun_q_rel := quantifier_rel.
reg_quant_rel := quantifier_rel.
(Some of you have already done something similar
for proper nouns, so you'll have to integrate that.)
- Now change demonstrative_q_rel and non+demonstrative_q_rel
so that they inherit from reg_quant_rel (instead of
- Reload your grammar and observe the effect these changes had on
the type hierarchy under quantifier_rel (use View > Type
Hierarchy). We'll use the contrast between reg_quant_rel
and pronoun_q_rel to keep the pronouns and the other nouns
from using each others' covert-det-rules.
- Remove the PRED value from inside the quant-relation on
the C-CONT of covert-det-phrase.
- Define two subtypes of covert-det-phrase, just like
these with **** in other-covert-det-phrase filled in
with the PRED value you removed in the preceding step:
pronoun-covert-det-phrase := covert-det-phrase &
< [ LOCAL.CONT.RELS < ! [ PRED pronoun_q_rel ] ! > ] >,
C-CONT.RELS < ! [ PRED pronoun_q_rel ] ! > ].
other-covert-det-phrase := covert-det-phrase &
< [ LOCAL.CONT.RELS < ! [ PRED reg_quant_rel ] ! > ] >,
C-CONT.RELS < ! [ PRED ****** ] ! > ].
- In rules.tdl, remove the existing instance
you had of covert-det-phrase and create two instances,
one for each of the subtypes defined in the preceding step.
- Make subtypes of noun-lex for pronouns and
common nouns:
pronoun-lex := noun-lex &
< [ LOCAL.CONT.RELS < ! [PRED pronoun_q_rel] ! > ] >,
LKEYS.KEYREL.PRED 'pronoun_n_rel ] ].
common-noun-lex := noun-lex &
< [ LOCAL.CONT.RELS < ! [PRED reg_quant_rel] ! > ] >,
CONT.HOOK.INDEX.PNG [ PER third ] ] ].
Note that pronoun-lex specifies a PRED value, so all
pronouns will have the same one. The only difference will be in the
person and number values. (Something will have to be said about
demonstrative pronouns, probably something about what kind of
quantifier relation they should appear with.) common-noun-lex
is constrained to [PER third] since only pronouns have other
PER values.
- Create lexical entries for pronouns in lexicon.tdl,
specifying PER, NUM and GEND values, as appropriate. Here's
an example for English:
we := pronoun-lex &
[ STEM < "we" >,
NUM non-sg ] ].
- Update your lexical entries for common nouns to inherit
from common-noun-lex and to specify number and gender
information. If you're going to use a lexical rule for
noun number, you might consider doing only a couple lexical
entries now for testing purposes. If your language has a gender
system, you might consider defining subtypes of common-noun-lex
for each gender (which constrain the GEND value), and inheriting
from those instead. (A similar thing could be done for number, but
it's redundant if you're going to write a lexical rule.)
- Test your grammar by checking whether pronouns and
common nouns can appear with or without determiners, and
make sure that the results are what you want!
- See whether adding in the person and number information
cut down on some of the overgeneration you experienced with
Lab 2.
- Define a feature CASE appropriate for the type noun
(if you think it might also be appropriate for other types,
talk to me).
noun := head &
[ CASE case ].
- Define a type case and subtypes as appropriate
(e.g., nom, acc, dat, ...). There is
no need to use consistent type names across grammars for
our purposes here, as the MT exercise won't involve case (that
is, case doesn't appear in semantic representations).
- Add case information to the ARG-S or valence features
of trans-verb-lex and intrans-verb-lex, e.g.,:
trans-verb-lex := basic-verb-lex & transitive-lex-item &
VAL [ SPR < >,
SUBJ < #subj & synsem
& [ LOCAL.CAT [ HEAD noun &
[ CASE nom ],
VAL.SPR <> ]] >,
COMPS < #comps
& [ LOCAL.CAT [ HEAD noun &
[ CASE acc ],
VAL.SPR <> ]]>,
SPEC < > ]],
ARG-S < #subj, #comps > ]].
- (If you have exceptional verbs in your language, you might
want to define a subtype of trans-verb-lex, say nom-acc-trans-verb-lex which encodes the regular pattern, and let most verbs inherit from
it. The exceptional verbs would inherit directly from trans-verb-lex
instead, and specify the cases they require in their lexical entries.)
- Modify your lexical entries for nouns to reflect their case values.
If you're going to write a lexical rule for case inflection on nouns,
just do one or two now to test this part of your grammar. Here's an
example from English:
we := pronoun-lex &
[ STEM < "we" >,
NUM non-sg ] ] ].
us := pronoun-lex &
[ STEM < "us" >,
NUM non-sg ] ] ].
- Test your grammar: Do sentences with nouns in the right
case parse? Do sentences with nouns in the wrong case parse?
- Determine which element is doing the agreeing (e.g.,
in subject-verb agreement, it's the verb; in determiner-noun
agreement, its the determiner, arguably even if the noun itself
doesn't overtly show the information being agreed upon).
- Determine where in the feature structure for the agreeing
element, the information it is agreeing with should be available
(e.g., in subject-verb agreement, the information is available
inside the verb's SUBJ value; in determiner-noun agreement, the
information is available inside the determiner's SPEC feature;
in adjective-noun agreement, the information is available inside
the adjective's MOD feature).
- Constrain the information in both places. (e.g., if you're
doing determiner-noun agreement for number and gender in a Romance
language, make sure your noun lexical entries specify the
relevant values for number and gender. Then constrain the SPEC
value of the determiner entries.)
Example from French:
chat := common-noun-lex &
[ STEM < "chat" >,
GEND masc ],
LKEYS.KEYREL.PRED '_cat_n_rel ] ].
le := determiner-lex &
[ STEM < "le" >,
[ NUM sg,
GEND masc ] ] >,
LKEYS.KEYREL.PRED def_q_rel ] ].
- Consider writing lexical rules to generate the appropriate lexical
entries (e.g., singular and plural nouns, 2-person-plural-feminine
verbs, etc).
- Test your grammar: do sentences with agreement parse and
sentences without agreement fail to parse?
Lexical rules
- Pick a supertype for your rule:
- Determine whether your lexical rule needs to change SYNSEM
information, or just add to it. (Examples: If the input has a
non-empty SPR list and the output has an empty SPR list, that's
changing information. If the output has more relations than the
input, that's changing information. If the input has no value
specified for CASE and the output is [CASE nom], that's just adding
- Determine whether your lexical rule creates fully inflected
forms, or whether there's more inflection you'd like to stack
on top of it.
- Rules creating fully inflected forms and only adding
information to SYNSEM can inherit from infl-ltow-rule.
- Rules creating not-yet fully inflected forms should inherit
from infl-ltol-rule, but then add the constraint that
the SYNSEM and the DTR.SYNSEM are the same.
- Rules that change SYNSEM are more work (you need to
use infl-ltol-rule or perhaps a new type that you
define and then make sure that you are copying up all the SYNSEM
information that you aren't changing). If you're still
interested in writing this kind of a rule, be sure to talk to me!
- Define a rule type in esperanto.tdl which contains
all of the information about your rule except the spelling changes.
The value of DTR should be specific enough to constrain the
rule to only applying to the right type of words. The value of
SYNSEM should be at least as specific as the lexical entries
you've been writing so far. Here's an example from English (where
the value of SYNSEM ends up being very specific since all the
information from the daughter is also in the mother):
3sg_verb-lex-rule := infl-ltow-rule &
NUM sg ]] >,
- Define whatever letter classes you need in irules.tdl.
They should all be at the beginning of the file. Here's the one I need
for the English example:
%(letter-set (!s abcedfghijklmnopqrtuvwxyz))
- Define an instance of the rule type in irules.tdl. This
instance should give the spelling change subrules on a line beginning
with %prefix or %suffix. The rest of the line is
a list of pairs in which the first member matches the input form
and the second member describes the output form. * matches the empty
string. ! signals a letter-set. More specific subrules to the
3sg_verb :=
%suffix (!s s) (!ss !ssses) (ss sses)
- Define exceptional ('suppletive') forms in
The file should begin and end with a line consisting only of ".
In between, there is one line per suppletive form, listing the
suppletive form, the rule name, and the stem form the suppletive form
goes with. Here's an example from English (don't think too hard
about it -- English doesn't have very good examples of suppletive
forms in the present tense, given the grammar I'm laying out here).
is 3SG_VERB be
- (If you're going to use, you'll need to
edit the file lkb/globals.lsp to comment out the definition of
*lex-rule-suffix*. Restart the LKB once you've done
this. Sorry.)
- Update your lexical entries so that they give the stem
instead of the inflected word (i.e., so that your lexical rule
can do the work). Any such stem entries should also be marked
[INFLECTED -]. Consider making [INFLECTED -] a constraint on
the relevant lexical types, so you don't have to keep remembering
to type it.
- Test your grammar. Does the lexical rule apply to the
words it should apply to? Does it apply to words it shouldn't apply to?
Head-modifier rules
If you're doing adjectives and adverbs because your language
lacks inflection, you'll probably need a few head-modifier rules.
If you're doing adjectives because that's the easiest case of
agreement to be working on, you'll probably need just one.
The Matrix distinguishes scopal from intersective modification.
We're going to pretend that everything is intersective and
just not worry about the scopal guys for now.
- Create an instance of head-adj-int-phrase, an
instance of adj-head-int-phrase, or both, depending on
whether you need only prehead modifiers, only posthead modifiers,
or both.
- Try parsing a transitive sentence. Be surprised by the extra
parse. Or, if you don't get an extra parse, try parsing some of your
ungrammatical examples from earlier labs. Look at the enlarged trees
(or try Parse > Compare) to see what's going on.
- Constrain your existing subtypes of head (verb, noun,
det) to be [MOD < >].
- Try parsing the misbehaving sentence again.
- Create a type adjective-lex which inherits from
basic-adjective-lex. The following type works for English
assuming that:
- We're not worried about predicative adjectives or adjectives
taking complements for now.
- We have both orders (head-adj and adj-head), but adjectives
are always prehead (hence the value of POSTHEAD).
- We're only dealing with intersective adjectives (as stipulated).
adjective-lex := basic-adjective-lex &
[ MOD < [ LOCAL [ CAT.HEAD noun,
LTOP #ltop ]]]>],
VAL [ SPR < >,
SUBJ < >,
COMPS < >,
SPEC < > ],
- Create the necessary subtype of head.
- Create one or more adjective instances.
- Create a type adverb-lex, as follows:
adverb-lex := basic-adverb-lex &
[ MOD < [ LOCAL [ CAT.HEAD verb,
LTOP #ltop ]]]>],
VAL [ SPR < >,
SUBJ < >,
COMPS < >,
SPEC < > ]],
- Define the necessary subtype of head.
- Define instances of adverbs in lexicon.tdl.
- Try parsing a few sentences to see where this allows
your adverbs to appear in a string.
- Consider making subclasses of adverbs to restrict
that distribution somewhat. (Use the feature POSTHEAD as well
as the values of VAL inside MOD to do this.)
Test your grammar
- Using the testsuite you developed under "Map out the
space you intend to cover", test your grammar. Did you
in fact cover everything you set out to?
- Go back to the testsuites (test.items files) you made for
labs 1 and 2. Does your grammar still have the right behavior
over those examples? (This is called "regression testing" and
it should be done on a regular basis.)
- Create a master test file with all of the examples you've
tested so far (i.e., the total of the test suites from all
the labs) which you can keep adding to as you go.
- The following commands can be used to test whether you're
parsing ungrammatical items or failing to parse grammatical items,
respectively, provided that you have access to grep and
that you're not using numbers in your orthography (if you are, and
you'd like to use this, I can help you come up with a suitable
regular expression):
- (The topic of the compling lab meeting next week is [incr tsdb()],
which allows more sophisticated regression testing of grammars.
Unfortunately, it is not available for windows, or you all would
be using it in this class ;-)
Write up your analyses
- Indicate which "package" you choose
to do for this lab.
- Indicate whether you decided to write any additional
lexical rules.
- Describe the phenomena that you analyzed (e.g., how the
case system works in your language) indicating which aspects
of the phenomenon you decided to cover (e.g., your language
might have a split-ergative case system, which you can avoid
for now by carefully choosing your verb tenses...).
- Describe how you analyzed the phenomena in your grammar,
with reference to the particular types and features you
Upload files to Dante
Submit via ESubmit
- Make sure your write-up is inside the matrix folder.
- Make sure your batch test files (both the new one for
this assignment, and the general one containing all of your
sentences so far) are in the matrix folder.
- Don't rename your folder (it's easier for me if you leave it
as "matrix").
- Compress the folder, and upload it to ESubmit.
- Submit it by midnight Sunday night (preferably by Friday evening :).
Back to main course page