Linguistics 567: Grammar Engineering
Lab 9 Due 5/26
Preliminaries
As usual, you'll need to turn in before and after snapshots
of your test suite. If you end up modifying your test suite in
the course of working on this lab, it can be edifying to run
your previous grammar (as submitted for Lab 8) over the new test suite
when you're all done and then comparing that to the final Lab 9 grammar.
I'll also be collecting your test suites for ODIN this week,
so if you have any further cleaning up you'd like to do, please
do so :-).
Background
The goal of this lab is to be able to parse the two
sentences I can eat glass. It doesn't hurt me.,
assign them appropriate semantics, and generate back. You have already
done some of the work: from previous labs, your grammar
should already handle pronouns, case (if applicable), and
transitive verbs. You should already have most of the
vocabulary required (except can and possibly not).
You may need to add the appropriate verb forms, and may
get inspired to add some lexical rules for verbal agreement
in the process (if applicable, and if you haven't already).
You will need to add a treatment of however your language expresses
the modal meaning can. The instructions below outline several
possibilities. If none of them fit what's going on in your language,
talk to me :).
Your grammar probably has a treatment of sentential negation from
the customization script. You should verify that it works and
produces the correct semantics. The instructions below cover several
possibilities for sentential negation. You can refer to them if your
negation is broken. If you don't see what you need there, be sure
to contact me.
Finally, this is the last lab of the class. This means that
you'll also need to:
- Make sure that your grammar generates.
- Include as much of the MT vocab as you can. Consider adding some basic adjectives to that site if you like.
- Suggest some interesting sentences using only the MT vocab
(plus not and can, if you like) for me to use in
the MT demo.
Semantic representations
Your semantic representations for the two sentences should look
approximately like this, modulo the relations showing up in a different
order, the variables (e's, x's, and h's) showing up with different
numbers, the SEMSORT information showing up in different places.
Also, if your language tends to use prodrop rather than overt pronouns,
you might end up without any representation of the pronouns in these sentences.
< h1,u2:SEMSORT,
{h1:proposition_m_rel(h2),
h3:pronoun_n_rel(x4:SEMSORT:FIRST:SG),
h5:pronoun_q_rel(x4, h7, h6),
h8:_can_v_rel(e9:SEMSORT:TENSE:ASPECT:MOOD, h10),
h10:_eat_v_rel(e11:SEMSORT:TENSE:ASPECT:MOOD, x4, x12:SEMSORT:THIRD:SG)
h13:_glass_n_rel(x12),
h14:indef_q_rel(x12,h16,h15)}
{h2 qeq h8,
h6 qeq h3,
h15 qeq h13} >
Things to note about this representation: _can_v_rel
is a one-place relation (i.e., we're treating can as a raising
verb, for now), taking the handle of the _eat_v_rel
as its argument. The _eat_v_rel is a two-place relation
taking x4 (the index from the first-person pronoun) and
x12 (the ARG0 of _glass_n_rel) as its arguments.
It doesn't hurt me.
< h1,u2:SEMSORT,
{h1:proposition_m_rel(h2),
h3:pronoun_n_rel(x4:SEMSORT:THIRD:SG),
h5:pronoun_q_rel(x4, h7, h6),
h8:_neg_r_rel(u10:SEMSORT, h9),
h11:_hurt_v_rel(e12:SEMSORT:TENSE:ASPECT:MOOD, x4, x13:SEMSORT:FIRST:SG),
h14:pronoun_n_rel(x13),
h15:pronoun_q_rel(x13, h17, h16)},
{h2 qeq h8,
h6 qeq h3,
h9 qeq h11,
h16 qeq h14} >
Things to note about this representation: The _neg_r_rel
takes a handle as its argument, which is related through a qeq
to the handle of the _hurt_v_rel. The handle of
_neg_r_rel is itself in turn related via qeq to the MARG of the
message. These qeqs allow quantifiers to scope above or below
_neg_r_rel so that I can't eat some cheese can either
mean 'There is some cheese that I can't eat', or 'I can't eat just
some cheese (I end up eating more)'. (The relationship
between the _can_v_rel and the _eat_v_rel above should
similarly be mediated by a qeq. However, the way the Matrix is
written, this will only happen if we fix things so that there's a
message relation for the embedded clause. We'll just live with the
inaccuracy for now.)
can as an auxiliary verb
Use this version if in your language the morpheme expressing
the same notion as can is a separate word which takes a
VP complement and a subject. Updated 5/24/06
- Define a new verb type which inherits from your verb-lex
and trans-first-arg-raising-lex-item (and take a look at
the definition of this type in matrix.tdl so you
know what they're doing).
- In addition to inheriting
from these types, your new type should put appropriate constraints
on the values of ARG-ST and the valence features.
- Make sure that it constrains the part of speech of each
argument.
- In addition, the KEYREL.ARG1 of the auxiliary should be identified
with the CONT.HOOK.LTOP of the embedded verb.
- Define a lexical entry (with PRED value '_can_v_rel
which inherits from your new type.
- Create the appropriate form of the verb meaning eat, if necessary.
This can be done either directly as a lexical entry, or via a lexical rule.
- If you needed an additional form of eat, ensure that only that form of eat can appear as the
complement of can (and add whatever items you use to test
this to your master testsuite), and that the new form of eat
can or can't appear in matrix clauses (as appropriate).
- In English, this involves defining a feature FORM on
verb (subtype of head), somewhat similar to
CASE on noun.
- Parse your translation of I can eat glass, and examine
the chart for extra edges. Are they legitimate, or spurious?
If they're spurious, try to rule them out (and then rerun your
master testsuite to see if they were, in fact, spurious :-).
- Parse your translation of I can eat glass and
see if you get the right semantics. Debug
as necessary.
can as a bound morpheme
Use this version if the morpheme expressing the same meaning as
can in your language attaches morphologically to the main verb
of the sentence.
- The first step is to decide which
lexical rule type is appropriate. Look at the section
of matrix.tdl titled "Lexical Rules" and see if
any of the xxx-only-xxx-rule types are appropriate.
If not, construct an appropriate one out of the next level
of supertypes. I suspect that infl-cont-change-only-ltol-rule
will be a likely candidate, unless you have concomittant changes
to the valence features (such as the CASE value required
on one of the arguments).
- infl-ltow-rule won't be appropriate, because it
assumes that
no relations are added via C-CONT.
If this rule
is a true lexeme-to-word rule (takes a stem from [INFLECTED -]
to [INFLECTED +]) you'll want to add a type like
the following:
infl-add-ccont-ltow-rule := same-non-local-lex-rule &
same-cat-lex-rule &
same-ctxt-lex-rule &
same-agr-lex-rule &
[ INFLECTED +,
DTR.INFLECTED - ].
- Your subtype for this particular rule will now need to constrain the MSG and HOOK of
the mother (MSG should be the same as the daughter, probably
also the HOOK.XARG), as well as the C-CONT (see below).
- Regardless of the type you choose for your lexical rule, the subtype
will need to add the following constraints:
- The lexical rule's C-CONT.HCONS is empty (< ! ! >).
- The lexical rule's C-CONT.RELS contains a single relation
of type arg1-ev-relation. The PRED value of that relation
should be '_can_v_rel, the LBL should be identified
with the C-CONT.HOOK.LTOP, the ARG0 with the C-CONT.HOOK.INDEX and the ARG1 should be identified with
the daughter's LTOP.
- Add an instance for your lexical rule to irules.tdl,
with the appropriate spelling change information.
- Constrain your lexical rule to apply only to verbs, and
test this.
- If you have other verbal inflection in your language, determine
which lexical rule has to go first, and constrain the rules to only
apply in that order. Make up nonsense forms with the affixes attached
the other way around, and make sure that they don't parse. If your
other affixes attach to the other end of the verb, you'll want to
constrain the order anyway, or you will end up getting double parses.
- Parse your translation of I can eat glass and
see if you get the right semantics. Debug
as necessary.
Negation as an adverb modifier
Use this version if your language expresses sentential negation
via an adverb which modifies the V, VP or S.
(Note:
English has two forms of sentential negation "contracted", which
is actually an affix on the verb, cf. Zwicky and Pullum 1983,
and the full-form adverb. This adverb is not actually treated
syntactically as a modifier in sentential negation, but rather
selected by auxiliary verbs, including the do of do-support.
For the details of this analysis, see Sag, Wasow and Bender 2003
chapter 13 and Kim and Sag 1995. I would be surprised if another
language being treated in this class had a system very similar to
the English one, as it seems like a pretty quirky part of English
grammar. Further, it's a subtle matter to establish what is actually
going on in English, and I don't think anyone would have time in
one week to show the same about another language.)
- Determine where your negative adverb attaches: to V, VP or
S and whether it attaches to the left or to the right of the node
it attaches to.
- For testing purposes, develop a set of sentences contrasting
the correct attachment with the incorrect attachments.
- Negative adverbs are scopal modifiers, so even though we did
adverbs in the previous lab, you'll need to add some machinery:
- Define an instance of adj-head-scop-phrase and/or
head-adj-scop-phrase in rules.tdl. (These types
are fairly fully specified, which means you'll most likely not need to
put anything in esperanto.tdl.) Which one
you pick depends on whether your negative adverb is prehead
or posthead (or either). Look at the type definitions in matrix.tdl
to decide which is appropriate.
- Create a subtype of basic-scopal-adverb-lex. That type
(through its supertypes) does much of the work for you. You will
need to constrain is VAL and MOD..CAT values. The
matrix type also leaves ARG-ST underconstrained. For
consistency, if your adverb takes no arguments (and I'd be a bit
surprised if the negative element did), ARG-ST should be
constrained to be empty as well.
- Create a lexical entry which inherits from scopal-adverb-lex
and introduces a relation with the PRED value 'neg_r_rel.
- Test your grammar. Does the adverb show up only where it's
supposed to? Do you get the right semantics for
It doesn't hurt me.? Debug as necessary.
Negation as a verbal affix
Use this version if your language expresses sentential negation
by adding a morpheme to the main verb.
Two-part negation
Use this version if your language expresses negation with
both an affix on the verb and an adverb (e.g., French ne ... pas).
If both elements are arguably affixes, you probably just want
to write a pair of lexical rules, i.e., take the "Negation as a
verbal affix" route, but write two rules and make sure you
can require that they both apply or neither apply.
- The strategy here is going to be add the affix with a lexical rule
similar to the one above, but to have it change the COMPS value and
not add any semantics. The COMPS list will be the same as the input's
COMPS list, with the addition of a negative adverb.
- We're assuming that the second part of the negation has
an independent life as a negative adverb for constituent (i.e.,
not sentential) negation. This seems to work for French at first
glance, I'd be curious about other languages. That adverb will
have to have constraints on its MOD value that keep it from
modifying finite verbs (which would give sentential negation).
However, if you don't have any head-modifier rules in your grammar,
you don't need to worry about that yet.
- The negative adverb needs some kind of distinguished value (a
feature inside HEAD, and in particular,
HEAD.KEYS.KEY might be a good candidate) so that the rule
won't license verbs just picking up any adverb as a complement.
- The inflected verb will "hand" its LTOP value to the negative
particle, and adopt the negative particle's LTOP value as its
own. Regular semantic composition should take care of the rest.
- Again, you'll need to look at the lexrule types and pick an
appropriate supertype. Since we're changing both VAL.COMPS
and CONT.HOOK, you might need to create your own supertype.
Talk to me :).
- Test your grammar: Try parsing your translation
of It doesn't hurt me. and see if you're getting
the right semantics. Test sentences
with each of the two parts of the negation independently,
and verify that they don't parse. Try putting the two
part negation on a non-verb, and verify that it doesn't
parse.
Debug as necessary.
Test your grammar and try generating!
- Use your master testsuite file to make sure you haven't
lost any coverage from previous labs. Debug as necessary.
- Try parsing I can eat glass. and It doesn't
hurt me. and then generating from the semantic representations
your grammar produced.
- (If you see the error "probable circular lexical rule", it
means your lexical rule is morphosyntactically capable of applying
to its own output. Since the generator isn't constrained by
any surface forms, it can keep adding inflectional endings...)
- To receive full credit on this lab, your grammar will
need to be able to generate back the input string for these
two sentences, and do so without generating too much other
garbage.
Write up
- Describe the facts of sentential negation in your language.
Used glossed examples.
- Describe how you implemented or attempted to implement sentential negation.
If the tdl provided by the customization script works as is, describe
what it is doing.
- Describe the current coverage of your grammar wrt to the sentential
negation facts of the language (syntactic and semantic: are you getting the
right strings and only the right strings? Do they get the right
meaning?). If you don't have complete coverage,
speculate as to what you need to do to get there. If there are particular
problematic strings, please include them in the write up so I can try parsing
them.
- Describe the facts of how the equivalent of can is expressed
in your language. Use glossed examples.
- Describe how you implemented or attempted to implement can
If you were able to use of the scenarios described above, specify which
one. If you had to add something, or if your language is completely
different from any of the choices above, describe in detail what
types, contraints, rules you had to add.
- Describe the current coverage of your grammar wrt to the
facts of how can is expressed in your language (syntactic
and semantic: are
you getting the
right strings and only the right strings? Do they get the right
meaning?).
If you don't have complete coverage,
speculate as to what you need to do to get there. If there are particular
problematic strings, please include them in the write up so I can try parsing
them.
- Provide examlpes for the MT demo in the format your grammar expects them,
including the translations of I can eat glass. It doesn't hurt me..
Submit via ESubmit
- Be sure your matrix folder includes your before and after test suite runs and your write-up.
- Include your plain text test suite file for ODIN.
- Consider removing the doc/ subdirectory in order to save
space on E-Submit.
- Compress the folder, and upload it to ESubmit.
- Submit it by midnight Monday night (preferably by Friday evening :-).
Back to main course page