Linguistics 567: Knowledge Engineering for NLP
Lab 6 Due 5/5
Read all the way through the assignment once before
starting it. Once again I'll be asking for write ups, and basing
a significant portion of the grade on the write up. This means that
even if you don't get something working, you can get a lot of partial
credit for describing the problem and how you attempted to handle it,
and you best guess as to why it's not working. Conversely, you could
have everything working properly, but if you don't describe the phenomena
(with glossed examples) and how you handle them in your write up, you
won't get full credit.
Background
This lab has two goals:
- Introduce messages into the semantics of clauses.
- Handle the syntactic differences (if any) between matrix and subordinate
clauses declarative clauses, with an eye to handling the difference
between declarative and interrogative clauses next week.
Semantic representations
This section gives example semantic representations
(of the form produced by the "Indexed MRS" option) to
compare your results to. The focus of this week's lab
is the message relations: getting the proper number of
them, and relating them in the right way to the other
relations.
Matrix declarative
- Cats sleep.
< h1, e2:SEMSORT:TENSE:ASPECT:MOOD,
{ h3:_cat_n_rel(x4:SEMSORT:BOOL:THIRD:PL),
h5:indef_q_rel(x4,h7,h6),
h8:_sleep_v_rel(e2,x4),
h1:proposition_m_rel(h9)},
{h6 qeq h3,
h9 qeq h8 }>
Things to note: The proposition_m_rel has the same
handle as the (local) top, and the single argument of
the proposition_m_rel qeqs the handle of the verb.
Matrix interrogative
- Do cats sleep?
< h1, e2:SEMSORT:TENSE:ASPECT:MOOD,
{ h3:_cat_n_rel(x4:SEMSORT:BOOL:THIRD:PL),
h5:indef_q_rel(x4,h7,h6),
h8:_sleep_v_rel(e2,x4),
h1:question_m_rel(h9),
h9:proposition_m_rel(h10)},
{h6 qeq h3,
h10 qeq h8 }>
The main difference between this and the previous
mrs is addition of the question_m_rel. The local top
handle is now the label of the question_m_rel, which
takes the proposition_m_rel's handle directly as its
sole argument. The argument of the proposition_m_rel
is still related via qeq to the label of the _sleep_v_rel.
Embedded declarative (with matrix declarative)
- Cats know that dogs sleep.
< h1, e2:SEMSORT:TENSE:ASPECT:MOOD,
{ h3:_cat_n_rel(x4:SEMSORT:BOOL:THIRD:PL),
h5:indef_q_rel(x4,h7,h6),
h8:_know_v_rel(e2,x4,h9),
h10:_dog_n_rel(x11:SEMSORT:BOOL:THIRD:PL),
h12:indef_q_rel(x11,h14,h13),
h15:_sleep_v_rel(e16:SEMSORT:TENSE:ASPECT:MOOD,x11),
h9:proposition_m_rel(h17),
h1:proposition_m_rel(h18)},
{h6 qeq h3,
h13 qeq h10,
h17 qeq h15,
h18 qeq h8 }>
Now there are two proposition_m_rels, one for the
matrix clause and one for the embedded clause. The
arguments of the proposition_m_rels qeq the handles of
the verbs, but the argument of _know_v_rel takes the
handle of the lower proposition_m_rel directly.
Embedded interrogative (with matrix declarative)
- Cats know whether dogs sleep.
< h1, e2:SEMSORT:TENSE:ASPECT:MOOD,
{ h3:_cat_n_rel(x4:SEMSORT:BOOL:THIRD:PL),
h5:indef_q_rel(x4,h7,h6),
h8:_know_v_rel(e2,x4,h9),
h11:_dog_n_rel(x12:SEMSORT:BOOL:THIRD:PL),
h13:indef_q_rel(x12,h15,h14),
h16:_sleep_v_rel(e17:SEMSORT:TENSE:ASPECT:MOOD,x12),
h9:question_m_rel(h10),
h10:proposition_m_rel(h18),
h1:proposition_m_rel(h19)},
{h6 qeq h3,
h14 qeq h11,
h18 qeq h16,
h19 qeq h8 }>
This one is just like the preceding one, except there
is a question_m_rel in addition to the proposition_m_rel
for the embedded clause. Note that _know_v_rel takes
the handle of the embedded question_m_rel as its argument
directly (no qeq) and the embedded question_m_rel takes
the embedded proposition_m_rel as its argument directly
(again, no qeq). The next link in the chain (between
the argument of the embedded proposition_m_rel and the
_sleep_v_rel) does have a qeq.
Once you've got all of these, matrix interrogatives
with embedded declaratives or interrogatives should follow!
Syntactic differences between clauses
Your first step in this lab should be to cover
the syntax of your clause types. Once that it working,
worry about the semantics.
Everyone will need to implement a clause-embedding
verb type (another subtype of verb-lex).
In addition, you may need to implement one or more
of the following:
- Complementizers (treat as heads taking sentential
complements)
- Word order variations
- ...
Think before you code! There's too much
variety across languages in this domain (especially when we get to
interrogatives next week, but potentially with the declaratives this
week) for me to sketch out all the relevant possibilities in this lab,
so you'll need to plan out what you're going to try. I'm happy to
answer questions as you do.
Semantic differences between clauses
There are two parts to the problem of getting
the semantics right for clauses:
- Making sure each clause type gets the right
message(s) inserted.
- Making sure that the syntax and semantics correlate
as they are supposed to (e.g., if there is a word order
that is particular to interrogative clauses, it shouldn't
get a parse with propositional semantics).
The matrix has done most of the work for (1),
it's just a matter of hooking it in to your grammar
in the right way. Hopefully, the English examples below
will be useful in this regard.
A sketch of what you need to do
Create a baseline
Create an instance of your test suite, and process it with
your 'before' grammar, to have a base line for comparison. Save
this testsuite to submit when you're done.
Declarative matrix clauses
Chances are, you've mostly been working with these so
far, so the only adjustment you'll need to make is to get
the clausal semantics added. We are trying out the strategy
of always introducing the message through a non-branching rule.
- Create a subtype of declarative-clause in your
klingon.tdl file. This is a non-branching construction.
As you can see in the matrix.tdl definition of the
type, it has just one daughter. The daughter is constrained
to be [MSG no-msg] whereas the mother has a contentful
value of MSG. In your type, you should: (NB: Some of
these constraints might get promoted the matrix eventually, but
they're not there yet.)
- Constrain the VAL features on the mother.
- Constrain the VAL features on the daughter.
- Constrain any other CAT features of the daughter that
you need to in order to keep the embedded clause patterns
from showing up as matrix clauses. (For example, if you
have complementizer-introduced embedded clauses, you may
wish to say [HEAD verb] on the daughter. You may
also find the boolean feature MC to be useful in
distinguishing main from subordinate clauses.)
- Create an instance of your non-branching rule in rules.tdl.
- Reload your grammar, and try parsing a sentence.
- Be surprised by the extra parse(s).
- Edit your root condition in roots.tdl to rule out the
parse without the clausal semantics.
- Test again. Debug as necessary.
Clause embedding verb(s)
- Create a new verb type, inheriting from verb-lex
(defined in klingon.tdl)
and clausal-second-arg-trans-lex-item (defined in
matrix.tdl). (Unless your clause embedding verb
actually takes more than two arguments, including the embedded
clause. In this case, look in the vicinity of
clausal-second-arg-trans-lex-item in matrix.tdl
for an appropriate type, and confirm your choice with me.)
- Link the ARG-ST elements to the VAL elements in this new type.
- Constrain the CAT value of its complement appropriately (keeping
in mind that you might need to revise this constraint and/or create
subtypes when you allow embedded interrogatives next week).
- Constrain the CONT.MSG value of its complement appropriately.
Embedded clauses: complementizers
If your language marks embedded clauses (optionally or
obligatorily) with a complementizer, I recommend the following
strategy.
- Create a type in klingon.tdl called
complementizer-lex-item.
- It should inherit from no-hcons-lex-item and
basic-one-arg.
- It should link its one argument to the sole item on
its COMPS list, and constrain the other valence
features to be empty.
- It should constrain its own CONT.RELS to be empty.
(< ! ! >).
- It should copy its complement's CONT.HOOK to its
own CONT.HOOK.
- It should place appropriate constraints on its complements
CAT features.
- It should identify its own CONT.MSG with that of
its complement.
- It should not be able to serve as a modifier.
- Create an instance of complementizer-lex-item in
lexicon.tdl. Since the complementizer is not introducing
any relations, you don't need to say anything about the
KEYREL, just the STEM.
- Try parsing a sentence with an embedded clause marked by
a complementizer, and then one without the complementizer. Are you
getting the right results? Debug as necessary.
Embedded declaratives: Just like matrix declaratives
If your embedded declaratives have the same form as matrix
declaratives, you got lucky this week :).
- Try parsing a sentence with an embedded clause, and make sure
it has the expected number of parses with the expected structure
and expected semantics.
- Debug as necessary.
Embedded declaratives: Other strategies
If your language does something with embedded (finite)
declaratives other than the possibilities discussed above,
talk with me.
Test your grammar
- Use your test suite to check the syntactic coverage of your grammar.
- Examine the semantic representations you assign to each of
the clause types, and compare them to the examples given in the
lab instructions.
- Check for overgeneration (syntactic forms associated with
one clause type showing up in other clause types, multiple parses
for single sentences with spurious clause type assignments or
lack of clausal semantics). Are there any sentences you want
to add to your test suite now (grammatical or ungrammatical?).
Write up
- Describe the syntactic properties of the two clause
types (matrix and embedded declaratives) in your language. Illustrate your points with glossed examples.
- Describe the current coverage of your grammar with
respect to those properties.
- Describe how you handled these syntactic properties
(or attempted to handle them).
- Describe how you handled the semantic properties (or
attempted to handle them).
- Indicate which clause types are getting correct semantic
representations (according to the examples given at the
beginning of this assignment), and how those that aren't differ.
Submit via ESubmit
- Be sure your matrix folder includes your write-up.
- Consider removing the doc/ subdirectory in order to save
space on E-Submit.
- Create a tsdb directory with your baseline test suite and
a final run of the test suite.
- Compress the folder, and upload it to ESubmit.
- Submit it by midnight Sunday night (preferably by Friday evening :-).
Back to main course page