Linguistics 567: Knowledge Engineering for NLP

Lab 6 Due 5/5

Read all the way through the assignment once before starting it. Once again I'll be asking for write ups, and basing a significant portion of the grade on the write up. This means that even if you don't get something working, you can get a lot of partial credit for describing the problem and how you attempted to handle it, and you best guess as to why it's not working. Conversely, you could have everything working properly, but if you don't describe the phenomena (with glossed examples) and how you handle them in your write up, you won't get full credit.

Background

This lab has two goals:

Introduce messages into the semantics of clauses.
Handle the syntactic differences (if any) between matrix and subordinate clauses declarative clauses, with an eye to handling the difference between declarative and interrogative clauses next week.

Semantic representations

This section gives example semantic representations (of the form produced by the "Indexed MRS" option) to compare your results to. The focus of this week's lab is the message relations: getting the proper number of them, and relating them in the right way to the other relations.

Matrix declarative

Cats sleep.

< h1, e2:SEMSORT:TENSE:ASPECT:MOOD,
{ h3:_cat_n_rel(x4:SEMSORT:BOOL:THIRD:PL),
  h5:indef_q_rel(x4,h7,h6),
  h8:_sleep_v_rel(e2,x4),
  h1:proposition_m_rel(h9)},
{h6 qeq h3,
 h9 qeq h8 }>

Things to note: The proposition_m_rel has the same handle as the (local) top, and the single argument of the proposition_m_rel qeqs the handle of the verb.

Matrix interrogative

Do cats sleep?

< h1, e2:SEMSORT:TENSE:ASPECT:MOOD,
{ h3:_cat_n_rel(x4:SEMSORT:BOOL:THIRD:PL),
  h5:indef_q_rel(x4,h7,h6),
  h8:_sleep_v_rel(e2,x4),
  h1:question_m_rel(h9),
  h9:proposition_m_rel(h10)},
{h6 qeq h3,
 h10 qeq h8 }>

The main difference between this and the previous mrs is addition of the question_m_rel. The local top handle is now the label of the question_m_rel, which takes the proposition_m_rel's handle directly as its sole argument. The argument of the proposition_m_rel is still related via qeq to the label of the _sleep_v_rel.

Embedded declarative (with matrix declarative)

Cats know that dogs sleep.

< h1, e2:SEMSORT:TENSE:ASPECT:MOOD,
{ h3:_cat_n_rel(x4:SEMSORT:BOOL:THIRD:PL),
  h5:indef_q_rel(x4,h7,h6),
  h8:_know_v_rel(e2,x4,h9),
  h10:_dog_n_rel(x11:SEMSORT:BOOL:THIRD:PL),
  h12:indef_q_rel(x11,h14,h13),
  h15:_sleep_v_rel(e16:SEMSORT:TENSE:ASPECT:MOOD,x11),
  h9:proposition_m_rel(h17),
  h1:proposition_m_rel(h18)},
{h6 qeq h3,
 h13 qeq h10,
 h17 qeq h15,
 h18 qeq h8  }>

Now there are two proposition_m_rels, one for the matrix clause and one for the embedded clause. The arguments of the proposition_m_rels qeq the handles of the verbs, but the argument of _know_v_rel takes the handle of the lower proposition_m_rel directly.

Embedded interrogative (with matrix declarative)

Cats know whether dogs sleep.

< h1, e2:SEMSORT:TENSE:ASPECT:MOOD,
{ h3:_cat_n_rel(x4:SEMSORT:BOOL:THIRD:PL),
  h5:indef_q_rel(x4,h7,h6),
  h8:_know_v_rel(e2,x4,h9),
  h11:_dog_n_rel(x12:SEMSORT:BOOL:THIRD:PL),
  h13:indef_q_rel(x12,h15,h14),
  h16:_sleep_v_rel(e17:SEMSORT:TENSE:ASPECT:MOOD,x12),
  h9:question_m_rel(h10),
  h10:proposition_m_rel(h18),
  h1:proposition_m_rel(h19)},
{h6 qeq h3,
 h14 qeq h11,
 h18 qeq h16,
 h19 qeq h8  }>

This one is just like the preceding one, except there is a question_m_rel in addition to the proposition_m_rel for the embedded clause. Note that _know_v_rel takes the handle of the embedded question_m_rel as its argument directly (no qeq) and the embedded question_m_rel takes the embedded proposition_m_rel as its argument directly (again, no qeq). The next link in the chain (between the argument of the embedded proposition_m_rel and the _sleep_v_rel) does have a qeq.

Once you've got all of these, matrix interrogatives with embedded declaratives or interrogatives should follow!

Syntactic differences between clauses

Your first step in this lab should be to cover the syntax of your clause types. Once that it working, worry about the semantics.

Everyone will need to implement a clause-embedding verb type (another subtype of verb-lex).

In addition, you may need to implement one or more of the following:

Complementizers (treat as heads taking sentential complements)
Word order variations
...

Think before you code! There's too much variety across languages in this domain (especially when we get to interrogatives next week, but potentially with the declaratives this week) for me to sketch out all the relevant possibilities in this lab, so you'll need to plan out what you're going to try. I'm happy to answer questions as you do.

Semantic differences between clauses

There are two parts to the problem of getting the semantics right for clauses:

Making sure each clause type gets the right message(s) inserted.
Making sure that the syntax and semantics correlate as they are supposed to (e.g., if there is a word order that is particular to interrogative clauses, it shouldn't get a parse with propositional semantics).

The matrix has done most of the work for (1), it's just a matter of hooking it in to your grammar in the right way. Hopefully, the English examples below will be useful in this regard.

A sketch of what you need to do

Create a baseline

Create an instance of your test suite, and process it with your 'before' grammar, to have a base line for comparison. Save this testsuite to submit when you're done.

Declarative matrix clauses

Chances are, you've mostly been working with these so far, so the only adjustment you'll need to make is to get the clausal semantics added. We are trying out the strategy of always introducing the message through a non-branching rule.

Create a subtype of declarative-clause in your klingon.tdl file. This is a non-branching construction. As you can see in the matrix.tdl definition of the type, it has just one daughter. The daughter is constrained to be [MSG no-msg] whereas the mother has a contentful value of MSG. In your type, you should: (NB: Some of these constraints might get promoted the matrix eventually, but they're not there yet.)
- Constrain the VAL features on the mother.
- Constrain the VAL features on the daughter.
- Constrain any other CAT features of the daughter that you need to in order to keep the embedded clause patterns from showing up as matrix clauses. (For example, if you have complementizer-introduced embedded clauses, you may wish to say [HEAD verb] on the daughter. You may also find the boolean feature MC to be useful in distinguishing main from subordinate clauses.)
Create an instance of your non-branching rule in rules.tdl.
Reload your grammar, and try parsing a sentence.
Be surprised by the extra parse(s).
Edit your root condition in roots.tdl to rule out the parse without the clausal semantics.
Test again. Debug as necessary.

Clause embedding verb(s)

Create a new verb type, inheriting from verb-lex (defined in klingon.tdl) and clausal-second-arg-trans-lex-item (defined in matrix.tdl). (Unless your clause embedding verb actually takes more than two arguments, including the embedded clause. In this case, look in the vicinity of clausal-second-arg-trans-lex-item in matrix.tdl for an appropriate type, and confirm your choice with me.)
Link the ARG-ST elements to the VAL elements in this new type.
Constrain the CAT value of its complement appropriately (keeping in mind that you might need to revise this constraint and/or create subtypes when you allow embedded interrogatives next week).
Constrain the CONT.MSG value of its complement appropriately.

Embedded clauses: complementizers

If your language marks embedded clauses (optionally or obligatorily) with a complementizer, I recommend the following strategy.

Create a type in klingon.tdl called complementizer-lex-item.
- It should inherit from no-hcons-lex-item and basic-one-arg.
- It should link its one argument to the sole item on its COMPS list, and constrain the other valence features to be empty.
- It should constrain its own CONT.RELS to be empty. (< ! ! >).
- It should copy its complement's CONT.HOOK to its own CONT.HOOK.
- It should place appropriate constraints on its complements CAT features.
- It should identify its own CONT.MSG with that of its complement.
- It should not be able to serve as a modifier.
Create an instance of complementizer-lex-item in lexicon.tdl. Since the complementizer is not introducing any relations, you don't need to say anything about the KEYREL, just the STEM.
Try parsing a sentence with an embedded clause marked by a complementizer, and then one without the complementizer. Are you getting the right results? Debug as necessary.

Embedded declaratives: Just like matrix declaratives

If your embedded declaratives have the same form as matrix declaratives, you got lucky this week :).

Try parsing a sentence with an embedded clause, and make sure it has the expected number of parses with the expected structure and expected semantics.
Debug as necessary.

Embedded declaratives: Other strategies

If your language does something with embedded (finite) declaratives other than the possibilities discussed above, talk with me.

Test your grammar

Use your test suite to check the syntactic coverage of your grammar.
Examine the semantic representations you assign to each of the clause types, and compare them to the examples given in the lab instructions.
Check for overgeneration (syntactic forms associated with one clause type showing up in other clause types, multiple parses for single sentences with spurious clause type assignments or lack of clausal semantics). Are there any sentences you want to add to your test suite now (grammatical or ungrammatical?).

Write up

Describe the syntactic properties of the two clause types (matrix and embedded declaratives) in your language. Illustrate your points with glossed examples.
Describe the current coverage of your grammar with respect to those properties.
Describe how you handled these syntactic properties (or attempted to handle them).
Describe how you handled the semantic properties (or attempted to handle them).
Indicate which clause types are getting correct semantic representations (according to the examples given at the beginning of this assignment), and how those that aren't differ.

Submit via ESubmit

Be sure your matrix folder includes your write-up.
Consider removing the doc/ subdirectory in order to save space on E-Submit.
Create a tsdb directory with your baseline test suite and a final run of the test suite.
Compress the folder, and upload it to ESubmit.
Submit it by midnight Sunday night (preferably by Friday evening :-).

Back to main course page