Linguistics 567: Knowledge Engineering for NLP

Lab 7 Due 5/12

Read all the way through the assignment once before starting it. Once again I'll be asking for write ups, and basing a significant portion of the grade on the write up. This means that even if you don't get something working, you can get a lot of partial credit for describing the problem and how you attempted to handle it, and you best guess as to why it's not working. Conversely, you could have everything working properly, but if you don't describe the phenomena (with glossed examples) and how you handle them in your write up, you won't get full credit.

You might notice that these instructions are vaguer than in previous weeks. This is only partially because I expect you to have more of a sense to implement the details as the quarter goes along. It's more because I can't predict all of the details for this material. The upshot. Ask questions! Ask early and ask often :-).

Background

This lab has four goals:

Handle the syntactic differences between matrix and embedded polar questions and between polar questions and declaratives.
Implement the syntactic pattern of imperatives.
Map the right clausal semantics to each syntactic pattern.
Implement semantic selection by clausal-complement taking verbs of clause-types in their complements.

Semantic representations

This section gives example semantic representations (of the form produced by the "Indexed MRS" option) to compare your results to.

Matrix interrogative

Do cats sleep?

< h1, e2:SEMSORT:TENSE:ASPECT:MOOD,
{ h3:_cat_n_rel(x4:SEMSORT:BOOL:THIRD:PL),
  h5:indef_q_rel(x4,h7,h6),
  h8:_sleep_v_rel(e2,x4),
  h1:question_m_rel(h9),
  h9:proposition_m_rel(h10)},
{h6 qeq h3,
 h10 qeq h8 }>

The main difference between this and the matrix declarative given last week is addition of the question_m_rel. The local top handle is now the label of the question_m_rel, which takes the proposition_m_rel's handle directly as its sole argument. The argument of the proposition_m_rel is still related via qeq to the label of the _sleep_v_rel.

Embedded interrogative (with matrix declarative)

Cats know whether dogs sleep.

< h1, e2:SEMSORT:TENSE:ASPECT:MOOD,
{ h3:_cat_n_rel(x4:SEMSORT:BOOL:THIRD:PL),
  h5:indef_q_rel(x4,h7,h6),
  h8:_know_v_rel(e2,x4,h9),
  h11:_dog_n_rel(x12:SEMSORT:BOOL:THIRD:PL),
  h13:indef_q_rel(x12,h15,h14),
  h16:_sleep_v_rel(e17:SEMSORT:TENSE:ASPECT:MOOD,x12),
  h9:question_m_rel(h10),
  h10:proposition_m_rel(h18),
  h1:proposition_m_rel(h19)},
{h6 qeq h3,
 h14 qeq h11,
 h18 qeq h16,
 h19 qeq h8  }>

This one is just like the embedded declarative from last week, except there is a question_m_rel in addition to the proposition_m_rel for the embedded clause. Note that _know_v_rel takes the handle of the embedded question_m_rel as its argument directly (no qeq) and the embedded question_m_rel takes the embedded proposition_m_rel as its argument directly (again, no qeq). The next link in the chain (between the argument of the embedded proposition_m_rel and the _sleep_v_rel) does have a qeq.

Once you've got all of these, matrix interrogatives with embedded declaratives or interrogatives should follow!

Imperatives

< h1, e2:SEMSORT:TENSE:ASPECT:MOOD,
{ h1:command_m_rel(h5),
  h6:_sleep_v_rel(e2,x8:second)},
{h5 qeq h6}>

Note that the MARG of the command_m_rel is qeq the handle of the verb, and the ARG1 of the verb is constrained to be second person. (This latter may or may not be appropriate to mimic in other languages.)

Run a baseline test suite

Before making any changes to your grammar for this lab, run a baseline test suite instance. If you decide to add items to your test suite for the material covered here, consider doing so before modifying your grammar so that your baseline can include those examples. (Alternatively, if you add examples in the course of working on your grammar and want to make the snapshot later, you can do so using the grammar you turned in for Lab 6.)

Syntactic differences between clauses

Your first step in this lab should be to cover the syntax of your clause types. Once that is working, worry about the semantics.

Everyone will need to implement a clause-embedding verb type (another subtype of verb-lex). This one should inherit from clausal-second-arg-trans-lex-item (defined in matrix.tdl) as well as verb-lex (defined in klingon.tdl), and constrain the CAT value of its complement appropriately. If your clause-embedding verb from last week doesn't take interrogative complements, you'll need to add one. All clause-taking verbs should place appropriate constraints on the CONT.MSG of their complements.

In addition, you may need to implement one or more of the following:

Complementizers (treat as heads taking sentential complements)
Word order variations
Sentence final particles (question markers) (treat as heads taking sentential complements)
Sentence initial particles (question markers) (treat as heads taking sentential complements)
...

Think before you code! There's too much variety across languages in this domain for me to sketch out all the relevant possibilities in this lab, so you'll need to plan out what you're going to try. I'm happy to answer questions as you do.

To give you some guidance, I describe below what I did for English while testing out the matrix for this lab.

Semantic differences between clauses

There are two parts to the problem of getting the semantics right for clauses:

Making sure each clause type gets the right message(s) inserted.
Making sure that the syntax and semantics correlate as they are supposed to (e.g., if there is a word order that is particular to interrogative clauses, it shouldn't get a parse with propositional semantics).

The matrix has done most of the work for (1), it's just a matter of hooking it in to your grammar in the right way. Hopefully, the English examples below will be useful in this regard.

A sketch of what you need to do

Matrix polar questions just like declaratives

If there is no syntactic difference between matrix (polar) interrogatives and matrix declaratives, you've lucked out. You just need to add a rule that builds question semantics instead of proposition semantics.

This rule will look just like your declarative-clause construction, except that you need to do a bit more work with the semantics, since the type interrogative-clause in matrix.tdl is somewhat underspecified compared to declarative-clause.

There should be two message relations on C-CONT.REL, the first one is identified (by a constraint on the supertype basic-non-rel-clause) with the CONT.MSG of the mother, which interrogative-clause says is [PRED ques_m_rel].
The second message relation introduced by your interrogative construction should be the proposition_m_rel that is `wrapped' inside the question_m_rel.
The MARG of the question_m_rel and the LBL of the proposition_m_rel should be identified.
The MARG of the proposition_m_rel and the LTOP of the head daughter should be in a qeq relation (with the MARG being the HARG).

Once you have both non-branching constructions in, you should find that every sentence has double the parses. Verify that this is so.

An alternative is to change your matrix declarative clause construction so that it introduces an underspecified message relation (prop-or-ques_m_rel). In this case, you'll only have one parse of matrix clauses that can be either declarative or interrogative, with only one message relation in it.

Interrogative matrix clauses: Verbal inflection

If you language marks matrix interrogatives with inflection on the verb, you'll want to do the following:

Create a feature of the type verb say QUES with possible values + and - (bool).
Make verb-lex constrain the value of HEAD.QUES to -.
Write a lexical rule which adds the inflection and changes the value of HEAD.QUES to +. (Alternatively, consider leaving verb-lex unspecified, and writing a pair of rules, one which fills in + and one which fills in -.
Make sure your new lexical rule interacts properly with existing lexical rules.
Make an interrogative clause construction just like that described in Matrix polar questions just like declaratives above, with the added constraint that the head-daughter must be [QUES +].
Make your declarative clause construction require [QUES -] on its head daughter.

Interrogative matrix clauses: Question particle

Some languages mark (matrix) interrogative clauses with a question particle, either at the beginning or the end of a sentence. One way to handle this is like a complementizer, with one of the following two options:

If the order follows what you see in your other head-complement structures, you can build the CP with the ordinary head-complement rule.
If the order of the question particle is different from the usual possibilities for head-complement structures in your language, you might need to define a new head complement rule or add a constraint to one of your existing head-complement rules. The rule that disallows the question particle should constrain the HEAD value of the head daughter to make it incompatible with the question particle. (If your question particle uses a different ordering from other complementizers, talk to me.)

You'll then need a non-branching interrogative clause construction that takes the CP as its daughter (similar to the construction described above). Since the type interrogative-clause inherits from basic-head-only, the mother will also be a CP. That means you'll need to change your root condition to allow these CPs (perhaps disallowing those that only turn up in embedded contexts).

Alternatively, you can have the question particle introduce the question message relation. The lexical type for such an element would have the following constraints:

It should inherit from no-hcons-lex-item and basic-one-arg.
It should link its one argument to the sole item on its COMPS list, and constrain the other valence features to be empty.
It should constrain its own CONT.RELS to be empty. to contain just one item, a message relation with the PRED value question_m_rel.
The sole element of its CONT.RELS list should be identified with its CONT.MSG.
The MARG of its message relation should be identified with its complement's HOOK.LTOP.
Its own HOOK.LTOP should be identified with the LBL of its message relation.
It should place appropriate constraints on its complements CAT features.
It should constrain the CONT.MSG of its complement to be a proposition_m_rel.
It should not be able to serve as a modifier.

Interrogative matrix clauses: Subject-verb inversion

Here's a lexical rule which moves the SUBJ to the head of the COMPS list. That will create subj-verb inversion in a SVO language.

inv-lex-rule := const-cat-change-only-ltol-rule &
  [ INFLECTED +,
    SYNSEM.LOCAL.CAT [ HEAD verb &
                            [ INV + ],
                       VAL [ COMPS < #subj . #comps >,
                             SUBJ < anti-synsem > ],
                       MC #mc,
                       HC-LIGHT #hcl,
                       POSTHEAD #posthead ],
    DTR.SYNSEM.LOCAL.CAT [ HEAD verb,
                           VAL [ COMPS #comps,
                                 SUBJ < #subj & synsem > ],
                           MC #mc
                           HC-LIGHT #hcl,
                           POSTHEAD #posthead ]].

This rule requires a feature INV (inverted) (with default value bool added to the type verb. This feature keeps track of which phrases are headed by inverted verbs, so we can check for that in constructions like the matrix yes-no question rule.
You can then make the non-branching construction for interrogative clauses require [INV +] on the head daugther. Verbal lexemes should be [INV -] to start with.

Embedded interrogatives: Just like matrix interrogatives

If your embedded interrogatives look just like your matrix ones, you're in luck. You only need to make sure that those clauses can appears as the complement of appropriate verbs.

Embedded interrogatives: Marked by question particles

If your embedded interrogatives involve a question particle or complementizer, but your matrix interrogatives don't, you'll want to develop a complementizer analysis for the embedded ones. It probably makes more sense to follow the instructions for matrix interrogatives with question particles rather than the ones for embedded declaratives from last week, as the semantics is different.

If your matrix declaratives and interrogatives look the same and you went with a single clause rule for matrix clauses (which introduces an underspecified message), but your embedded clauses are distinguished either by the complementizer involved or perhaps only by the selecting verb, you'll need to have the complementizer or the selecting verb do some semantic work to get the message on an embedded clause right.

If you have interrogative complementizers: They'll look like the message-introducing question particle described above.By constraining their complement's MSG to be proposition_m_rel, they will in fact be resolving the underspecification. The constraints described above also produce a complementizer that introduces the addition question_m_rel and hooks up the handles appropriately.

If you do not have interrogative complementizers but the embedded verb constrains message of its complement: For verbs that can take only embedded questions, you'll want to have them introduce the question_m_rel, resolve any underspecification on the complement's message, identify the MARG of the question message with the LTOP of the embedded clause, and identify the LBL of the question message with an ARGn in their own key relation. Note that the lower-level lexical types in the matrix are constrained to only introduce on relation on the RELS list. This means that if you define this type of verb, you'll need to work with higher-level supertypes in the matrix, avoiding single-rel-lex-item.

Embedded interrogatives: Other

Talk to me :-)

Imperatives

English expresses imperatives by leaving out the subject and requiring a special form of the verb. One way to handle this is to make a construction which inherits from basic-head-opt-subj-phrase, marks the clause as imperative (say with a new CAT feature), requires a particular FORM value on the verb, and constrains the index of the daughter's SUBJ to be second person. Then a second non-branching rule, inheriting from imperative-clause would take the mother of the first and produce a constituent with clausal semantics. We would also need an appropriate lexical to produce verbs in the right form. Furthermore, the other clausal constructions need to be made sensitive to the `imperative' feature so as to reject the phrases built by this head-opt-subj rule as daughters.

It might seem like a better idea to have just one rule which does the SUBJ cancellation and the clausal semantics. This is not possible with the current version of the matrix (basic-head-opt-subj-phrase inherits from a supertype which constrains it to be [MSG no-msg]). This might get revised :-).

If your language allows subject prodrop generally, and requires it on imperatives, you could handle this by having the head-opt-subj rule record its application with a CAT feature. The imperative clause rule would be sensitive to this feature (as well as the HEAD.FORM, if appropriate), but the other clausal constructions wouldn't be.

If your language marks imperatives with some sort of sentence-final or sentence initial particle, see the instructions above for doing the analogous thing with interrogatives.

If your language does something else, talk to me :)

Test your grammar

Use your test suite to check the syntactic coverage of your grammar.
Examine the semantic representations you assign to each of the clause types, and compare them to the examples given in the lab instructions.
Check for overgeneration (syntactic forms associated with one clause type showing up in other clause types, multiple parses for single sentences with spurious clause type assignments or lack of clausal semantics).

Write up

Describe the syntactic properties embedded and matrix interrogative clauses and matrix imperatives in your language. Illustrate your points with glossed examples.
Describe the current coverage of your grammar with respect to those properties.
Describe how you handled these syntactic properties (or attempted to handle them).
Describe how you handled the semantic properties (or attempted to handle them).
Indicate which clause types are getting correct semantic representations (according to the examples given at the beginning of this assignment), and how those that aren't differ.

Submit via ESubmit

Be sure your matrix folder includes your write-up.
Be sure your matrix folder includes a tsdb/home directory with your initial and final test suite runs for this lab (and preferably nothing else, so I cna easily find these).
Consider removing the doc/ subdirectory in order to save space on E-Submit.
Compress the folder, and upload it to ESubmit.
Submit it by midnight Sunday night (preferably by Friday evening :-).

Back to main course page