Linguistics 567: Grammar Engineering

This week: Coordination, with special guest Scott Drellishak (sfd@u)

Lab 8 Due 5/27

Background

The goal of this lab is to add noun phrase, verb phrase, adjective phrase, and sentence coordination to your grammar, and to be able to parse and generate sentences containing coordinated structures. You will be provided with a set of general coordination rules from which you will derive rules specific to your language. (The coordination scaffolding I'll be providing is a work in progress, so it's possible you'll find that something doesn't work right at first. If this happens, please let me know what goes wrong (and how you fixed it, if you did) so that what finally goes into the Matrix is as solid as possible.)

Coordination in Your Language

There are many different ways coordination can be marked in your language, including a conjunction like English and, a suffix or prefix, or possibly no marking at all (juxtaposition). In addition, your language might mark coordination differently on different phrase types -- for example, it may use a special verb inflection to mark VP coordination, but juxtaposition for noun phrases. You'll need to collect the coordination facts about your language before you come to the lab. Note that you only need the facts about coordination strategies that mean something like "and" -- we won't be handling "or", "but", "then", etc.

Caveat: Some languages don't seem to have a coordination structure that's a single constitutent, instead using an adjunct marked by an adposition or affix meaning with. If you have such a language, for the purposes of this lab you'll be pretending it does have a balanced coordination strategy so that you have something to work on.

Top, Mid, and Bottom

A coordinate structure may have two or more coordinands. The Matrix handles this with two or three rules: a bottom-coord rule that marks the appropriate type of phrase as coordinated, an optional mid-coord rule that iterates as many times as necessary above the bottom rule, and a top-coord rule that is the root of the coordinated structure. For example, the English coordinated NP "a fish, a barrel, and a smoking gun" would have the structure:

[_NP-top [_NP a fish ] [_NP-mid [_NP a barrel ] [_NP-bottom and [_NP a smoking gun ]]]]

Semantic Representation

The following are some sample semantic representations for each phrase type you'll be working on. They've been slightly abbreviated, and there's some wiggle room -- your semantics may have different quantifer relations, for example.

Noun phrase coordination: "dogs and cats leave"

<h1,e2,
{h3:_dog_n(x4),
h5:indef_q(x4,h6,h7),
h8:_and_coord(x9,h11,x4,h12,x10),
h13:_cat_n(x10),
h14:indef_q(x10,h15,h16),
h17:indef_q(x9,h8,h18),
h1:_leave_v(e2,x9)},
{h6 qeq h3,
h15 qeq h13}>

Adjective phrase coordination: "red and blue dogs leave"

<h1,e2,
{h3:_red_adj(e4,x5),
h6:_and_coord(e8,h3,e4,h9,e7),
h9:_blue_adj(e7,x5),
h6:_dog_n(x5),
h10:indef_q(x5,h11,h12),
h1:_leave_v(e2,x5)},
{h11 qeq h6}>

Verb phrase coordination: "cats eat and leave"

<h1,e2,
{h3:_cat_n(x4),
h5:indef_q(x4,h6,h7),
h8:_eat_v(e9,x4),
h1:_and_coord(e2,h8,e9,h11,e10),
h11:_leave_v(e10,x4)},
{h6 qeq h3}>

The most important thing to notice about these representations is the _and_coord_rel, which is used to semantically coordinate two other relations. It has five arguments: its own INDEX (an individual or event), L-HNDL and L-INDEX for its left coordinand, and R-HNDL and R-INDEX for its right coordinand. Note that the usual argument relationships between subjects and verbs and between adjectives and their modified nouns remain. Also note that the L-HNDL and R-HNDL are identified with the appropriate handle of the coordinand for adjective and verb phrases, but not for noun phrases. In more-than-two-way coordinations, you'll likely see implicit_coord, a binary coordination relation that is inserted to hook all the coordinations together, in much the same way that the binary mid rule is used to hook n-way coordinated phrases together. Here's an example:

Noun phrase coordination: "dogs cats and fish eat"

<h1,e2,
{h3:_dog_n(x4),
h5:indef_q(x4,h6,h7),
h8:_cat_n(x9),
h10:indef_q(x9,h11,h12),
h13:_and_coord(x14,h16,x9,h17,x15),
h18:_fish_n(x15),
h19:indef_q(x15,h20,h21),
h22:indef_q(x14,h13,h23),
h24:implicit_coord(x25,h26,x4,h27,x14),
h28:indef_q(x25,h24,h29),
h1:_eat_v(e2,x25)},
{h6 qeq h3,
h11 qeq h8,
h20 qeq h18}>

Implementing Coordination

All the rules and definitions necessary to implement coordination can be found in this file: coord.tdl. Download it and put it in your Matrix directory, or wherever you like -- you'll be copying the rules contained in it into esperanto.tdl. After downloading the file, follow these steps:

coord.tdl contains a definition of a lexical type conj-lex, which you should use for any lexical coordinators (a.k.a. conjunctions) in your language. Your lexical item should go into lexicon.tdl and should look something like this:

and := conj-lex &
  [ STEM < "and" >,
    SYNSEM [ LOCAL.CAT [ HEAD.MOD null,
                         VAL [ SPR < >,
                               SUBJ < >,
                               COMPS < > ]],
             LKEYS.KEYREL.PRED '_and_coord_rel ]].

Whether your language has different coordination strategies for different parts of speech or not, you're going to need a set of rules for each phrase type because the semantics of each will differ. For each phrase type, therefore, you'll need a top-coord rule, a bottom-coord rule, and in some cases a mid-coord rule (see the comments in coord.tdl to determine which you need). Your top and mid rules will inherit from two rules in coord.tdl: a basic phrase-type specific top or mid rule and a rule that specifies the coordination marking pattern (asyndeton, monosyndeton, polysyndeton...) for that strategy. Your bottom rule will derive from either the unary or binary bottom rule in coord.tdl.
For example, here are the rules that handle NP coordination in English:
```
np-top-coord-rule := basic-np-top-coord-rule & monopoly-top-coord-rule.

np-mid-coord-rule := basic-np-mid-coord-rule & monopoly-mid-coord-rule.

np-bottom-coord-rule := binary-bottom-coord-rule &
  [ SYNSEM.LOCAL.CAT.HEAD noun,
    ARGS.REST.FIRST.SYNSEM.LOCAL.CAT.HEAD noun ].
```
(These rules will probably work for some of your languages, so if you're feeling very lazy you might be able to get away with just cutting and pasting them.)
One of the things these rules do is manipulate the COORD feature, which is a boolean flag that is only set inside of coordinated structures. coord.tdl contains some type addendum statements that prevent ordinary headed rules from interacting with COORD + phrases. You'll need to copy all of these into esperanto.tdl. You'll also need to go into roots.tdl and make sure any roots you've specified are COORD -.
After you've created your language-specific rules, go add them to rules.tdl as usual, then load up your grammar and try parsing some sentences with coordination in them. Do the semantics look like the representations above? Next, try generating. Do you get more any spurious sentences? You may find you need to constrain the coordination rules more (in particular, the general rules don't take most agreement phenomena into account) -- do this by modifying your language-specific rules rather than the rules in coord.tdl.
It's nice to be able to see coordination structures in your trees, so you should make labels for them in labels.tdl, perhaps XP-T for top phrases, XP-M for mid phrases, and XP-B for bottom phrases. How to distinguish the different phrases is left as an exercise for the reader (which means I haven't got a completely-working solution yet -- mid is sort of tricky for some kinds of strategies...).

Testing

Run your master test suite through the improved grammar and make sure you haven't broken anything.
Add additional items to the test suite to fully exercise the coordination rules. Include examples of all different phrase types, including both V- and VP-coordination. Make sure to include plausible but ungrammatical strings and to be sure they don't parse.
If you haven't yet, try generating. Satisfy yourself that any additional generated sentences really should be there.

Write up

Describe the facts of coordination in your language. Use glossed examples.
Describe how you implemented or attempted to implement coordination. Detail which information you incorporated from coord.tdl and the language specific subtypes you defined. If you had to add additional constraints, describe in detail what types, contraints, and rules you had to add.
Describe the current coverage of your grammar with respect to the coordination facts of the language, both syntactic and semantic: are you getting the right strings and only the right strings? Do they get the right meaning? Make sure these facts are represented in your test suite. If you don't have complete coverage, speculate as to what you need to do to get there. If there are particular problematic strings, include them in the write up so I can try parsing them and give you feedback.

Submit via ESubmit

Be sure your matrix folder includes your test suite and your write-up.
Consider removing the doc/ subdirectory in order to save space on E-Submit.
Compress the folder, and upload it to ESubmit.
Submit it by midnight Sunday night (preferably by Friday evening :-).

Back to main course page