Lab 8 (Due 2/24 11:45pm)

Preliminaries

These instructions will most likely get edited over the next couple of days. I'll tryt o flag changes.

As usual, check the write up instructions first.

Requirements for this assignment


Run a baseline test suite

Before making any changes to your grammar for this lab, run a baseline test suite instance. If you decide to add items to your test suite for the material covered here, consider doing so before modifying your grammar so that your baseline can include those examples. (Alternatively, if you add examples in the course of working on your grammar and want to make the snapshot later, you can do so using the grammar you turned in for Lab 7.)


Background

The goal of this lab is to model morphosyntactic (and, if you prefer, pseudo-model prosodic) marking of information structural concepts. Information structure is a pragmatic phenomenon relating to how the speaker/author presents the information contained in an utterance. Only in rare cases do we find grammaticality affected by information structure marking. Rather, marking of information structure constrains the possible interpretations of an utterance.

There is no consensus yet among linguists as to the range of semantic/pragmatic distinctions that should be made in information structure, nor on how to represent these distinctions. Taking the approach of incremental development, we will start with a simple three way distinction between topic, focus, and unmarked. I take these to be properties not of referents, but of the linguistic expressions that refer to referents, and in particular, of semantic indices. Working loosely from Lambrecht 1996, topic and focus are defined as follows:

A few things to note:


Representations

We will be representing information structure via a new feature within CONT, called ICONS (for "individual constraints"). ICONS will have a diff-list as its value, like HCONS or RELS. The items on the ICONS list will be feature structures of type info-str. Each of these has the features CLAUSE and TARGET, indicating which index has the topic/focus property (the TARGET) and with respect to which clause (CLAUSE). The subtype of the info-str feature structure will indicate which relation (topic or focus) is involved.

Each relation-bearing lexical entry will introduce an underspecified ICONS element into the ICONS list. Because we don't want to go digging around in diff-lists, the lexical entries also each maintain a pointer to the ICONS they introduced via the feature HOOK.--ICONS.

Sanghoun has prepared a new version of matrix.tdl which has the infrastructure you'll need for this new feature. Please download it, and place it in your grammar directory (overwriting the old matrix.tdl). Then try loading your grammar and parsing your test suite. As there have been a couple of other changes to matrix.tdl since you last used the customization system, you may find that you need to make adjustments to your my_language.tdl file unrelated to information structure.


Add information-structure marking constructions

NB: What we're targeting here is constructions that specifically mark information structure, rather than being strongly correlated with it. For example, English subjects tend to be topics, but aren't necessarily so. Therefore, we wouldn't mark subject position in English as [INFO-STR topic].

This section lists a few kinds of topic/focus marking that I'm aware of, with some sketches of how to implement them. It is expected that you will post the details of what's happening in your language to GoPost so I can make more specific suggestions. Please do this as early in the week as possible.

Position in the sentence

In some languages, distinguished positions (e.g., right before the verb, sentence-intial, etc.) are associated with topic or focus. The strategy here is to identify the rules that license elements in the relevant position, and then have the rules constrain the HOOK.--INCONS of the appropriate daughter. In some cases, you may need to create new rules: If there's a sentence-initial "topicalized" position, you may need a head-filler construction. If there's a preverbal "focus" position, you may need to bifurcate the head-final rules to create one series that insists on a lexical verb ([HEAD verb, LIGHT +]) as the head and another that allows larger constituents as the head. Only the former will constrain INFO-str.

Focus/topic clitics or adpositions

These ones are relatively straighforward. They are either heads combining with complements or modifiers combining with heads. The first step is to get the syntax right. Post the details to GoPost if it's not (immediately) clear how to do it (10 minute rule and all that).

Semantically, they constrain the HOOK.--ICONS value of the element they combine with (through either the COMPS list or the MOD list, depending).

Cleft construction

Some languages mark focus with a construction that involves the copula and a relative clause, like English "It was Kim who left." where "Kim" is focused. Since we're not otherwise handling relative clauses, these are outside the scope of this lab.

Focus prosody

In many languages, the primary means for unambiguously marking focus is prosody (intonation). This isn't typically represented in the orthography, so we can only pseudo-model it. The plan here is to make up an affix (-FP, for "Focus Prosody") that attaches to the word bearing the focus marking. This affix should go last in the chain of lexical rules (so make its DTR value be the type of the last existing lexical rule, or a -dtr supertype inherited by the set of last existing lexical rules in case some of those are optional). It should also be optional, which can be achieved by making it lexeme-to-lexeme

More specifically, this rule should be a infl-add-only-no-ccont-ltol-rule, and its only effect besides adding the -FP affix should be to constrain the INDEX.--ICONS to focus.


Check your semantics

Once you're satisified with the syntax of your topic and/or focus marking, take a look at the semantics. If you just look at the MRS the way we have been, you won't see any changes. This is because the ICONS feature is new, and not yet incorporated into the code that does the MRS display. (It's also not yet incorporated into the generator, and so generation will ignore the ICONS information, unfortunately.)

To see the ICONS information, you'll need to look at the feature structure for the top-most node in the tree, and then navigate to the CONT.ICONS. You should see one item on the ICONS list for every (non-semantically empty) lexical item. The ones that are marked as topic or focus should have the correct types, while the others should just be unmarked (i.e., just info-str).


Variable property mapping

Adding topic and/or focus marking is probably increasing your range of generation outputs. Unfortunately, the generator isn't paying attention to ICONS (yet), so we can't constrain this. Instead, this week we'll work on using vpm to constrain other kinds of generator output variation, e.g., multiple different tense/aspects from one input. The basic strategy is to take any underspecified values in variable properties and translate them, via vpm, to something that conflicts with any more specific values your grammar can produce.

The file semi.vpm provides a mapping between grammar-external features of indices (referential indices and events) and their values, and grammar-internal ones. For background on VPM, see the DELPH-IN wiki. As soon as you start using a VPM file, then only variable properties (features on indices) that are handled in the file are actually preserved.

  1. Save the file semi.vpm to your grammar directory. (This starter file should already handle the INFO-STR marking appropriately.)
  2. Edit the file lkb/script to add the following line, right before the comment that starts "Next, the lexicon itself":
    (mt:read-vpm (lkb-pathname (parent-directory) "semi.vpm") :semi)
    
  3. If your grammar uses a PERNUM feature, you'll need to map separate PER and NUM features from the external (right-hand side) of the VPM to a single PRENUM feature on the internal (left-hand side). See the example under "Properties: An Example" on the DELPH-IN wiki page.
  4. If your grammar encodes aspectual distinctions, you'll need to add an ASPECT section, modeled on tense. This should allow you to create and use specific a default value of ASPECT.
  5. If you have any other features you have added on indices, you will need to provide VPM entries for them as well.
  6. If your language has aspect marked in some sentences but other forms that are just underspecified for aspect, you'll want to have the default aspect be "no-aspect". Define this as a subtype of aspect in your grammar, but don't have anything other than the semi.vpm mention it otherwise.
  7. You can do a similar trick for other kinds of generation ambiguity relating to variable properties.

Test your semi.vpm file by parsing and then generating. You should see fewer strings coming out.


Write up your analyses

  1. Describe the topic and focus marking that you found in your language, including IGT examples that I can test.
  2. Describe how you implemented the topic and focus marking.
  3. If your implementation is incomplete, state how, and provide IGT examples illustrating problems, if you would like me to take a look.
  4. Describe what steps you had to take to make your grammar generate, or, if it's not generating, any ideas you have on where the problem might be. If some examples generate but not others, provide an example of each for me to test & provide feedback.
  5. Describe any changes you needed to make the semi.vpm file, and the effects that including the semi.vpm had on generation.
  6. Describe the current coverage of your grammar over your test suite (using numbers you can get from Analyze | Coverage and Analyze | Overgeneration in [incr tsdb()]) and a comparison between your baseline test suite run and your final one for this lab (see Compare | Competence).

Submit your assignment


Back to main course page
ebender at u dot washington dot edu
Last modified: 2/18/12