These instructions will most likely get edited over the next couple of days. I'll tryt o flag changes.
As usual, check the write up instructions first.
Requirements for this assignment
Before making any changes to your grammar for this lab, run a baseline test suite instance. If you decide to add items to your test suite for the material covered here, consider doing so before modifying your grammar so that your baseline can include those examples. (Alternatively, if you add examples in the course of working on your grammar and want to make the snapshot later, you can do so using the grammar you turned in for Lab 7.)
The goal of this lab is to model morphosyntactic (and, if you prefer, pseudo-model prosodic) marking of information structural concepts. Information structure is a pragmatic phenomenon relating to how the speaker/author presents the information contained in an utterance. Only in rare cases do we find grammaticality affected by information structure marking. Rather, marking of information structure constrains the possible interpretations of an utterance.
There is no consensus yet among linguists as to the range of semantic/pragmatic distinctions that should be made in information structure, nor on how to represent these distinctions. Taking the approach of incremental development, we will start with a simple three way distinction between topic, focus, and unmarked. I take these to be properties not of referents, but of the linguistic expressions that refer to referents, and in particular, of semantic indices. Working loosely from Lambrecht 1996, topic and focus are defined as follows:
A few things to note:
We will be representing information structure via a new feature within CONT, called ICONS (for "individual constraints"). ICONS will have a diff-list as its value, like HCONS or RELS. The items on the ICONS list will be feature structures of type info-str. Each of these has the features CLAUSE and TARGET, indicating which index has the topic/focus property (the TARGET) and with respect to which clause (CLAUSE). The subtype of the info-str feature structure will indicate which relation (topic or focus) is involved.
Each relation-bearing lexical entry will introduce an underspecified ICONS element into the ICONS list. Because we don't want to go digging around in diff-lists, the lexical entries also each maintain a pointer to the ICONS they introduced via the feature HOOK.--ICONS.
Sanghoun has prepared a new version of matrix.tdl which has the infrastructure you'll need for this new feature. Please download it, and place it in your grammar directory (overwriting the old matrix.tdl). Then try loading your grammar and parsing your test suite. As there have been a couple of other changes to matrix.tdl since you last used the customization system, you may find that you need to make adjustments to your my_language.tdl file unrelated to information structure.
NB: What we're targeting here is constructions that specifically mark information structure, rather than being strongly correlated with it. For example, English subjects tend to be topics, but aren't necessarily so. Therefore, we wouldn't mark subject position in English as [INFO-STR topic].
This section lists a few kinds of topic/focus marking that I'm aware of, with some sketches of how to implement them. It is expected that you will post the details of what's happening in your language to GoPost so I can make more specific suggestions. Please do this as early in the week as possible.
In some languages, distinguished positions (e.g., right before the verb, sentence-intial, etc.) are associated with topic or focus. The strategy here is to identify the rules that license elements in the relevant position, and then have the rules constrain the HOOK.--INCONS of the appropriate daughter. In some cases, you may need to create new rules: If there's a sentence-initial "topicalized" position, you may need a head-filler construction. If there's a preverbal "focus" position, you may need to bifurcate the head-final rules to create one series that insists on a lexical verb ([HEAD verb, LIGHT +]) as the head and another that allows larger constituents as the head. Only the former will constrain INFO-str.
These ones are relatively straighforward. They are either heads combining with complements or modifiers combining with heads. The first step is to get the syntax right. Post the details to GoPost if it's not (immediately) clear how to do it (10 minute rule and all that).
Semantically, they constrain the HOOK.--ICONS value of the element they combine with (through either the COMPS list or the MOD list, depending).
Some languages mark focus with a construction that involves the copula and a relative clause, like English "It was Kim who left." where "Kim" is focused. Since we're not otherwise handling relative clauses, these are outside the scope of this lab.
In many languages, the primary means for unambiguously marking focus is prosody (intonation). This isn't typically represented in the orthography, so we can only pseudo-model it. The plan here is to make up an affix (-FP, for "Focus Prosody") that attaches to the word bearing the focus marking. This affix should go last in the chain of lexical rules (so make its DTR value be the type of the last existing lexical rule, or a -dtr supertype inherited by the set of last existing lexical rules in case some of those are optional). It should also be optional, which can be achieved by making it lexeme-to-lexeme
More specifically, this rule should be a infl-add-only-no-ccont-ltol-rule, and its only effect besides adding the -FP affix should be to constrain the INDEX.--ICONS to focus.
Once you're satisified with the syntax of your topic and/or focus marking, take a look at the semantics. If you just look at the MRS the way we have been, you won't see any changes. This is because the ICONS feature is new, and not yet incorporated into the code that does the MRS display. (It's also not yet incorporated into the generator, and so generation will ignore the ICONS information, unfortunately.)
To see the ICONS information, you'll need to look at the feature structure for the top-most node in the tree, and then navigate to the CONT.ICONS. You should see one item on the ICONS list for every (non-semantically empty) lexical item. The ones that are marked as topic or focus should have the correct types, while the others should just be unmarked (i.e., just info-str).
Adding topic and/or focus marking is probably increasing your range of generation outputs. Unfortunately, the generator isn't paying attention to ICONS (yet), so we can't constrain this. Instead, this week we'll work on using vpm to constrain other kinds of generator output variation, e.g., multiple different tense/aspects from one input. The basic strategy is to take any underspecified values in variable properties and translate them, via vpm, to something that conflicts with any more specific values your grammar can produce.
The file semi.vpm provides a mapping between grammar-external features of indices (referential indices and events) and their values, and grammar-internal ones. For background on VPM, see the DELPH-IN wiki. As soon as you start using a VPM file, then only variable properties (features on indices) that are handled in the file are actually preserved.
(mt:read-vpm (lkb-pathname (parent-directory) "semi.vpm") :semi)
Test your semi.vpm file by parsing and then generating. You should see fewer strings coming out.
tar czf lab8.tgz *
(When I download your submission from CollectIt, it comes in a directory named with your UWNetID. The above method avoids extra directory structure inside that directory.)