Lab 8 (due 2/28 11:59 pm)

Overview

The goals for this lab are to explore just a bit of the space between these small grammars and coverage of naturally occurring data and to get the grammars ready to have lots of success in MT, once we add transfer rules next week! Specifically, you will be doing the following:

Identify and implement one phenomenon from the small corpus you collected, if this is feasible.
Getting one more phenomenon from the MMT set working in your grammar.
- If necessary, build out your testsuite to (further) document this phenomenon.
Cleaning up generator output some more.
Harmonizing the MRSes to the extent that is reasonable.
Documenting the current status of translation from eng and sje to your language, per sentence.
... as usual, running before & after testsuites.
... as usual, writing it all up!

For tdl editing, please practice incremental development: Test as frequently as you possibly can, both by compiling the grammar and by testing specific sentences.

Choose your phenomena

From among your test corpus, find a phenomenon that seems simple but is not yet handled by your grammar. Please post to Canvas by Tuesday about what you would like to work on, including IGT, and your understanding of how that phenomenon works in your grammar. I will provide guidance on how to approach this in tdl (if feasible).

If you have a lot to do still with the MMT sentences, or don't see anything particularly feasible from the corpus, you can pick two phenomena from the MMT sentences instead one of each.

From among the MT sentences, find a phenomenon that your grammar does not yet handle, or does not yet handle properly, and fix it. Post to Canvas by Tuesday what you're working on, with IGT, so I can provide guidance. Here are the sentences grouped by phenomena. The expectation is that for whichever phenomenon you pick this week, you'll have all of the associated sentences working (but this might not always be feasible; if not, please let me know over Canvas).

Simple transitive & intransive clauses:
```
	Dogs sleep
	Dogs chase cars
      
```
Pronouns, PNG
```
	I chase you
      
```
Argument optionality
```
	Dogs eat
      
```
Sentential negation
```
	The dogs dont chase cars
      
```

Clausal complements

	I think that you know that dogs chase cars
	I ask whether you know that dogs chase cars

Coordination

	Cats and dogs chase cars
	Dogs chase cars and cats chase dogs
	Cats chase dogs and sleep

Polar (yes/no) questions
```
	Do cats chase dogs
      
```

Non-scopal modifiers (adj & PP)

	Hungry dogs eat
	Dogs in the park eat
	Dogs eat in the park

Non-verbal predicates (AP, PP, NP)

	The dogs are hungry
	The dogs are in the park
	The dogs are the cats

Possessives

	The dog s car sleeps
	My dogs sleep

wh questions

	Who sleeps
	What do the dogs chase
	What do you think the dogs chase
	Who asked what the dogs chase
	I asked what the dogs chased

Clausal modifiers

	The dog sleeps because the cat sleeps
	The dog sleeps after the cat sleeps

Build out your testsuite and iso.txt for your phenomenon

Consult your descriptive materials for the phenomenon you chose, to understand how it is expressed in your language.

Add the relevant sentences to iso.txt and to your testsuite. The testsuite should also include relevant negative examples.

(If this was already done in previous labs for your phenomenon, that is okay.)

Improve your grammar for your chosen phenomena

I expect this to be done in collaboration with me, which is why I'm asking you to post by Tuesday indicating which phenomena you are working on and how it is expressed in your language.

In some cases, I'll suggest that you get an analysis from the customization system and integrated it into your current grammar (or at least use it as a starting point). In others, I might directly suggest some tdl or a general approach.

You are welcome to start with a suggestion of how you'd like to handle it, but please don't dive into extensive implementation without discussing the approach with me.

As usual, please practice incremental development and test frequently (by compiling the grammar and testing individual sentences, as well as by running your full testsuite).

Further generation clean up

For each MMT sentence that goes through, look at the range of generator outputs. How much realized-string ambiguity are you getting? What are the sources of ambiguity?

For MMT sentecnes that don't go through for lack of transfer rules, try using the MMT system with your language as both input and output. How much realized-string ambiguity are you getting? What are the sources of ambiguity?

If the ambiguity relates to overgeneration (i.e. your grammar generating ungrammatical strings), we'll want to work on adding appropriate constraints. Please psot to Canvas for assitance.

Likewise, if you have any ambiguity that relates to semantically empty things (affixes, words) that you didn't previously clean up, work on that some this week too. Please post to Canvas for assistance.

Harmonize MRSes and investigate need for transfer rules

In this part of the lab, your task is to try each of the items in the MMT sentences for both sje and eng as source languages and your language as the target language. (Note that sje doesn't have all of the items).

For each item that does go through, are you getting the expected output?
For each item that doesn't go through, how do the MRSes differ? Some differences might be legitimate (e.g. predications from overt pronouns v. dropped arguments, "hunger eats me"). Others might be things that could be resolved be assimilating your MRS to those from sje & eng or by refining your semi.vpm file further. In the latter cases, please make the changes (posting to Canvas for assistance as much as you need!). If it's not clear how to classify a given case, please post to Canvas for discussion.

Run both the test corpus and the testsuite

Following the same procedure as usual, do test runs over both the testsuite and the test corpus.

Collect the following information to provide in your write up:

How many items parsed?
What is the average number of parses per parsed item?
How many parses did the most ambiguous item receive?
What NEW sources of ambiguity can you identify?

Write up

Your write up should be a plain text file (not .doc, .rtf or .pdf) which includes the following:

A description of how the phenomena you picked to work on are expressed in your language, including IGT. For each:
A description of your implementation of this phenomenon, including:
- A prose description of the analysis you implemented
- The specific tdl you added/changed (paste it into the file)
- IGT I can use to test the analysis
- Any questions you have/things you want me to look into.
A description of any clean up work you did to get generation down to a reasonable number of outputs, including:
- Which MMT items you worked on in this way
- What the sources of extra generation output were
- What changes you made to grammar (described in prose and illustrated with tdl)
- Before/after numbers on how many outputs you're getting
A description of the status of each MMT item, with sje as source and with eng as source. Possible statuses:
- Works! (But document how many outputs you get)
- Doesn't work, because your language string doesn't parse and/or isn't available.
- Doesn't work, because MRSes are different. Indicate how they differ.
- Doesn't work, MRSes look the same, not sure what's going on.
A description of the performance of your final grammar for this week on the test suite and test corpus, as compared to your starting grammar (see details above).

Submit your assignment

Be sure your write up and the text-file version of your test suite are included in your grammar directory.
Likewise, make sure that tsdb/home includes four profiles:
1. Final testsuite with initial grammar for the week
2. Final testsuite with final grammar for the week
3. Test corpus with initial grammar for the week
4. Test corpus with final grammar for the week
If you're using svn, export the grammar so I don't get all your .svn files:
```
svn export yourgrammar iso-lab8

For git, please do the equivalent.
```

Create a tarball:

      tar czf iso-lab8.tgz iso-lab8

Upload the tarball to Canvas.

Back to course page

Last modified: