Due: Tuesday, Mar 16th, 2010 at 11:59PM
1. Objectives and Overview
For this assignment you are asked to develop a portion of a 3-step generation system: the microplanner and the realizer.
2. Reference files
The input file for this assignment is:
- textplan.xml: an XML file containing one text plan
- textplan.dtd: a DTD for the input file (not required)
3. Detailed instructions
Task 0:
Review the lectures on NLG. Get familiar with the SimpleNLG package.
Task 1: (5 pts)
Write a microplanner that performs the following tasks:
- read in the input file
- create an appropriate internal representation
Task 2: (10 pts)
For that internal representation, create code to do some sort of:
- aggregation
- referring expression generation (e.g., pronouns)
As long as you demonstrate one example for each of these, you will get full credit. The result of this task will be a text specification. You do not necessarily need to output anything for this task. If you do, then please name that file textspec. If you do not output a file, then you’ll want to create objects of type TextSpec. Please use this name so that we can better understand your code.
NOTE: you may hard-code the mapping between plan elements and the lexicon, e.g., the concept tuberculosis and the lexical entry tuberculosis.
Task 3: (10 pts)
Use the SimpleNLG package to realize your text specification. The output of Task 3 should be a text document resembling this target text. Call your output file output. It does NOT have to match exactly, but all information from the text plan (XML file) should be expressed.
NOTE: you may use any programming language you like to accomplish Tasks 1 and 2, but Java will be the easiest since you’re code needs to interact with the SimpleNLG realizer.
4. Running your code
Your code should run on Patas without error. And in order for us to run your assignment in a semi-automated fashion, please include a single shell script file called, e.g., hw7.cmd. We will run your homework on Patas using the following command:
$ condor_submit hw7.cmd
Once we untar your assignment (see below), this shell script should be in the top level of whatever directory structure you’re using.
Within your hw7.cmd file write your .out, .log, .error, etc, files to the top-level directory where the hw7.cmd file is. The script should call all necessary code. This way, you can use whatever language you like and whatever directory structure makes sense to you. Please refer to the detailed explanation of each assignment for what kinds of output files to produce, and what kinds of supplementary files are required. See the CLMA wiki pages for help on this.
5. How to turn in your work
Turn in your assignment using CollectIt. Please TAR your files and name the tar’d file with the extension .tar. Please don’t use ZIP, tar.gz, gzip, rar, etc.
Use the filename of whatever homework we’re on, e.g. for homework 6 name your file hw7.tar. Yes you will all have the same filename for your homeworks, but this doesn’t matter because of the way that CollectIt handles things.
To tar (available on Patas) from the directory that your work is in:
$ tar -cvf hw7.tar *
6. Assessment
This homework is worth 25% of your total grade. Assessment criteria are explained here.