Due: Feb 9th, 2010 at 11:59PM
1. Objectives and Overview
Imagine you are writing a grammar-checker application to reject ill-formed English sentences, but accept well-formed ones. You are asked to create a feature-grammar to parse the sentences, using morphosyntactic and semantic features to distinguish between ill-formed and well-formed sentences.
You are strongly encouraged to use the NLTK, but if you wish to use other code, e.g., Java code, you may do so.
2. Inputs
The input file for this assignment is:
- sentences: a text file with 30 (mostly inane) English sentences
There is also a reference file:
- sentences_key: a text file indicating which sentences are ill-formed (just so there’s no confusion)
3. Detailed instructions
Task 0:
Prepare for the assignment by reading Chap09 of the NLTK book. Review lecture notes and information from J&M.
Task 1:
Write a feature grammar that will accept all well-formed sentences, but reject all ill-formed ones. Call your grammar grammar.fcfg.
Task 2:
Using nltk.parse.FeatureEarleyChartParse (or your own) parse each item in sentences. Print your results to a file called results, which should be in the same format as sample_results. That is, print your parse on a single line AND output a blank line for ill-formed sentences. Your results file should be exactly 30 lines long. NOTE: if you get an ambiguous sentence, then just print one parse.
Task 3: Please comment your code; include your names AND NetIDs somewhere in the main file and/or cmd script.
4. Running your code
Your code should run on Patas without error. And in order for us to run your assignment in a semi-automated fashion, please include a single shell script file called, e.g., hw4.cmd. We will run your homework on Patas using the following command:
$ condor_submit hw4.cmd
Once we untar your assignment (see below), this shell script should be in the top level of whatever directory structure you’re using.
Within your hw4.cmd file write your .out, .log, .error, etc, files to the top-level directory where the hw4.cmd file is. The script should call all necessary code. This way, you can use whatever language you like and whatever directory structure makes sense to you. Please refer to the detailed explanation of each assignment for what kinds of output files to produce, and what kinds of supplementary files are required. See the CLMA wiki pages for help on this.
5. How to turn in your work
Turn in your assignment using CollectIt. Please TAR your files and name the tar’d file with the extension .tar. Please don’t use ZIP, tar.gz, gzip, rar, etc.
Use the filename of whatever homework we’re on, e.g. for homework 6 name your file hw6.tar. Yes you will all have the same filename for your homeworks, but this doesn’t matter because of the way that CollectIt handles things.
To tar (available on Patas) from the directory that your work is in:
$ tar -cvf hw6.tar *
6. Assessment
This homework is worth 10% of your total grade. Assessment criteria are explained here.