Ling 571 - Deep Processing Techniques for NLP
Winter 2011
Homework #4: Due February 8, 2011
Goals
Through this assignment you will:
- Explore the role of features in implementing linguistic constraints.
- Identify some of the challenges in building compact constraints to define a precise grammar.
- Gain some further familiarity with NLTK.
- Apply feature-based grammars to perform grammar checking.
Background
Please review the class slides and readings in the textbook on feature-based grammars and parsing. Also, review Chapter 9 of the NLTK book for additional detail
on feature structures and feature-based parsing in NLTK.
Building a Feature-based Grammar
Based on the materials above, create a set of context-free grammar
rules augmented with features that are adequate to analyze a small
set of English natural language
sentences.
Your grammar should be able to produce parses for all well-formed sentences in the file and reject all ill-formed sentences in the list.
Data
The basic sentences to analyze are found in
this file. The same sentences, marked for acceptability are found here.
Grammar Format
The grammar should be written in a format that can be read in by
nltk.data.load() and stored in a file named grammar.fcfg.
Sample grammars may be found in the NLTK Book Chapter 9 text.
Parsing
Create a program to parse the example sentences based on your grammar
and analyze the results. Specifically, your program should:
- Load your grammar.
- Use nltk.parse.FeatureEarleyChartParse (or your own
or similar available feature-based parser) to parse the sentences.
- Write the results to a file called results.
- For each example sentence, output to a file
- the parse of the sentence on a single line, if the sentence is grammatically well-formed, or
- a blank line in the sentence is ill-formed.
Note: You only need to print a single parse if the sentence is ambiguous.
Files
Please name your program hw4.cmd and your output file results
Please comment all code and remember to include your name in a comment at the
top of each file.
Testing
Your program must run on patas using:
$ condor-submit hw4.cmd
Please see the CLMA wiki pages on the basics of using the condor
cluster.
All files created by the condor run should appear in the top level of
the directory.
Handing in your work
All homework should be handed in using the class CollectIt.
Use the tar command to build a single hand-in file, named
hw#.tar where # is the number of the homework assignment and
containing all the material necessary to test your assignment. Your
hw1.cmd should be at the top level of whatever directory structure
you are using.
For example, in your top-level directory, run:
$ tar cvf hw4.tar *