Ling 571 - Deep Processing Techniques for NLP
Winter 2011
Homework #2: Due January 25, 2011
Goals
Through this assignment you will:
- Explore issues in parser design for natural language processing.
- Employ key programming concepts such as dynamic programming to create (relatively) effecient parsing algorithms.
- Improve your understanding of both Chomsky Normal Form and the CKY algorithm through implementation.
Background
Please review the class slides and readings in the textbook on Chomsky Normal Form conversion and the Cocke-Kasami-Younger algorithm.
Converting a Grammar to Chomsky Normal Form
As noted in the text, the CKY algorithm requires a grammar in Chomsky Normal Form (CNF). While it is not always intuitively clear how to write a grammar from
scratch in CNF, it is fairly straightforward to convert a context-free grammar
into a weakly equivalent grammar in CNF.
Following the approach outlined in class, implement a procedure to
convert an input grammar of the form used for the first assignment to
a new weakly equivalent grammar in CNF.
You will want to create data structures corresponding to RULE, RHS, LHS, etc.
You may use whatever programming language you like, provided that it can
be run on the CLMA cluster using condor. You may use existing implementations
of these data structures in NLTK or other NLP toolkits (e.g. the Stanford
parser), but you must implement the conversion algorithm yourself.
Implementing a CKY Parser
Based on the material in the lectures and text, develop an implementation of
the CKY algorithm that will parse input sentences using the CNF grammar
produced by the conversion procedure that you developed in the first part of
the assignment.
Your algorithm must return all parses derived for the input sentences given
the grammar.
Note: You do not need to convert output trees back out
of CNF.
Parsing with your CKY parser
The program you submit should do the following:
- Read in the original grammar.
- Convert the original grammar to CNF.
- Load the CNF grammar.
- Print out the rules of the converted grammar to a file.
- Read in the example sentences.
- For each example sentence, output to a file
- the simple bracketed structure parse(s), and
- the number of parses for that sentence.
Data
You will use the same sentences.txt file of
test inputs as used in Homework #1.
You will use an example grammar as your
initial grammar.
Files
Please name your program hw2.cmd and your output file hw2.out
Please comment all code and remember to include your name in a comment at the
top of each file.
Testing
Your program must run on patas using:
$ condor_submit hw2.cmd
Please see the CLMA wiki pages on the basics of using the condor
cluster.
All files created by the condor run should appear in the top level of
the directory.
Handing in your work
All homework should be handed in using the class CollectIt.
Use the tar command to build a single hand-in file, named
hw#.tar where # is the number of the homework assignment and
containing all the material necessary to test your assignment. Your
hw1.cmd should be at the top level of whatever directory structure
you are using.
For example, in your top-level directory, run:
$ tar cvf hw2.tar *