Ling 571 - Deep Processing Techniques for NLP
Winter 2011
Homework #2: Due January 25, 2011


Goals

Through this assignment you will:

Background

Please review the class slides and readings in the textbook on Chomsky Normal Form conversion and the Cocke-Kasami-Younger algorithm.

Converting a Grammar to Chomsky Normal Form

As noted in the text, the CKY algorithm requires a grammar in Chomsky Normal Form (CNF). While it is not always intuitively clear how to write a grammar from scratch in CNF, it is fairly straightforward to convert a context-free grammar into a weakly equivalent grammar in CNF.

Following the approach outlined in class, implement a procedure to convert an input grammar of the form used for the first assignment to a new weakly equivalent grammar in CNF.

You will want to create data structures corresponding to RULE, RHS, LHS, etc. You may use whatever programming language you like, provided that it can be run on the CLMA cluster using condor. You may use existing implementations of these data structures in NLTK or other NLP toolkits (e.g. the Stanford parser), but you must implement the conversion algorithm yourself.

Implementing a CKY Parser

Based on the material in the lectures and text, develop an implementation of the CKY algorithm that will parse input sentences using the CNF grammar produced by the conversion procedure that you developed in the first part of the assignment.

Your algorithm must return all parses derived for the input sentences given the grammar.
Note: You do not need to convert output trees back out of CNF.

Parsing with your CKY parser

The program you submit should do the following:

Data

You will use the same sentences.txt file of test inputs as used in Homework #1.

You will use an example grammar as your initial grammar.

Files

Please name your program hw2.cmd and your output file hw2.out
Please comment all code and remember to include your name in a comment at the top of each file.

Testing

Your program must run on patas using:
$ condor_submit hw2.cmd

Please see the CLMA wiki pages on the basics of using the condor cluster. All files created by the condor run should appear in the top level of the directory.

Handing in your work

All homework should be handed in using the class CollectIt. Use the tar command to build a single hand-in file, named hw#.tar where # is the number of the homework assignment and containing all the material necessary to test your assignment. Your hw1.cmd should be at the top level of whatever directory structure you are using. For example, in your top-level directory, run:
$ tar cvf hw2.tar *