Ling 571 - Deep Processing Techniques for NLP
Winter 2015
Homework #3: Due January 28, 2015
Goals
Through this assignment you will:
- Explore issues in parser design for natural language processing.
- Employ key programming concepts such as dynamic programming to create (relatively) effecient parsing algorithms.
- Improve your understanding of the CKY algorithm through implementation.
Background
Please review the class slides and readings in the textbook on the Cocke-Kasami-Younger algorithm.
Implementing a CKY Parser
Based on the material in the lectures and text, develop an implementation of
the CKY algorithm that will parse input sentences using a CNF grammar.
You may use existing implementations
of the data structures to represent the grammar in NLTK or other NLP toolkits (e.g. the Stanford
parser), but you must implement the parsing algorithm yourself.
Your algorithm must return all parses derived for the input sentences given
the grammar.
Note: You do not need to convert output trees back out
of CNF.
Parsing with your CKY parser
The program you submit should do the following:
- Load the CNF grammar.
- Read in the example sentences.
- For each example sentence, output to a file:
- the simple bracketed structure parse(s), and
- the number of parses for that sentence.
Files
Please adhere to the naming conventions.
Programming
Create a program named hw3_parser.{py|pl|etc} to perform CKY parsing with the following parameters ordered as below:
- - /dropbox/14-15/571/hw3/grammar_cnf.cfg: a file holding grammar rules in the NLTK .cfg format in Chomsky Normal Form.
- - /dropbox/14-15/571/hw3/sentences.txt: a file containing test sentences.
- - hw3_parser.out: the output parse file from your system showing the parses and their counts over the test sentences.
Condor file
Please name your condor file hw3.cmd.
Write-up file
Please name your write-up readme.{txt|pdf} as appropriate.
Describe and discuss your work in a write-up file. Include problems you came across and how (or if) you were able to solve them, any insights, special features, and what you learned. Give examples if possible. If you were not able to complete parts of the project, discuss what you tried and/or what did not work.
Compare your results using your parser to those from the EarleyChartParser as
in HW#1.
Testing
Your program must run on patas using:
$ condor_submit hw3.cmd
Please see the CLMS wiki pages on the basics of using the condor
cluster.
All files created by the condor run should appear in the top level of
the directory.
Handing in your work
All homework should be handed in using the class CollectIt.
Use the tar command to build a single hand-in file, named
hw#.tar where # is the number of the homework assignment and
containing all the material necessary to test your assignment. Your
hw1.cmd should be at the top level of whatever directory structure
you are using.
For example, in your top-level directory, run:
$ tar cvf hw3.tar *