Ling 571 - Deep Processing Techniques for NLP
Winter 2017
Homework #3: Due January 24, 2017, 23:45
Goals
Through this assignment you will:
- Explore issues in parser design for natural language processing.
- Employ key programming concepts such as dynamic programming to create (relatively) efficient parsing algorithms.
- Improve your understanding of the CKY algorithm through implementation.
NOTE: You may work in teams of two (2) on this assignment. If
you do so:
-
Submit the hand-in file (hw3.tar) to one teammate's CollectIt.
- Please put a note in the other teammate's CollectIt indicating where the
joint assignment should be found.
- Please include a brief discussion of each teammate's contribution in the readme.{txt|pdf} file.
Background
Please review the class slides and readings in the textbook on the Cocke-Kasami-Younger algorithm.
Implementing a CKY Parser
Based on the material in the lectures and text, develop an implementation of
the CKY algorithm that will parse input sentences using a CNF grammar.
You may use existing implementations
of the data structures to represent the grammar in NLTK or other NLP toolkits (e.g. the Stanford
parser), but you must implement the parsing algorithm yourself.
Your algorithm must return all parses derived for the input sentences given
the grammar.
Note: You do not need to convert output trees back out
of CNF.
Parsing with your CKY parser
The program you submit should do the following:
- Load the CNF grammar.
- Read in the example sentences.
- For each example sentence, output to a file:
- the sentence itself
- the simple bracketed structure parse(s) based on your implementation of the CKY algorithm, and
- the number of parses for that sentence.
Programming
Create a program named hw3_parser.{py|pl|etc} to perform CKY parsing as described above invoked as:
hw3_parser.{py|pl|etc} <grammar_filename> <test_sentence_filename> <output_filename>
where:
- <grammar_filename> is the name of the file holding grammar rules in the NLTK .cfg format in Chomsky Normal Form.
- <test_sentence_filename> is the name of the file containing test sentences to parse with your algorithm.
- <output_filename> is the name of the file where your system will write the parses and their counts over the test sentences.
Files
Please adhere to the naming conventions.
Test and Example Files
All test and example files are located in /dropbox/16-17/571/hw3/ on the CL cluster (patas).
- grammar_cnf.cfg: Grammar in NLTK format, already in Chomsky Normal Form, to be used by your algorithm to parse the sentences.
- sentences.txt: Test sentences to be parsed using your
parsing program (hw3_parser.{py|pl|etc}).
- toy.cfg: Simple grammar in CNF for development.
- toy_sentences.txt: Simple set of practice sentences.
- toy_output.txt: Example output file.
Submission Files
- hw3_parser.{py|pl|etc}: Primary program file with
language-appropriate extension.
- hw3_output.txt: Results of running your parser on
the test sentences with the corresponding grammar grammar_cnf.cfg
- hw3.cmd: Condor file which drives your parsing program (hw3_parser.{py|pl|etc}) with the relevant grammar, test sentences, and output file.
- readme.{txt|pdf}: Write-up file
-
This file should describe and discuss your work on this assignment. Include problems you came across and how (or if) you were able to solve them, any insights, special features, and what you learned. Give examples if possible. If you were not able to complete parts of the project, discuss what you tried and/or what did not work.
- hw3.tar: Your hand-in file
Handing in your work
All homework should be handed in using the class CollectIt.