University of Washington: Linguistics: Ling 571: Winter 2017: Homework #3

Ling 571 - Deep Processing Techniques for NLP
Winter 2017
Homework #3: Due January 24, 2017, 23:45

Goals

Through this assignment you will:

Explore issues in parser design for natural language processing.
Employ key programming concepts such as dynamic programming to create (relatively) efficient parsing algorithms.
Improve your understanding of the CKY algorithm through implementation.

NOTE: You may work in teams of two (2) on this assignment. If you do so:

Submit the hand-in file (hw3.tar) to one teammate's CollectIt.
Please put a note in the other teammate's CollectIt indicating where the joint assignment should be found.
Please include a brief discussion of each teammate's contribution in the readme.{txt|pdf} file.

Background

Please review the class slides and readings in the textbook on the Cocke-Kasami-Younger algorithm.

Implementing a CKY Parser

Based on the material in the lectures and text, develop an implementation of the CKY algorithm that will parse input sentences using a CNF grammar. You may use existing implementations of the data structures to represent the grammar in NLTK or other NLP toolkits (e.g. the Stanford parser), but you must implement the parsing algorithm yourself.

Your algorithm must return all parses derived for the input sentences given the grammar.
Note: You do not need to convert output trees back out of CNF.

Parsing with your CKY parser

The program you submit should do the following:

Load the CNF grammar.
Read in the example sentences.
For each example sentence, output to a file:

the sentence itself
the simple bracketed structure parse(s) based on your implementation of the CKY algorithm, and
the number of parses for that sentence.

Programming

Create a program named hw3_parser.{py|pl|etc} to perform CKY parsing as described above invoked as:
hw3_parser.{py|pl|etc} <grammar_filename> <test_sentence_filename> <output_filename>
where:

<grammar_filename> is the name of the file holding grammar rules in the NLTK .cfg format in Chomsky Normal Form.
<test_sentence_filename> is the name of the file containing test sentences to parse with your algorithm.
<output_filename> is the name of the file where your system will write the parses and their counts over the test sentences.

Files

Please adhere to the naming conventions.

Test and Example Files

All test and example files are located in /dropbox/16-17/571/hw3/ on the CL cluster (patas).

grammar_cnf.cfg: Grammar in NLTK format, already in Chomsky Normal Form, to be used by your algorithm to parse the sentences.
sentences.txt: Test sentences to be parsed using your parsing program (hw3_parser.{py|pl|etc}).
toy.cfg: Simple grammar in CNF for development.
toy_sentences.txt: Simple set of practice sentences.
toy_output.txt: Example output file.

Submission Files

hw3_parser.{py|pl|etc}: Primary program file with language-appropriate extension.
hw3_output.txt: Results of running your parser on the test sentences with the corresponding grammar grammar_cnf.cfg
hw3.cmd: Condor file which drives your parsing program (hw3_parser.{py|pl|etc}) with the relevant grammar, test sentences, and output file.
- Your CKY parsing program must run on patas using:
  $ condor_submit hw3.cmd
  Please see the CLMS wiki pages on the basics of using the condor cluster.
  All files created by the condor run should appear in the top level of the directory.
readme.{txt|pdf}: Write-up file
- This file should describe and discuss your work on this assignment. Include problems you came across and how (or if) you were able to solve them, any insights, special features, and what you learned. Give examples if possible. If you were not able to complete parts of the project, discuss what you tried and/or what did not work.
hw3.tar: Your hand-in file
- Use the tar command to build a single hand-in file, named hw#.tar where # is the number of the homework assignment and containing all the material necessary to test your assignment. Your hw3.cmd should be at the top level of whatever directory structure you are using.
  For example, in your top-level directory, run:
  $ tar cvf hw3.tar *

Handing in your work

All homework should be handed in using the class CollectIt.

Ling 571 - Deep Processing Techniques for NLP Winter 2017 Homework #3: Due January 24, 2017, 23:45