University of Washington: Linguistics: Ling 571: Winter 2015: Homework #2

Ling 571 - Deep Processing Techniques for NLP
Winter 2015
Homework #2: Due January 21, 2015

Goals

Through this assignment you will:

Begin development of an automatic parser. Homework #3 will require the implementation of the CKY algorithm.
Develop and manipulate a representation for context-free grammars.
Improve your understanding of Chomsky Normal Form and weak grammatical equivalence through implementation.

Background

Please review the class slides and readings in the textbook on Chomsky Normal Form conversion.

Converting a Grammar to Chomsky Normal Form

As noted in the text, the CKY algorithm requires a grammar in Chomsky Normal Form (CNF). While it is not always intuitively clear how to write a grammar from scratch in CNF, it is fairly straightforward to convert a context-free grammar into a weakly equivalent grammar in CNF.

Following the approach outlined in class, implement a procedure to convert an input grammar of the form used for the first assignment to a new weakly equivalent grammar in CNF.

You will want to create data structures corresponding to RULE, RHS, LHS, etc. You may use whatever programming language you like, provided that it can be run on the CLMS cluster using condor. You may use existing implementations of these data structures in NLTK or other NLP toolkits (e.g. the Stanford parser), but you must implement the conversion algorithm yourself.

Converting a general context-free grammar to Chomsky Normal Form

The program you submit should do the following:

Read in an original context-free grammar.
Convert this grammar to Chomsky Normal Form.
Print out the rules of the converted grammar to a file.

Files

Please adhere to the naming conventions.

Programming

Create a program named hw2_tocnf.py to perform conversion to Chomsky Normal Form with the following parameters ordered as below:

- /corpora/nltk/nltk-data/grammars/large_grammars/atis.cfg: a file holding grammar rules in the NLTK .cfg format.
- cnf_grammar.cfg: the output grammar file from your system with all rules in Chomsky Normal Form.

NOTE: The ATIS grammar is fairly large (193K), so consider developing your algorithm on a subset of that grammar or another small grammar like the NLTK "toy.cfg" or your HW#1 grammar.

Verification

Using your system from HW#1,

use the original ATIS grammar to parse the sentences in /dropbox/14-15/571/hw2/test_sentences.txt. The results should be stored in original_parses.out.
use your new cnf_grammar.cfg to parse the sentences in /dropbox/14-15/571/hw2/test_sentences.txt. The results should be stored in cnf_parses.out.

Condor file

Please name your condor file hw2.cmd.

Write-up file

Please name your write-up readme.{txt|pdf} as appropriate. Describe and discuss your work in a write-up file. Include problems you came across and how (or if) you were able to solve them, any insights, special features, and what you learned. Give examples if possible. If you were not able to complete parts of the project, discuss what you tried and/or what did not work. Also, please review the parses generated by the original grammar and those from the converted CNF grammar. Provide a brief discussion of similarities and differences.

Testing

Your program must run on patas using:
$ condor_submit hw2.cmd

Please see the CLMS wiki pages on the basics of using the condor cluster. All files created by the condor run should appear in the top level of the directory.

Handing in your work

All homework should be handed in using the class CollectIt. Use the tar command to build a single hand-in file, named hw#.tar where # is the number of the homework assignment and containing all the material necessary to test your assignment. Your hw1.cmd should be at the top level of whatever directory structure you are using. For example, in your top-level directory, run:
$ tar cvf hw2.tar *

Ling 571 - Deep Processing Techniques for NLP Winter 2015 Homework #2: Due January 21, 2015