Ling 571 - Deep Processing Techniques for NLP
Winter 2017
Homework #2: Due January 17, 2017, 23:45
Goals
Through this assignment you will:
- Begin development of an automatic parser. Homework #3 will require
the implementation of the CKY algorithm.
- Develop and manipulate a representation for context-free grammars.
- Improve your understanding of Chomsky Normal Form and weak grammatical
equivalence through implementation.
Background
Please review the class slides and readings in the textbook on Chomsky Normal Form conversion.
Converting a Grammar to Chomsky Normal Form
As noted in the text, the CKY algorithm requires a grammar in Chomsky Normal Form (CNF). While it is not always intuitively clear how to write a grammar from
scratch in CNF, it is fairly straightforward to convert a context-free grammar
into a weakly equivalent grammar in CNF.
Following the approach outlined in class, implement a procedure to
convert an input grammar of the form used for the first assignment to
a new weakly equivalent grammar in CNF.
You will want to create data structures corresponding to RULE, RHS, LHS, etc.
You may use whatever programming language you like, provided that it can
be run on the CLMS cluster using condor. You may use existing implementations
of these data structures in NLTK or other NLP toolkits (e.g. the Stanford
parser), but you must implement the conversion algorithm yourself.
Converting a general context-free grammar to Chomsky Normal Form
The program you submit should do the following:
- Read in an original context-free grammar.
- Convert this grammar to Chomsky Normal Form.
- Print out the rules of the converted grammar to a file.
Programming
Create a program named hw2_tocnf.{py|pl|java|....etc} to
perform the conversion described above invoked as:
hw2_tocnf.<ext> <input_grammar_file> <output_grammar_file>
where
- <input_grammar_file> is the name of the file holding the grammar to convert to Chomsky Normal Form. The file will be in NLTK context-free grammar format.
- <output_grammar_file> is the name of the output file for the
CNF conversion. The file should be in the NLTK grammar format, but now
in Chomsky Normal Form.
Verification & Parse Comparison
Using your system from HW#1, you will parse a set of sentences with
- a general NLTK context-free grammar, and
- a weakly equivalent Chomsky Normal Form context-free grammar, created by
your CNF conversion program.
Note: You will need to run your code yourself in both these conditions
with the files specified below. You will include the output parse files
in your submission tar file. You do *not* need to run this code as
part of your condor file.
Files
Please adhere to the naming conventions.
Test, Validation, and Example Files
- atis.cfg: CFG grammar to test your CNF conversion program. The file is located on patas in /corpora/nltk/nltk-data/grammars/large_grammars/.
-
NOTE: The ATIS grammar is fairly large (193K), so consider
developing your algorithm on a subset of that grammar or another small grammar
like the NLTK "toy.cfg" or the HW#1 grammar.
- sentences.txt: Set of sentences for validation of your CNF conversion. These sentences will be parsed
by the ATIS grammar using your HW#1 system, both with the original atis.cfg and with your CNF-converted version of the ATIS grammar.
- toy.cfg: Toy example NLTK-format grammar file. Other examples can be found on patas under /corpora/nltk/nltk-data/grammars/
Submission Files
- hw2_tocnf.py: Primary program file
- hw2_grammar_cnf.cfg: Results of running your CNF conversion program on the atis.cfg grammar file.
- hw2_orig_output.txt: Results of parsing the test sentences using the original atis.cfg grammar file. This is validation file #1.
- hw2_cnf_output.txt: Results of parsing the test sentences using your CNF-converted grammar file (hw2_grammar_cnf.cfg above). This is validation file #2.
- hw2.cmd: Condor file which drives your parser.
- readme.{txt|pdf}: Write-up file
-
This file should describe and discuss your work on this assignment. Include problems you came across and how (or if) you were able to solve them, any insights, special features, and what you learned. Give examples if possible. If you were not able to complete parts of the project, discuss what you tried and/or what did not work.
NOTE:Also, please review the parses generated by the original grammar and
those from the converted CNF grammar. Provide a brief discussion of
similarities and differences.
- hw2.tar: Your hand-in file
Handing in your work
All homework should be handed in using the class CollectIt.