Homework 2 ========== Due: ???, 2010 at 11:59PM **1. Objectives and Overview** The goal is to implement the CKY algorithm for a non-probabilistic context free grammar and to test the parser on a set of input sentences. A subgoal will be to transform an arbitrary CFG into Chomsky Normal Form. To prepare for this assignment, read chapter 13 of Jurafsky and Martin (2nd Edition) and be sure you understand the CKY algorithm in Section 13.4. Note in particular why a CNF grammar is needed for the CKY algorithm. **2. Inputs** The input files for this assignment are: - `sentences `_: a text file with 20 English sentences (same as from hw1) - `grammar.cfg `_: a text file with a non-CNF context-free grammar used to parse the 20 sentences **3. Detailed instructions** **Task 0:** Review the lecture and relevant reading material on CKY parsing. **Task 1:** Create a function or class that transforms ``grammar.cfg`` to Chomsky Normal Form. Its only required argument should be a grammar object. This function should return a new instance of a grammar object, but in Chomsky Normal Form. Please print the new CNF grammar to a file called ``grammar.cnf``. Hint: you will need classes such as Production, LHS, RHS, etc. **Task 2:** Create a ``CKYParser`` class that will take a context-free grammar in CNF and return all possible parses for a give sentence file. Technical requirements: Please include at least these methods for the CKYParser class: 1. a constructor which takes a grammar and initializes an instance of CKYParser 2. ``nbest_parse`` which takes a list of word tokens as input and returns a list of parse trees for that input. You may create other supplementary classes/functions. **Task 3:** Please comment your code; include your names somewhere in the main script file. **4. Running your code** Your code should run on Patas without error. And in order for us to run your assignment in a semi-automated fashion, please include a single shell script file called, e.g., ``hw2.sh``. We will run your homework on Patas using the following command: ``$ sh hw2.sh`` This shell script should be in the top level of whatever directory structure you're using. The script should call all necessary code. This way, you can use whatever language you like and whatever directory structure makes sense to you. Please refer to the detailed explanation of each assignment for what kinds of output files to produce, and what kinds of supplementary files are required. **5. How to turn in your work** Turn in your assignment using `CollectIt `_. Please TAR your files and name the tar'd file with the extension ``.tar``. Please don't use ZIP, tar.gz, gzip, rar, etc. Use the filename of whatever homework we're on, e.g. for homework 6 name your file ``hw6.tar``. Yes you will all have the same filename for your homeworks, but this doesn't matter because of the way that CollectIt handles things. To tar (available on Patas) from the directory that your work is in: ``$ tar -cvf hw6.tar *`` **6. Assessment** This homework is worth 15% of your total grade. Assessment criteria are explained `here `_.