Previous topic

Homework 1

Next topic

Readings

This Page

Homework 2

Due: Jan 19th, 2010 at 11:59PM

1. Objectives and Overview

The goal is to implement the CKY algorithm for a non-probabilistic context free grammar and to test the parser on a set of input sentences. A subgoal will be to transform an arbitrary CFG into Chomsky Normal Form. To prepare for this assignment, read chapter 13 of Jurafsky and Martin (2nd Edition) and be sure you understand the CKY algorithm in Section 13.4. Note in particular why a CNF grammar is needed for the CKY algorithm.

2. Inputs

The input files for this assignment are:

  • sentences: a text file with 20 English sentences (same as from hw1)
  • grammar.cfg: a text file with a non-CNF context-free grammar used to parse the 20 sentences

3. Detailed instructions

Task 0:

Review the lecture and relevant reading material on CKY parsing.

Task 1:

Using any programming language you like–you don’t have to use Python–create a function or class that transforms grammar.cfg to Chomsky Normal Form. Its only required argument should be a grammar object. This function should return a new instance of a grammar object, but in Chomsky Normal Form. Please print the new CNF grammar to a file called grammar.cnf. Please use the naming scheme discussed in the course slides.

Hint: you will need classes such as Production, LHS, RHS, etc. You may use the NLTK’s implementation as a guide and/or use those classes directly if you’re using Python. If you find code (e.g., in Java) that does the same thing, feel free to use it. The only requirement is that the core of the 2CNF conversion code be your own.

Task 2:

Create a CKYParser class that will take a context-free grammar in CNF and return all possible parses for a give sentence file. Again, you may use any utility code you find, but please use your own code for the core of the CKY algorithm, as given in the course slides or textbook. NOTE: you do not need to unCNF the final parse trees.

Task 3: Create a script called hw2.cmd. The script should call code that performs the following tasks:

  • process grammar.cfg and sentences to output grammar.cnf
  • Load the grammar.cnf
  • Read sentences line by line
  • Parse each sentence using the CKY algorithm
  • Print the simple bracketed structure(s) for each parsed sentence, followed by the number of parses for that sentence, to a file called hw2.out.

(See sample.out for the appropriate output format.)

Task 4: Please comment your code; include your names somewhere in the main script file.

4. Running your code

Your code should run on Patas without error. And in order for us to run your assignment in a semi-automated fashion, please include a single shell script file called, e.g., hw2.cmd. We will run your homework on Patas using the following command:

$ condor_submit hw2.cmd

Once we untar your assignment (see below), this shell script should be in the top level of whatever directory structure you’re using.

Within your hw2.cmd file write your .out, .log, .error, etc, files to the top-level directory where the hw2.cmd file is. The script should call all necessary code. This way, you can use whatever language you like and whatever directory structure makes sense to you. Please refer to the detailed explanation of each assignment for what kinds of output files to produce, and what kinds of supplementary files are required. See the CLMA wiki pages for help on this.

5. How to turn in your work

Turn in your assignment using CollectIt. Please TAR your files and name the tar’d file with the extension .tar. Please don’t use ZIP, tar.gz, gzip, rar, etc.

Use the filename of whatever homework we’re on, e.g. for homework 6 name your file hw6.tar. Yes you will all have the same filename for your homeworks, but this doesn’t matter because of the way that CollectIt handles things.

To tar (available on Patas) from the directory that your work is in:

$ tar -cvf hw6.tar *

6. Assessment

This homework is worth 15% of your total grade. Assessment criteria are explained here.