Ling 571 - Deep Processing Techniques for NLP
Winter 2017
Homework #4: Due January 31, 2017: 23:45


Goals

Through this assignment you will: NOTE: You may work in teams of two (2) on this assignment. If you do so:

Background

Please review the class slides and readings in the textbook on the probabilistic Cocke-Kasami-Younger algorithm, optimization, and evaluation. Additional slides on the homework itself may be found here.

1: Inducing a Probabilistic Context-free Grammar

Based on the material in the lectures and text, implement a procedure that takes a set of context-free grammar parses of sentences (a small treebank) and induces a probabilistic context-free grammar from them.

Your algorithm must create a grammar of the form:
A -> B C [0.38725]
All productions must have an associated probability.

Specifically, the program should:

Programming 1

Create a program named hw4_topcfg.{py|pl|etc} to perform PCFG induction invoked as:
hw4_topcfg.{py|pl|etc} <treebank_filename> <output_PCFG_file>, where:

2: Converting from CKY to Probabilistic CKY

Implement a probabilistic version of the CKY parsing algorithm. Given a probabilistic context-free grammar and an input string, the algorithm should return the highest probability parse tree for that input string.

You should follow the approach outlined in the textbook and course notes. You may adapt the CKY implementation that you created for HW#3. You may use any language that you like, provided that it can be run on the CL cluster.

Specifically, your program should:

Programming 2

Create a program named hw4_parser.{py|pl|etc} to perform PCKY parsing invoked as:
hw4_parser.{py|pl|etc} <input_PCFG_file> <test_sentence_filename> <output_parse_filename>, where:

3: Evaluating the PCKY parser

Use the evalb program to evaluate your parser. The executable may be found in ~/dropbox/16-17/571/hw4/tools/ along with the required parameter file. It should be run as:
$dir/evalb -p $dir/COLLINS.prm <gold_standard_parse_file> <hypothesis_parse_file>
where

4, 5: Improving the parser

You will also need to improve your baseline parser. You can improve the parser either by: You will either :

Re-run the evaluation script on your new parses to demonstrate your improvement.

Files

Training, Test, Evaluation, Example Data

You will use the following files, derived from the ATIS subset of the Penn Treebank as described in class. All files can be found on patas in /dropbox/16-17/571/hw4/data/, unless otherwise mentioned:

Submission Files

Handing in your work

All homework should be handed in using the class CollectIt.