University of Washington: Linguistics: Ling 571: Winter 2015: Homework #1

Ling 571 - Deep Processing Techniques for NLP
Winter 2015
Homework #1: Due January 14, 2015: 11:59pm

Goals

Through this assignment you will:

Explore the basics of context-free grammar design.
Identify some of the challenges in building grammars for natural languages.
Begin to gain some familiarity with the Natural Language Toolkit (NLTK).
Gain some experience with the cluster and condor.

Background

Please review the class slides and readings in the textbook on context-free grammars. Also, see Section 8.3 of the NLTK Book for examples of how to write grammars and configure the included parsers. We'll get to the later parts of that chapter soon.

Building a Grammar

Based on the text and class notes, create a set of context-free grammar rules that are adequate to analyze a small set of English natural language sentences.

Your grammar should be able to produce parses for all sentences in the files (as well as other similar ones in the English language). The grammar should capture the major clause type (S, FRAG, etc.), the major phrase types (NP, VP, PP, etc.), the parts of speech (POS) (NN, VBZ), and any punctuation or special symbols. The phrase and POS types specified in the Jurafsky and Martin text (CH. 12 and inside front cover) provide a good basis for your grammar.

You may hard-code capitalization.

Parsing

Create a program to parse the example sentences based on your grammar and analyze the results. Specifically, your program should:

Load your grammar.
Build a parser for your grammar using nltk.parse.EarleyChartParser.
Read in the example sentences.
For each example sentence, output to a file

the simple bracketed structure parse(s), and
the number of parses for that sentence.

Finally, print the average number of parses per sentence obtained by your grammar.

Files

Please adhere to the naming conventions.

Programming

Create a program named hw1_parse.py to perform the parsing as described above with following parameters ordered as below:

- hw1_grammar.cfg: the file holding your grammar rules in the NLTK .cfg format.
- sentences.txt: the file holding the input sentences to parse, one per line
- hw1.out: the output file for your system

Condor file

Please name your condor file hw1.cmd.

Write-up file

Please name your write-up readme.{txt|pdf} as appropriate. Describe and discuss your work in a write-up file. Include problems you came across and how (or if) you were able to solve them, any insights, special features, and what you learned. Give examples if possible. If you were not able to complete parts of the project, discuss what you tried and/or what did not work.

Testing

Your program must run on patas using:
$ condor_submit hw1.cmd

Please see the CLMS wiki pages on the basics of using the condor cluster. All files created by the condor run should appear in the top level of the directory.

Handing in your work

All homework should be handed in using the class CollectIt. Use the tar command to build a single hand-in file, named hw#.tar where # is the number of the homework assignment and containing all the material necessary to test your assignment. Your hw1.cmd should be at the top level of whatever directory structure you are using. For example, in your top-level directory, run:
$ tar cvf hw1.tar *

Ling 571 - Deep Processing Techniques for NLP Winter 2015 Homework #1: Due January 14, 2015: 11:59pm