Ling 571 - Deep Processing Techniques for NLP
Winter 2015
Homework #5: Due February 11, 2015
Goals
Through this assignment you will:
- Explore the role of features in implementing linguistic constraints.
- Identify some of the challenges in building compact constraints to define a precise grammar.
- Gain some further familiarity with NLTK.
- Apply feature-based grammars to perform grammar checking.
Background
Please review the class slides and readings in the textbook on feature-based grammars and parsing. Also, review Chapter 9 of the NLTK book for additional detail
on feature structures and feature-based parsing in NLTK.
Building a Feature-based Grammar
Based on the materials above, create a set of context-free grammar
rules augmented with features that are adequate to analyze a small
set of English natural language
sentences.
Your grammar should be able to produce parses for all well-formed sentences in the file and reject all ill-formed sentences in the list.
Parsing
Create a program to parse the example sentences based on your grammar
and analyze the results. Specifically, your program should:
- Load your grammar.
- Use nltk.parse.FeatureEarleyChartParser (or your own
or similar available feature-based parser) to parse the sentences.
- Write the results to a file.
- For each example sentence, output to the file
- the parse of the sentence on a single line, if the sentence is grammatically well-formed, or
- a blank line if the sentence is ill-formed.
Note 1: If the sentence is ambiguous, you only need to print a single parse.
Files
Please adhere to the naming conventions below:
Example and Test Data Files
All data and example files may be found in /dropbox/14-15/571/hw5/.
- feature_sentences.txt: Basic set of sentences to analyze.
- feature_sentences_key.txt: Same set of sentences, but marked for acceptability.
- example_grammar.fcfg: Toy grammar file in NLTK format with features.
- example_results: Example formatted output file.
- example_sents: Sentence file corresponding to the output file above.
Your grammar file
hw5_features.fcfg: This file should contain the
grammar rules with feature augmentations required to parse the acceptable
sentences in the test set and reject the ungrammatical ones.
Grammar Format
The grammar should be written in a format that can be read in by
nltk.data.load() . The .fcfg extension
should allow you to specify to NLTK how to read in your grammar. Sample grammars may be found in the NLTK Book Chapter 9 text, and in the mini example file referenced above.
Feature-based Parsing
Create a program hw5_parser.py that reads in your
grammar, parses test sentences, and produces results as outlined above. It
should take the parameters as specified below:
- grammar.fcfg: The grammar file you created above
covering the example sentences in .fcfg format.
- Test sentence file: The sentences to parse, one sentence per line.
- hw5_results.out: The output file with the results of
parsing each of the input sentences.
Write-up
Include a file readme.{txt|pdf} to describe and discuss your work. Include problems you came across and how (or if) you
were able to solve them, any insights, special features, and what you learned. Give examples if possible.
If you were not able to complete parts of the project, discuss what you tried and/or what did not work.
This will allow you to receive maximum credit for partial work.
Testing
Your program must run on patas using:
$ condor-submit hw5.cmd
Please see the CLMS wiki pages on the basics of using the condor
cluster.
All files created by the condor run should appear in the top level of
the directory.
Handing in your work
All homework should be handed in using the class CollectIt.
Use the tar command to build a single hand-in file, named
hw#.tar where # is the number of the homework assignment and
containing all the material necessary to test your assignment. Your
hw5.cmd should be at the top level of whatever directory structure
you are using.
For example, in your top-level directory, run:
$ tar cvf hw5.tar *