Ling 571 - Deep Processing Techniques for NLP
Winter 2015
Homework #9: Due 11:59 March 18, 2015
Goals
Through this assignment you will:
- Explore issues in pronominal anaphora resolution.
- Gain familiarity with syntax-based resolution techniques.
- Analyze the effectiveness of the Hobbs algorithm by applying it
to pairs of parsed sentences.
- Optionally: Implement the Hobbs algorithm for anaphora resolution on a set of
sentences.
Background
Please review the class slides (esp. Class 16, #54) and readings in the textbook on pronominal
anaphora resolution and especially the Hobbs algorithm.
Analyzing Coreference Resolution with the Hobbs Algorithm
The Hobbs algorithm takes as input a pronoun and a sequence of
sentence parse trees in the context, and returns the proposed
antecedent. The data file contains a list of pairs of sentences
separated by blank lines. In each pair of the sentences, the
second sentence has one or more pronouns to be resolved. Parse the
sentences, almost all of which are drawn from the first homework assignment,
using the same techniques as in HW#1 (or HW#5 if you want to handle number
agreement).
For each pronoun, in each sentence pair, trace the Hobbs algorithm to
identify its antecedent.
Specifically, you should:
- i) Print out the pronoun and the corresponding parses.
- A) identify each parse tree node corresponding to 'X' in the
algorithm, writing out the corresponding NP or S (or SBAR) constituent.
- B) identify each node proposed as an antecedent
- C) reject any proposed node ruled out by agreement
- D) identify the accepted antecent.
- E) indicate whether the accepted antecedent is correct
- F1) If the accepted antecedent is correct, do nothing more
- F2) If the accepted antecedent is NOT correct, explain why and identify which of the syntactic and semantic preferences listed in the text (Slides: class 16: 41, 47) would be required to correct this error.
"Implementation"
You should implement step i) using NLTK and a suitable parser.
You may do steps A-D either:
- by manually stepping through the algorithm, or
- (for additional credit) by implementing this simplified
portion of the algorithm. If you take this coding route, you may use
a feature grammar or a simple look-up table to filter for agreement.
You may use any supporting software, such
as NLTK's components for manipulating parse trees, that you wish, provided
it does not implement the full Hobbs algorithm for you.
Files
Test Data, Example, and Resource Files
The files for this assignment may be found on patas in
/dropbox/14-15/571/hw9/:
- coref_sentences.txt: Contains the contexts to analyze. You should resolve the pronoun(s) in the second sentence in each pair
based on the context provided by the pair of sentences.
- simple_example.txt: Contains an application to
a simplified parse of a textbook example. This is intended to provide an
example of the process and output format.
- grammar.cfg: Contains a simple grammar that
covers the sentences and is fairly compatible with the Hobbs algorithm in the
text (minor changes may be made). You may also use your own grammar from
HW #1 (with adaptations to the algorithm as needed).
Parsing (and optional anaphora resolution)
Create a file hw9.py with the following parameters:
- grammar_file: This parameter should specify the grammar to use for
analysis of your sentences, e.g. grammar.cfg
- sentence_pair_file: This parameter should specify the file containing
the sentence pairs to analyze and perform pronoun resolution on, e.g. coref_sentences.txt
- hw9_results.out: This file should contain the results
of all automatic processing that you do, either:
- Parsing and pronoun identification only, or
- Parsing through candidate antecedent identification
Condor file
Please name your condor file hw9.cmd.
Your program must run on patas using:
$ condor-submit hw9.cmd
Please see the CLMS wiki pages on the basics of using the condor
cluster.
All files created by the condor run should appear in the top level of
the directory.
Output Files
- hw9_results.out: This file, as above, should contain the
results of the automatic processing stages.
- hw9_results.final: This file should contain the augmented
analysis based on the contents to hw9_results.out.
- For the manual case, this is steps A-F(1,2)
- For the coding case, this is steps E-F(1,2)
Write-up
Describe and discuss your work in a write-up file. Include problems you came across and how (or if) you
were able to solve them, any insights, special features, and what you learned. Give examples if possible.
If you were not able to complete parts of the project, discuss what you tried and/or what did not work.
This will allow you to receive maximum credit for partial work.
Please name the file readme.{txt|pdf} with a suitable extension.
Handing in your work
All homework should be handed in using the class CollectIt.
Use the tar command to build a single hand-in file, named
hw#.tar where # is the number of the homework assignment and
containing all the material necessary to test your assignment.