Ling 571 - Deep Processing Techniques for NLP
Winter 2017
Homework #8: Due February 28, 2017: 23:45


Goals

Through this assignment you will:

Background

Please review the class slides and readings in the textbook on lexical semantics, including WordNet, and word sense disambiguation. Also please read the article Section 5.1, describing Resnik's word sense disambiguation in groupings approach in detail.

Please also see the HW8 notes slide deck for a detailed discussion of useful implementation hints.
Note: You will be implementing a somewhat simplified version of Resnik's approach as detailed in the notes.

For additional information on NLTK's WordNet API and information content measures, see:

Implementing Word Sense Disambinguation and Similarity using Resnik's Similarity Measure

Based on the examples in the text, class slides, and other resources, implement a program to perform Word Sense Disambiguation based on noun groups, using Resnik's method and WordNet-based similarity measure. Then compute and compare similarity scores for a set of human judgments. Specifically, your program should:

NOTE: You do not need to select senses for all words, only for the probe word; this is a simplification of the word group disambiguation model in the paper.

NOTE: You may treat all the words in context groups as nouns. You are not responsible for cross-POS similarity.

Programming

Create a program hw8_resnik_wsd.{py|pl|etc} that implements the disambiguation specified as above invoked as:
hw8_resnik_wsd.{py|pl|etc} <information_content_file_type> <wsd_test_filename> <judgment_file> <output_filename>

Implementation Resources

Resnik's similarity measure relies on two components:

Extra credit notes:

If you wish, for extra credit, you may implement a procedure to calculate the information content measure yourself using one of the POS tagged corpus excerpts provided with NLTK (such as the Brown corp us or the Penn treebank) or elsewhere on the patas cluster. It should produce an output file of a format similar to that in the NLTK Wordnet IC files.

Files

All files are found in /dropbox/16-17/571/hw8/ on patas:

Test, Gold Standard, and Example

Submission Files

Handing in your work

All homework should be handed in using the class CollectIt.