Ling 571 - Deep Processing Techniques for NLP
Winter 2016
Homework #7: Due March 1, 2016, 23:45


Goals

Through this assignment you will:

Background

Please review the class slides and readings in the textbook on distributional semantics and models. You may implement the assignment in whatever language you choose, provided that it runs on the CLMS cluster. In some cases below, Python functions are referenced, but you can use alternate implementations in other languages if you so choose.

Creating and Evaluating Models of Distributional Semantic Similarity

Implement a program to create and evaluate a distributional model of word similarity based on local context term cooccurrence. Your program should:

Programming

Create a program hw7_dist_similarity.{py|pl|etc} that implements the creation and evaluation of the distributional similarity model as described above and invoked as:
hw7_dist_similarity.{py|pl|etc} <window> <weighting> <judgment_filename> <output_filename>, where:
In this assignment, you should use the Brown corpus provided with NLTK in /corpora/nltk/nltk-data/corpora/brown/ as the source of cooccurrence information. The file is white-space tokenized, but all tokens are of the form "word/POS". If you choose to use NLTK, you may use the Brown corpus reader as in:
brown_words = list(nltk.corpus.brown.words())

Files

Test and Example Data Files

Aside from the Brown corpus, all files related to this assignment may be found on patas in /dropbox/15-16/571/hw7/, as below:

Submission Files

Handing in your work

All homework should be handed in using the class CollectIt.