Ling 571 - Deep Processing Techniques for NLP
Winter 2015
Homework #7: Due 11:59 March 4, 2015


Goals

Through this assignment you will:

Background

Please review the class slides and readings in the textbook on distributional semantics and models. You may implement the assignment in whatever language you choose, provided that it runs on the CLMS cluster. In some cases below, Python functions are referenced, but you can use alternate implementations in other languages if you so choose.

Creating Local Context Bag-of-Words Representations

Create a program named hw7_bow.{py|java|*} to compute distributional similarity models using a local context term cooccurrence model. Your program should:

Creating Local Relation-based Models

Create a program named hw7_relation.{py|java|*} to compute distributional similarity models using a local dependency relation-based model, similar to Lin's. The basic structure should be similar to that in the local cooccurrence model above, except:

Files

Test Data Files

All files related to this assignment may be found on patas in /dropbox/14-15/571/hw7/, as below:

Distributional Semantic Analysis

hw7_bow.* that creates and evaluates your local context cooccurrence model with respect to human judgments should take parameters as specified below: hw7_relation.* that creates and evaluates your local context cooccurrence model with respect to human judgments should take parameters as specified below:

Testing

You should run your programs and store the results for the following configurations (not identical to invocations):

Write-up

Describe and discuss your work in a write-up file. Include problems you came across and how (or if) you were able to solve them, any insights, special features, and what you learned. Give examples if possible. If you were not able to complete parts of the project, discuss what you tried and/or what did not work. This will allow you to receive maximum credit for partial work.

NOTE: You should discuss your results in terms of the different impacts of context type, context window, and weighting scheme. You may reference both qualitative observations and your quantitative scores.

Please name the file readme.{txt|pdf} with a suitable extension.

Testing

Your program must run on patas using:
$ condor-submit hw7.cmd

Note: Your condor script should run (at least) one configuration of the hw7_bow.* and hw7_relation.* models. You can simply run the other configurations yourself and store the results files.

Please see the CLMS wiki pages on the basics of using the condor cluster. All files created by the condor run should appear in the top level of the directory.

Handing in your work

All homework should be handed in using the class CollectIt. Use the tar command to build a single hand-in file, named hw#.tar where # is the number of the homework assignment and containing all the material necessary to test your assignment. Your hw7.cmd should be at the top level of whatever directory structure you are using. For example, in your top-level directory, run:
$ tar cvf hw7.tar *