Previous topic

Readings

Next topic

Adobe Connect

This Page

Computing ResourcesΒΆ

Condor

An account on the Condor Cluster (patas.ling.washington.edu) is also required. Feel free to develop using whatever OS you prefer. But please keep in mind that all homework assignments must ultimately run on the cluster. Please be sure to familiarize yourself with this system, in particular how to run code using condor_submit. See the CLMA wiki pages for help on this. This is required for finished assignments. You should use this even during the debugging stage if you expect your code to take up a lot of processing resources. The short-cut command condor_exec is also useful.

Languages

For this course, you can use any programming language you like. Some good choices are Python, Perl, Ruby, C#, C++ and Java. I would avoid using Lisp or C unless you are very familiar with these languages already. First, have a look at this article comparing scripting and compiled languages. Finally, note this Wikipedia table comparing NLP toolkits and languages.

Python

  • For useful info on Python, see the main project page.
  • I would use Python 2.6 if you need any outside modules (e.g., the NLTK). Otherwise, Python 3.x is preferable for its superior Unicode handling.
  • The Natural Language ToolKit is the most comprehensive library ever written for NLP.

Perl

  • As for Perl see the main Perl code archive CPAN.
  • NLP libraries in Perl tend to be scattered, but there’s a lot of code out there.

C#

  • For a thorough HOWTO on using C# on Patas and for many other links, see this write-up. See the sections on using Condor and how to call a Python script from within C#.
  • You might also have a look at this warning about using C#/Mono, as it relates to free software and patent issues.

Java

  • See the Stanford NLP group’s resources, which include a lot of Java code.
  • The LingPipe project concerns more shallow processing, but it may be of use.
  • There’s also the OpenNLP project with a good bit of Java NLP code.

Ruby