LING 572 - Advanced Statistical Methods in NLP
Winter 2017
Course Description and Policy
Course description
This course covers several important machine learning algorithms for natural language processing including decision trees, k-Nearest Neighbors, Naive Bayes, transformation-based learning, Support Vector Machines, Maximum Entropy and Conditional Random Fields. Students implement many of the algorithms and apply these algorithms to NLP tasks.
Textbook
There is no required textbook. Instead, the course readings will be drawn from
contemporary articles and tutorials available online.
Helpful background material can also be found in:
- (M): Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, 2008. Introduction to Information Retrieval, Cambridge University Press. [pdf]
- (J&M): Daniel Jurafsky and James Martin, 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edition.
- (M&S): Christopher D. Manning and Hinrich Schütze, 1999. Foundations of statistical natural language processing,
Cambridge: MIT Press.
Prerequisites:
- CS 326 (Data Structures) or equivalent
- Stat 391 (Probability and Statistics for CS) or equivalent
- Programming in one or more of Java, Python, C/C++, or Perl
- Linux/Unix commands
- Ling 570
Programming
- Students can choose their favorite languages, but choosing an uncommon language would make it more difficult to get help from the TA and others. Therefore, it is recommended to choose one of the following languages: Java, C/C++, Perl, Python.
- Students need to strictly follow the instructions in the assignments (e.g., naming convention, file format, command line format, etc.).
- For submission, students should tar all the required files and upload the tar file via CollectIt. Please include the shell scripts and the note files as explained in class.
- The code must run on Patas. If the code does not work for whatever reasons, please explain your work in the note file to get partial credits.
- If you need to use some software that is not already installed on patas, please email me with the detail of the software (the url of downloading, version number, etc) and a short explanation for why the software is needed. If needed, I will contact the system administrator to install the software. Notice that the process can take time.
Grading
- 90%: Homework Assignments. Due on Wed at 11pm unless specified otherwise
- 10%: Reading Assignments. Due at 11am before class
- The lowest grade of the 10 assignments (9 hw assignments and the total of reading assignment) will be dropped when computing the final grade.
Course Policies
- If any of the following is untrue, you should consider taking the course
later:
- You have met all the prerequisites.
- You can spend at least 15-20 hours
per week on the course including lecture time.
- You can attend class live (in person or remotely) for most
of the sessions.
- At least one of the four office hours works for you.
- Assignment:
- Reading assignments: Reading assignments are on teaching slides, not on separate handout. They are due before class. Submit it via CollectIt. If you miss the deadline, you will not get any credits for it.
- Late penalty for homework assignments only: There will be a 1% penalty for every hour after the deadline. For instance, suppose the assignment is due at 11pm and you turn in the assignment at 1am the next morning, you grade would be x * 0.98, where x is the grade you would have gotten if you have turned in before the deadline. No assignments will be accepted two (2) days after the due date.
- GoPost and emails:
- GoPost: The urls of recordings will be posted to GoPost. Other than that, GoPost is used for student discussion only. We (Fei and Leanne) will NOT check or answer questions on GoPost. For any questions not answered by your peers in GoPost, please raise them in class or during office hours.
- Emails: please email us from your UW account, and start the subject line with "ling572:". If your questions are not answered promptly (say within one business day), please ask us in/after class or during office hour.
- Class mailing list: Any course-relate announcements will be discussed in class or sent to the class mailing list. Please check your UW email at least once per day.
- Lectures:
- Online section: Students who do not register for the online section can attend no more than 10% of sessions online.
- Laptop in class: Laptops in class are allowed only for ling572-related purposes.
- Attending live (either in class or remotely) is crucial.
If you cannot attend class live most of the time, I highly recommend
you not to take the course now.
- Collaboration: Students are encouraged to collaborate with their classmates in and outside the classroom. For instance, you can post questions about assignments to GoPost, and others are encouraged to reply to your post.
- Incomplete: According to UW policy, "incomplete grades may only be awarded if you are doing satisfactory work up until the last two weeks of the quarter." Therefore, it is crucial for you to hand in your homework on time. An "incomplete" grade is given only under extremely unusual circumstances (e.g., health issues, family emergency).