Linguistics 580G: Statistical Methods in NLP

Course Info

Instructor Info



The goals of this course are (1) to learn about various statistical methods commonly used in NLP and (2) to explore the ways in which those methods have been applied to various linguistic problems.

Note: To request academic accommodations due to a disability, please contact Disabled Student Services, 448 Schmitz, 206-543-8924 (V/TTY). If you have a letter from Disabled Student Services indicating that you have a disability which requires academic accommodations, please present the letter to the instructor so we can discuss the accommodations you might need in this class.


Students will be expected to present some number of papers over the course of the quarter (TBD based on the number of students in the course!), to do all of the assigned readings and participate in the discussion, and to complete a final paper or project. Final papers (of 15-20 pages) could take the form of comparing various approaches to a given linguistic problem (e.g., POS tagging) in the literature. Final projects involve the application of a statistical method to some linguistic problem, and should be accompanied by a short (5 page) write up. Students are encouraged to select topics for either sort of final early, and to discuss them with the instructor.

List of readings

Schedule of Topics and Assignments (may be updated)

Probability theory
Manning & Schutze 2.1 
4/8Information theory
Preview: n-grams
Manning & Schutze 2.2 
4/15N-grams in spell checkers,
dialogue sequence modeling,
augmentative communication
and speech recognition
Preview: smoothing
Mays et al 1991 (Poulson)
Lesher et al 1999 (Poulson)
Samuelsson & Reichl 1999 (Oxford)
Preview: HMMs
Chen & Goodman 1996 (McNeill)
Manning & Schutze 9.1-9.3
Choice of final type & topic
4/29HMMs in POS tagging
and named-entity recognition
Preview: PCFGs
DeRose 1988 (Unsworth)
Kempe 1997 (Unsworth)
Elworthy 1995 (Blanchard)
Zhou and Su 2002 (Blanchard)
5/6 PCFGs, A* search
Preview: Maximum entropy
Jurafsky 1996 (Tur)
Abney 1997 (Goss-Grubbs)
Klein & Manning 2003 (Goss-Grubbs)
5/13 Maximum entropy models for
stochastic tagging and parsing
Ratnaparhki 1996 (Dormer)
Johnson et al 1999
Toutanova et al 2002 (Dormer)
(Background: Jelinek 1997:Chs 13-14)
Final paper outlines/
Final project specs
5/20 Chunk parsing
Term paper/project presentations
Preview: kNN
Briscoe & Carroll 2002 (Darnell)
5/27 k Nearest Neighbors
POS tagging, word-sense disambiguation
lexical acquisition
Preview: Decision trees

Daelemans et al 1996 (Waltmunson)
Dagan et al 1997 (Waltmunson)
Baldwin & Bond 2003a
Baldwin & Bond 2003b

6/3 Decision trees in
sentence-boundary detection
generation and disambiguation
Palmer & Hearst 1997 (Kahn)
Gamon et al 2002 (Weight)
Wu 2003 (Weight)
6/9  Final papers/projects due, 5pm