Syllabus for LING 570

Shallow Processing Techniques for NLP

Fall 2009

 

Class time & location:                        MW 3:30-4:50pm, MGH 287

 

Instructor:                                            Fei Xia

Office:                                                 Padelford  A-210G

Office Hour:                                        F: 11am - noon

Phone:                                                 (206) 543-9764

Email:                                                  fxia@u.washington.edu

 

TA:                                                      David Goss-Grubbs

Office:                                                 ART 337

Office Hour:                                        M: 12:30-1:30

Email:                                                  davidgg@u.washington.edu

 

 

 

Textbooks:

  • (M&S) Manning & Schütze (1999).  Foundations of Statistical Natural Language Processing.  Cambridge: MIT Press
  • (J&M) Jurafsky and Martin (2008).  Speech and Language Processing (2nd edition).  Prentice-Hall.  http://www.cs.colorado.edu/~martin/slp2.html

 

 

 Prerequisites:

  • CS 326 (Data Structures) or equivalent
  • Stat 391 (Probability and Statistics for CS) or equivalent
  • Formal grammars, formal languages, finite state automata
  • Programming in Perl, C, C++, Java, or Python
  • Basic unix/linux commands (e.g., ls, cd, ln, sort, head):  tutorials on unix.

 

 

 Grading:   

  • Assignments (80-90%):  They are handed out every Wed, and are due at 11:45pm the next Wed. The code must run on Patas.
  • Quizzes (0-10%)
  • Class participation (10%)

  

 

Course policies:

  • Course website: All course information on this web page is tentative and can change at any time. Confirm crucial dates or information with me in person during class.

 

  • Attendance: Students are expected to attend all classes (in person or remotely via Meeting Room). Announcements about assigned readings and assignments will be made available at the start of each class. Such announcements may not be made on this web page, so don't rely on information here instead of attending class.

 

  • Late assignments:  There will be a 1% penalty for every hour after the deadline. For instance, suppose the assignment is due at 11:45pm and you turn in the assignment at noon the next day, you grade would be x * 0.88, where x is the grade you would get if you have turned in before the deadline. No assignments will be accepted four (4) days after the due date.

 

  • “Incomplete”:  According to UW policy, "incomplete grades may only be awarded if you are doing satisfactory work up until the last two weeks of the quarter." Therefore, it is crucial for you to hand in your homework on time. An “incomplete” grade is given only under extremely unusual circumstances (e.g., health issues, family emergency). 

 

  • Emails: Use the prefix "ling570: " on the subject lines of your messages. If you do not include the prefix, then the mail might go unanswered. If you don’t receive a reply from me within 48 hours, please send me a reminder.
 
   

Tentative Schedule:

 

Week

Date

Topic

Reading

 

Hw

Due

1

9/30

 

 

 

M&S 4.2.2

 

 

 

M&S 2.1

 

Hw1: build a tokenizer

 

2

10/5

 

 

 

 

10/7

·         Formal Grammar, language, and Regex

 

 

 

J&M  2

 

 

 

 

 

 

 

Hw2:

FSA and Carmel

 

 

Hw1

 

3

10/12

 

10/14

 

 

J&M  3

 

 

Hw3: build a unigram POS tagger

Hw2

4

10/19

 

10/21

  • Finish morph analysis

 

 

 

 

J&M 3

 

 

 

 

J&M 4

Hw4: morph acceptor

Hw3

5

10/26

 

10/28

·         Smoothing

 

·         POS tagging (1): overview

 

J&M 4

 

J&M 5

M&S 6

 

 

Hw5:

n-gram LM

 

Slides for Hw5

Hw4

6

11/2

 

 

11/4

·         HMM (1): definition

 

·         POS tagging (2): ngram tagger

M&S 9

 

 

 

Hw6: HMM

 

Slides for Hw6

Hw5

7

11/9

 

 

11/11

(no class)

 M&S 9 

Hw7: Viterbi algorithm

 

Hw6

8

11/16

 

 

11/18

·         Introduction to classification

 

·         Mallet

 

 

 

Hw8: use Mallet

 

 

Hw7

9

11/23

 

 

 

11/25

 

 

Ratnaparkhi (1996)

Hw9: MaxEnt tagger

Hw8

10

11/30

 

 

12/2

·         Chunking

·         NE tagging

 

·         Clustering

 

 

 

 

 Hw10: clustering

Hw9

11

12/7

 

12/9

  • IE

 

  • Summary

 

 

 

 

 

Hw10