|
Syllabus for
Linguistics 570 Shallow
Processing Techniques for Natural Language Processing Autumn 2008 |
|
|
|
|
|
Professor: |
William Lewis |
|
Time & Location: |
MW 4:30-5:50, LOW 202 |
|
|
|
|
Office: |
LOW 202 (for now) |
|
Hours: |
M 6-7 |
|
|
|
|
e-mail: |
wlewis2 at u Please
include "570" in the subject line. |
|
TA: |
Bill McNeill |
|
e-mail: Hours: |
billmcn at u F 2-3,
Art 337 |
Course Description:
Techniques and algorithms for associating relatively surface-level structures and information with natural language corpora, including: POS tagging, morphological analysis, preprocessing/segmentation, named entity recognition, chunk parsing, and word-sense disambiguation. Linguistic resources that can be leveraged for these tasks (e.g., WordNet).
Course Texts:
Manning &
Schütze (1999). Foundations of Statistical Natural Language Processing.
Jurafsky and Martin (2006). Speech and Language Processing, 2nd Edition. Prentice-Hall.
Other Materials:
Miscellaneous readings as required.
Prerequisites: Ling 200 or equivalent introductory linguistics course
Ling 473 (Basics for Computational Linguistics) or placement exam
CS 326 (Data Structures) or equivalent
Stat 391 (Prob. and Stats for CS) or equivalent
Programming in Perl, C, C++, Java, or Python
Grading:
Homework assignments: 50%
Projects: 40%
Class participation: 10%
Tentative Course Schedule:
|
Day |
Date |
Topic |
Reading/Homework
Assignments |
|
1 |
Sep 24 (Slides) |
Introduction Overview: - Shallow Approaches to Natural Language Processing - Basics Corpora: - Utility in Lang. Processing - Types of Corpora Brief overview of FSA/FSTs |
Review: Read M&S Overview: Read J&M Ch. 1 |
|
2 |
Sep 29 (Slides) |
Review of HW#1 Overview, Corpora, Evaluation (con’t) - Methods for evaluation Morphological Processing - Tokenization - Stemming - Evaluating Stemmers Tools for Stemming |
M&S
Ch. 1,4 M&S
Ch. 8: § 8.1 J&M
Ch. 3: § 3.1, § 3.2, § 3.4, § 3.5 |
|
3 |
Oct 1 (Slides) |
Markov Models: - Adding weights and probabilities - Morphological processing - POS Tagging |
Charniak
97 Ch 3: § 3.2, 3.3 M&S
Ch. 9: § 9.2-9.3 |
|
4 |
Oct 6 (Slides) |
HMM’s: - POS Tagging |
M&S Ch. 9: § 9.4 M&S Ch. 10: § 10.1-10.3 |
|
5 |
Oct 8 (Slides) |
HMM’s: - POS Tagging - DP algorithms (review) - Viterbi (review and application) |
J&M Ch. 5: § 5.5 J&M Ch. 6: § 6.4 Review:
M&S Ch. 10: § 10.2.2 |
|
6 |
Oct 13 (Slides) |
Finish review of Project 1, Viterbi POS Tagging - Other methods for POS Tagging - Evaluating Taggers |
Remainder of M&S Ch. 10 Roche
& Schabes 1995, through section 7 (inclusive) |
|
7 |
Oct 15 (Slides) |
Smoothing |
J&M Ch
6: § 6.5 M&S Ch 6: § 6.2.5 |
|
8 |
Oct 20 (Slides) |
Class cancelled due to illness |
M&S
§ 1.4, 2.2 J&M Ch. 4: § 4.1 & 4.2 |
|
9 |
Oct 22 (Slides) |
N-gram models - N-grams and HMM’s - N-gram models of language Language Identification - N-gram models for Language ID - Hybrid models for Language ID N-gram models - Estimators - Entropy/perplexity (intro) - Evaluation of language models |
J&M Ch. 4: § 4.10 |
|
10 |
Oct 27 (Slides) |
Project 1 & HW#3 review Continuation on entropy: - Cross-entropy - Perplexity |
M&S Ch 2: § 2.2.5 - 2.2.8 |
|
11 |
Oct 29 (Slides) |
Shallow Parsing - Text Chunking - Phrasal Identification |
Bird &
Loper 2005 (see dropbox on patas) - Concentrate on general chunking issues (ignore
NLTK specifics) Optional: Molina & Pla 2002 |
|
12 |
Nov 3 (Slides) |
Word Sense Disambiguation Information Retrieval |
M&S Ch. 7: § -7.2.1, 7.3.2, 7.5 |
|
13 |
Nov 5 (Slides) |
WSD (con’t) IR (con’t) VS Model discussion, clustering |
J&M Ch 23:
§ 23.1 |
|
14 |
Nov 10 (Slides) |
The bigger context: - Question Answering - TREC Competition Tools for IR - Lucene |
Optional: Wang et al 2008 |
|
15 |
Nov 12 (Slides) |
Clustering discussion IR issues, weighting |
M&S Ch 14: through 14.2.1 (inclusive) |
|
16 |
Nov 17 (Slides) |
Classification discussion Basic Machine Learning issues |
M&S: Ch 16 |
|
17 |
Nov 19 Slides: Day 16 con’t |
Classification discussion (con’t) |
|
|
18 |
Nov 24 (Slides) |
WSD (con’t) - WordNet and Disambiguation - Semantic Distance and Disambiguation |
J&M Ch 20: through § 20.2 |
|
19 |
Nov 26 (Slides) |
Word Sense Disambiguation - Naïve Bayes Classifiers |
M&S Ch. 2: § 2.2-2.3, 7.2.2-7.3 |
|
20 |
Dec 1 (Slides) |
Named Entity Recognition - Named entity tagging - Evaluating and comparing contexts Named Entity Recognition - Clustering of NE pairs - Evaluation of NE Systems Tools for NER - LingPipe |
|
|
21 |
Dec 3 (Slides) |
The Big Challenge |
|
Bibliography (for those documents that can’t be found online):
Charniak, E. (1997). Statistical
Language Learning.