Linguistic research across all subfields involves the testing of hypotheses against data. Computational methods allow linguists to greatly extend the size of datasets and range of data brought to bear on their hypotheses. The purpose of this course is to introduce students to existing computational resources including text and speech corpora, annotations over those corpora and software for manipulating them. The course is project-driven: The target audience is graduate students who are engaged in research projects which can be enhanced with the computational methods we study. No background in computational linguistics or computer programming is assumed.
By the end of this course students will:
We will get "hands-on" experience working with large corpora of interest to students. Students (in pairs or small groups) will form practicum groups, and will work together to accomplish specific database-related tasks across the quarter. Most of this work will be "behind the scenes", and groups will independently arrange how to conduct their work. There are two planned points in the quarter when practicum work will be presented in class: each practicum group will be responsible one syllabus day (a Wednesday) in the first half of the quarter for doing an in-class presentation to introduce us to their chosen database, show us how it is annotated, demonstrate how it is navigated, and show us how it might be of use to linguists. Note: a list of database resources of interest to linguists, CorpusList.rtf is available online in the Sociolinguistics Wiki.
Ideally, each student will have a laptop available to use in class on Wednesdays when we do hands-on exercises.
Practicum group members will:
To request academic accommodations due to a disability, please contact Disabled Student Services, 448 Schmitz, 206-543-8924 (V/TTY). If you have a letter from Disabled Student Services indicating that you have a disability which requires academic accommodations, please present the letter to the instructor so we can discuss the accommodations you might need in this class.
Each week has a technical topic and a theoretical topic. We seek to build skills in linguistic analysis, drawing on some issue from linguistic theory (Mondays) as well as technical computing skills (Wednesdays). You will, therefore, see the days of each week in the schedule labelled either "linguistics" or "technical" depending on their focus. This schedule may change in light of the projects that students are working on.
Date | Topic | Reading(s) to be discussed this week | Due |
---|---|---|---|
Wk 1 3/26, 3/28 |
Linguistics: Intro; Projects of interest; What database resources exist Technical: Getting around in Unix |
Corpus scavenger hunt | |
Wk 2 4/2, 4/4 |
Linguistics: Processing resources: POS taggers, parsers, forced aligners Technical: Running software |
Bender & Langendoen 2010 | Running a parser/POS tagger/forced aligner Presentation by Practicum Groups Term project: Define research questions |
Wk 3 4/9, 4/11 |
Linguistics: Discussion of project questions and how corpora
can be brought to bear on them Technical: Python basics, SVN |
Bird et al 2009 Chapters 0 and 1 [R: Gries 2009, ch. 3] |
Hello World Term project: Identify relevant resources |
Wk 4 4/16, 4/18 |
Linguistics: Lexical frequency Technical: Python (cont) |
Jurafsky (skip or skim pp 63-88) [R: Gries, 2009, ch. 4] |
Simple word frequency counter in Python (run over multiple corpora) Term project: Identify relevant resources |
Wk 5 4/23, 4/25 |
Linguistics: Metadata (DCMI, OLAC); Publishing data with papers Technical: Python continued |
Merging demographic data with transcripts (Python exercise) | |
Wk 6 4/30, 5/2 |
Linguistics: Annotation: Inter-annotator agreement; Annotation guidelines Technical: Annotation software (xtdf; Excel; ELAN) |
Bird & Liberman 1999 Morgan et al. (skip or skim Sec.5,pgs. 22-31) |
Annotation (in pairs to produce dual-annotation) |
Wk 7 5/7, 5/9 |
Interim project reports, open discussion of normalized frequencies for student databases | ||
Wk 8 5/14, 5/16 |
Linguistics: Inter-annotator agreement Technical: Computing Cohen's Kappa and Krippendorff's Alpha |
Artstien & Poesio 2008 Clopper 2011 |
Practicum groups compute Kappa/Alpha (Python) |
Wk 9 5/21, 5/23 |
Linguistics: Sampling; Choosing statistical tests Technical: Basic exploratory statistics in R |
Hay 2011 | R assignment Term project: Choose statistical tests |
Wk 10 5/30 |
No class Monday, Project presentations | ||
6/6 | Final projects due No late projects accepted. |