CSS 581 - Introduction
to Machine Learning
Winter 2014
TuTh 8:00-10:00
PM
UW1-040
Instructor
email
peaklist@u.washington.edu
phone
(206) 669-6629 (cell)
office hours Tuesday, 5:00-6:30 PM,
UW1-302
Course description
Machine learning is the science of building predictive models from available data, in order to predict the behavior of new, previously unseen data. It lies at the intersection of modern statistics and computer science, and is widely and successfully used in medicine, image recognition, finance, e-commerce, textual analysis, and many areas of scientific research, especially computational biology. This course is an introduction to the theory and practical use of the most commonly used machine learning techniques, including decision trees, logistic regression, discriminant analysis, neural networks, naïve Bayes, k-nearest neighbor, support vector machines, collaborative filtering, clustering, and ensembles. The coursework will emphasize hands-on experience applying specific techniques to real-world datasets, combined with several programming projects.
Prerequisites
CSS 342, calculus, and statistics. Some prior exposure to probability and linear algebra is recommended.
Textbook
Introduction
to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin
Kumar, Addison-Wesley, 2006.
Supplemental reading
See Schedule.
Programming language
MATLAB will be used for both exercises and projects.
Exercises
Most lectures
will be accompanied by a set of exercises involving the machine
learning method(s) introduced in that lecture. Exercises
will be a mix of problem sets, hands-on tutorials, and minor
coding. The purpose of the exercises is to impart some familiarity
with and intuition about the method. Exercises will have
some simple deliverables to ensure they are being completed and
understood.
Exercise answers
should be turned in as a Word document or PDF.
If you want to turn in a different type of document, please
discuss it with me first.
Exercise answers
will be collected using a Catalyst Collect It dropbox. See
the course
homepage sidebar for a link to the course Collect It.
Programming projects
There will be
three programming projects. Each will require implementing a
particular machine learning method from scratch (i.e. not using
the built-in MATLAB modules). Likely methods for the projects
include feature selection, collaborative filtering, and ensemble
classification.
Aside from code
files, project deliverables should be turned in as a Word document
or PDF.
If you want to turn in a different type of document, please
discuss it with me first.
Project deliverables will be collected using a Catalyst Collect It dropbox. See the course homepage sidebar for a link to the course Collect It.
Grading
Exercises will
be weighted equally, although the amount of work may vary.
Exercises will account for 25% of the overall grade. Each of the
three programming projects will account for 25% of the overall
grade. I will grade on a curve.
Late policy for
Exercises: Exercises will be accepted up to two days past
the due date, but for each day late there will be a loss of 25% of
the grade. Exercises will not under any circumstances be
accepted more than two days past the due date. Submission
times as determined by the Catalyst Collect It dropbox will be the
final arbiter on lateness.
Late policy for Projects: Projects will be accepted up to three days past the due date, but for each day late there will be a loss of 15% of the grade. Projects will not under any circumstances be accepted more than three days past the due date. Submission times as determined by the Catalyst Collect It dropbox will be the final arbiter on lateness.
If at any point during the quarter you have concerns about your performance in the class, please talk with me, and I will tell you where you stand relative to your classmates.
Schedule (revised Mar. 4, 2014)
For reading:
IDM = Introduction to Data Mining, by Tan,
Steinbach, and Kumar
ESL = Elements of Statistical Learning, 2nd Ed., by
Hastie, Tibshirani, and Friedman
ISLR = An Introduction to Statistical Learning with
Applications in R, by James, Witten, Hastie, and
Tibshirani.
Date |
Lecture |
Topics |
Reading |
Exercises,
Projects |
Tues. Jan.
7 |
1 slides ppt pdf |
Course logistics Overview and examples of applications |
IDM Chap. 1 |
Exercises 1 - due
Sun. Jan. 12 |
Thurs. Jan.
9 |
2 slides ppt pdf |
Math essentials (1)
|
|
Exercises 2 - due
Wed. Jan. 15 |
Tues. Jan.
14 |
3 slides a ppt pdf slides b ppt pdf script a script b script c |
MATLAB essentials Data |
IDM Chap. 2.1 - 2.3 IDM Chap. 3.1 - 3.3 |
Exercises 3 - due
Mon. Jan. 20 |
Thurs. Jan
16 |
4 slides ppt pdf script dataset |
Classification (1) |
IDM Chap. 4 |
Exercises 4 - due
Wed. Jan. 22 |
Tues. Jan.
21 |
5 slides a ppt pdf slides b ppt pdf script dataset doc |
Feature generation Feature selection Classification (2) |
||
Thurs. Jan.
23 |
6 slides a ppt pdf slides b ppt pdf script |
Math essentials (2) Classification (3) |
Project 1 - due Sat.
Feb. 8 |
|
Tues. Jan.
28 |
7 slides a ppt pdf slides b ppt pdf script |
Classification (4) |
IDM Chap. 5.2, 5.3 ESL Chap. 1, 2.1-2.3 Machine Learning: a Probabilistic Perspective, by Murphy, Chap. 1.1-1.4.6 (see Reference texts for download) |
|
Thurs. Jan. 30 | 8 slides ppt pdf script dataset doc |
Regression (1) |
IDM Appendix D ESL Chap. 3.1, 3.2 (through p. 46), 3.2.1, 3.4 (through p. 72) ESL Chap. 9.2.1-9.2.2 |
Exercises
5 - due Wed. Feb. 5 script knn.m |
Tues. Feb. 4 | 9 slides a ppt pdf slides b ppt pdf |
Collaborative filtering (1) |
R. Bell, Y. Koren, and C.
Volinsky, "All
Together Now: A Perspective on the Netflix Prize", Chance, Vol. 23, 24-29,
2010. Y. Koren, R. Bell, and C. Volinsky, "Matrix Factorization Techniques for Recommender Systems", Computer, 42-49, August 2009. J. Breese, D. Heckerman, and C. Kadie, "Empirical Analysis of Predictive Algorithms for Collaborative Filtering", Proc. 14th Conf. Uncertainty Artif. Intell., 1998. optional: A. Narayanan and V. Shmatikov, "Robust De-anonymization of Large Sparse Datasets", IEEE Symp. Security Privacy, 111, 2008. |
|
Thurs. Feb. 6 | 10 slides ppt pdf |
Collaborative filtering (2) |
Project 2 - due Mon.
Feb. 24 |
|
Tues. Feb. 11 | 11 slides ppt pdf script a script b dataset |
Clustering (1) |
IDM Chap. 2.4 IDM Chap. 8.1 - 8.2 |
|
Thurs. Feb. 13 | 12 slides ppt pdf script |
Clustering (2) |
IDM Chap. 8.3 - 8.5 |
Exercises 6 - due
Wed. Feb. 19 script clust.m dataset synthGaussMix.mat |
Tues. Feb. 18 | 13 slides ppt pdf |
Ensembles (1) |
IDM Chap. 5.6 optional but highly recommended: P. Domingos, "A Few Useful Things to Know about Machine Learning", Comm. ACM, Vol. 55, 78-87, 2012. |
|
Thurs. Feb. 20 | 14 slides a ppt pdf slides b ppt pdf |
Ensembles (2) Classification (5) |
R. Polikar, "Ensemble
Learning", Scholarpedia,
Vol. 4, 2776, 2008. R. Berk, L. Sherman, G. Barnes, E. Kurtz, and L. Ahlman, "Forecasting Murder Within a Population of Probationers and Parolees: a High Stakes Application of Statistical Learning", J. Royal Statist. Soc. A, Vol. 172, Part 1, 191-211, 2009. |
|
Tues. Feb. 25 | 15 slides ppt pdf script |
Classification (5) |
IDM Chap. 5.4 |
|
Thurs. Feb. 27 | 16 slides ppt pdf script dataset doc a doc b doc c |
Classification (6) |
IDM Chap. 5.5 |
Project 3 - due Wed.
Mar. 19 |
Tues. Mar. 4 | 17 slides ppt pdf |
Dimensionality reduction |
IDM Appendix B.1 - B.3 ESL Chap. 14.5.1 ISLR Chap. 6.3, 10.2 |
|
Thurs. Mar. 6 |
18 slides ppt pdf |
Anomaly detection |
IDM Chap. 10 |
Exercises 7 -
due Fri. Mar. 14
|
Tues. Mar. 11 |
19 slides a ppt pdf slides b ppt pdf abstract |
Special topic lecture: "Seizure prediction and machine learning" (OPEN TO PUBLIC) Natural language processing |
J. Howbert, E. Patterson, S. Stead, B. Brinkmann, V. Vasoli, et al., "Forecasting Seizures in Dogs with Naturally Occurring Epilepsy", PLoS ONE, Vol. 9, e81920, 2014. | |
Thurs. Mar. 13 |
20 abstract |
Special topic lecture: "Using NLP and machine learning in medical information systems" (OPEN TO PUBLIC) |
B.
Tinsley, A. Thomas, J. McCarthy, and M. Lazarus, "Atigeo
at TREC 2012 Medical Records Track: ICD-9 Code
Description Injection to Enhance Electronic Medical
Record Search Accuracy", 2012. |
|
Mon.-Fri.
Mar. 17-21 |
-- |
FINALS WEEK |