# Introduction to Statistics and Data Analysis

*Geoffrey M. Boynton*

Department of Psychology

University of Washington

Department of Psychology

University of Washington

*Latest build: August 14, 2023*

# 1 Introduction

This book contains materials for Psychology 522/524, the first quarter graduate statistics course in the Department of Psychology at the University of Washington. It’s very much a work in progress.

A pdf version of this book can be found here: http://courses.washington.edu/psy524a/_book/_main.pdf Pdf format is finicky, so there may be some formatting issues with the pdf version. I’m still working on it.

I’ve been teaching statistics at the undergraduate and graduate level for a couple of decades now. To be honest, I took on the undergraduate stats course, Psychology 315, because it was easy. I TA’d undergraduate statistics in psychology back as a graduate student in the 90’s and when I took on Psych 315 at UW in 2010, and then 522/524 in 2013 the course material hadn’t changed in 20 years. All textbooks were pretty much the same, covering tests, binomial distributions, correlations, simple ANOVA, all using tables in the back of the book to get p-values from known standard distributions. I used to joke that teaching 315 year after year was easy because it didn’t exactly require me to keep up with the literature.

But then things started to change! Although it was created back in 1993, the statistical programming language R started to gain popularity in the 2010’s probably due to the availability of cheap laptops and the push toward open source languages and free data sets. When I took over the grad stats course in 2013 everything was done in SPSS. Back then I surveyed the faculty about whether they thought it’d be useful for me to teach using R there was a resounding vote of ‘no’. So for the first few years I taught a hybrid class, each year emphasizing R more and SPSS less. By 2019 I had dropped SPSS entirely. The sounds of the students in lab course (522) has gradually switched from mouse clicking to keyboard typing.

Switching to R lead to two major changes in the way my students learn and use statistics. The first is the elimination of tables for looking up probabilities. A significant portion of my notes and lectures involved explaining which table to use and where to find the answer in the table. With R, this has usually been replaced with a single command, like ‘pnorm’,‘pt’, or ‘pf’. The second major change is the transition from teaching ANOVA using sums-of-squares to using regression. R’s ‘lm’ and ‘lmer’ functions provide a natural way to conduct ANOVA tests (along with a bunch of others) using regression and the linear model. My lectures are now less filled with SS’s all over the board.

By the way, statisticians have always emphasized (often snidely) that ‘ANOVA is just regression’, but I’ve never found a good resource that really explains why. Hopefully my chapters here on how ANOVA is just regression will help make this link more clear.