---
title: "SurveyHypothesisTestExample"
author: "Anthony S. Fauci M.D."
date: "March 10, 2021"
output:
word_document: default
pdf_document: default
html_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Hypothesis: Does the choice of computer vary with choice of superpower?
I want to test the hypothesis that whether students use Apple or PC's varies with their choice of superpower. This is a comparison of nominal to nominal scale data, so it'll require making a bar plot of frequencies and running a chi-squared test for independence. I will choose an alpha value of .05.
## Analysis
The following R code loads in the survey data and creates our 2x2 table of frequencies
```{r}
# First we'll clear the workspace and load in the survey data:
rm(list = ls())
survey <-read.csv("http://www.courses.washington.edu/psy315/datasets/Psych315W21survey.csv")
# Then create the table
fo <- table(survey$superpower,survey$computer)
# The result is a table with both rows and columns, with labels:
fo
# The labels can be pulled out using 'row.names' and 'colnames' (note
# the inconsistency using '.' in the function names)
row.names(fo)
colnames(fo)
# The second and third correspond to 'Flight and Invisibility', and the 1st
# and 3rd columns correspond to Apple and PC. This pulls out the relevant
# subset of rows and columns:
fo <- fo[c(2,3),c(1,3)]
```
## Results
```{r}
# Here's the table of the results:
fo
# And the bar graph (optional):
barplot(fo,
beside=TRUE,
legend = row.names(fo),
col = c("Blue","Red"))
# Here is the chi-squared test on the data
out <- chisq.test(fo, correct = FALSE)
# The chi-squared statistic is:
out$statistic
# The degrees of freedom is:
out$parameter
# And the p-value is:
out$p.value
```
## Summary
Our p-value of 0.2122 is much larger than .05 so our results are not statically significant. We therefore cannot conclude that the choice of computers varies with choice of superpower.
```{r}
# Writing in APA format can be done like this:
sprintf('Chi-Squared(%d,N=%d) = %5.2f, p = %5.4f',out$parameter,sum(fo),out$statistic,out$p.value)
```