# SurveyHypothesisTest.R
# This script shows you how to run hypothesis tests from the results of our survey. Below I've
# supplied R code for the 5 examples:
# Hypothesis: A specific null and alternative hypothesis about the data (see examples below)
# Analysis: A copy of the R code used to test the hypothesis
# Results: A copy of the p-value of the results
# Summary: An interpretation of the results, in your own words.
##
# First we'll clear the workspace and load in the survey data:
rm(list = ls())
survey <-read.csv("http://www.courses.washington.edu/psy315/datasets/Psych315W19survey.csv")
# Our new variable 'survey' has a bunch of fields associated with it that
# correspond to your answers to each of the questions. A good way to see the
# list of fields is, if you're using R Studio' to go to the 'Data' window, find
# the 'survey' variable and click on the blue triangle. You'll see
# things like:
#
# gender : Factor w /2 levels "Female", "Male": 1 2 2 ...
#
# This means that there is a field 'gender' which you can access with the
# dollar sign (survey$gender)
#
# 'Factor' means that this field is nominal data, and you can see that the 2 levels
# are 'Female' and 'Male'.
#
# Other fields are either 'int' (integers) or 'num' (decimals), which are both ratio
# scale data for our survey.
## Example 1: Is the mean height of women in our class different from 64 inches?
# This will be a single sample t-test on the mean of 'height', compared to 64.
# For more details, see the 'OneSampleTTest.R' script.
# We can pull out the heights for female students like this:
ratio.data <- survey$height[survey$gender == "Female"]
# We can summarize our ratio-scale value with means and standard deviations.
mean(ratio.data)
sd(ratio.data)
# To compare this mean to 64, we use R's 't.test' function
H0 <- 64
out <- t.test(ratio.data,
mu = H0,
alternative = "two.sided")
# we can report the results of our last t-test in APA format:
sprintf('t(%g) = %4.2f, p = %5.4f',out$parameter,out$statistic,out$p.value)
## Example 2: Is there an equal number of men and women in our class?
# This will be a chi-squared test for frequencies. For more details,
# see the 'Chi2TestFrequencies.R' script.
#
# Gender is a nominal scale measure. We'll make a table of frequencies
# using the 'table' function:
nominal.data <- survey$gender
freqs <- table(nominal.data)
freqs
# To run a chi-squared test for frequencies we need to define the expected
# frequencies. For this example, we expect a 50/50 distribution of males
# and females:
fe = c(.5,.5) # expected frequencies
# run the chi-squared test:
out <- chisq.test(freqs,p=fe)
# The chi-squared statistic is:
out$statistic
# Here is our result in APA format:
sprintf('Chi-Squared(%d,N=%d) = %5.2f, p = %5.8f',out$parameter,sum(freqs),out$statistic,out$p.value)
## Example 3: Does where you sit in class depend on gender?
#
# This will be a chi-squared test for independence on the two nominal scale variables,
# 'sit' and 'gender'. For more details see the 'Chi2TestIndependence.R' script.
#
# R's 'table' function conveniently tabulates the observed frequencies for more than
# one nominal variable:
fo <- table(survey$gender,survey$sit)
# run the chi-squared test.
out <- chisq.test(fo, correct = FALSE)
# The chi-squared statistic is:
out$statistic
# The degrees of freedom is:
out$parameter
# And the p-value is:
out$p.value
# Writing in APA format can be done like this:
sprintf('Chi-Squared(%d,N=%d) = %5.2f, p = %5.4f',out$parameter,sum(fo),out$statistic,out$p.value)
## Example 4: Is there a statistically significant correlation between mother's and father's heights?
# This will be a hypothesis test on a single correlation between two ratio scale variables,
# 'pheight' and 'mheight'. For more details see the 'ComparingOneCorrelation.R' script.
#
# Comparing ratio scale data to ratio scale data is best done with
# a scatterplot, and summarized with a correlation. For example, to compare
# your father's heights to your mother's heights, use:
x <- survey$mheight
y <- survey$pheight
# get rid of the NAs
goodvals = !is.na(x) & !is.na(y)
x <- x[goodvals]
y <- y[goodvals]
# cor.test runs the t-test for you:
out <- cor.test(x,y,alternative = "greater")
# 'estimate' is the correlation
out$estimate
# 'p.value' is the p-value
out$p.value
# 'statistic' is the t-statistic used in the test:
out$statistic
# with degrees of freedom:
out$parameter
# Here's how to display your results in APA format:
sprintf('r(%g) = %4.2f, p = %5.8f',out$parameter,out$estimate,out$p.value)
## Example 5: Does your expected score on Exam 1 depend on where you like to sit
# in class?
# This example will be a one-factor ANOVA on mean scores for a ratio scale
# measure ('Exam1') across the levels of a nominal scale measure ('sit')
#
# For more details, see the the 'OneFactorANOVA.R' script
# We'll need the library 'broom' so we can use the function 'tidy' to
# clean up the results of the 'aov' function.
library("broom")
out <- aov(Exam1 ~ sit,data = survey,na.action = na.omit)
# It's hard to find the p-value and other statistics in this output, but there's
# a function 'tidy' that cleans up the output:
tidy.out <- tidy(out)
# The output of 'tidy' gives you that familiar table of results. Hopefully it matches the tutorial.
tidy.out
# Useful fields in tidy.out are 'df', 'statistic' (F value), and 'p.value'
# We can use this output and 'spritnf' to present the results in APA format:
sprintf('F(%g,%g) = %0.2f, p = %0.4f',tidy.out$df[1],tidy.out$df[2],
tidy.out$statistic[1],tidy.out$p.value[1])
# In summary, we've gone through 5 different hypothesis tests based on our survey data:
# 1) a one-sample t-test on a single ratio scale measure
# 2) a chi-squared test for frequencies on a single nominal scale measure
# 3) a chi-squared test for indepenedence on two nominal scale measures
# 4) a test on a single correlation for two ratio vs. ratio scale measures
# 5) a one-factor ANOVA on the means of a ratio scale measure across nominal scale levels