# SurveyHypothesisTest.R
# This script shows you how to run hypothesis tests from the results of our survey. Below I've
# supplied R code for the 5 examples done in 'SurveyAnalysis.R':
# For the extra credit assignment, your report should have the following sections:
# Hypothesis: A specific null and alternative hypothesis about the data (see examples below)
# Analysis: A copy of the R code used to test the hypothesis
# Results: A copy of the p-value of the results
# Summary: An interpretation of the results, in your own words.
# You can also include a plot of the data in the results (bar graphs, scatter plots, frequency distributions)
# but it is not required.
# An example 'Markdown' script can be found here:
# http://courses.washington.edu/psy315/homework/SurveyHypothesisTestExample.Rmd
##
# First we'll clear the workspace and load in the survey data:
rm(list = ls())
survey <-read.csv("http://www.courses.washington.edu/psy315/datasets/Psych315W21survey.csv")
# From the script 'SurveyAnalysis.R' we learned that our new variable 'survey'
# has a bunch of fields associated with it that correspond to your answers to
# each of the questions.
## Example 1: Is the mean height of women in our class different from 64 inches?
# This will be a single sample t-test on the mean of 'height', compared to 64.
# For more details, see the 'OneSampleTTest.R' script.
# We can pull out the heights for female students like this:
ratio.data <- survey$height[survey$gender == "Female"]
ratio.data <- na.omit(ratio.data)
# We can summarize our ratio-scale value with means and standard deviations.
mean(ratio.data)
sd(ratio.data)
# To compare this mean to 64, we use R's 't.test' function
H0 <- 64
out <- t.test(ratio.data,
mu = H0,
alternative = "two.sided")
# we can report the results of our last t-test in APA format:
sprintf('t(%g) = %4.2f, p = %5.4f',out$parameter,out$statistic,out$p.value)
## Example 2: Is there an equal number of men and women in our class?
# This will be a chi-squared test for frequencies. For more details,
# see the 'Chi2TestFrequencies.R' script.
#
# Gender is a nominal scale measure. We'll make a table of frequencies
# set the nominal scale variable
nominal.data <- survey$gender
# using the 'table' function:
freqs <- table(nominal.data,exclude = "") # the 'exclude = "" skips the empty responses
# To run a chi-squared test for frequencies we need to define the expected
# frequencies. For this example, we expect a 50/50 distribution of males
# and females:
fe = c(.5,.5) # expected frequencies
# run the chi-squared test:
out <- chisq.test(freqs,p=fe)
# The chi-squared statistic is:
out$statistic
# Here is our result in APA format:
sprintf('Chi-Squared(%d,N=%d) = %5.2f, p = %5.16f',out$parameter,sum(freqs),out$statistic,out$p.value)
## Example 3: Does where you sit in class depend on gender?
#
# This will be a chi-squared test for independence on the two nominal scale variables,
# 'sit' and 'gender'. For more details see the 'Chi2TestIndependence.R' script.
#
# R's 'table' function conveniently tabulates the observed frequencies for more than
# one nominal variable:
fo <- table(survey$gender,survey$sit,exclude = "")
fo
# run the chi-squared test.
out <- chisq.test(fo, correct = FALSE)
# The chi-squared statistic is:
out$statistic
# The degrees of freedom is:
out$parameter
# And the p-value is:
out$p.value
# Writing in APA format can be done like this:
sprintf('Chi-Squared(%d,N=%d) = %5.2f, p = %5.4f',out$parameter,sum(fo),out$statistic,out$p.value)
## Example 4: Is there a statistically significant correlation between mother's and father's heights?
# This will be a hypothesis test on a single correlation between two ratio scale variables,
# 'pheight' and 'mheight'. For more details see the 'ComparingOneCorrelation.R' script.
#
# Comparing ratio scale data to ratio scale data is best done with
# a scatterplot, and summarized with a correlation. For example, to compare
# your father's heights to your mother's heights, use:
x <- survey$mheight
y <- survey$pheight
# get rid of the NAs
goodvals = !is.na(x) & !is.na(y)
x <- x[goodvals]
y <- y[goodvals]
# cor.test runs the t-test for you:
out <- cor.test(x,y,alternative = "greater") # two.sided for a two-tailed test
# 'estimate' is the correlation
out$estimate
# 'p.value' is the p-value
out$p.value
# 'statistic' is the t-statistic used in the test:
out$statistic
# with degrees of freedom:
out$parameter
# Here's how to display your results in APA format:
sprintf('r(%g) = %4.2f, p = %5.8f',out$parameter,out$estimate,out$p.value)
## Example 5: Does your expected score on Exam 1 depend on where you like to sit
# in class?
# This example will be a one-factor ANOVA on mean scores for a ratio scale
# measure ('Exam1') across the levels of a nominal scale measure ('sit')
#
# For more details, see the the 'OneFactorANOVA.R' script
out <- lm(Exam1 ~ sit,data = survey,na.action = na.omit)
# It's hard to find the p-value and other statistics in this output, but there's
# a function 'anova' that cleans up the output:
anova.out <- anova(out)
# The output of 'anova' gives you that familiar table of results. Hopefully it matches the tutorial.
anova.out
# Useful fields in anovalout are 'Df', 'F value', and 'Pr(>F)' which is the p-value.
# We can use this output and 'sprintf' to present the results in APA format:
sprintf('F(%g,%g) = %0.2f, p = %0.4f',anova.out$Df[1],anova.out$Df[2],
anova.out$`F value`[1],anova.out$`Pr(>F)`[1])
# In summary, we've gone through 5 different hypothesis tests based on our survey data:
# 1) a one-sample t-test on a single ratio scale measure
# 2) a chi-squared test for frequencies on a single nominal scale measure
# 3) a chi-squared test for independence on two nominal scale measures
# 4) a test on a single correlation comparing two ratio scale measures
# 5) a one-factor ANOVA on the means of a ratio scale measure across nominal scale levels