# SurveyHypothesisTest.R # This script shows you how to run hypothesis tests from the results of our survey. Below I've # supplied R code for the 5 examples: # Hypothesis: A specific null and alternative hypothesis about the data (see examples below) # Analysis: A copy of the R code used to test the hypothesis # Results: A copy of the p-value of the results # Summary: An interpretation of the results, in your own words. ## # First we'll clear the workspace and load in the survey data: rm(list = ls()) survey <-read.csv("http://www.courses.washington.edu/psy315/datasets/Psych315W19survey.csv") # Our new variable 'survey' has a bunch of fields associated with it that # correspond to your answers to each of the questions. A good way to see the # list of fields is, if you're using R Studio' to go to the 'Data' window, find # the 'survey' variable and click on the blue triangle. You'll see # things like: # # gender : Factor w /2 levels "Female", "Male": 1 2 2 ... # # This means that there is a field 'gender' which you can access with the # dollar sign (survey\$gender) # # 'Factor' means that this field is nominal data, and you can see that the 2 levels # are 'Female' and 'Male'. # # Other fields are either 'int' (integers) or 'num' (decimals), which are both ratio # scale data for our survey. ## Example 1: Is the mean height of women in our class different from 64 inches? # This will be a single sample t-test on the mean of 'height', compared to 64. # For more details, see the 'OneSampleTTest.R' script. # We can pull out the heights for female students like this: ratio.data <- survey\$height[survey\$gender == "Female"] # We can summarize our ratio-scale value with means and standard deviations. mean(ratio.data) sd(ratio.data) # To compare this mean to 64, we use R's 't.test' function H0 <- 64 out <- t.test(ratio.data, mu = H0, alternative = "two.sided") # we can report the results of our last t-test in APA format: sprintf('t(%g) = %4.2f, p = %5.4f',out\$parameter,out\$statistic,out\$p.value) ## Example 2: Is there an equal number of men and women in our class? # This will be a chi-squared test for frequencies. For more details, # see the 'Chi2TestFrequencies.R' script. # # Gender is a nominal scale measure. We'll make a table of frequencies # using the 'table' function: nominal.data <- survey\$gender freqs <- table(nominal.data) freqs # To run a chi-squared test for frequencies we need to define the expected # frequencies. For this example, we expect a 50/50 distribution of males # and females: fe = c(.5,.5) # expected frequencies # run the chi-squared test: out <- chisq.test(freqs,p=fe) # The chi-squared statistic is: out\$statistic # Here is our result in APA format: sprintf('Chi-Squared(%d,N=%d) = %5.2f, p = %5.8f',out\$parameter,sum(freqs),out\$statistic,out\$p.value) ## Example 3: Does where you sit in class depend on gender? # # This will be a chi-squared test for independence on the two nominal scale variables, # 'sit' and 'gender'. For more details see the 'Chi2TestIndependence.R' script. # # R's 'table' function conveniently tabulates the observed frequencies for more than # one nominal variable: fo <- table(survey\$gender,survey\$sit) # run the chi-squared test. out <- chisq.test(fo, correct = FALSE) # The chi-squared statistic is: out\$statistic # The degrees of freedom is: out\$parameter # And the p-value is: out\$p.value # Writing in APA format can be done like this: sprintf('Chi-Squared(%d,N=%d) = %5.2f, p = %5.4f',out\$parameter,sum(fo),out\$statistic,out\$p.value) ## Example 4: Is there a statistically significant correlation between mother's and father's heights? # This will be a hypothesis test on a single correlation between two ratio scale variables, # 'pheight' and 'mheight'. For more details see the 'ComparingOneCorrelation.R' script. # # Comparing ratio scale data to ratio scale data is best done with # a scatterplot, and summarized with a correlation. For example, to compare # your father's heights to your mother's heights, use: x <- survey\$mheight y <- survey\$pheight # get rid of the NAs goodvals = !is.na(x) & !is.na(y) x <- x[goodvals] y <- y[goodvals] # cor.test runs the t-test for you: out <- cor.test(x,y,alternative = "greater") # 'estimate' is the correlation out\$estimate # 'p.value' is the p-value out\$p.value # 'statistic' is the t-statistic used in the test: out\$statistic # with degrees of freedom: out\$parameter # Here's how to display your results in APA format: sprintf('r(%g) = %4.2f, p = %5.8f',out\$parameter,out\$estimate,out\$p.value) ## Example 5: Does your expected score on Exam 1 depend on where you like to sit # in class? # This example will be a one-factor ANOVA on mean scores for a ratio scale # measure ('Exam1') across the levels of a nominal scale measure ('sit') # # For more details, see the the 'OneFactorANOVA.R' script # We'll need the library 'broom' so we can use the function 'tidy' to # clean up the results of the 'aov' function. library("broom") out <- aov(Exam1 ~ sit,data = survey,na.action = na.omit) # It's hard to find the p-value and other statistics in this output, but there's # a function 'tidy' that cleans up the output: tidy.out <- tidy(out) # The output of 'tidy' gives you that familiar table of results. Hopefully it matches the tutorial. tidy.out # Useful fields in tidy.out are 'df', 'statistic' (F value), and 'p.value' # We can use this output and 'spritnf' to present the results in APA format: sprintf('F(%g,%g) = %0.2f, p = %0.4f',tidy.out\$df[1],tidy.out\$df[2], tidy.out\$statistic[1],tidy.out\$p.value[1]) # In summary, we've gone through 5 different hypothesis tests based on our survey data: # 1) a one-sample t-test on a single ratio scale measure # 2) a chi-squared test for frequencies on a single nominal scale measure # 3) a chi-squared test for indepenedence on two nominal scale measures # 4) a test on a single correlation for two ratio vs. ratio scale measures # 5) a one-factor ANOVA on the means of a ratio scale measure across nominal scale levels