Introduction

In this lab we will use R to simulate drawing marbles from a bag (i.e. what you just did by hand). Simulation will be one of the foundations of this course so it is important that you get comfortable with simulating data in the first week or two. You will be provided with some simple code to implement the simulations and then modify this code to answer some questions.

We start by defining a vector of strings (40 "red" and 40 "clear") using the rep function.

bag <- rep(c("red","clear"),40) 

We can draw a random sample from this object using the sample function.

four.marbles <- sample(bag,4,replace=FALSE)

because the parameter replace is FALSE sampling occurs without replacement (i.e. the marble is not put back in the bag before sampling drawing another marble).

To check for an extreme event (i.e all red or all clear) we can use the all function and the or operator (|). This returns, TRUE or FALSE.

all(four.marbles=="red")|all(four.marbles=="clear")
## [1] FALSE

If we want to simulate drawing multiple samples of size four from a bag then we need some way of repeating the sample command. We accomplish this using the for(...){} control construct. Here, everything inside of the braces, {}, is repeated 10 times. Notice that 1:10 is short hand for the integers from 1 to 10.

for(i in 1:10){
  four.marbles <- sample(bag,4,replace=F)
}

Finally, we add code that checks for an extreme event for every interation and tallies the total number of extreme evens. The if(...){...} control construct executes the commands within the braces, {}, if the logical statement within the parentheses, (), is TRUE.

extremeEvents <- 0
for(i in 1:10){
  four.marbles <- sample(bag,4,replace=F)
  if(all(four.marbles=="red")|all(four.marbles=="clear")){
    extremeEvents <- extremeEvents + 1
  }
}
extremeEvents/10
## [1] 0.3

We can generalize this code by defining the variables draws and sampleSize.

draws <- 10
sampleSize <- 7
extremeEvents <- 0
for(i in 1:draws){
 marbles <- sample(bag,sampleSize,replace=F)
  if(all(marbles=="red")|all(marbles=="clear")){
    extremeEvents <- extremeEvents + 1
  }
}
extremeEvents/draws
## [1] 0.1

Questions

  1. Increase the draws to 10000 and repeat the exercise above for sample sizes from 1 to 10. Then plot, by hand, the relationship between the sample size and the proportion of draws that are extreme.
  1. Set the sample size to 7, draws to 10000, and redefine an extreme event as 6 or 7 red or clear marbles. Hint, see what happens when you type sum(marbles=="red").
  1. Now, instead of simulating the results for a single group, simulate the results for multiple groups and look at the distribution of extreme events (i.e. extremeEvents). This can be accomplished by nesting the for loop above in another for loop and saving the results from each simulation.

    simulations <- 10000
    draws <- 10
    sampleSize <- 4
    extremeEventsVect <- rep(NA,simulations)
    for(j in 1:simulations){
      for(i in 1:draws){
    
      }
    }

    To assign a value to location j in a vector you can use: extremeEventVect[j] <- value.

Extra for experts

  1. What statistical distribution is defined by the process above?
  1. Use the function in R for generating random variables for this distribution (i.e. starts with r) to repeat the simulation above in question 3.
  1. Use the density function to calculate the answer.
  1. Calculate the probability by hand.
  1. What would the answer be if you sampled WITH replacement?