19 Two Factor ANOVA

The Two-factor ANOVA is a hypothesis test on means with a ‘crossed design’ which has two independent variables. Observations are made for all combinations of levels for each of the two variables.

You can find this test in the flow chart here:

We’ll build up to the two-factor ANOVA by starting with what we already know - a 1-factor ANOVA experiment.

19.1 1-factor ANOVA Beer and Caffeine

Suppose you want to study the effects of beer and caffeine on response times for a simple reaction time task. One way to do this is to divide subjects in to four groups: a control group, a group with caffeine (and no beer), a group with beer (and without caffeine), and a lucky group with both beer and caffeine.

We’ll load in this existing data from the course website:

data.1 <- read.csv('http://courses.washington.edu/psy524a/datasets/BeerCaffeineANOVA1.csv')

Here are the summary statistics from this data set:

The four conditions, or ‘levels’ are “no beer, no caffeine”, “no beer, caffeine”, “beer, no caffeine”, and “beer, caffeine”

Table 19.1:
mean n sd sem
no beer, no caffeine 1.650833 12 0.6725455 0.1941472
no beer, caffeine 1.351667 12 0.4829235 0.1394080
beer, no caffeine 2.210000 12 0.3476153 0.1003479
beer, caffeine 1.858333 12 0.4696194 0.1355675

Here’s a plot of the means with error bars as the standard error of the mean:

It looks like the means do differ from one another. Here’s the ‘omnibus’ ANOVA result:

Table 19.2:
df SS MS F p
Between 3 4.687 1.5623 6.0856 p = 0.0015
Within 44 11.296 0.2567
Total 47 15.983

Yup, some combination of beer and caffeine have a significant effect on response times.

In this design we actually manipulated two factors - beer, and caffeine. It’d be nice to be able to look at these two ‘factors’ separately.

19.2 Effect of Beer

Consider the effect of Beer on reaction times. We could just run a t-test on the ‘no beer, no caffeine’ condition vs. the ‘beer, no caffeine’ condition. But we also can compare the ‘no beer, caffeine’ condition to the ‘beer, caffeine’ condition. Or, perhaps even better, we can combine these two comparisons. This combined analysis can be done with a contrast with the following weights:

Table 19.3:
no beer, no caffeine no beer, caffeine beer, no caffeine beer, caffeine
Effect of Beer 1 1 -1 -1

This measures the effect of beer averaging across the two caffeine conditions.

The calculations for this contrast yields:

\[\psi = (1)( 1.65) + (1)( 1.35) + (-1)( 2.21) + (-1)( 1.86) = -1.0658\]

\[MS_{contrast} = \frac{(-1.0658)^{2}}{\frac{(1)^{2}}{12} + \frac{(1)^{2}}{12} + \frac{(-1)^{2}}{12} + \frac{(-1)^{2}}{12}} = 3.4080\]

\[F(1,44) = \frac{3.4080}{0.2567} = 13.2748\]

\[p = 0.0007\]

It looks like there’s a significant effect of beer on response times. Since we’re subtracting the beer from the without beer conditions, our negative value of \(\psi\) indicates that the responses times for beer are greater than for without Beer. Beer increases response times.

19.3 Effect of Caffeine

To study the effect of caffeine, averaging across the two beer conditions, we use this contrast, which is independent of the first one:

Table 19.4:
no beer, no caffeine no beer, caffeine beer, no caffeine beer, caffeine
Effect of Caffeine 1 -1 1 -1

The calculations for this contrast yields:

\[\psi = (1)( 1.65) + (-1)( 1.35) + (1)( 2.21) + (-1)( 1.86) = 0.6508\]

\[MS_{contrast} = \frac{(0.6508)^{2}}{\frac{(1)^{2}}{12} + \frac{(-1)^{2}}{12} + \frac{(1)^{2}}{12} + \frac{(-1)^{2}}{12}} = 1.2708\]

\[F(1,44) = \frac{1.2708}{0.2567} = 4.9498\]

\[p = 0.0313\]

Caffeine has a significant effect on response times - this time \(\psi\) is positive, so response times for without caffeine are greater than for with caffeine. Caffeine reduces response times.

19.4 The Third Contrast: Interaction

For four levels or groups, there should be three independent contrast. Here’s the third contrast:

Table 19.5:
no beer, no caffeine no beer, caffeine beer, no caffeine beer, caffeine
Beer X Caffeine 1 -1 -1 1

What does that third contrast measure? Symbolically, the contrast combines the conditions as:

[without beer without caffeine] - [without beer with caffeine] - [with beer without caffeine] + [with beer with caffeine]

Rearanging the terms as a difference of differences:

([with beer without caffeine] - [without beer without caffeine]) - ([with beer with caffeine]-[without beer with caffeine])

The first difference is the effect of beer without caffeine. The second difference is the effect of beer with caffeine. The difference of the differences is a measure of how the effect of beer changes by adding caffeine. In statistical terms, we call this the interaction between the effects of beer and caffeine on response times. Interactions are labeled with an ‘X’, so this contrast is labeled as ‘Beer X Caffeine’.

You might have noticed the parallel between this and the \(\chi^{2}\) test of independence. This is the same concept, but for means rather than frequencies.

The results of the F-tests for this third contrast is:

\[\psi = (1)( 1.65) + (-1)( 1.35) + (-1)( 2.21) + (1)( 1.86) = -0.0525\]

\[MS_{contrast} = \frac{(-0.0525)^{2}}{\frac{(1)^{2}}{12} + \frac{(-1)^{2}}{12} + \frac{(-1)^{2}}{12} + \frac{(1)^{2}}{12}} = 0.0083\]

\[F(1,44) = \frac{0.0083}{0.2567} = 0.0322\]

\[p = 0.8584\]

We fail to reject \(H_{0}\), so there is no significant interaction between the effects of beer and caffeine on response times. This means that beer effectively increases response times the same amount, regardless of caffeine. Conversely, caffeine reduces response times effectively the same amount with or without beer. Notice the use of the word ‘effectively’ here. We should be careful about saying that ‘beer increases response times the same amount, regardless of caffeine’ because this isn’t true. There is a slight numerical difference, but it is not statistically significant.

19.5 Partitioning \(SS_{between}\)

Recall that for a 1-factor ANOVA, \(SS_{total}\) is broken down in to two parts \(SS_{within}\) and \(SS_{between}\):

In the chapter section 18.3.1 on APriori and post-hoc tests we discussed how the sums of squared for independent contrasts is a way of breaking down the total variability between the means, \(SS_{between}\). The same is true here for our three orthogonal contrasts. Summing up the three \(SS_{contrast}\) values gives us:

\[3.408+1.2708+0.0083 = 4.6871 = SS_{between}\]

So the three contrasts have partitioned the total variability between the means into three separate tests - each telling us something different about what is driving the significance of the ‘omnibus’ F-test. If we call the sums-of-squares for each of the three contrasts \(SS_{beer}\), \(SS_{caffeine}\), and \(SS_{beerXcaffeine}\) (where the ‘\(X\)’ means ’interaction), we can expand the above diagram to this:

This experiment has what is called a ‘factorial design’, where there are conditions for each combination of levels for the two factors of beer and caffeine. This example is a ‘balanced design’, which means that the sample sizes are the same for all conditions.

A standard way to analyze a factorial design is to break the overall variability between the means into separate hypothesis tests - a main effect for each factor, and their interactions. In this section we’ll show how treating the same data that we just discussed as a 2-factor ANOVA gives us the exact same result as treating the same results as a 1-factor ANOVA with three contrasts.

19.6 2-Factor ANOVA

I’ve saved the same data set but in a way that’s ready to be analyzed as a factorial design experiment. We’ll load it in here:

data.2 <- read.csv('http://courses.washington.edu/psy524a/datasets/BeerCaffeineANOVA2.csv')
head(data.2)
##   Responsetime    caffeine    beer
## 1         2.24 no caffeine no beer
## 2         1.62 no caffeine no beer
## 3         1.48 no caffeine no beer
## 4         1.70 no caffeine no beer
## 5         1.06 no caffeine no beer
## 6         1.39 no caffeine no beer

The data format has the same ‘ResponseTime’ column, but now it has two columns instead of one that define which condition for each measurement. The ‘caffeine’ column has two levels: ‘caffeine’ and ‘no caffeine’. Similarly the ‘beer’ column has two levels ‘beer’ and ‘no beer’. This way of storing the data is called ‘long format’, where which each row corresponds to a single observation.

This experiment is called a 2x2 factorial design because each of the two factors has two levels. We can summarize the results in the form of matrices with rows and columns corresponding to the two factors. We’ll set the ‘row factor’ as ‘caffeine’ and the ‘column factor’ as ‘beer’. That is, ‘beer’ varies across the rows and ‘caffeine’ varies across the column. Here’s the 2x2 table for the means:

Table 19.6: Means
no beer beer
no caffeine 1.6508 2.2100
caffeine 1.3517 1.8583

Instead of bar graphs, it’s common to plot results of factorial designs as data points with lines connecting them. By default, I plot the column factor along the x-axis and define the row factor in the legend. There are various ways of doing this in R. Here’s an example for our data. It requires both ‘ggplot2’ and the ‘dplyr’ libraries. Both are part of the ‘tidyverse’ package.

# Do this to avoid a stupid useless error message
options(dplyr.summarise.inform = FALSE)  

# order the levels for the two factors (alphabetical by default)
data.2$caffeine <- factor(data.2$caffeine,levels = c('no caffeine','caffeine'))
data.2$beer <- factor(data.2$beer,levels = c('no beer','beer'))

# Make a table (tibble) with generic names
summary.table <- data.2 %>%
  dplyr::group_by(caffeine,beer) %>%
  dplyr::summarise(
    m = mean(Responsetime),
    sem = sd(Responsetime)/sqrt(length(Responsetime))
  )

  # plot with error bars, replacing generic names with specific names
  ggplot(summary.table, aes(beer, m)) +
    geom_errorbar(
      aes(ymin = m-sem, ymax = m+sem, color = caffeine),
      position = position_dodge(0), width = 0.5)+
    geom_line(aes(group = caffeine,color = caffeine)) +
    geom_point(aes(group = caffeine,color = caffeine),size = 5) +
    scale_color_manual(values = rainbow(2)) +
    xlab('beer') +
    ylab('Response Time (s)') +
    theme_bw()

19.7 Within-Cell Variance (\(MS_{wc}\))

The three F-tests for a 2-factor ANOVA will use the same within-cell mean-squared error as the denominator. This is calculated the same way as for the 1-way ANOVA. We first add up the sums of squares for each condition.

The sums of squares within each group is called ‘\(SS_{wc}\)’ where ‘wc’ means ‘within cell’ since we’re now talking about cells in a matrix. Here’s the table for \(SS_{wc}\):

Table 19.7: \(SS_{wc}\)
no beer beer
no caffeine 4.9755 1.3292
caffeine 2.5654 2.4260

\(SS_{wc}\) is the sum of these individual within-cell sums of squares:

\[SS_{wc} = 4.9755+2.5654+1.3292+2.426 = 11.2961\]

Each cell contributes n-1 degrees of freedom to \(SS_{wc}\), so the degrees of freedom for all cells is N-k, where k is the total number of cells and N is the total sample size (n \(\times\) k):

\[df_{wc} = 48 - 4 = 44\]

Mean-squared error is, as always, \(\frac{SS}{df}\):

\[MS_{wc} = \frac{SS_{wc}}{df_{wc}} = \frac{11.296}{44} = 0.2567\]

This is the same value and df as \(MS_{w}\) from above when we treated the same data as a 1-factor ANOVA design.

The three contrasts that we used for the 1-factor ANOVA example correspond to what we call ‘main effects’ for the factors and the ‘interaction’ between the factors. To calculate the main effects by hand we need to calculate the means across the rows and columns of our factors. Here’s a table with the row and sum means in the ‘margins’:

Table 19.8: Row and Column Means
no beer beer means
no caffeine 1.6508 2.2100 1.9304
caffeine 1.3517 1.8583 1.6050
means 1.5012 2.0342 1.7677

The bottom-right number is the mean of the means, which is the grand mean (\(\overline{\overline{X}}\) = 1.7677)

19.7.1 Main Effect for Columns (Beer)

Calculating main effects is lot like calculating \(SS_{between}\) for the 1-factor ANOVA. For the main effect for columns, we calculate the sums of squared deviations of the column means from the grand mean, and scale it by the number of samples that contributed to each column mean. For our example, the sums of square deviations is:

\[(1.5012-1.7677)^2+(2.0342-1.7677)^2=0.071+0.071 = 0.142\]

There are \(2 \times 12 = 24\) samples for each column mean, so the sums of squared for the columns, called \(SS_{C}\) is

\[SS_{C} = (24)(0.142) = 3.408\]

Since 2 means contributing to \(SS_{R}\), so the degrees of freedom is \(df_{R}\) = 2 -1 = 1

\(MS_{C}\) is therefore

\[\frac{SS_{C}}{df_{C}} = \frac{3.4080}{1} = 3.4080\]

The F-statistic for this main effect is \(MS_{C}\) divided by our common denominator, \(MS_{wc}\)

\[F = \frac{MS_{C}}{MS_{wc}} = \frac{3.4080}{0.2567} = 13.2748\]

We can calculate the p-value for this main effect using pf:

1-pf(13.2748,1,44)
## [1] 0.0007060154

Notice that the F and p-values are the same as for the first contrast in the 1-way ANOVA above. If you work out the algebra you’ll find that the math is the same. The main effect in a multi-factorial ANOVA is exactly the same as the appropriate contrast in a 1-factor ANOVA.

19.7.2 Main Effect for Rows (Caffeine)

The calculations for finding the main effect of rows (Caffeine) on response times is completely analogous to finding the main effect for columns. We use our row means in the table above, which are the averages across the two beer conditions.

The sums of squared deviations for the means for rows is:

\[(1.9304-1.7677)^2+(1.605-1.7677)^2=0.0265+0.0265 = 0.053\]

There are \(2 \times 12 = 24\) samples for each row mean, so the sums of squared for the row, called \(SS_{R}\) is

\[SS_{R} = (24)(0.0529) = 1.2708\]

There are 2 means contributing to \(SS_{R}\), so the degrees of freedom is \(df_{R}\) = 2 -1 = 1

\(MS_{R}\) is therefore

\[\frac{SS_{R}}{df_{R}} = \frac{1.2708}{1} = 1.2708\]

The F-statistic for this main effect is \(MS_{R}\) divided by our common denominator, \(MS_{wc}\)

\[F = \frac{MS_{R}}{MS_{wc}} = \frac{1.2708}{0.2567} = 4.9498\]

The p-value for the main effect of Beer is:

1-pf(4.9498,1,44)
## [1] 0.03126914

19.7.3 Interaction Between Beer and Caffeine

The third contrast in the 1-factor ANOVA measured the differential effect of caffeine on response times across the two beer conditions (or vice versa). Recall that for a 1-factor ANOVA the sums of squares associated with three orthogonal conditions adds up to \(SS_{between}\) for four groups. Also, recall that \(SS_{between} + SS_{within} = SS_{total}\).

The easiest way to calculate the sums of square value for interaction is to appreciate that

\[SS_{total} = SS_{caffeine} + SS_{beer} + SS_{caffeineXbeer} + SS_{wc}\]

The total sums of squares is \(SS_{total}\) = 15.983.

Therefore,

\[SS_{caffeineXbeer} = SS_{total} - (SS_{wc} + SS_{caffeine} + SS_{beer})\]

so

\[SS_{RXC} = SS_{total}-SS_{R}-SS_{C}-SS_{wc} = 15.9830 - 1.2708 - 3.4080 - 11.2960 = 0.0083\]

The degrees of freedom for this interaction term is (\(n_{rows}\)-1)*(\(n_{cols}\)-1):

\[df_{RXC} = (n_{rows}-1)(n_{cols}-1) = (2-1)(2-1) = 1\]

So the mean-squared error for the interaction is

\[\frac{SS_{RXC}}{df_{RXC}} = \frac{0.0083}{1} = 0.0083\]

Using \(SS_{wc}\) for the denominator again, the F-statistic is:

$$

The p-value for the interaction is:

1-pf(0.0322,1,44)
## [1] 0.8584132

There is not a significant interaction between caffeine and beer on response times. Compare these numbers to the results of the third contrast in the 1-factor ANOVA above.

We typically summarize our calculations and results in a table like this:

Table 19.9:
df SS MS F p
caffeine 1 1.2708 1.2708 4.9498 p = 0.0313
beer 1 3.4080 3.408 13.2748 p = 0.0007
interaction 1 0.0083 0.0083 0.0322 p = 0.8584
wc 44 11.2960 0.2567
Total 47 15.9830

Using APA format we state, for our three tests:

There is a main effect of caffeine. F(1,44) = 4.9498, p = 0.0313.

There is a main effect of beer. F(1,44) = 13.2748, p = 0.0007.

There is not a significant interaction between caffeine and beer. F(1,44) = 0.0322, p = 0.8584.

You might have noticed that didn’t use any correction for familywise error for these three tests. There is a general consensus that the main effects and the interaction do not require familywise error correction. But if we treat the same data as a 1-factor design with three planned contrasts, we should apply error correction (like Bonferroni) even though the math and p-values are the same. If you find discrepancies like this baffling you are not alone.

19.8 The two-factor ANOVA in R

Conducting a two-factor ANOVA in R is a lot like for 1-factor ANOVA. We’ll use the lm function and pass it through the anova function to get our table and statistics. The difference is the definition of the formula. Here we’ll use: Responsetime ~ caffeiene*beer. The use of ’*’ is the way to ask R to conduct not only the main effect of caffeine and beer but also their interaction:

anova2.out <- anova(lm(Responsetime ~ caffeine*beer,data = data.2))
anova2.out
## Analysis of Variance Table
## 
## Response: Responsetime
##               Df  Sum Sq Mean Sq F value   Pr(>F)    
## caffeine       1  1.2708  1.2708  4.9498 0.031269 *  
## beer           1  3.4080  3.4080 13.2748 0.000706 ***
## caffeine:beer  1  0.0083  0.0083  0.0322 0.858395    
## Residuals     44 11.2960  0.2567                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

All of these numbers should look familiar.

19.9 A 2x3 Factorial Example

Factorial designs let you study the effects of one factor across multiple levels of another factor (or factors). In this made-up example, we’ll study the effect of two kinds of diets: “Atkins” and “pea soup” on the the body mass index (BMI) for three exercise levels: “none”, “a little”, and “a lot”. 12 subjects participated in each of the 2x3 = 6 groups for a total of 72 subjects. Here’s how to load in the data from the course website and order the levels:

data.3 <- read.csv('http://courses.washington.edu/psy524a/datasets/DietExercise.csv')
data.3$Diet <- factor(data.3$Diet,levels = c('Atkins','pea soup'))
data.3$Exercise <- factor(data.3$Exercise,levels = c('none','a little','a lot'))

The data is stored in ‘long format’ like this:

Table 19.10:
BMI Diet Exercise
28.43 Atkins none
24.07 Atkins none
23.08 Atkins none
24.66 Atkins none
20.17 Atkins none
22.45 Atkins none
31.60 Atkins none
14.71 Atkins none
28.43 Atkins none
20.83 Atkins none
23.47 Atkins none
29.78 Atkins none
21.95 Atkins a little
25.31 Atkins a little
25.87 Atkins a little
26.42 Atkins a little
27.27 Atkins a little
24.42 Atkins a little
27.84 Atkins a little
26.43 Atkins a little
20.48 Atkins a little
28.37 Atkins a little
27.28 Atkins a little
23.56 Atkins a little
18.81 Atkins a lot
26.54 Atkins a lot
26.73 Atkins a lot
27.40 Atkins a lot
23.61 Atkins a lot
25.17 Atkins a lot
23.70 Atkins a lot
25.90 Atkins a lot
27.84 Atkins a lot
23.81 Atkins a lot
20.38 Atkins a lot
17.62 Atkins a lot
26.07 pea soup none
22.24 pea soup none
23.24 pea soup none
30.44 pea soup none
26.12 pea soup none
24.33 pea soup none
30.45 pea soup none
25.20 pea soup none
20.71 pea soup none
25.23 pea soup none
21.17 pea soup none
21.26 pea soup none
22.76 pea soup a little
19.30 pea soup a little
19.09 pea soup a little
25.37 pea soup a little
18.69 pea soup a little
25.22 pea soup a little
18.01 pea soup a little
24.08 pea soup a little
19.78 pea soup a little
18.76 pea soup a little
26.30 pea soup a little
23.59 pea soup a little
16.79 pea soup a lot
14.10 pea soup a lot
17.66 pea soup a lot
25.11 pea soup a lot
18.57 pea soup a lot
19.57 pea soup a lot
21.62 pea soup a lot
18.82 pea soup a lot
19.80 pea soup a lot
20.66 pea soup a lot
17.66 pea soup a lot
11.17 pea soup a lot

Here’s a plot of the means with error bars:

Here we’ve defined the row factor to be Diet and the column factor to be Exercise.

From the graph it looks like the Atkins diet has little effect on BMI across exercise levels, but the pea soup diet does seem to lead to lower BMI for higher levels of exercise.

The math behind running a 2-factor ANOVA on this design is the same as for the 2x2 example above.

19.9.1 Calculating \(MS_{wc}\) for the 2x3 example

\(SS_{wc}\) and \(MS_{wc}\) are calculated the same way as for the 2x2 example. We sum up the sums of squared deviation of each score from the mean of the cell that each score came from. Here’s the table of the SS for each of the cells:

Table 19.11: \(SS_{wc}\) for the 2x3 example
none a little a lot
Atkins 244.2819 64.5777 126.1921
pea soup 118.9687 104.9325 138.4401

\(SS_{wc}\) is therefore

\[244.282+118.969+64.5777+104.933+126.192+138.44 = 797.393\]

Again, each cell contributes n- 1 degrees of freedom to \(SS_{wc}\), so the degrees of freedom for all cells is N-k, where k is the total number of cells and N is the total sample size (n \(\times\) k):

\[df_{wc} = 72 - 6 = 66\]

Mean-squared within-cell is:

\[MS_{wc} = \frac{SS_{wc}}{df_{wc}} = \frac{797.3929}{66} = 12.0817\]

Remember this number: \(MS_{wc}\) = 12.0817. It will be the common denominator for all of the F-tests for this data set.

Like for the 2x2 example, the main effects are done by computing the sums of squared deviation from the rows and column means from the grand mean, weighted by the total number of subjects contributing to each row or column mean.

Here’s a table of the means, along with the row and column means:

Table 19.12: 2x3 Example: Row and Column Means
none a little a lot means
Atkins 24.3067 25.4333 23.9592 24.5664
pea soup 24.7050 21.7458 18.4608 21.6372
means 24.5058 23.5896 21.2100 23.1018

19.9.2 Main effect for columns (Exercise)

The main effect of columns (Exercise), the sums of squared deviation from the grand mean is:

\[\small (24.5058-23.1018)^2+(23.5896-23.1018)^2+(21.21-23.1018)^2=1.9712+0.2379+3.5789 = 5.788\]

Since there are 2 levels for the row factor, there are 12 \(\times\) 2 = 24 subjects for each column mean. So \(SS_{col}\) is:

\[SS_{C} = (24)(5.7881) = 138.9156\]

There are 3 means contributing to \(SS_{C}\), so the degrees of freedom is \(df_{C}\) = 2 -1 = 1

\(MS_{C}\) is therefore

\[\frac{SS_{C}}{df_{C}} = \frac{138.9156}{2} = 69.4578\]

The F-statistic for this main effect is \(MS_{C}\) divided by our common denominator, \(MS_{wc}\)

\[F = \frac{MS_{C}}{MS_{wc}} = \frac{69.4578}{12.0817} = 5.7490\]

The p-value for the main effect of Exercise is:

1-pf(5.7490,2,66)
## [1] 0.004993023

19.9.3 Main effect for rows (Diet)

The main effect of rows (Diet), the sums of squared deviation from the grand mean is:

\[(24.5664-23.1018)^2+(21.6372-23.1018)^2=2.1451+2.1451 = 4.2902\]

This time, since there are 3 levels for the column factor, there are 12 \(\times\) 3 = 36 subjects for each row mean. So \(SS_{row}\) is:

\[SS_{R} = (36)(4.2901) = 154.4403\]

There are 2 means contributing to \(SS_{R}\), so the degrees of freedom is \(df_{R}\) = 3 -1 = 2

\(MS_{R}\) is therefore

\[\frac{SS_{R}}{df_{R}} = \frac{154.4403}{1} = 154.4403\]

The F-statistic for this main effect is \(MS_{R}\) divided by our common denominator, \(MS_{wc}\)

\[F = \frac{MS_{R}}{MS_{wc}} = \frac{154.4403}{12.0817} = 12.7830\]

The p-value for the main effect of Diet is:

1-pf(12.7830,1,66)
## [1] 0.0006602539

19.9.4 Interaction Between Diet and Exercise

The total sums of squares is 1200.2365, so we can calculate \(SS_{RXC}\) by;

\[SS_{RXC} = SS_{total}-SS_{R}-SS_{C}-SS_{wc} = 1200.2365 - 154.4403 - 138.9156 - 797.3929 = 109.4877\]

The degrees of freedom for this interaction term is:

\[df_{RXC} = (n_{rows}-1)(n_{cols}-1) = (2-1)(3-1) = 2\]

The mean-squared error for the interaction is

\[\frac{SS_{RXC}}{df_{RXC}} = \frac{109.4877}{2} = 54.7438\]

Using \(SS_{wc}\) for the denominator again, the F-statistic is:

\[F = \frac{MS_{RXC}}{MS_{wc}} = \frac{54.7438}{12.0817} = 4.5311\]

The p-value for the interaction is:

1-pf(4.5311,2,66)
## [1] 0.01432344

Here’s how to run the 2-factor ANOVA in R:

anova3.out <- anova(lm(BMI ~ Diet*Exercise,data = data.3))
anova3.out
## Analysis of Variance Table
## 
## Response: BMI
##               Df Sum Sq Mean Sq F value    Pr(>F)    
## Diet           1 154.44 154.440 12.7830 0.0006603 ***
## Exercise       2 138.92  69.458  5.7490 0.0049930 ** 
## Diet:Exercise  2 109.49  54.744  4.5311 0.0143230 *  
## Residuals     66 797.39  12.082                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Using APA format we’d say:

There is a main effect of Diet. F(1,66) = 12.7830, p = 0.0007.

There is a main effect of Exercise. F(2,66) = 5.7490, p = 0.005.

There is a significant interaction between Diet and Exercise. F(2,66) = 4.5311, p = 0.0143.

All three hypothesis tests are statistically significant, but this doesn’t really tell us much about what’s driving the effects of Diet and Exercise on BMI. As discussed above when we plotted the results, what seems to be happening is that only the subjects on the pea soup diet are influenced by Exercise.

It might make sense, instead, to run two ANOVAs on the data, one for the Atkins diet and one for the pea soup diet. We expect to find that most of variability across the means is driven by the effect of Exercise for the pea soup dieters.

19.10 Simple Effects

Running ANOVAs on subsets of the data like this is called a simple effects analysis. Running separate ANOVAs on each level of Diet is studying the simple effects of Exercise by Diet. I remember this by replacing the word ‘by’ with ‘for every level of’. That is, this simple effect analysis is studying the effect of Exercise on BMI for every level of Diet.

Running simple effects is almost as simple as running separate ANOVA’s for each level of Diet. In fact, let’s start there. We can use the subset function to pull out the data for each of the two diets”:

# Atkins diet:
anova3.out.Atkins <- anova(lm(BMI ~ Exercise,data = subset(data.3,Diet == 'Atkins') ))
anova3.out.Atkins
## Analysis of Variance Table
## 
## Response: BMI
##           Df Sum Sq Mean Sq F value Pr(>F)
## Exercise   2  14.25  7.1266  0.5406 0.5875
## Residuals 33 435.05 13.1834
# pea soup diet:
anova3.out.peasoup <- anova(lm(BMI ~ Exercise,data = subset(data.3,Diet == 'pea soup') ))
anova3.out.peasoup
## Analysis of Variance Table
## 
## Response: BMI
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## Exercise   2 234.15  117.08  10.662 0.0002679 ***
## Residuals 33 362.34   10.98                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There’s one more think we can do to increase the power of these tests. If we assume homogeneity of variance, then it makes sense to use the denominator of the two-factor ANOVA, \(MS_{wc}\) = 12.0817, for both of these F-tests since this should be a better estimate of the population variance - and it has a larger df which helps with power.

R doesn’t have a function for simple effects, but it’s not hard to do it by hand. All we need to do is pull out \(MS_{wc}\) from the output from the original two factor ANOVA and recalculate our F-statistics and p-values:

# from the 2-factor ANOVA, MS_wc is the fourth mean squared in the list
MS_wc <- anova3.out$`Mean Sq`[4]
df_wc <- anova3.out$Df[4]

# Atkins
MS_Atkins <- anova3.out.Atkins$`Mean Sq`[1]
df_Atkins <- anova3.out.Atkins$Df[1]
F_Atkins <- MS_Atkins/MS_wc
p_Atkins <- 1-pf(F_Atkins,df_Atkins,df_wc)

# pea soup
MS_peasoup <- anova3.out.peasoup$`Mean Sq`[1]
df_peasoup <- anova3.out.peasoup$Df[1]
F_peasoup <- MS_peasoup/MS_wc
p_peasoup <- 1-pf(F_peasoup,df_peasoup,df_wc)

sprintf('Atkins:    F(%d,%d)= %5.4f,p = %5.6f',df_Atkins,df_wc,F_Atkins,p_Atkins)
## [1] "Atkins:    F(2,66)= 0.5899,p = 0.557297"
sprintf('pea soup:  F(%d,%d)= %5.4f,p = %5.6f',df_peasoup,df_wc,F_peasoup,p_peasoup)
## [1] "pea soup:  F(2,66)= 9.6903,p = 0.000204"

The p-values didn’t change much when substituting \(MS_{wc}\), but every little bit of power helps.

19.11 Additivity of Simple Effects

These two simple effects have an interesting relation with the three tests from the original 2-factor ANOVA. It turns out that the SS associated with these two simple effects add up the SS associated with the main effects of Exercise plus the SS for the interaction between Diet and Exercise. In math terms:

\[ SS_{exercise by Atkins diet} + SS_{exercise by pea soup diet} = SS_{exercise} + SS_{exerciseXdiet}\]

You can see that here:

# Adding SS's for the two simple effects of Exercise by Diet:

anova3.out.Atkins$`Sum Sq`[1] + anova3.out.peasoup$`Sum Sq`[1]
## [1] 248.4032
# Adding SS's for the main effect of Diet and the interaction 
# (second and third in the list of SS's)

sum(anova3.out$`Sum Sq`[c(2,3)])
## [1] 248.4032

Note also that the degrees of freedom for both sets add up to 4

The pie charts below show how \(SS_{total}\) is divided up into the different sums of squares for the standard analysis (main effects and interaction) and for the simple effects analysis.

Our simple effects analysis is just another way of breaking down the SS associated with the main effects for columns and the interaction, since the main effect for Diet has no influence on this set of simple effects. You can visualize this by thinking about what would happen to our results of the shapes of the two effects were the same, but if one curve were to be shifted above or below another. For example if our results had come out like this, with an overall larger BMI for the Atkins diet:

Our simple effects analysis of columns by row would have come out the same. This is because shifting up the Atkins group only increased the main effect of Diet, and not the main effects of Exercise by Diet, the main effect of Exercise, or the interaction between Exercise and Diet.