Chapter 18 Two Factor ANOVA

The Two-factor ANOVA is a hypothesis test on means with a ‘crossed design’ which has two independent variables. Observations are made for all combinations of levels for each of the two variables.

You can find this test in the flow chart here:

We’ll build up to the two-factor ANOVA by starting with what we already know - a 1-factor ANOVA experiment.

18.1 1-factor ANOVA Beer and Caffeine

Suppose you want to study the effects of beer and caffeine on response times for a simple reaction time task. One way to do this is to divide subjects in to four groups: a control group, a group with caffeine (and no beer), a group with beer (and without caffeine), and a lucky group with both beer and caffeine.

We’ll load in this existing data from the course website:

data.1 <- read.csv('http://courses.washington.edu/psy524a/datasets/BeerCaffeineANOVA1.csv')

Here are the summary statistics from this data set:

The four conditions, or ‘levels’ are “no beer, no caffeine”, “no beer, caffeine”, “beer, no caffeine”, and “beer, caffeine”

Table 18.1:
	mean	n	sd	sem
no beer, no caffeine	1.650833	12	0.6725455	0.1941472
no beer, caffeine	1.351667	12	0.4829235	0.1394080
beer, no caffeine	2.210000	12	0.3476153	0.1003479
beer, caffeine	1.858333	12	0.4696194	0.1355675

Here’s a plot of the means with error bars as the standard error of the mean:

It looks like the means do differ from one another. Here’s the ‘omnibus’ ANOVA result:

Table 18.2:
	df	SS	MS	F	p
Between	3	4.687	1.5623	6.0856	p = 0.0015
Within	44	11.296	0.2567
Total	47	15.983

Yup, some combination of beer and caffeine have a significant effect on response times.

In this design we actually manipulated two factors - beer, and caffeine. It’d be nice to be able to look at these two ‘factors’ separately.

18.2 Effect of Beer

Consider the effect of Beer on reaction times. We could just run a t-test on the ‘no beer, no caffeine’ condition vs. the ‘beer, no caffeine’ condition. But we also can compare the ‘no beer, caffeine’ condition to the ‘beer, caffeine’ condition. Or, perhaps even better, we can combine these two comparisons. This combined analysis can be done with a contrast with the following weights:

Table 18.3:
	no beer, no caffeine	no beer, caffeine	beer, no caffeine	beer, caffeine
Effect of Beer	1	1	-1	-1

This measures the effect of beer averaging across the two caffeine conditions.

The calculations for this contrast yields:

\[\psi = (1)( 1.65) + (1)( 1.35) + (-1)( 2.21) + (-1)( 1.86) = -1.0658\]

\[MS_{contrast} = \frac{(-1.0658)^{2}}{\frac{(1)^{2}}{12} + \frac{(1)^{2}}{12} + \frac{(-1)^{2}}{12} + \frac{(-1)^{2}}{12}} = 3.4080\]

\[F(1,44) = \frac{3.4080}{0.2567} = 13.2748\]

\[p = 0.0007\]

It looks like there’s a significant effect of beer on response times. Since we’re subtracting the beer from the without beer conditions, our negative value of \(\psi\) indicates that the responses times for beer are greater than for without Beer. Beer increases response times.

18.3 Effect of Caffeine

To study the effect of caffeine, averaging across the two beer conditions, we use this contrast, which is independent of the first one:

Table 18.4:
	no beer, no caffeine	no beer, caffeine	beer, no caffeine	beer, caffeine
Effect of Caffeine	1	-1	1	-1

The calculations for this contrast yields:

\[\psi = (1)( 1.65) + (-1)( 1.35) + (1)( 2.21) + (-1)( 1.86) = 0.6508\]

\[MS_{contrast} = \frac{(0.6508)^{2}}{\frac{(1)^{2}}{12} + \frac{(-1)^{2}}{12} + \frac{(1)^{2}}{12} + \frac{(-1)^{2}}{12}} = 1.2708\]

\[F(1,44) = \frac{1.2708}{0.2567} = 4.9498\]

\[p = 0.0313\]

Caffeine has a significant effect on response times - this time \(\psi\) is positive, so response times for without caffeine are greater than for with caffeine. Caffeine reduces response times.

18.4 The Third Contrast: Interaction

For four levels or groups, there should be three independent contrast. Here’s the third contrast:

Table 18.5:
	no beer, no caffeine	no beer, caffeine	beer, no caffeine	beer, caffeine
Beer X Caffeine	1	-1	-1	1

What does that third contrast measure? Symbolically, the contrast combines the conditions as:

[without beer without caffeine] - [without beer with caffeine] - [with beer without caffeine] + [with beer with caffeine]

Rearanging the terms as a difference of differences:

([with beer without caffeine] - [without beer without caffeine]) - ([with beer with caffeine]-[without beer with caffeine])

The first difference is the effect of beer without caffeine. The second difference is the effect of beer with caffeine. The difference of the differences is a measure of how the effect of beer changes by adding caffeine. In statistical terms, we call this the interaction between the effects of beer and caffeine on response times. Interactions are labeled with an ‘X’, so this contrast is labeled as ‘Beer X Caffeine’.

You might have noticed the parallel between this and the \(\chi^{2}\) test of independence. This is the same concept, but for means rather than frequencies.

The results of the F-tests for this third contrast is:

\[\psi = (1)( 1.65) + (-1)( 1.35) + (-1)( 2.21) + (1)( 1.86) = -0.0525\]

\[MS_{contrast} = \frac{(-0.0525)^{2}}{\frac{(1)^{2}}{12} + \frac{(-1)^{2}}{12} + \frac{(-1)^{2}}{12} + \frac{(1)^{2}}{12}} = 0.0083\]

\[F(1,44) = \frac{0.0083}{0.2567} = 0.0322\]

\[p = 0.8584\]

We fail to reject \(H_{0}\), so there is no significant interaction between the effects of beer and caffeine on response times. This means that beer effectively increases response times the same amount, regardless of caffeine. Conversely, caffeine reduces response times effectively the same amount with or without beer. Notice the use of the word ‘effectively’ here. We should be careful about saying that ‘beer increases response times the same amount, regardless of caffeine’ because this isn’t true. There is a slight numerical difference, but it is not statistically significant.

18.5 Partitioning \(SS_{between}\)

Recall that for a 1-factor ANOVA, \(SS_{total}\) is broken down in to two parts \(SS_{within}\) and \(SS_{between}\):

In the chapter section 17.3.1 on APriori and post-hoc tests we discussed how the sums of squared for independent contrasts is a way of breaking down the total variability between the means, \(SS_{between}\). The same is true here for our three orthogonal contrasts. Summing up the three \(SS_{contrast}\) values gives us:

\[3.408+1.2708+0.0083 = 4.6871 = SS_{between}\]

So the three contrasts have partitioned the total variability between the means into three separate tests - each telling us something different about what is driving the significance of the ‘omnibus’ F-test. If we call the sums-of-squares for each of the three contrasts \(SS_{beer}\), \(SS_{caffeine}\), and \(SS_{beerXcaffeine}\) (where the ‘\(X\)’ means ’interaction), we can expand the above diagram to this:

This experiment has what is called a ‘factorial design’, where there are conditions for each combination of levels for the two factors of beer and caffeine. This example is a ‘balanced design’, which means that the sample sizes are the same for all conditions.

A standard way to analyze a factorial design is to break the overall variability between the means into separate hypothesis tests - a main effect for each factor, and their interactions. In this section we’ll show how treating the same data that we just discussed as a 2-factor ANOVA gives us the exact same result as treating the same results as a 1-factor ANOVA with three contrasts.

18.6 2-Factor ANOVA

I’ve saved the same data set but in a way that’s ready to be analyzed as a factorial design experiment. We’ll load it in here:

data.2 <- read.csv('http://courses.washington.edu/psy524a/datasets/BeerCaffeineANOVA2.csv')
head(data.2)

##   Responsetime    caffeine    beer
## 1         2.24 no caffeine no beer
## 2         1.62 no caffeine no beer
## 3         1.48 no caffeine no beer
## 4         1.70 no caffeine no beer
## 5         1.06 no caffeine no beer
## 6         1.39 no caffeine no beer

The data format has the same ‘ResponseTime’ column, but now it has two columns instead of one that define which condition for each measurement. The ‘caffeine’ column has two levels: ‘caffeine’ and ‘no caffeine’. Similarly the ‘beer’ column has two levels ‘beer’ and ‘no beer’. This way of storing the data is called ‘long format’, where which each row corresponds to a single observation.

This experiment is called a 2x2 factorial design because each of the two factors has two levels. We can summarize the results in the form of matrices with rows and columns corresponding to the two factors. We’ll set the ‘row factor’ as ‘caffeine’ and the ‘column factor’ as ‘beer’. That is, ‘beer’ varies across the rows and ‘caffeine’ varies across the column. Here’s the 2x2 table for the means:

Table 18.6: Means
	no beer	beer
no caffeine	1.6508	2.2100
caffeine	1.3517	1.8583

Instead of bar graphs, it’s common to plot results of factorial designs as data points with lines connecting them. By default, I plot the column factor along the x-axis and define the row factor in the legend. There are various ways of doing this in R. Here’s an example for our data. It requires both ‘ggplot2’ and the ‘dplyr’ libraries. Both are part of the ‘tidyverse’ package.

# Do this to avoid a stupid useless error message
options(dplyr.summarise.inform = FALSE)  

# order the levels for the two factors (alphabetical by default)
data.2$caffeine <- factor(data.2$caffeine,levels = c('no caffeine','caffeine'))
data.2$beer <- factor(data.2$beer,levels = c('no beer','beer'))

# Make a table (tibble) with generic names
summary.table <- data.2 %>%
  dplyr::group_by(caffeine,beer) %>%
  dplyr::summarise(
    m = mean(Responsetime),
    sem = sd(Responsetime)/sqrt(length(Responsetime))
  )

  # plot with error bars, replacing generic names with specific names
  ggplot(summary.table, aes(beer, m)) +
    geom_errorbar(
      aes(ymin = m-sem, ymax = m+sem, color = caffeine),
      position = position_dodge(0), width = 0.5)+
    geom_line(aes(group = caffeine,color = caffeine)) +
    geom_point(aes(group = caffeine,color = caffeine),size = 5) +
    scale_color_manual(values = rainbow(2)) +
    xlab('beer') +
    ylab('Response Time (s)') +
    theme_bw()

18.7 Within-Cell Variance (\(MS_{wc}\))

The three F-tests for a 2-factor ANOVA will use the same within-cell mean-squared error as the denominator. This is calculated the same way as for the 1-way ANOVA. We first add up the sums of squares for each condition.

The sums of squares within each group is called ‘\(SS_{wc}\)’ where ‘wc’ means ‘within cell’ since we’re now talking about cells in a matrix. Here’s the table for \(SS_{wc}\):

Table 18.7: \(SS_{wc}\)
	no beer	beer
no caffeine	4.9755	1.3292
caffeine	2.5654	2.4260

\(SS_{wc}\) is the sum of these individual within-cell sums of squares:

\[SS_{wc} = 4.9755+2.5654+1.3292+2.426 = 11.2961\]

Each cell contributes n-1 degrees of freedom to \(SS_{wc}\), so the degrees of freedom for all cells is N-k, where k is the total number of cells and N is the total sample size (n \(\times\) k):

\[df_{wc} = 48 - 4 = 44\]

Mean-squared error is, as always, \(\frac{SS}{df}\):

\[MS_{wc} = \frac{SS_{wc}}{df_{wc}} = \frac{11.296}{44} = 0.2567\]

This is the same value and df as \(MS_{w}\) from above when we treated the same data as a 1-factor ANOVA design.

The three contrasts that we used for the 1-factor ANOVA example correspond to what we call ‘main effects’ for the factors and the ‘interaction’ between the factors. To calculate the main effects by hand we need to calculate the means across the rows and columns of our factors. Here’s a table with the row and sum means in the ‘margins’:

Table 18.8: Row and Column Means
	no beer	beer	means
no caffeine	1.6508	2.2100	1.9304
caffeine	1.3517	1.8583	1.6050
means	1.5012	2.0342	1.7677

The bottom-right number is the mean of the means, which is the grand mean (\(\overline{\overline{X}}\) = 1.7677)

18.7.1 Main Effect for Columns (Beer)

Calculating main effects is lot like calculating \(SS_{between}\) for the 1-factor ANOVA. For the main effect for columns, we calculate the sums of squared deviations of the column means from the grand mean, and scale it by the number of samples that contributed to each column mean. For our example, the sums of square deviations is:

\[(1.5012-1.7677)^2+(2.0342-1.7677)^2=0.071+0.071 = 0.142\]

There are \(2 \times 12 = 24\) samples for each column mean, so the sums of squared for the columns, called \(SS_{C}\) is

\[SS_{C} = (24)(0.142) = 3.408\]

Since 2 means contributing to \(SS_{R}\), so the degrees of freedom is \(df_{R}\) = 2 -1 = 1

\(MS_{C}\) is therefore

\[\frac{SS_{C}}{df_{C}} = \frac{3.4080}{1} = 3.4080\]

The F-statistic for this main effect is \(MS_{C}\) divided by our common denominator, \(MS_{wc}\)

\[F = \frac{MS_{C}}{MS_{wc}} = \frac{3.4080}{0.2567} = 13.2748\]

We can calculate the p-value for this main effect using pf:

1-pf(13.2748,1,44)

## [1] 0.0007060154

Notice that the F and p-values are the same as for the first contrast in the 1-way ANOVA above. If you work out the algebra you’ll find that the math is the same. The main effect in a multi-factorial ANOVA is exactly the same as the appropriate contrast in a 1-factor ANOVA.

18.7.2 Main Effect for Rows (Caffeine)

The calculations for finding the main effect of rows (Caffeine) on response times is completely analogous to finding the main effect for columns. We use our row means in the table above, which are the averages across the two beer conditions.

The sums of squared deviations for the means for rows is:

\[(1.9304-1.7677)^2+(1.605-1.7677)^2=0.0265+0.0265 = 0.053\]

There are \(2 \times 12 = 24\) samples for each row mean, so the sums of squared for the row, called \(SS_{R}\) is

\[SS_{R} = (24)(0.0529) = 1.2708\]

There are 2 means contributing to \(SS_{R}\), so the degrees of freedom is \(df_{R}\) = 2 -1 = 1

\(MS_{R}\) is therefore

\[\frac{SS_{R}}{df_{R}} = \frac{1.2708}{1} = 1.2708\]

The F-statistic for this main effect is \(MS_{R}\) divided by our common denominator, \(MS_{wc}\)

\[F = \frac{MS_{R}}{MS_{wc}} = \frac{1.2708}{0.2567} = 4.9498\]

The p-value for the main effect of Beer is:

1-pf(4.9498,1,44)

## [1] 0.03126914

18.7.3 Interaction Between Beer and Caffeine

The third contrast in the 1-factor ANOVA measured the differential effect of caffeine on response times across the two beer conditions (or vice versa). Recall that for a 1-factor ANOVA the sums of squares associated with three orthogonal conditions adds up to \(SS_{between}\) for four groups. Also, recall that \(SS_{between} + SS_{within} = SS_{total}\).

The easiest way to calculate the sums of square value for interaction is to appreciate that

\[SS_{total} = SS_{caffeine} + SS_{beer} + SS_{caffeineXbeer} + SS_{wc}\]

The total sums of squares is \(SS_{total}\) = 15.983.

Therefore,

\[SS_{caffeineXbeer} = SS_{total} - (SS_{wc} + SS_{caffeine} + SS_{beer})\]

\[SS_{RXC} = SS_{total}-SS_{R}-SS_{C}-SS_{wc} = 15.9830 - 1.2708 - 3.4080 - 11.2960 = 0.0083\]

The degrees of freedom for this interaction term is (\(n_{rows}\)-1)*(\(n_{cols}\)-1):

\[df_{RXC} = (n_{rows}-1)(n_{cols}-1) = (2-1)(2-1) = 1\]

So the mean-squared error for the interaction is

\[\frac{SS_{RXC}}{df_{RXC}} = \frac{0.0083}{1} = 0.0083\]

Using \(SS_{wc}\) for the denominator again, the F-statistic is:

The p-value for the interaction is:

1-pf(0.0322,1,44)

## [1] 0.8584132

There is not a significant interaction between caffeine and beer on response times. Compare these numbers to the results of the third contrast in the 1-factor ANOVA above.

We typically summarize our calculations and results in a table like this:

Table 18.9:
	df	SS	MS	F	p
caffeine	1	1.2708	1.2708	4.9498	p = 0.0313
beer	1	3.4080	3.408	13.2748	p = 0.0007
interaction	1	0.0083	0.0083	0.0322	p = 0.8584
wc	44	11.2960	0.2567
Total	47	15.9830

Using APA format we state, for our three tests:

There is a main effect of caffeine. F(1,44) = 4.9498, p = 0.0313.

There is a main effect of beer. F(1,44) = 13.2748, p = 0.0007.

There is not a significant interaction between caffeine and beer. F(1,44) = 0.0322, p = 0.8584.

You might have noticed that didn’t use any correction for familywise error for these three tests. There is a general consensus that the main effects and the interaction do not require familywise error correction. But if we treat the same data as a 1-factor design with three planned contrasts, we should apply error correction (like Bonferroni) even though the math and p-values are the same. If you find discrepancies like this baffling you are not alone.

18.8 The two-factor ANOVA in R

Conducting a two-factor ANOVA in R is a lot like for 1-factor ANOVA. We’ll use the lm function and pass it through the anova function to get our table and statistics. The difference is the definition of the formula. Here we’ll use: Responsetime ~ caffieene*beer. The use of ’*’ is the way to ask R to conduct not only the main effect of caffeine and beer but also their interaction:

anova2.out <- anova(lm(Responsetime ~ caffeine*beer,data = data.2))
anova2.out

## Analysis of Variance Table
## 
## Response: Responsetime
##               Df  Sum Sq Mean Sq F value   Pr(>F)    
## caffeine       1  1.2708  1.2708  4.9498 0.031269 *  
## beer           1  3.4080  3.4080 13.2748 0.000706 ***
## caffeine:beer  1  0.0083  0.0083  0.0322 0.858395    
## Residuals     44 11.2960  0.2567                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

All of these numbers should look familiar.

18.9 A 2x3 Factorial Example

Factorial designs let you study the effects of one factor across multiple levels of another factor (or factors). In this made-up example, we’ll study the effect of two kinds of diets: “Atkins” and “pea soup” on the systolic blood pressure (BP, in mm Hg) for three exercise levels: “none”, “a little”, and “a lot”. A systolic blood pressure less than 120 mm Hg is considered normal. 20 subjects participated in each of the 2x3 = 6 groups for a total of 120 subjects. Here’s how to load in the data from the course website and order the levels:

# data.3 <- read.csv('http://courses.washington.edu/psy524a/datasets/DietExercise.csv')
# data.3$Diet <- factor(data.3$Diet,levels = c('Atkins','pea soup'))
# data.3$Exercise <- factor(data.3$Exercise,levels = c('none','a little','a lot'))

The data is stored in ‘long format’ like this:

Table 18.10:
BP	Diet	Exercise
125.6032	Atkins	none
137.7546	Atkins	none
122.4656	Atkins	none
158.9292	Atkins	none
139.9426	Atkins	none
122.6930	Atkins	none
142.3114	Atkins	none
146.0749	Atkins	none
143.6367	Atkins	none
130.4192	Atkins	none
157.6767	Atkins	none
140.8476	Atkins	none
125.6814	Atkins	none
101.7795	Atkins	none
151.8740	Atkins	none
134.3260	Atkins	none
134.7571	Atkins	none
149.1575	Atkins	none
147.3183	Atkins	none
143.9085	Atkins	none
148.7847	Atkins	a little
146.7320	Atkins	a little
136.1185	Atkins	a little
105.1597	Atkins	a little
144.2974	Atkins	a little
134.1581	Atkins	a little
132.6631	Atkins	a little
112.9387	Atkins	a little
127.8277	Atkins	a little
141.2691	Atkins	a little
155.3802	Atkins	a little
133.4582	Atkins	a little
140.8151	Atkins	a little
134.1929	Atkins	a little
114.3441	Atkins	a little
128.7751	Atkins	a little
129.0857	Atkins	a little
134.1103	Atkins	a little
151.5004	Atkins	a little
146.4476	Atkins	a little
132.5321	Atkins	a lot
131.1996	Atkins	a lot
145.4545	Atkins	a lot
143.3499	Atkins	a lot
124.6687	Atkins	a lot
124.3876	Atkins	a lot
140.4687	Atkins	a lot
146.5280	Atkins	a lot
133.3148	Atkins	a lot
148.2166	Atkins	a lot
140.9716	Atkins	a lot
125.8196	Atkins	a lot
140.1168	Atkins	a lot
118.0596	Atkins	a lot
156.4954	Atkins	a lot
164.7060	Atkins	a lot
129.4917	Atkins	a lot
119.3380	Atkins	a lot
143.5458	Atkins	a lot
132.9742	Atkins	a lot
171.0243	pea soup	none
134.4114	pea soup	none
145.3461	pea soup	none
135.4200	pea soup	none
123.8509	pea soup	none
137.8319	pea soup	none
107.9256	pea soup	none
156.9833	pea soup	none
137.2988	pea soup	none
167.5892	pea soup	none
142.1326	pea soup	none
124.3508	pea soup	none
144.1609	pea soup	none
120.9885	pea soup	none
116.1955	pea soup	none
139.3717	pea soup	none
128.3506	pea soup	none
135.0166	pea soup	none
136.1151	pea soup	none
126.1572	pea soup	none
121.4700	pea soup	a little
127.9723	pea soup	a little
147.6713	pea soup	a little
107.1465	pea soup	a little
138.9092	pea soup	a little
134.9943	pea soup	a little
145.9465	pea soup	a little
125.4372	pea soup	a little
135.5503	pea soup	a little
134.0065	pea soup	a little
121.8622	pea soup	a little
148.1180	pea soup	a little
147.4060	pea soup	a little
140.5032	pea soup	a little
153.8025	pea soup	a little
138.3773	pea soup	a little
110.8511	pea soup	a little
121.4010	pea soup	a little
111.6308	pea soup	a little
122.8990	pea soup	a little
110.6945	pea soup	a lot
120.6317	pea soup	a lot
106.3362	pea soup	a lot
122.3704	pea soup	a lot
110.1812	pea soup	a lot
146.5093	pea soup	a lot
130.7506	pea soup	a lot
133.6526	pea soup	a lot
125.7628	pea soup	a lot
145.2326	pea soup	a lot
110.4640	pea soup	a lot
113.0753	pea soup	a lot
141.4842	pea soup	a lot
110.2396	pea soup	a lot
116.8893	pea soup	a lot
114.1079	pea soup	a lot
115.2001	pea soup	a lot
115.8133	pea soup	a lot
127.4128	pea soup	a lot
117.3400	pea soup	a lot

Here’s a plot of the means with error bars:

Here we’ve defined the row factor to be Diet and the column factor to be Exercise.

From the graph it looks like the Atkins diet has little effect on systolic blood pressure across exercise levels, but the pea soup diet does seem to lead to lower BP for higher levels of exercise.

The math behind running a 2-factor ANOVA on this design is the same as for the 2x2 example above.

18.9.1 Calculating \(MS_{wc}\) for the 2x3 example

\(SS_{wc}\) and \(MS_{wc}\) are calculated the same way as for the 2x2 example. We sum up the sums of squared deviation of each score from the mean of the cell that each score came from. Here’s the table of the SS for each of the cells:

Table 18.11: \(SS_{wc}\) for the 2x3 example
	none	a little	a lot
Atkins	3565.484	3245.913	2802.910
pea soup	4715.587	3546.061	2834.728

\(SS_{wc}\) is therefore

\[3565.48+4715.59+3245.91+3546.06+2802.91+2834.73 = 20710.7\]

Again, each cell contributes n- 1 degrees of freedom to \(SS_{wc}\), so the degrees of freedom for all cells is N-k, where k is the total number of cells and N is the total sample size (n \(\times\) k):

\[df_{wc} = 120 - 6 = 114\]

Mean-squared within-cell is:

\[MS_{wc} = \frac{SS_{wc}}{df_{wc}} = \frac{2.0710683\times 10^{4}}{114} = 181.6727\]

Remember this number: \(MS_{wc}\) = 181.6727. It will be the common denominator for all of the F-tests for this data set.

Like for the 2x2 example, the main effects are done by computing the sums of squared deviation from the rows and column means from the grand mean, weighted by the total number of subjects contributing to each row or column mean.

Here’s a table of the means, along with the row and column means:

Table 18.12: 2x3 Example: Row and Column Means
	none	a little	a lot	means
Atkins	137.8579	134.9029	137.0820	136.6142
pea soup	136.5260	131.7978	121.7074	130.0104
means	137.1920	133.3503	129.3947	133.3123

18.9.2 Main effect for columns (Exercise)

The main effect of columns (Exercise), the sums of squared deviation from the grand mean is:

\[\small (137.192-133.312)^2+(133.35-133.312)^2+(129.395-133.312)^2=15.0521+0.0014+15.3476 = 30.4011\]

Since there are 2 levels for the row factor, there are 20 \(\times\) 2 = 40 subjects for each column mean. So \(SS_{col}\) is:

\[SS_{C} = (40)(30.4011) = 1216.032\]

There are 3 means contributing to \(SS_{C}\), so the degrees of freedom is \(df_{C}\) = 2 -1 = 1

\(MS_{C}\) is therefore

\[\frac{SS_{C}}{df_{C}} = \frac{1216.0320}{2} = 608.0160\]

The F-statistic for this main effect is \(MS_{C}\) divided by our common denominator, \(MS_{wc}\)

\[F = \frac{MS_{C}}{MS_{wc}} = \frac{608.0160}{181.6727} = 3.3468\]

The p-value for the main effect of Exercise is:

1-pf(3.3468,2,114)

## [1] 0.03868787

18.9.3 Main effect for rows (Diet)

The main effect of rows (Diet), the sums of squared deviation from the grand mean is:

\[(136.614-133.312)^2+(130.01-133.312)^2=10.9025+10.9025 = 21.805\]

This time, since there are 3 levels for the column factor, there are 20 \(\times\) 3 = 60 subjects for each row mean. So \(SS_{row}\) is:

\[SS_{R} = (60)(21.8051) = 1308.3198\]

There are 2 means contributing to \(SS_{R}\), so the degrees of freedom is \(df_{R}\) = 3 -1 = 2

\(MS_{R}\) is therefore

\[\frac{SS_{R}}{df_{R}} = \frac{1308.3198}{1} = 1308.3198\]

The F-statistic for this main effect is \(MS_{R}\) divided by our common denominator, \(MS_{wc}\)

\[F = \frac{MS_{R}}{MS_{wc}} = \frac{1308.3198}{181.6727} = 7.2015\]

The p-value for the main effect of Diet is:

1-pf(7.2015,1,114)

## [1] 0.008368909

18.9.4 Interaction Between Diet and Exercise

The total sums of squares is 2.4404637^{4}, so we can calculate \(SS_{RXC}\) by;

\[SS_{RXC} = SS_{total}-SS_{R}-SS_{C}-SS_{wc} = 24404.6372 - 1308.3198 - 1216.0320 - 20710.6827 = 1169.6028\]

The degrees of freedom for this interaction term is:

\[df_{RXC} = (n_{rows}-1)(n_{cols}-1) = (2-1)(3-1) = 2\]

The mean-squared error for the interaction is

\[\frac{SS_{RXC}}{df_{RXC}} = \frac{1169.6028}{2} = 584.8014\]

Using \(SS_{wc}\) for the denominator again, the F-statistic is:

\[F = \frac{MS_{RXC}}{MS_{wc}} = \frac{584.8014}{181.6727} = 3.2190\]

The p-value for the interaction is:

1-pf(3.2190,2,114)

## [1] 0.04365711

Here’s how to run the 2-factor ANOVA in R:

anova3.out <- anova(lm(BP ~ Diet*Exercise,data = data.3))
anova3.out

## Analysis of Variance Table
## 
## Response: BP
##                Df  Sum Sq Mean Sq F value   Pr(>F)   
## Diet            1  1308.3 1308.32  7.2015 0.008369 **
## Exercise        2  1216.0  608.02  3.3468 0.038689 * 
## Diet:Exercise   2  1169.6  584.80  3.2190 0.043658 * 
## Residuals     114 20710.7  181.67                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Using APA format we’d say:

There is a main effect of Diet. F(1,114) = 7.2015, p = 0.0084.

There is a main effect of Exercise. F(2,114) = 3.3468, p = 0.0387.

There is a significant interaction between Diet and Exercise. F(2,114) = 3.2190, p = 0.0437.

All three hypothesis tests are statistically significant, but this doesn’t really tell us much about what’s driving the effects of Diet and Exercise on BP. As discussed above when we plotted the results, what seems to be happening is that only the subjects on the pea soup diet are influenced by Exercise.

It might make sense, instead, to run two ANOVAs on the data, one for the Atkins diet and one for the pea soup diet. We expect to find that most of variability across the means is driven by the effect of Exercise for the pea soup dieters.

18.10 Simple Effects

Running ANOVAs on subsets of the data like this is called a simple effects analysis. Running separate ANOVAs on each level of Diet is studying the simple effects of Exercise by Diet. I remember this by replacing the word ‘by’ with ‘for every level of’. That is, this simple effect analysis is studying the effect of Exercise on BP for every level of Diet.

Running simple effects is almost as simple as running separate ANOVA’s for each level of Diet. In fact, let’s start there. We can use the subset function to pull out the data for each of the two diets”:

# Atkins diet:
anova3.out.Atkins <- anova(lm(BP ~ Exercise,data = subset(data.3,Diet == 'Atkins') ))
anova3.out.Atkins

## Analysis of Variance Table
## 
## Response: BP
##           Df Sum Sq Mean Sq F value Pr(>F)
## Exercise   2   93.9  46.939  0.2783 0.7581
## Residuals 57 9614.3 168.672

# pea soup diet:
anova3.out.peasoup <- anova(lm(BP ~ Exercise,data = subset(data.3,Diet == 'pea soup') ))
anova3.out.peasoup

## Analysis of Variance Table
## 
## Response: BP
##           Df  Sum Sq Mean Sq F value   Pr(>F)   
## Exercise   2  2291.8 1145.88  5.8862 0.004744 **
## Residuals 57 11096.4  194.67                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There’s one more think we can do to increase the power of these tests. If we assume homogeneity of variance, then it makes sense to use the denominator of the two-factor ANOVA, \(MS_{wc}\) = 181.6727, for both of these F-tests since this should be a better estimate of the population variance - and it has a larger df which helps with power.

R doesn’t have a function for simple effects, but it’s not hard to do it by hand. All we need to do is pull out \(MS_{wc}\) from the output from the original two factor ANOVA and recalculate our F-statistics and p-values:

# from the 2-factor ANOVA, MS_wc is the fourth mean squared in the list
MS_wc <- anova3.out$`Mean Sq`[4]
df_wc <- anova3.out$Df[4]

# Atkins
MS_Atkins <- anova3.out.Atkins$`Mean Sq`[1]
df_Atkins <- anova3.out.Atkins$Df[1]
F_Atkins <- MS_Atkins/MS_wc
p_Atkins <- 1-pf(F_Atkins,df_Atkins,df_wc)

# pea soup
MS_peasoup <- anova3.out.peasoup$`Mean Sq`[1]
df_peasoup <- anova3.out.peasoup$Df[1]
F_peasoup <- MS_peasoup/MS_wc
p_peasoup <- 1-pf(F_peasoup,df_peasoup,df_wc)

sprintf('Atkins:    F(%d,%d)= %5.4f,p = %5.6f',df_Atkins,df_wc,F_Atkins,p_Atkins)

## [1] "Atkins:    F(2,114)= 0.2584,p = 0.772759"

sprintf('pea soup:  F(%d,%d)= %5.4f,p = %5.6f',df_peasoup,df_wc,F_peasoup,p_peasoup)

## [1] "pea soup:  F(2,114)= 6.3074,p = 0.002523"

The p-values didn’t change much when substituting \(MS_{wc}\), but every little bit of power helps.

18.11 Additivity of Simple Effects

These two simple effects have an interesting relation with the three tests from the original 2-factor ANOVA. It turns out that the SS associated with these two simple effects add up the SS associated with the main effects of Exercise plus the SS for the interaction between Diet and Exercise. In math terms:

\[ SS_{exercise by Atkins diet} + SS_{exercise by pea soup diet} = SS_{exercise} + SS_{exerciseXdiet}\]

You can see that here:

# Adding SS's for the two simple effects of Exercise by Diet:

anova3.out.Atkins$`Sum Sq`[1] + anova3.out.peasoup$`Sum Sq`[1]

## [1] 2385.635

# Adding SS's for the main effect of Diet and the interaction 
# (second and third in the list of SS's)

sum(anova3.out$`Sum Sq`[c(2,3)])

## [1] 2385.635

Note also that the degrees of freedom for both sets add up to 4

The pie charts below show how \(SS_{total}\) is divided up into the different sums of squares for the standard analysis (main effects and interaction) and for the simple effects analysis.

Our simple effects analysis is just another way of breaking down the SS associated with the main effects for columns and the interaction, since the main effect for Diet has no influence on this set of simple effects. You can visualize this by thinking about what would happen to our results of the shapes of the two effects were the same, but if one curve were to be shifted above or below another. For example if our results had come out like this, with an overall higher systolic blood pressure for the Atkins diet:

Our simple effects analysis of columns by row would have come out the same. This is because shifting up the Atkins group only increased the main effect of Diet, and not the main effects of Exercise by Diet, the main effect of Exercise, or the interaction between Exercise and Diet.