Psych 218 – Third SPSS Tutorial

Multiple Comparisons


You are researching treatments for substance abuse, and your main outcome of interest is the number of days in which participants use substances (per month).  You randomly assign 40 participants to 4 different treatment conditions:


1)      a wait-list control (they later receive a treatment, but you measure their use before they do)

2)      12-step group, either Alcoholics Anonymous or Narcotics Anonymous

3)      harm reduction group therapy, geared toward reducing the harms associated with use (abstinence from drugs is not necessarily the goal)

4)      individual psychodynamic psychotherapy, focusing on the unconscious drives to use substances



The data above (download by clicking here, then save to a folder on your computer) are the number of days that each participant has used substances in the most recent month (assessed at 3 months into the study).  The possible range of days is 0 to 31 days; this dependent variable is “daysuse.”  The “group” variable represents the therapy that each participant receives (1-4 represent the conditions 1-4 listed above).



You can look at the data in boxplot form by selecting


            GRAPHS / BOXPLOT / SIMPLE (summaries for groups of cases)


and defining your variables (“daysuse” is your variable of interest, and “group” is the category axis).


Text Box: It looks like the wait-list control and the 12-step group have the highest median number of days using substances.  Harm reduction seems to have the least.

You can do an overall F test to see if there is a difference somewhere among the means.  The output will also tell you if Levene’s statistic is significant, which would indicate that you have violated the homogeneity of variance assumption.


Run the analysis of variance (overall F test in SPSS by selecting)




The one-way ANOVA command (which you used in the last tutorial) requires you to specify a dependent variable (daysuse) and a factor (group).  You also want to select under the Options that you want both descriptive and homogeneity of variance statistics.  Your output follows:


You have an effect!!  Somewhere, that is…


This is fine! No

HOV violations.






OK, so you know that there is some effect of the independent variable.  Everything that you’ve done in SPSS up to this point is review.  However, let’s assume several different scenarios in which you want to do different types of tests…


SCENARIO 1:  You have two questions in mind before you begin the study:

1)      Is harm reduction superior to all other methods of therapy?


2)      Does group therapy differ from the other conditions (both 12-step and harm reduction are in groups in this study)?  That is, does the average of the wait-list control and the individual psychodynamic therapy differ from the average of 12-step group and harm reduction group?


What is the appropriate type of testing to do here?  You have two comparisons that you plan to make before the study begins.  The null hypotheses are:


First comparison: μ harm r  = μ wait list + μ 12-step + μ psychodyn. or μ harm r  - μ wait list + μ 12-step + μ psychodyn = 0

3                                                                                                                     3


Second comparison: μ harm r  + μ 12-step  =  μ wait list + μ psychodynam (can also set this = 0)

                                             2                                  2


The contrast coefficients for these two comparisons are (with the means in order of wait list control, 12-step group, harm reduction group, and individual psychodynamic therapy):


First comparison:  (-1/3, -1/3, 1, -1/3)


Second comparison:  (-1/2, 1/2, 1/2, -1/2)


Note that the sum of the contrast coefficients, in both cases, is 0.  This satisfies the rule for constructing contrasts.  Also note that the sum of the absolute value of the coefficients is 2; this satisfies the suggested way of constructing contrasts.


What type of comparisons are these?  Should you use Bonferroni, Tukey’s HSD, or Scheffé?


SO, how do you do these in SPSS?  Well, SPSS allows you to specify which contrasts you would like to make in the ANOVA command.  Let’s re-run the ANOVA, but this time indicate which contrasts we’re testing:





Enter your dependent and independent variables as before.

 Also select Contrasts

To enter your contrasts, enter each coefficient into the “coefficients” box and click “ADD.”  When you have finished entering one contrast, click on “Next” to go the second contrast.  Here is how your dialog box will look after you have entered each contrast:


Contrast 1:                                                                               Contrast 2:



Click on “Continue,” then OK to run the tests. (The output is below; descriptives and HOV tests are omitted since they are the same as in the F test run previously)

The value of F and the overall significance is still the same.




Text Box: Your contrast coefficients are given to you in a table here so that you can make certain that you entered them correctly.




Notice that the tests of the contrasts give you the value of each contrast (this is computed the same way that you would compute ψ-hat), the standard error of the contrast, and the t-statistic that is the test of the contrast.  Because Levene’s statistic was not significant, we will fail to reject the null hypothesis that the population variances are the same, and will use the values under the “assume equal variances” row.


Also note that for the first contrast, it states that the “sum of the contrast coefficients is not zero.”  This is because SPSS only allows us to enter decimals for each coefficient (and only two decimal places), so .33 is not interpreted as 1/3.  Don’t worry about the fact that the contrast value will be slightly off because of this.


NOW, what about the p-values (significance levels of each contrast)?  SPSS does not take into consideration the fact that you may have more than one planned comparison, so it does not do the Bonferroni adjustment.  This adjustment is accomplished in one of two ways:


1)      Compare your obtained p-values to the desired alpha level DIVIDED BY the number of tests you planned in advance.  For example, if you have planned two tests (as you did here), you compare your p-value to alpha (.05) divided by 2, or .025, to determine whether or not the tests are significant.


2)      Multiply your p-value by the number of tests (two here) and compare to the alpha that you have set.  For these contrasts, your p-values would be (.001 * 2) or .002 and (.020 * 2) or .040, respectively.  These are both less than .05, of course.


Regardless of the way you choose to do the adjustment (either to the alpha level or to the p-value), you will obtain the same result.


Now, let’s think of a different scenario:


SCENARIO 2:  Your primary question of interest is whether any of the four conditions differ from each other in the experiment.

This results in p (p-1) / 2 comparisons, where p is the number of groups or conditions.  For a 4 group experiment like the one here, there are 6 pairwise comparisons.


What test should you use to account for the fact that you are doing all pairwise comparisons?  You planned these comparisons in advance, remember…


(ANSWER:  Tukey’s HSD test is appropriate for all pairwise comparisons, whether these comparisons are planned beforehand or compared afterward).


You can compute Tukey’s HSD test in SPSS by running the usual command:




(SPSS will remember the a priori contrasts that you just entered; you can clear those by selecting “contrasts” again and removing the contrast coefficients you entered)


Select “POST HOC.”  Note that this is a misnomer; you planned these pairwise comparisons in advance, but SPSS only offers Tukey’s HSD test under the “Post Hoc” option.


Select that you want the Tukey test.  Run the test, and you will get the following as part of your output:



First, note that SPSS yields 12 pairwise comparisons (that is because each pair is duplicated, e.g., ’12-step group and harm reduction’ is the same comparison as ‘harm reduction and 12-step group’).  The value of the difference between the group means is given, along with the significance of each comparison.  SPSS has already adjusted for the multiple pairwise comparisons using Tukey’s HSD, so you can take these significance levels as they are listed.


Notice that there are significant differences between wait list control and harm reduction, and between 12-step group and harm reduction.  In both cases, harm reduction is superior.



SCENARIO 3:  You had no planned comparisons before you looked at the data.  After you ran the overall ANOVA, you wondered if the wait list control was less effective than the average of the other three conditions (any therapy at all) in reducing the number of days of substance use.

In doing this, you had no planned comparisons, and only one post-hoc comparison. 


What type of test is appropriate to use? 


ANSWER:  Because you conducted this test post-hoc, you would need to use either the Tukey HSD or the Scheffé…however, to cover the fact that you are doing more than pairwise comparisons, you would need to use Scheffé (the only test that is appropriate if you do complex comparisons post-hoc).





Your null hypothesis for this test is: μ wait list  = μ 12-step + μ harm r + μ psychodyn.



Your contrast coefficients should be (1, -1/3, -1/3, -1/3).


You can run this contrast as you did earlier in SPSS, using the following values:



If you do that, you would get the following:

Unfortunately, this is not Scheffé’s test.  It is the t-test corresponding to a planned comparison; if you check Scheffé in the post-hoc box of the ANOVA command, you get the following output:


Note here that you are encountering a limitation of SPSS.  When you perform a Scheffé test in SPSS, it gives you the pairwise tests, just as it did for Tukey.  It does not give you the complex comparisons (of course, this would be difficult, as there are an infinite number of complex comparisons!).  What you should note from this is that the significance levels of the Scheffé tests are greater than the significance levels of the Tukey’s HSD tests (e.g., the difference between harm reduction and 12-step group is no longer significant with Scheffé).


Why is this?  Since you are accounting for the possibility of doing all possible comparisons (pairwise and complex), you are doing many more tests than with Tukey, so your test of each comparison is therefore more conservative.  That means, it is more difficult to reject the null with each of your tests using Scheffé as compared to using Tukey.


Unfortunately, if you want the Scheffé’s test statistic as is listed in your text, you must compute this by hand; SPSS will not give it to you for complex comparisons.



SCENARIO 4:  You had no planned comparisons before you looked at the data.  After looking at the data, you decide to test all pairwise comparisons, but stop there.

You will be conducting all pairwise comparisons post-hoc. 


What test do you use?  This isn’t meant to be a trick; you still use Tukey’s HSD.  That is because this test is appropriate for all pairwise comparisons, whether planned or post-hoc.




SCENARIO 5:  You had no planned comparisons before you looked at the data.  After looking at the data, you wonder if harm reduction is significantly different from individual psychodynamic therapy, but don’t want to answer any other questions.

You had no planned comparisons, were interested in only one pairwise comparison post-hoc. 


What test do you use?  Again, it’s Tukey’s HSD!  That’s because when you conduct post-hoc tests, you need to adjust your overall alpha to account for the number of tests that you would need to have conducted in order to do the test of interest (here, you would have needed to do ALL pairwise tests in order to do this pairwise test).  Tukey’s test is what you would use, since you did not do a complex comparison and you did not plan this comparison in advance.