6.4 What makes a good set of contrasts?

For a factor with \(I\) levels one can make only \(I-1\) comparisons within a single model. For example, in a design with one factor with two levels, only one comparison is possible (between the two factor levels). More generally, if we have a factor with \(I_1\) and another factor with \(I_2\) levels, then the total number of conditions is \(I_1\times I_2 = \nu\) (not \(I_1 + I_2\)!), which implies a maximum of \(\nu-1\) contrasts.

For example, in a design with one factor with three levels, A, B, and C, in principle one could make three comparisons (A vs. B, A vs. C, B vs. C). However, after defining an intercept, only two means can be compared. Therefore, for a factor with three levels, we define two comparisons within one statistical model. F tests are nothing but combinations, or bundles of contrasts. F tests are less specific and they lack focus, but they are useful when the hypothesis in question is vague. However, a significant F test leaves unclear what effects the data actually show. Contrasts are very useful to test specific effects in the data.

One critical precondition for contrasts is that they implement different hypotheses that are not collinear, that is, that none of the contrasts can be generated from the other contrasts by linear combination. For example, the contrast c1 = c(1,2,3) can be generated from the contrast c2 = c(3,4,5) simply by computing c2 - 2. Therefore, contrasts c1 and c2 cannot be used simultaneously. That is, each contrast needs to encode some independent information about the data.

There are (at least) two criteria to decide what a good contrast is. First, have advantages as they test mutually independent hypotheses about the data (see Dobson and Barnett 2011, sec. 6.2.5, p. 91 for a detailed explanation of orthogonality). Second, it is crucial that contrasts are defined in a way such that they answer the research questions. This second point is crucial. One way to accomplish this, is to use the hypothesis matrix to generate contrasts (e.g., via the hypr package), as this ensures that one uses contrasts that exactly estimate the comparisons of interest in a given study.

6.4.1 Centered contrasts

Contrasts are often constrained to be centered, such that the individual contrast coefficients \(c_i\) for different factor levels \(i\) sum to \(0\): \(\sum_{i=1}^I c_i = 0\). This has advantages when testing interactions with other factors or covariates (we discuss interactions between factors in a separate chapter below). All contrasts discussed here are centered except for the treatment contrast, in which the contrast coefficients for each contrast do not sum to zero:

colSums(contr.treatment(4))
## 2 3 4 
## 1 1 1

Other contrasts, such as repeated contrasts, are centered and the contrast coefficients for each contrast sum to \(0\):

colSums(contr.sdif(4))
## 2-1 3-2 4-3 
##   0   0   0

The contrast coefficients mentioned above appear in the contrast matrix. By contrast, the weights in the hypothesis matrix are always centered. This is also true for the treatment contrast. The reason is that they code hypotheses, which always relate to comparisons between conditions or bundles of conditions. The only exception are the weights for the intercept, which are all the same and together always sum to \(1\) in the hypothesis matrix. This is done to ensure that when applying the generalized matrix inverse, the intercept results in a constant term with values of \(1\) in the contrast matrix. An important question concerns whether (or when) the intercept needs to be considered in the generalized matrix inversion, and whether (or when) it can be ignored. This question is closely related to the concept of orthogonal contrasts, a concept we turn to below.

6.4.2 Orthogonal contrasts

Two centered contrasts \(c_1\) and \(c_2\) are orthogonal to each other if the following condition applies. Here, \(i\) is the \(i\)-th cell of the vector representing the contrast.

\[\begin{equation} \sum_{i=1}^I c_{1,i} \cdot c_{2,i} = 0 \end{equation}\]

Orthogonality can be determined easily in R by computing the correlation between two contrasts. Orthogonal contrasts have a correlation of \(0\). Contrasts are therefore just a special case for the general case of predictors in regression models, where two numeric predictor variables are orthogonal if they are un-correlated.

For example, coding two factors in a \(2 \times 2\) design (we return to this case in a section on designs with two factors below) using sum contrasts, these sum contrasts and their interaction are orthogonal to each other:

(Xsum <- cbind(
  F1 = c(1, 1, -1, -1), F2 = c(1, -1, 1, -1),
  F1xF2 = c(1, -1, -1, 1)
))
##      F1 F2 F1xF2
## [1,]  1  1     1
## [2,]  1 -1    -1
## [3,] -1  1    -1
## [4,] -1 -1     1
cor(Xsum)
##       F1 F2 F1xF2
## F1     1  0     0
## F2     0  1     0
## F1xF2  0  0     1

Notice that the correlations between the different contrasts (i.e., the off-diagonals) are exactly \(0\). Sum contrasts coding one multi-level factor, however, are not orthogonal to each other:

cor(contr.sum(4))
##      [,1] [,2] [,3]
## [1,]  1.0  0.5  0.5
## [2,]  0.5  1.0  0.5
## [3,]  0.5  0.5  1.0

Here, the correlations between individual contrasts, which appear in the off-diagonals, deviate from \(0\), indicating non-orthogonality. The same is also true for treatment and repeated contrasts:

cor(contr.sdif(4))
##        2-1    3-2    4-3
## 2-1 1.0000 0.5774 0.3333
## 3-2 0.5774 1.0000 0.5774
## 4-3 0.3333 0.5774 1.0000
cor(contr.treatment(4))
##         2       3       4
## 2  1.0000 -0.3333 -0.3333
## 3 -0.3333  1.0000 -0.3333
## 4 -0.3333 -0.3333  1.0000

Orthogonality of contrasts plays a critical role when computing the generalized inverse. In the inversion operation, orthogonal contrasts are converted independently from each other. That is, the presence or absence of another orthogonal contrast does not change the resulting weights. In fact, for orthogonal contrasts, applying the generalized matrix inverse to the hypothesis matrix simply produces a scaled version of the hypothesis matrix into the contrast matrix (for mathematical details, see Schad et al. (2020b)).

6.4.3 The role of the intercept in non-centered contrasts

A related question concerns whether the intercept needs to be considered when computing the generalized inverse for a contrast. It turns out that considering the intercept is necessary for contrasts that are not centered. This is the case for treatment contrasts which are not centered; e.g., the treatment contrast for two factor levels c1vs0 = c(0,1): \(\sum_i c_i = 0 + 1 = 1\). One can actually show that the formula to determine whether contrasts are centered (i.e., \(\sum_i c_i = 0\)) is the same formula as the formula to test whether a contrast is “orthogonal to the intercept”. Remember that for the intercept, all contrast coefficients are equal to one: \(c_{1,i} = 1\) (here, \(c_{1,i}\) indicates the vector of contrast coefficients associated with the intercept). We enter these contrast coefficient values into the formula testing whether a contrast is orthogonal to the intercept (here, \(c_{2,i}\) indicates the vector of contrast coefficients associated with some contrast for which we want to test whether it is “orthogonal to the intercept”): \(\sum_i c_{1,i} \cdot c_{2,i} = \sum_i 1 \cdot c_{2,i} = \sum_i c_{2,i} = 0\). The resulting formula is: \(\sum_i c_{2,i} = 0\), which is exactly the formula for whether a contrast is centered. Because of this analogy, treatment contrasts can be viewed to be `not orthogonal to the intercept’. This means that the intercept needs to be considered when computing the generalized inverse for treatment contrasts. As we have discussed above, when the intercept is included in the hypothesis matrix, the weights for this intercept term should sum to one, as this yields a column of ones for the intercept term in the contrast matrix.

We can see that considering the intercept makes a difference for the treatment contrast. First, we define the comparisons involved in a treatment contrast, where two experimental conditions b and c are each compared to a baseline condition a (b~a and c~a). In addition, we explicitly code the intercept term, which involves a comparison of the baseline to 0 (a~0). We take a look at the resulting contrast matrix:

hypr(int = a ~ 0, b1 = b ~ a, b2 = c ~ a)
## hypr object containing 3 null hypotheses:
## H0.int: 0 = a      (Intercept)
##  H0.b1: 0 = b - a
##  H0.b2: 0 = c - a
## 
## Call:
## hypr(int = ~a, b1 = ~b - a, b2 = ~c - a, levels = c("a", "b", 
## "c"))
## 
## Hypothesis matrix (transposed):
##   int b1 b2
## a  1  -1 -1
## b  0   1  0
## c  0   0  1
## 
## Contrast matrix:
##   int b1 b2
## a 1   0  0 
## b 1   1  0 
## c 1   0  1
contr.treatment(c("a", "b", "c"))
##   b c
## a 0 0
## b 1 0
## c 0 1

This shows a contrast matrix that we know from the treatment contrast. The intercept is coded as a column of 1s. And each of the comparisons is coded as a 1 in the condition which is compared to the baseline, and a 0 in other conditions. The point is here that this gives us the contrast matrix that is expected and known for the treatment contrast.

However, we can also ignore the intercept in the specification of the hypotheses:

hypr(b1 = b ~ a, b2 = c ~ a)
## hypr object containing 2 null hypotheses:
## H0.b1: 0 = b - a
## H0.b2: 0 = c - a
## 
## Call:
## hypr(b1 = ~b - a, b2 = ~c - a, levels = c("a", "b", "c"))
## 
## Hypothesis matrix (transposed):
##   b1 b2
## a -1 -1
## b  1  0
## c  0  1
## 
## Contrast matrix:
##   b1   b2  
## a -1/3 -1/3
## b  2/3 -1/3
## c -1/3  2/3

Interestingly, the resulting contrast matrix now looks very different from the contrast matrix that we know from the treatment contrast. Indeed, this contrast also estimates a reasonable set of quantities. It again estimates how strongly condition mean m1 differs from the baseline and how m2 differs from baseline. The intercept, however, now estimates the average dependent variable across all three conditions (i.e., the GM). This can be seen by explicitly adding a comparison of the average of all three conditions to 0:

hypr(int = (a + b + c) / 3 ~ 0, b1 = b ~ a, b2 = c ~ a)
## hypr object containing 3 null hypotheses:
## H0.int: 0 = (a + b + c)/3  (Intercept)
##  H0.b1: 0 = b - a
##  H0.b2: 0 = c - a
## 
## Call:
## hypr(int = ~1/3 * a + 1/3 * b + 1/3 * c, b1 = ~b - a, b2 = ~c - 
##     a, levels = c("a", "b", "c"))
## 
## Hypothesis matrix (transposed):
##   int b1  b2 
## a 1/3  -1  -1
## b 1/3   1   0
## c 1/3   0   1
## 
## Contrast matrix:
##   int  b1   b2  
## a    1 -1/3 -1/3
## b    1  2/3 -1/3
## c    1 -1/3  2/3

The resulting contrast matrix is now the same as when the intercept was ignored, which confirms that these both test the same hypotheses.

References

Dobson, Annette J, and Adrian Barnett. 2011. An Introduction to Generalized Linear Models. CRC press.
Schad, Daniel J., Shravan Vasishth, Sven Hohenstein, and Reinhold Kliegl. 2020b. “How to Capitalize on a Priori Contrasts in Linear (Mixed) Models: A Tutorial.” Journal of Memory and Language 110: 104038.