7.1 Contrast coding in a factorial 2 x 2 design

In chapter 6 in section 6.3, we have used a data set with one 4-level factor. Here, we assume that the exact same four means come from an \(A(2) \times B(2)\) between-subject-factor design rather than an F(4) between-subject-factor design. We load the artificial, simulated data and show summary statistics in Table 7.1 and in Figure 7.1. The means and standard deviations are exactly the same as in Figure 6.2 and in Table 6.3.

Means and error bars (showing standard errors) for a simulated data-set with a two-by-two  between-subjects factorial design.

FIGURE 7.1: Means and error bars (showing standard errors) for a simulated data-set with a two-by-two between-subjects factorial design.

TABLE 7.1: Summary statistics per condition for the simulated data.
Factor A Factor B N data Means Std. dev. Std. errors
A1 B1 5 10 10 4.5
A1 B2 5 20 10 4.5
A2 B1 5 10 10 4.5
A2 B2 5 40 10 4.5

7.1.1 The difference between an ANOVA and a multiple regression

Let’s compare the traditional ANOVA with multiple regression for analyzing these data.

The results from the two analyses, shown in the R output and in Table ??, are very different. How do we see these are different? Notice that it is possible to compute F-values from t-values from the fact that \(F(1,df) = t(df)^2\) (Snedecor and Cochran 1967) (where \(df\) indicates degrees of freedom). When applying this to the above multiple regression model, the F-value for factor \(A\) (i.e., \(AA2\)) is \(0.00^2 = 0\). This is obviously not the same as in the ANOVA, where the F-value for factor \(A\) is \(5\). Likewise, in the multiple regression factor \(B\) (i.e., \(BB2\)) has an F-value of \(1.58^2 = 2.5\), which also does not correspond to the F-value for factor \(B\) in the ANOVA of \(20\). Interestingly, however, the F-value for the interaction is identical in both models, as \(2.24^2 = 5\).

The reason that the results from the ANOVA and the results from the multiple regression are different is that one needs sum contrasts in the linear model to get the conventional tests from an ANOVA model. (This is true for factors with two levels, but does not generalize to factors with more levels.)

TABLE 7.2: Regression analysis with sum contrasts.
Predictor Estimate Std. Error t p
(Intercept) 20 2.236 8.944 0.00
A1 -5 2.236 -2.236 0.04
B1 -10 2.236 -4.472 0.00
A1:B1 5 2.236 2.236 0.04

When using sum contrasts, the results from the multiple regression models (see Table 7.2) are identical to the results from the ANOVA (see R output). Factor \(A\) now has \(t^2=-2.24^2 = 5\), factor \(B\) has \(t^2=-4.47^2 = 20\), and the interaction has \(t^22.24^2 = 5\). All F-values are now the same as in the ANOVA model.

Next, we reproduce the \(A(2) \times B(2)\) - ANOVA with contrasts specified for the corresponding one-way \(F(4)\) ANOVA, that is by treating the \(2 \times 2 = 4\) condition means as four levels of a single factor F. In other words, we go back to the data frame simulated for the analysis of repeated contrasts (see chapter 6, section 6.3). We first define weights for condition means according to our hypotheses, invert this matrix, and use it as the contrast matrix for factor F in an LM. We define weights of \(1/4\) and \(-1/4\). We do so because (a) we want to compare the mean of two conditions to the mean of two other conditions (e.g., factor A compares \(\frac{F1 + F2}{2}\) to \(\frac{F3 + F4}{2}\)). Moreover, (b) we want to use sum contrasts, where the regression coefficients assess half the difference between means. Together (a+b), this yields weights of \(1/2 \cdot 1/2 = 1/4\). The resulting contrast matrix contains contrast coefficients of \(+1\) or \(-1\), showing that we successfully implemented sum contrasts. The results are identical to the previous models.

##    A    B    AxB 
## F1  1/4  1/4  1/4
## F2  1/4 -1/4 -1/4
## F3 -1/4  1/4 -1/4
## F4 -1/4 -1/4  1/4
##    A  B  AxB
## F1  1  1  1 
## F2  1 -1 -1 
## F3 -1  1 -1 
## F4 -1 -1  1
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)       20          2       9        0
## FA                -5          2      -2        0
## FB               -10          2      -4        0
## FAxB               5          2       2        0

This shows that it is possible to specify the contrasts not only for each factor (e.g., here in the \(2 \times 2\) design) separately. One can alternatively pool all experimental conditions (or design cells) into one large factor (here factor F with \(4\) levels), and specify the contrasts for the main effects and for the interactions in the resulting one large contrast matrix simultaneously.

In this approach, it can again be very useful to apply the hypr package to construct contrasts for a \(2 \times 2\) design. The first hypothesis estimates the main effect A, i.e., it compares the average of F1 and F2 to the average of F3 and F4. The second parameter estimates the main effect B, i.e., it compares the average of F1 and F3 to the average of F2 and F4. Note that we code direct differences between the averages, i.e., we implement scaled sum contrasts instead of sum contrasts. This becomes clear below as the contrast matrix contains coefficients of \(+1/2\) and \(-1/2\) instead of \(+1\) and \(-1\). The interaction term estimates the difference between differences, i.e., the difference between F1 - F2 and F3 - F4.

## hypr object containing 3 null hypotheses:
##   H0.A: 0 = 1/2*F1 + 1/2*F2 - 1/2*F3 - 1/2*F4
##   H0.B: 0 = 1/2*F1 + 1/2*F3 - 1/2*F2 - 1/2*F4
## H0.AxB: 0 = 1/2*F1 - 1/2*F2 - 1/2*F3 + 1/2*F4
## 
## Hypothesis matrix (transposed):
##    A    B    AxB 
## F1  1/2  1/2  1/2
## F2  1/2 -1/2 -1/2
## F3 -1/2  1/2 -1/2
## F4 -1/2 -1/2  1/2
## 
## Contrast matrix:
##    A    B    AxB 
## F1  1/2  1/2  1/2
## F2  1/2 -1/2 -1/2
## F3 -1/2  1/2 -1/2
## F4 -1/2 -1/2  1/2
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)       20          2       9        0
## FA               -10          4      -2        0
## FB               -20          4      -4        0
## FAxB              10          4       2        0

The results show that the estimates have half the size as compared to the sum contrasts - this is the result of the scaling that we applied. I.e., the main effects now directly estimate the difference between averages. However, both contrasts provide the exact same hypothesis tests. Thus, the hypr package can be used to code hypotheses in a 2 x 2 design.

7.1.2 Nested effects

One can specify hypotheses that do not correspond directly to main effects and interaction of the traditional ANOVA. For example, in a \(2 \times 2\) experimental design, where factor \(A\) codes word frequency (low/high) and factor \(B\) is part of speech (noun/verb), one can test the effect of word frequency within nouns and the effect of word frequency within verbs. Formally, \(A_{B1}\) versus \(A_{B2}\) are nested within levels of \(B\). Differently put, simple effects of factor \(A\) are tested for each of the levels of factor \(B\). In this version, we test whether there is a main effect of part of speech (\(B\); as in traditional ANOVA). However, instead of also estimating the second main effect word frequency, \(A\), and the interaction, we estimate (1) whether the two levels of word frequency, \(A\), differ for the first level of \(B\) (i.e., nouns) and (2) whether the two levels of word frequency, \(A\), differ for the second level of \(B\) (i.e., verbs). In other words, we estimate whether there are differences for \(A\) in each of the levels of \(B\). Often researchers have hypotheses about these differences, and not about the interaction.

##    B    B1xA B2xA
## F1  1/2   -1    0
## F2 -1/2    0   -1
## F3  1/2    1    0
## F4 -1/2    0    1
##    B    B1xA B2xA
## F1  1/2 -1/2    0
## F2 -1/2    0 -1/2
## F3  1/2  1/2    0
## F4 -1/2    0  1/2
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)       20          2       9        0
## FB               -20          4      -4        0
## FB1xA              0          6       0        1
## FB2xA             20          6       3        0

Regression coefficients estimate the GM, the difference for the main effect of word frequency (\(A\)) and the two differences (for \(B\); i.e., simple main effects) within levels of word frequency (\(A\)).

These custom nested contrasts’ columns are scaled versions of the corresponding hypothesis matrix. This is the case because the columns are orthogonal. It illustrates the advantage of orthogonal contrasts for the interpretation of regression coefficients: the underlying hypotheses being tested are already clear from the contrast matrix.

There is also a built-in R formula specification of nested designs. The order of factors in the formula from left to right specifies a top-down order of nesting within levels, i.e., here factor \(A\) (word frequency) is nested within levels of the factor \(B\) (part of speech). This yields the exact same result as our previous result based on custom nested contrasts:

##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)       20          2       9        0
## B1               -20          4      -4        0
## BB1:A1             0          6       0        1
## BB2:A1            20          6       3        0

xxx

Note that in cases such as these, where \(A_{B1}\) vs. \(A_{B2}\) are nested within levels of \(B\), it is necessary to include the effect of \(B\) (part of speech) in the model, even if one is only interested in the effect of \(A\) (word frequency) within levels of \(B\) (part of speech). Leaving out factor \(B\) in this case can lead to biases in parameter estimation in the case the data are not fully balanced.

Again, we show how nested contrasts can be easily implemented using hypr:

## hypr object containing 3 null hypotheses:
##    H0.B: 0 = 1/2*F1 + 1/2*F3 - 1/2*F2 - 1/2*F4
## H0.B1xA: 0 = F3 - F1
## H0.B2xA: 0 = F4 - F2
## 
## Hypothesis matrix (transposed):
##    B    B1xA B2xA
## F1  1/2   -1    0
## F2 -1/2    0   -1
## F3  1/2    1    0
## F4 -1/2    0    1
## 
## Contrast matrix:
##    B    B1xA B2xA
## F1  1/2 -1/2    0
## F2 -1/2    0 -1/2
## F3  1/2  1/2    0
## F4 -1/2    0  1/2
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)       20          2       9        0
## FB               -20          4      -4        0
## FB1xA              0          6       0        1
## FB2xA             20          6       3        0

Of course, we can also ask the reverse question: Are there differences for part of speech (\(B\)) in the levels of word frequency (\(A\); in addition to estimating the main effect of word frequency, \(A\))? That is, do nouns differ from verbs for low-frequency words (\(B_{A1}\)) and do nouns differ from verbs for high-frequency words (\(B_{A2}\))?

## hypr object containing 3 null hypotheses:
##    H0.A: 0 = 1/2*F1 + 1/2*F2 - 1/2*F3 - 1/2*F4
## H0.A1xB: 0 = F2 - F1
## H0.A2xB: 0 = F4 - F3
## 
## Hypothesis matrix (transposed):
##    A    A1xB A2xB
## F1  1/2   -1    0
## F2  1/2    1    0
## F3 -1/2    0   -1
## F4 -1/2    0    1
## 
## Contrast matrix:
##    A    A1xB A2xB
## F1  1/2 -1/2    0
## F2  1/2  1/2    0
## F3 -1/2    0 -1/2
## F4 -1/2    0  1/2
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)       20          2       9        0
## FA               -10          4      -2        0
## FA1xB             10          6       2        0
## FA2xB             30          6       5        0

xxx

Regression coefficients estimate the GM, the difference for the main effect of word frequency (\(A\)) and the two part of speech effects (for \(B\); i.e., simple main effects) within levels of word frequency (\(A\)).

7.1.3 Interactions between contrasts

We have discussed above that in a \(2 \times 2\) experimental design, the results from sum contrasts are equivalent to typical ANOVA results. In addition, we had also run the analysis with treatment contrasts. It was clear that the results for treatment contrasts (see Table ??) did not correspond to the results from the ANOVA. However, if the results for treatment contrasts do not correspond to the typical ANOVA results, what do they then test? That is, is it still possible to meaningfully interpret the results from the treatment contrasts in a simple \(2 \times 2\) design?

This leads us to a very important principle in interpreting results from contrasts: When interactions between contrasts are included in a model, then the results of one contrast actually depend on the specification of the other contrast(s) in the analysis! This may be counter-intuitive at first. However, it is very important and essential to keep in mind when interpreting results from contrasts. How does this work in detail?

The general rule to remember is that the main effect of one contrast measures its effect at the location \(0\) of the other contrast(s) in the analysis. What does that mean? Let us consider the example that we use two treatment contrasts in a \(2 \times 2\) design (see results in Table ??). Let’s take a look at the main effect of factor A. How can we interpret what this measures or tests? This main effect actually tests the effect of factor A at the “location” where factor B is coded as \(0\). Factor B is coded as a treatment contrast, that is, it codes a zero at its baseline condition, which is B1. Thus, the main effect of factor A tests the effect of A nested within the baseline condition of B. We take a look at the data presented in Figure 7.1, what this nested effect should be. Figure 7.1 shows that the effect of factor A nested in B1 is \(0\). If we now compare this to the results from the linear model, it is indeed clear that the main effect of factor A (see Table ??) is exactly estimated as \(0\). As expected, when factor B is coded as a treatment contrast, the main effect of factor A estimates the effect of A nested within the baseline level of factor B.

Next, consider the main effect of factor B. According to the same logic, this main effect estimates the effect of factor B at the “location” where factor A is \(0\). Factor A is also coded as a treatment contrast, that is, it codes its baseline condition A1 as \(0\). The main effect of factor B estimates the effect of B nested within the baseline condition of A. Figure 7.1 shows that this effect should be \(10\); this indeed corresponds to the main effect of B as estimated in the regression model for treatment contrasts (see Table ??, the Estimate for BB2). As we had seen before, the interaction term, however, does not differ between the treatment contrast and ANOVA (\(t^2 = 2.24^2 = F = 5.00\)).

How do we know what the “location” is, where a contrast applies? For the treatment contrasts discussed here, it is possible to reason this through because all contrasts are coded as \(0\) or \(1\). However, how is it possible to derive the “location” in general? What we can do is to look at the hypotheses tested by the treatment contrasts (or the comparisons that are estimated) in the presence of an interaction between them by using the generalized matrix inverse. We go back to the default treatment contrasts. Then we extract the contrast matrix from the design matrix:

##       (Intercept) A2 B2 A2:B2
## A1_B1           1  0  0     0
## A1_B2           1  0  1     0
## A2_B1           1  1  0     0
## A2_B2           1  1  1     1

This shows the treatment contrast for factors A and B, and their interaction. We can now assign this contrast matrix to a hypr object. hypr automatically converts the contrast matrix into a hypothesis matrix, such that we can read from the hypothesis matrix which comparison are being estimated by the different contrasts.

## hypr object containing 4 null hypotheses:
## H0.(Intercept): 0 = A1_B1
##          H0.A2: 0 = -A1_B1 + A2_B1
##          H0.B2: 0 = -A1_B1 + A1_B2
##       H0.A2:B2: 0 = A1_B1 - A1_B2 - A2_B1 + A2_B2
## 
## Hypothesis matrix (transposed):
##       (Intercept) A2 B2 A2:B2
## A1_B1  1          -1 -1  1   
## A1_B2  0           0  1 -1   
## A2_B1  0           1  0 -1   
## A2_B2  0           0  0  1   
## 
## Contrast matrix:
##       (Intercept) A2 B2 A2:B2
## A1_B1 1           0  0  0    
## A1_B2 1           0  1  0    
## A2_B1 1           1  0  0    
## A2_B2 1           1  1  1

Note that the same result is obtained by applying the generalized inverse to the contrast matrix (this is what hypr does as well). An important fact is that when we apply the generalized inverse to the contrast matrix, we obtain the corresponding hypothesis matrix (for details see Schad et al. 2020b).

##       (Intercept) A2 B2 A2:B2
## A1_B1  1          -1 -1  1   
## A1_B2  0           0  1 -1   
## A2_B1  0           1  0 -1   
## A2_B2  0           0  0  1

As discussed above, the main effect of factor A estimates its effect nested within the baseline level of factor B. Likewise, the main effect of factor B estimates its effect nested within the baseline level of factor A.

How does this work for sum contrasts? They do not have a baseline condition that is coded as \(0\). In sum contrasts, however, the average of the contrast coefficients is \(0\). Therefore, main effects estimate the average effect across factor levels. This is what is typically also tested in standard ANOVA. Let’s look at the example shown in Table 7.2: given that factor B has a sum contrast, the main effect of factor A is tested as the average across levels of factor B. Figure 7.1 shows that the effect of factor A in level B1 is \(10 - 10 = 0\), and in level B2 it is \(20 - 40 = -20\). The average effect across both levels is \((0 - 20)/2 = -10\). Due to the sum contrast coding, we have to divide this by 2, yielding an expected effect of \(-10 / 2 = -5\). This is exactly what the main effect of factor A measures (see Table 7.2, Estimate for A1).

Similarly, factor B tests its effect at the location \(0\) of factor A. Again, \(0\) is exactly the mean of the contrast coefficients from factor A, which is coded as a sum contrast. Therefore, factor B tests the effect of B averaged across factor levels of A. For factor level A1, factor B has an effect of \(10 - 20 = -10\). For factor level A2, factor B has an effect of \(10 - 40 = -30\). The average effect is \((-10 - 30)/2 = -20\), which again needs to be divided by \(2\) due to the sum contrast. This yields exactly the estimate of \(-10\) that is also reported in Table 7.2 (Estimate for B1).

Again, we look at the hypothesis matrix for the main effects and the interaction:

## hypr object containing 4 null hypotheses:
## H0.(Intercept): 0 = 1/4*A1_B1 + 1/4*A1_B2 + 1/4*A2_B1 + 1/4*A2_B2
##          H0.A1: 0 = 1/4*A1_B1 + 1/4*A1_B2 - 1/4*A2_B1 - 1/4*A2_B2
##          H0.B1: 0 = 1/4*A1_B1 - 1/4*A1_B2 + 1/4*A2_B1 - 1/4*A2_B2
##       H0.A1:B1: 0 = 1/4*A1_B1 - 1/4*A1_B2 - 1/4*A2_B1 + 1/4*A2_B2
## 
## Hypothesis matrix (transposed):
##       (Intercept) A1   B1   A1:B1
## A1_B1  1/4         1/4  1/4  1/4 
## A1_B2  1/4         1/4 -1/4 -1/4 
## A2_B1  1/4        -1/4  1/4 -1/4 
## A2_B2  1/4        -1/4 -1/4  1/4 
## 
## Contrast matrix:
##       (Intercept) A1 B1 A1:B1
## A1_B1  1           1  1  1   
## A1_B2  1           1 -1 -1   
## A2_B1  1          -1  1 -1   
## A2_B2  1          -1 -1  1

This shows that each of the main effects now does not compute nested comparisons any more, but that they rather test their effect averaged across conditions of the other factor. The averaging involves using weights of \(1/2\). Moreover, the regression coefficients in the sum contrast measure half the distance between conditions, leading to weights of \(1/2 \cdot 1/2 = 1/4\).

The general rule to remember from these examples is that when interactions between contrasts are estimated, what a main effect of a factor estimates depends on the contrast coding of the other factors in the design! The main effect of a factor estimates the effect nested within the location zero of the other contrast(s) in an analysis. If another contrast is centered, and zero is the average of this other contrast’s coefficients, then the contrast of interest tests the average effect, averaged across the levels of the other factor. Importantly, this property holds only when the interaction between two contrasts is included into a model. If the interaction is omitted and only main effects are estimated, then there is no such “action at a distance”.

This may be a very surprising result for interactions of contrasts. However, it is also essential to interpreting contrast coefficients involved in interactions. It is particularly relevant for the analysis of the default treatment contrast, where the main effects estimate nested effects rather than average effects.

References

Schad, Daniel J., Shravan Vasishth, Sven Hohenstein, and Reinhold Kliegl. 2020a. “How to Capitalize on a Priori Contrasts in Linear (Mixed) Models: A Tutorial.” Journal of Memory and Language 110.

2020b. “How to Capitalize on a Priori Contrasts in Linear (Mixed) Models: A Tutorial.” Journal of Memory and Language 110. Elsevier: 104038.

Snedecor, George W, and William G Cochran. 1967. Statistical Methods. Ames, Iowa: Iowa State University Press.