7.2 One factor and one covariate

7.2.1 Estimating a group-difference and controlling for a covariate

In this section we treat the case where there are again two predictor variables for one dependent variable, but where one predictor variable is a discrete factor, and the other is a continuous covariate. Let’s assume we have measured some response time (RT), e.g. in a lexical decision task. We want to predict the response time based on each subject’s IQ, and we expect that higher IQ leads to shorter response times. Moreover, we have two groups of each 30 subjects. These are coded as factor F, with factor levels F1 and F2. We assume that these two groups have obtained different training programs to optimize their response times on the task. Group F1 obtained a control training, whereas group F2 obtained a training to improve lexical decisions. We want to test whether the training for better lexical decisions in group F2 actually leads to shorter response times compared to the control group F1. This is our main question of interest here, i.e., whether the training program in F2 leads to faster response times compared to the control group F1. We load the data, which is an artificially simulated data set.

data("df_contrasts5")
str(df_contrasts5)

## tibble[,4] [60 × 4] (S3: tbl_df/tbl/data.frame)
##  $ F : Factor w/ 2 levels "F1","F2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ RT: num [1:60] 247 226 173 229 226 ...
##  $ IQ: num [1:60] 80.6 93.5 72 72.4 73.4 ...
##  $ id: Factor w/ 60 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...

Our main effect of interest is the factor F. We want to test its effect on response times and code it using scaled sum contrasts, such that negative parameter estimates would yield support for our hypothesis that response times are faster in the training group F2:

(contrasts(df_contrasts5$F) <- c(-0.5, +0.5))

## [1] -0.5  0.5

We run a linear model to estimate the effect of factor F, i.e., how strongly the response times in the two groups differ from each other.

fit_RT_F <- lm(RT ~ 1 + F,
                 data = df_contrasts5)

round(summary(fit_RT_F)$coefficients)

##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)      212          5      41        0
## F1               -25         10      -2        0

Means and error bars (showing standard errors) for a simulated data-set of response times for two different groups of subjects, who have obtained a training in lexical decisions (F2) versus have obtained a control training (F1).

FIGURE 7.2: Means and error bars (showing standard errors) for a simulated data-set of response times for two different groups of subjects, who have obtained a training in lexical decisions (F2) versus have obtained a control training (F1).

We find (see model estimates and data shown in Fig. 7.2) that response times in group F2 are roughly 25 ms faster than in group F1 (Estimate of $-24$). The 95% confidence intervals do not overlap with zero. This suggests that as expected, the training program that group F2 obtained seems to be successful in speeding up response times. We could now run a Bayes factor analysis on this data set to directly test this hypothesis, and maybe this would provide evidence for a difference in response times between groups.

However, let’s assume we have allocated subjects to the two groups randomly. Let’s say that we also measured the IQ of each person using an IQ test. We did so, because we expected that IQ could have a strong influence on response times, and we wanted to control for this influence. We now can check whether the two groups had the same average IQ.

df_contrasts5 %>% group_by(F) %>% summarize(M.IQ = mean(IQ))

## # A tibble: 2 x 2
##   F      M.IQ
##   <fct> <dbl>
## 1 F1      85.
## 2 F2     115

Interestingly, group F2 did not only obtain an additional training and had faster response times, but group F2 also had a higher IQ (mean of 115) on average than group F1 (mean IQ = 85). Thus, the random allocation of subjects to the two groups seems to have created - by chance - a difference in IQs. Now we can ask the question: why may response times in group F2 be faster than in group F1? Is this because of the training program in F2? Or is this simply because the average IQ in group F2 was higher than in group F1? To investigate this question, we add both predictor variables simultaneously in a linear model. Before we enter the continuous IQ variable, we center it, by subtracting its mean. Centering covariates is generally good practice. Moreover, it is often important to z-transform the covariate, i.e., to not only subtract the mean, but also to divide by its standard deviation (this can be done as follows: df_contrasts5$IQ.s <- scale(df_contrasts5$IQ)). The reason why this is often important is that the estimation doesn’t work well if predictors have different scales. For the simple models we use here, the estimation works fine without z-transformation. However, for more realistic more complex models, z-transformation of covariates is often very important.

df_contrasts5$IQ.c <- df_contrasts5$IQ - mean(df_contrasts5$IQ)
fit_RT_F_IQ <- lm(RT ~ 1 + F + IQ.c,
                 data = df_contrasts5)

round(summary(fit_RT_F_IQ)$coefficients,2)

##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   212.50       4.77   44.51     0.00
## F1              7.00      13.62    0.51     0.61
## IQ.c           -1.07       0.32   -3.30     0.00

The results from the brms model now show that the difference in response times between groups (i.e., factor F) is not estimated to be $-25$ ms any more, but instead, the estimate is about $+7$ ms, and the 95% confidence intervals strongly overlap with zero ($-20$ to $33$). Thus, it looks as if the groups would not differ from each other any more. At the same time, we see that the predictor variable IQ shows a negative effect (Estimate = $-1$ with 95% confidence interval: $-1.7$ to $-0.4$), suggesting that - as expected - response times seem to be faster in subjects with higher IQ.

Response times as a function of individual IQ for two groups with a lexical decision training (F2) versus a control training (F1). Points indicate individual subjects, and lines with error bands indicate linear regression lines.

FIGURE 7.3: Response times as a function of individual IQ for two groups with a lexical decision training (F2) versus a control training (F1). Points indicate individual subjects, and lines with error bands indicate linear regression lines.

This result can also be seen in Figure 7.3, which shows that response times decrease with increasing IQ, as suggested by the linear model. However, the heights of the two regression lines do not differ from each other, consistent with the observation in the brms model that the effect of factor F did not seem to differ from zero. That is, factor F in the linear model estimates the difference in height of the regression line between both groups. That the height does not differ and the effect of F is estimated close to zero suggests that in fact group F2 showed faster response times not because of their additional training program. Instead, they had faster response times simply because their IQ was by chance higher on average compared to the control group F1. This analysis is called “analysis of covariance” (ANCOVA), where it’s possible to test a group-difference after “controlling for” the influence of a covariate.

Importantly, we can see in Figure 7.3 that the two regression lines for the two groups are exactly parallel to each other. That is, the influence of IQ on response times seems to be exactly the same in both groups. This is actually a prerequiste for the ANCOVA analysis that needs to be checked in the data. That is, if we want to test the difference between groups after controlling for a covariate (here IQ), we have to test whether the influence of the covariate is the same in both groups. We can investigate this by including an interaction term between the factor and the covariate in the brms model:

fit_RT_FxIQ <- lm(RT ~ 1 + F * IQ.c,
                 data = df_contrasts5)

round(summary(fit_RT_FxIQ)$coefficients,2)

##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   212.50       6.87   30.93     0.00
## F1              7.00      13.74    0.51     0.61
## IQ.c           -1.07       0.33   -3.27     0.00
## F1:IQ.c         0.00       0.65    0.00     1.00

The estimate for the interaction (the term “F1:IQ.c”) is very small here (close to 0) and the 95% confidence intervals clearly overlap with zero, showing that the two regression lines are estimated to be very similar, or parallel, to each other. If this is the case, then it is possible to correct for IQ when testing the group difference.

7.2.2 Estimating differences in slopes

We now take a look at a different data set.

data("df_contrasts6")
levels(df_contrasts6$F) <- c("simple","complex")
str(df_contrasts6)

## tibble[,4] [60 × 4] (S3: tbl_df/tbl/data.frame)
##  $ F : Factor w/ 2 levels "simple","complex": 1 1 1 1 1 1 1 1 1 1 ...
##  $ RT: num [1:60] 223 200 152 206 203 ...
##  $ IQ: num [1:60] 99.3 109.5 76.3 87.1 87.6 ...
##  $ id: Factor w/ 60 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...

This again contains data from response times (RT) in two groups. Let’s assume the two groups have performed two different response time tasks, where one simple RT task doesn’t rely on much cognitive processing (group “simple”), whereas the other task is more complex and depends on complex cognitive operations (group “complex”). We therefore expect that RTs in the simple task should be independent of IQ, whereas in the complex task, individuals with a high IQ should be faster in responding compared to individuals with low IQ. Thus, our primary hypothesis of interest states that the influence of IQ on RT differs between conditions. This means that we are interested in the difference between slopes. A slope in a linear regression assesses how strongly the dependent variable (here RT) changes with an increase of one unit on the covariate (here IQ), it thus assesses how “steep” the regression line is. Our research hypothesis is that the regression lines differ between groups.

Response times as a function of individual IQ for two groups performing a simple versus a complex task. Points indicate individual subjects, and lines with error bands indicate linear regression lines.

FIGURE 7.4: Response times as a function of individual IQ for two groups performing a simple versus a complex task. Points indicate individual subjects, and lines with error bands indicate linear regression lines.

The results, displayed in Figure 7.4, suggest that the data are consistent with our research hypothesis. For the subjects performing the complex task, response times seem to decrease with increasing IQ, whereas for subjects performing the simple task, response times seem to be independent of IQ. As stated before, our primary hypothesis relates to the difference in slopes. Statistically speaking, this is assessed in the interaction between the factor and the covariate. Thus, we run a linear model where the interaction is included. Importantly, we first use scaled sum contrasts for the group effect, and again center the covariate IQ.

contrasts(df_contrasts6$F) <- c(-0.5, +0.5)
df_contrasts6$IQ.c <- df_contrasts6$IQ - mean(df_contrasts6$IQ)
fit_RT_FxIQ2 <- lm(RT ~ 1 + F * IQ.c,
                 data = df_contrasts6)

round(summary(fit_RT_FxIQ2)$coefficients,2)

##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)    210.0       4.76   44.13     0.00
## F1              20.0       9.52    2.10     0.04
## IQ.c            -0.8       0.32   -2.48     0.02
## F1:IQ.c         -1.6       0.65   -2.48     0.02

We can see that the main effect of IQ (term “IQ.c”) is negative ($-0.8$) with 95% confidence intervals $-1.5$ to $-0.2$, suggesting that overall response times decrease with increasing IQ. However, this is qualified by the interaction term, which is estimated to be negative ($-1.6$), with 95% confidence intervals $-2.9$ to $-0.3$. This suggests that the slope in the complex group (which was coded as $+0.5$ in the scaled sum contrast) is more negative than the slope in the simple group (which was coded as $-0.5$ in the scaled sum contrast). Thus, the interaction assesses the difference between slopes.

We can also run a model where the simple slopes are estimated, i.e., the slope of IQ in the simple group and the slope of IQ in the complex group. This can be implemented by using the nested coding that we learned about in the previous section:

fit_RT_FnIQ2 <- lm(RT ~ 1 + F / IQ.c,
                 data = df_contrasts6)

round(summary(fit_RT_FnIQ2)$coefficients,2)

##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)      210.0       4.76   44.13     0.00
## F1                20.0       9.52    2.10     0.04
## Fsimple:IQ.c       0.0       0.46    0.00     1.00
## Fcomplex:IQ.c     -1.6       0.46   -3.51     0.00

Now we see that the slope of IQ in the simple group (“Fsimple:IQ.c”) is estimated to be $0$, with confidence intervals clearly including zero. By contrast, the slope in the complex group (“Fcomplex:IQ.c”) is estimated as $-1.6$ (95% confidence interval $-2.5$ to $-0.7$). This is consistent with our hypothesis that high IQ speeds up response times for the complex but not for the simple task. We can also see from the nested analysis that the difference in slopes between conditions is $-1.6 - 0.0 = -1.6$. This is exactly the value for the interaction term that we estimated in the previous model, demonstrating that interaction terms assess the difference between slopes; i.e., they estimate the extent to which the regression lines in the two conditions are parallel, with an estimate of 0 indicating perfectly parallel lines.

A note: It is very important to always center covariates before including them into a model. If covariates are not centered, then the main effects (here the main effect for the factor) cannot be interpreted as main effects any more.

Interestingly, one can also do analyses with interactions between a covariate and a factor, but by using different contrast codings. For example, if we use treatment contrasts for the factor, then the main effect of IQ.c assess not the average slope of IQ.c across conditions, but instead the nested slope of IQ.c within the baseline group of the treatment contrast. The interaction still assesses the difference in slopes between groups. In a situation where there are more than two groups, when one estimates the interaction of contrasts with a covariate, then the contrasts define which slopes are compared with each other in the interaction terms. For example, when using sum contrasts in an example where the influence of IQ is measured on response times for nouns, verbs, and adjectives, then there are two interaction terms: these assess (1) whether the slope of IQ for nouns is different from the average slope across conditions, and (2) whether the slope of IQ for verbs is different from the average slope across conditions. If one uses repeated contrasts in a situation where the influence of IQ on response times is estimated for word frequency conditions “low”, “medium-low”, “medium-high”, and “high”, then there are three interaction terms (one for each contrast). The first interaction term estimates the difference in slopes between “low” and “medium-low” word frequencies, the second interaction term estimates the difference in slopes between “medium-low” and “medium-high” word frequencies, and the third interaction term estimates the difference in slopes between “medium-high” and “high” word frequency conditions. Thus, the logic of how contrasts specify certain comparisons between conditions extends directly to the situation where differences in slopes are estimated.