3.5 Linear mixed models
We return to our subject and object relative clause data from English (Experiment 1 of Grodner and Gibson 2005). First load the data as usual, define relative clause type as a sum-coded predictor, and create a new column called so
that represents the contrast coding (\(\pm 1\) sum contrasts). From this point on, we will fit log-transformed reading times. To this end, create a column that holds log-transformed reading time.
$so <- ifelse(gg05e1$condition == "objgap", 1, -1)
gg05e1$logrt <- log(gg05e1$rawRT) gg05e1
Recall that these data have multiple measurements from each subject for each condition:
t(xtabs(~ subject + condition, gg05e1))
## subject
## condition 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## objgap 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
## subjgap 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
## subject
## condition 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
## objgap 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
## subjgap 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
## subject
## condition 34 35 36 37 38 39 40 41 42
## objgap 8 8 8 8 8 8 8 8 8
## subjgap 8 8 8 8 8 8 8 8 8
We can visualize the different reading times of each of the 42 subjects as in Figure 3.4.
It’s clear from this figure that different subjects have different effects of the relative clause manipulation: some slopes are positive sloping, some are flat, and some are negatively sloping. There is between-subject variability in the relative clause effect.
Given these differences between subjects, you could fit a separate linear model for each subject, collect together the slopes for each subject, and then check if the slopes are significantly different from zero. There is a function in the package lme4
that computes separate linear models for each subject: lmList
.
<- lmList(logrt ~ so | subject, gg05e1) lmlist_m1
One can extract the intercept and slope estimates for each subject. For example, for subject 1:
$`1`$coefficients lmlist_m1
## (Intercept) so
## 5.76962 0.04352
Figure 3.5 shows the individual fitted linear models for each subject, as well as the fit of a simple linear model m1log
for all the data taken together (the gray line); this figure shows how each subject deviates in intercept and slope from the model m1log
’s intercept and slope.
To find out if there is an effect of relative clause type, we simply need to check whether the slopes of the individual subjects’ fitted lines taken together are significantly different from zero. A one-sample t-test will achieve this:
summary_ttest(t.test(coef(lmlist_m1)[2]))
## [1] "t(41)=2.81 p=0.008"
## [1] "est.: 0.06 [0.02,0.11] ms"
The above test is exactly the same as the paired t-test and the varying intercepts linear mixed model that we fit earlier using the by-subject aggregated data. The only difference is that we are now using log reading times.
<- aggregate(log(rawRT) ~ subject + condition,
bysubj
mean,data = gg05e1
)
colnames(bysubj)[3] <- "logrt"
Compare the paired t-test and linear mixed model results (the t-values):
summary_ttest(t.test(logrt ~ condition, bysubj,
paired = TRUE))
## [1] "t(41)=2.81 p=0.008"
## [1] "est.: 0.12 [0.03,0.21] ms"
## compare with linear mixed model:
summary(lmer(
~ condition + (1 | subject),
logrt
bysubj$coefficients[2, ] ))
## Estimate Std. Error t value
## -0.12403 0.04414 -2.81021
The t-test we carried out using the slopes from the lmList
function is called repeated measures regression. This repeated measures regression model is now largely of historical interest, and useful only for understanding the linear mixed model, which will be our standard approach.
We turn next to three main types of linear mixed model; other variants will be introduced in later chapters.
3.5.1 Model type 1: Varying intercepts
We have already encountered this model. In the model shown below, the statement
\[\begin{equation} (1 \mid \hbox{subject}) \end{equation}\]
adjusts the grand mean estimates of the intercept by a term (a number) for each subject. That was the term called \(u_{0i}\) earlier in the chapter.
<- lmer(logrt ~ so + (1 | subject), gg05e1) m0_lmer
Notice that we did not aggregate the data this time.
Here is the abbreviated output from model m0_lmer
:
Random effects:
Groups Name Variance Std.Dev.
subject (Intercept) 0.09983 0.3160
Residual 0.14618 0.3823
Number of obs: 672, groups: subject, 42
Fixed effects:
Estimate Std. Error t value
(Intercept) 5.88306 0.05094 115.497
so 0.06202 0.01475 4.205
One thing to notice in the present example is that the coefficients (intercept and slope) of the fixed effects of the above model are identical to those in the linear model:
<- lm(logrt ~ so, gg05e1)
m0_lm summary(m0_lm)
##
## Call:
## lm(formula = logrt ~ so, data = gg05e1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.183 -0.308 -0.109 0.260 2.575
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.8831 0.0191 308.78 <2e-16 ***
## so 0.0620 0.0191 3.26 0.0012 **
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.494 on 670 degrees of freedom
## Multiple R-squared: 0.0156, Adjusted R-squared: 0.0141
## F-statistic: 10.6 on 1 and 670 DF, p-value: 0.00119
What is different between the linear model and the linear mixed model is the standard error. In the latter, the standard error is determined by more than one source of variance, as we explain below.
The intercept adjustments for each subject can be viewed by typing:
## first 10 subjects' intercept adjustments:
ranef(m0_lmer)$subject[, 1][1:10]
## [1] -0.103928 0.077195 -0.230621 0.234198 0.008828
## [6] -0.095363 -0.205571 -0.155371 0.075944 -0.364367
Figure 3.6 shows another way to summarize the adjustments to the grand mean intercept by subject. The error bars represent 95% confidence intervals.
library(lattice)
print(dotplot(ranef(m0_lmer, condVar = TRUE)))
## $subject
Figure 3.6 is only showing us the distribution of the adjustments \(u_{0i}\) around zero. One can also work out the estimates \(\hat\beta_0 + u{0i}\), i.e., each subject’s predicted intercept from the model. However, some random variable theory is needed.
If we have two random variables \(X\) and \(Y\) that are independent, the variance of the sum of these two random variables is \(Var(X) + Var(Y)\). The sampling distribution of the parameter \(\beta\) has some standard error \(SE_{\beta_0}\): we have those estimates from the linear mixed model.
The code below uses the above fact about random variable theory. The steps are:
- extract the random effect adjustments to the intercept for each subject; also extract the standard deviation of each estimate
- extract the intercept from the linear mixed model (\(\hat\beta_0\)), and its standard error
- work out the estimates \(\hat\beta_0 + u_{0i}\) and the standard deviation of the sum of these two random variables (keep in mind that for \(\hat\beta_0\) we are looking at the sampling distribution of the mean; the parameter \(\beta_0\) itself is not a random variable but a fixed value).
- use these estimates to work out the confidence intervals, and plot.
Here is the code that implements the above steps:
<-as.data.frame(ranef(m0_lmer,
ranefu0condVar = TRUE))
<-transform(ranefu0,
ranefu0SElwr = condval - 1.96*condsd,
upr = condval + 1.96*condsd)
<-ranefu0SE$condval
u0<-ranefu0SE$condsd
u0sd
<-summary(m0_lmer)$coefficients[1,1]
b0<-summary(m0_lmer)$coefficients[1,2]
b0SE
<-b0 + u0
b0u0<-sqrt(b0SE^2 + u0sd^2)
b0u0SD<-b0u0 - 2*b0u0SD
b0u0lower<-b0u0 + 2*b0u0SD
b0u0upper
<-data.frame(b=b0u0,
bysubjinterceptlower=b0u0lower,upper=b0u0upper)
$subj<-factor(1:42)
bysubjintercept<-
bysubjinterceptorder(bysubjintercept$b),]
bysubjintercept[$subj<-
bysubjinterceptfactor(bysubjintercept$subj,
levels=as.numeric(bysubjintercept$subj))
3.5.2 The formal statement of the varying intercepts model
The model m0_lmer
above prints out the following type of linear model. \(i\) indexes subject, and \(j\) indexes items.
Because this is a Latin square design, once we know the subject id and the item id, we know which subject saw which condition:
subset(gg05e1, subject == 1 & item == 1)
## subject item condition rawRT so logrt
## 6 1 1 objgap 320 1 5.768
The mathematical form of the linear mixed model is:
\[\begin{equation} y_{ij} = \beta_0 + u_{0i}+\beta_1\times so_{ij} + \varepsilon_{ij} \end{equation}\]
The only new thing here beyond the linear model we saw earlier is the by-subject adjustment to the intercept. These by-subject adjustments to the intercept \(u_{0i}\) are assumed by lmer
to come from a normal distribution centered around 0:
\[\begin{equation} u_{0i} \sim Normal(0,\sigma_{u0}) \end{equation}\]
The ordinary linear model m0_lm
has one intercept \(\beta_0\) for all subjects, whereas this linear mixed model with varying intercepts m0_lmer
has a different intercept (\(\beta_0 + u_{0i}\)) for each subject \(i\).
Figure 3.8 visualizes the adjustments for each subject to the intercepts.
An important point is that in this model there are two variance components or sources of variance (cf. the linear model, which had only one):
- Between-subject variability in the intercept:
- \(u_0 \sim Normal(0,\sigma_{u0})\)
- Within-subject variability:
- \(\varepsilon \sim Normal(0,\sigma)\)
These two standard deviations determine the standard error of the \(\beta_1\) slope parameter.
3.5.3 Model type 2: Varying intercepts and varying slopes, without a correlation
Unlike the figure associated with the lmlist_m1
model above, which also involves fitting separate models for each subject, the model m0_lmer
assumes different intercepts for each subject but the same slope.
We can choose to fit different intercepts as well as different slopes for each subject. To achieve this, assume now that each subject’s slope is also adjusted:
\[\begin{equation} y_{ij} = \beta_0 + u_{0i}+(\beta_1+u_{1i})\times so_{ij} + \varepsilon_{ij} \end{equation}\]
That is, we additionally assume that \(u_{1i} \sim Normal(0,\sigma_{u1})\). The lmer
notation for fitting separate intercepts and slopes is (1+so||subject)
. The double vertical bar syntax will be explained soon.
<- lmer(logrt ~ so + (1 + so || subject), gg05e1) m1_lmer
The output of this model shows that now there are not two but three sources of variability. These are:
- Between-subject variability in the intercept:
- \(u_0 \sim Normal(0,\sigma_{u0})\)
- Between-subject variability in the slope:
- \(u_1 \sim Normal(0,\sigma_{u1})\)
- Within-subject variability:
- \(\varepsilon \sim Normal(0,\sigma)\)
The model output reports the estimates of the three standard deviations:
- \(\hat\sigma_{u0}=0.317\)
- \(\hat\sigma_{u1}=0.110\)
- \(\hat\sigma = 0.365\).
Random effects:
Groups Name Variance Std.Dev.
subject (Intercept) 0.1006 0.317
subject.1 so 0.0121 0.110
Residual 0.1336 0.365
Number of obs: 672, groups: subject, 42
Fixed effects:
Estimate Std. Error t value
(Intercept) 5.8831 0.0509 115.50
so 0.0620 0.0221 2.81
Figure 3.9 visualizes these fits for each subject. As before, the gray line shows the model with a single intercept and slope, i.e., the simple linear model m0_lm
):
3.5.3.1 Comparing lmList model with the varying intercepts model
Compare this m1_lmer
model with the lmlist_m1
model we fitted earlier. Figure 3.10 visualizes the difference between the two models.
What is striking in Figure 3.10 is that in the linear mixed model plot, each subject’s estimated best fit line is “gravitating” towards the gray line which represents the simple linear model fit. Compare the lines to those from the lmList
fits; there, each subject’s fitted line is based only on that subject’s data, the lines don’t gravitate towards the gray line. This gravitation in the linear mixed model of individual subjects’ intercepts and slopes towards the grand mean intercepts and slopes is called shrinkage. What is happening here is that the estimates for the intercept and slope for each subject is informed by two sources of information: the average behavior of all the subjects taken together, and the individual subjects’ measurements. By contrast, in the lmList
model, the individual subjects’ fits are only informed by the individual subjects’ data, ignoring the average estimates of the intercept and slope over all subjects.
3.5.3.2 Visualizing random effects
As before, it is instructive to visualize the by-subjects adjustments to the intercept and slope. See Figure 3.11. What Figure 3.11 is showing is wide variability in the mean reading times between subjects, but relatively smaller variation in the slope between subjects.
3.5.3.3 The formal statement of the varying intercepts and varying slopes linear mixed model
Here is the formal statement of the varying intercept and varying slopes model. Again, i indexes subjects, j items.
\[\begin{equation} y_{ij} = \beta_0 + u_{0i}+(\beta_1+u_{1i})\times so_{ij} + \varepsilon_{ij} \end{equation}\]
There are three variance components, which partition the variation in the data to between-subjects and within-subjects variability:
- Between-subjects variability in the intercept:
- \(u_0 \sim Normal(0,\sigma_{u0})\)
- Between-subjects variability in the slope:
- \(u_1 \sim Normal(0,\sigma_{u1})\)
- Within-subject variability:
- \(\varepsilon \sim Normal(0,\sigma)\)
3.5.3.4 Crossed random effects for subjects and for items
The varying intercepts and varying slopes model doesn’t capture the full story yet. The items also contribute sources of variance: just like subjects, items may also vary in their reading times or in the extent to which the reading times are impacted by condition. I other words, they might have different intercepts and slopes.
Notice that in the Latin square design (which is standard in psycholinguistics or linguistic research) each subject sees all the items, but only one condition from each item. In such cases, we say that subjects and items are crossed.
head(xtabs(~ subject + item, gg05e1))
## item
## subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
A design with only some items being shown to each subject would be called partially crossed. A linear mixed model with (partially) crossed subject and items random effects can be defined with the following syntax:
<- lmer(logrt ~ so + (1 + so || subject) +
m2_lmer 1 + so || item), gg05e1) (
Analogously to the preceding example, now there are five variance components:
Random effects:
Groups Name Variance Std.Dev.
subject (Intercept) 0.10090 0.3177
subject.1 so 0.01224 0.1106
item (Intercept) 0.00127 0.0356
item.1 so 0.00162 0.0402
Residual 0.13063 0.3614
Number of obs: 672, groups: subject, 42; item, 16
Fixed effects:
Estimate Std. Error t value
(Intercept) 5.8831 0.0517 113.72
so 0.0620 0.0242 2.56
The item intercept and slope adjustments can be visualized as well; see Figure 3.12. Notice that in the present data, there is less item-level variation than subject-level variation; this is often the case in planned experiments in sentence processing, where the experimental items are carefully constructed to vary as little as possible. However, there could be important variation in the items, and for that reason one should always take item level variability into account.
In the above models, there is an explicit assumption that there is no correlation between the varying intercept and slope adjustments by subject, and no correlation between the varying intercept and slope adjustments by item. It is possible that the intercept and slope adjustments are in fact correlated. We turn to this model next.
3.5.4 Model type 3: Varying intercepts and varying slopes, with correlation
A correlation can be introduced between the intercept and slope adjustments by using a single vertical bar instead of two vertical bars in the random effects structure:
<- lmer(
m3_lmer ~ so + (1 + so | subject) + (1 + so | item),
logrt
gg05e1 )
## boundary (singular) fit: see help('isSingular')
To understand what this model is doing, we have to recall what a bivariate/multivariate distribution is.
Random effects:
Groups Name Variance Std.Dev. Corr
subject (Intercept) 0.10103 0.3178
so 0.01228 0.1108 0.58
item (Intercept) 0.00172 0.0415
so 0.00196 0.0443 1.00 <= degeneracy
Residual 0.12984 0.3603
Number of obs: 672, groups: subject, 42; item, 16
Fixed effects:
Estimate Std. Error t value
(Intercept) 5.8831 0.0520 113.09
so 0.0620 0.0247 2.51
The correlations (0.58 and 1.00) you see in the model output above are the correlations between the varying intercepts and slopes for subjects and for items. The variance covariance matrix for items is called “degenerate”. This means that, because its correlation is 1, the matrix cannot be inverted. Variance-covariance matrices must be invertible. The model output also issues a “singularity” warning, which is the same thing as a matrix being not invertible.
When the correlation is \(\pm 1\) or very close to \(\pm 1\), this means that the optimizer in lme4
is unable to estimate the correlation parameter. This failure to estimate the correlation is almost always due to there not being enough data. If you are in such a situation, you are better off not trying to estimate this parameter with the data you have, and instead fitting one of the simpler models. We will return to this point when discussing model selection. For further discussion, see Barr et al. (2013), Bates et al. (2015), and Matuschek et al. (2017).
3.5.4.1 Formal statement of varying intercepts and varying slopes linear mixed model with correlation
As usual, i indexes subjects, j items. The vector so
is the sum-coded factor levels: +1 for object relatives and -1 for subject relatives. The only new thing in this model is the item-level effects, and the specification of the variance-covariance matrix for subjects and items, in order to include the correlation parameters.
\[\begin{equation} y_{ij} = \beta_0 + u_{0i} + w_{0j} + (\beta_1 + u_{1i} + w_{1j}) \times so_{ij} + \varepsilon_{ij} \end{equation}\]
where \(\varepsilon_{ij} \sim Normal(0,\sigma)\) and
\[\begin{equation}\label{eq:covmatLM} \Sigma_u = \begin{pmatrix} \sigma _{u0}^2 & \rho _{u}\sigma _{u0}\sigma _{u1}\\ \rho _{u}\sigma _{u0}\sigma _{u1} & \sigma _{u1}^2\\ \end{pmatrix} \quad \Sigma _w = \begin{pmatrix} \sigma _{w0}^2 & \rho _{w}\sigma _{w0}\sigma _{w1}\\ \rho _{w}\sigma _{w0}\sigma _{w1} & \sigma _{w1}^2\\ \end{pmatrix} \end{equation}\]
\[\begin{equation}\label{eq:jointpriordistLM} \begin{pmatrix} u_0 \\ u_1 \\ \end{pmatrix} \sim \mathcal{N} \left( \begin{pmatrix} 0 \\ 0 \\ \end{pmatrix}, \Sigma_{u} \right), \quad \begin{pmatrix} w_0 \\ w_1 \\ \end{pmatrix} \sim \mathcal{N}\left( \begin{pmatrix} 0 \\ 0 \\ \end{pmatrix}, \Sigma_{w} \right) \end{equation}\]
3.5.4.2 Visualizing the random effects
One can visualize the correlation between intercepts and slopes by subjects; see Figure 3.13. The positive correlation of 0.58 between subject intercept and slope adjustments implies that slower subjects show larger effects. The correlation pattern isn’t easy to see in Figure 3.13.
The correlation pattern becomes more apparent if we plot the slope adjustments against the intercept adjustments. See Figure 3.14.
When we talk about hypothesis testing, we will look at what inferences we can draw from this correlation.
Figure 3.15 is a dotplot showing the item-level effects. The plot suggests a perfect correlation between intercept and slope adjustments, but as mentioned above, this correlation is a result of a failure to estimate the correlation. It is not meaningful.
For these data, the appropriate model would be one without correlations for items:
<- lmer(
m4_lmer ~ so + (1 + so | subject) + (1 + so || item),
logrt
gg05e1
)summary(m4_lmer)
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## logrt ~ so + (1 + so | subject) + ((1 | item) + (0 + so | item))
## Data: gg05e1
##
## REML criterion at convergence: 696.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.000 -0.509 -0.129 0.292 6.180
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## subject (Intercept) 0.10088 0.3176
## so 0.01220 0.1105 0.58
## item (Intercept) 0.00132 0.0363
## item.1 so 0.00163 0.0403
## Residual 0.13060 0.3614
## Number of obs: 672, groups: subject, 42; item, 16
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 5.8831 0.0518 113.67
## so 0.0620 0.0242 2.56
##
## Correlation of Fixed Effects:
## (Intr)
## so 0.388
The double vertical bar for the item random effects is the syntax in lme4
for removing the correlation term that the model was unable to estimate. For the Grodner and Gibson (2005) data, model m4_lmer
is the one we would report in a paper. Specifically, the wording that would be appropriate is the following: “The estimate of the relative clause effect was 43 ms, with a t-value of 2.56. We can therefore reject the null hypothesis of no difference between the two relative clause types.”
In a balanced data set like this one (i.e., with no missing data), the t-test on the slope \(\beta_1\) is sufficient for drawing inferences about the null hypothesis test. However, when there is missing data (this happens for example in eyetracking data), the likelihood ratio test discussed in the next chapter is the more general method. In practice, researchers often interpret the t-value as we have done here, but in this book we will always use the likelihood ratio test (chapter 4) for hypothesis testing, as that is the more general approach.