4.3 A real-life example: The English relative clause data

The likelihood ratio test is also the way that hypothesis testing is done with the linear mixed model. Here is how it works. Let’s look again at the Grodner and Gibson (2005) English relative clause data. The null hypothesis here refers to the slope parameter. When we have the sum contrast coding, the intercept \(\beta_0\) refers to the grand mean, and the slope \(\beta_1\) is the amount by which subject and object relative clause mean reading times deviate from the grand mean. Testing the null hypothesis that \(\beta_1\) is 0 amounts to testing whether there is any difference in means between the two relative clause types. This becomes clear if we consider the following.

Let object relatives be coded as \(+1\) and subject relatives as \(-1\). Then, the mean reading time \(\mu_{or}\) for object relatives in the linear mixed model is:

\[\begin{equation} \mu_{or}=\beta_0 + \beta_1 \end{equation}\]

Similarly, the mean reading time \(\mu_{sr}\) for subject relatives is:

\[\begin{equation} \mu_{sr}=\beta_0 - \beta_1 \end{equation}\]

If the null hypothesis is that \(\mu_{or}-\mu_{sr}=0\), then this amounts to saying that:

\[\begin{equation} (\beta_0 + \beta_1)-(\beta_0 - \beta_1)=0 \end{equation}\]

Removing the brackets gives us:

\[\begin{equation} \beta_0 + \beta_1-\beta_0 + \beta_1 = 0 \end{equation}\]

This yields the equation:

\[\begin{equation} 2\beta_1= 0 \tag{4.5} \end{equation}\]

Dividing both sides of the equation by 2, we get the null hypothesis that \(\beta_1=0\).

Incidentally, if we had rescaled the contrast coding to be not \(\pm 1\) but \(\pm 1/2\), the parameter \(\beta_1\) would represent exactly the difference between the two means, and null hypothesis in equation (4.5) would have come out to be \(\beta_1= 0\). This is why it is sometimes better to recode the contrasts as \(\pm 1/2\) rather than \(\pm 1\). See Schad et al. (2020a) for details; we will discuss this in the contrast coding chapter as well.

Let’s load the data, set up the contrast coding, and fit the null versus the alternative models. We will fit varying intercept and varying slopes for subject and item, without correlations for items. We don’t attempt to fit a model with by-items varying intercepts and slopes with a correlation because we would get a singularity in the variance covariance matrix.

library(lingpsych)
data("df_gg05e1")
gg05e1 <- df_gg05e1

gg05e1$so <- ifelse(gg05e1$condition == "objgap", 1, -1)
gg05e1$logrt <- log(gg05e1$rawRT)

library(lme4)
m0 <- lmer(logrt ~ 1 + (1 + so | subject) + (1 + so || item), gg05e1)
m1 <- lmer(logrt ~ 1 + so + (1 + so | subject) + (1 + so || item), gg05e1)

Notice that we keep all random effects in the null model. We say that the null model is nested inside the full model.

Next, we compare the two models’ log likelihoods. There is a function in the lme4 package that achieves that: the anova function:

anova(m0, m1)

## refitting model(s) with ML (instead of REML)

## Data: gg05e1
## Models:
## m0: logrt ~ 1 + (1 + so | subject) + ((1 | item) + (0 + so | item))
## m1: logrt ~ 1 + so + (1 + so | subject) + ((1 | item) + (0 + so | item))
##    npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)  
## m0    7 707 739   -347      693                      
## m1    8 703 739   -343      687  6.15  1      0.013 *
## ---
## Signif. codes:  
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

You can confirm from the output that the Chisq value shown is minus two times the difference in log likelihood of the two models. The p-value is computed using the chi-squared distribution with one degree of freedom because in the two models the difference in the number of parameters is one:

round(pchisq(6.15, df = 1, lower.tail = FALSE), 3)

## [1] 0.013

It is common in the psycholinguistics literature to use the t-value from the linear mixed model output to conduct the hypothesis test on the slope:

summary(m1)$coefficients

##             Estimate Std. Error t value
## (Intercept)  5.88306    0.05176 113.669
## so           0.06202    0.02422   2.561

The most general method for hypothesis testing is the likelihood ratio test shown above. One can use the t-test output from the linear mixed model for hypothesis testing, but this should be done only when the data are balanced. If there is lack of balance (e.g., missing data for whatever reason), the likelihood ratio test is the best way to proceed. In any case, when we talk about the evidence against the null hypothesis, the likelihood ratio test is the only reasonable way to talk about what evidence we have. See Royall (1997) for more discussion of this point. The essence of Royall’s point is that the most reasonable way to talk about the evidence in favor of a particular model is with reference to, i.e., relative to, a baseline model.

One can also use the likelihood ratio test to evaluate whether a variance component should be included or not. For example, is the correlation parameter justified for the subjects random effects? Recall that we had a correlation of 0.58. Is this statistically significant? One can test this in the following way:

m1 <- lmer(logrt ~ 1 + so + (1 + so | subject) + (1 + so || item), gg05e1)
m1NoCorr <- lmer(logrt ~ 1 + so + (1 + so || subject) + (1 + so || item), gg05e1)
anova(m1, m1NoCorr)

## refitting model(s) with ML (instead of REML)

## Data: gg05e1
## Models:
## m1NoCorr: logrt ~ 1 + so + ((1 | subject) + (0 + so | subject)) + ((1 | item) + (0 + so | item))
## m1: logrt ~ 1 + so + (1 + so | subject) + ((1 | item) + (0 + so | item))
##          npar AIC BIC logLik deviance Chisq Df
## m1NoCorr    7 710 741   -348      696         
## m1          8 703 739   -343      687   8.7  1
##          Pr(>Chisq)   
## m1NoCorr              
## m1           0.0032 **
## ---
## Signif. codes:  
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The test indicates that we can reject the null hypothesis that the correlation parameter is 0. We will return to this parameter in the chapter on simulation.

A final point: the likelihood ratio has essentially the same logic as the t-test, and that includes the fact that the focus is on the rejection of the null. The null cannot be accepted in the face of a null result, unless there is a good case to be made that power is sufficiently high (we will investigate this point later, in the simulation chapter). Also, since the likelihood ratio depends on defining the alternative model using the maximum likelihood estimate (the sample mean), it suffers from exactly the same problem as the t-test (Type M errors in the face of low power). We will also investigate this point in the simulation chapter.

References

Grodner, Daniel, and Edward Gibson. 2005. “Consequences of the Serial Nature of Linguistic Input.” Cognitive Science 29: 261–90.

Royall, Richard. 1997. Statistical Evidence: A Likelihood Paradigm. New York: Chapman; Hall, CRC Press.

———. 2020a. “How to Capitalize on a Priori Contrasts in Linear (Mixed) Models: A Tutorial.” Journal of Memory and Language 110.