9.1 Overly simple random effects structure in LMMs inflate Type I error

We return here to the data-set we saw earlier, the Grodner and Gibson (2005) self-paced reading study (their Experiment 1) on English relative clauses. First, load the data, fit a model with a full variance-covariance matrix for subjects and items, and then extract the parameters from the fitted model, just like we did in the simulation chapter.

library(lingpsych)
## Grodner and Gibson 2005 Expt 1 data:
data("df_gg05e1")
gg05e1<-df_gg05e1
## look at the data frame:
head(gg05e1)

##    subject item condition rawRT
## 6        1    1    objgap   320
## 19       1    2   subjgap   424
## 34       1    3    objgap   309
## 49       1    4   subjgap   274
## 68       1    5    objgap   333
## 80       1    6   subjgap   266

## convert raw RTs to log RTs:
gg05e1$logrt<-log(gg05e1$rawRT)
## sum-coding of predictor:
gg05e1$so<-ifelse(gg05e1$condition=="objgap",1/2,-1/2)
## fit model to log RTs:
m<-lmer(logrt ~ so + 
          (1+so|subject) + 
          (1+so|item), 
        data=gg05e1,
        ## "switch off" warnings:
        control=lmerControl(calc.derivs=FALSE))

## boundary (singular) fit: see help('isSingular')

## extract parameter estimates:
beta<-round(summary(m)$coefficients[,1],4)
sigma_e<-round(attr(VarCorr(m),"sc"),2)
subj_ranefsd<-round(attr(VarCorr(m)$subject,"stddev"),4)
subj_ranefcorr<-round(attr(VarCorr(m)$subject,"corr"),1)

## assemble variance-covariance matrix for subjects:
Sigma_u<-SIN::sdcor2cov(stddev=subj_ranefsd,
                        corr=subj_ranefcorr)

item_ranefsd<-round(attr(VarCorr(m)$item,"stddev"),4)
item_ranefcorr<-round(attr(VarCorr(m)$item,"corr"),1)

## assemble variance matrix for items:

## choose some intermediate values for correlations:
corr_matrix<-(diag(2) + matrix(rep(1,4),ncol=2))/2

Sigma_w<-SIN::sdcor2cov(stddev=item_ranefsd,
                        corr=corr_matrix)

We are now ready to compute Type I error using simulation.

9.1.1 Type I error with a varying intercepts-only model

Compute Type I error for a varying intercepts model using log reading times:

## null is true:
beta[2]<-0
nsim<-100
significant<-rep(NA,nsim)

for(i in 1:nsim){
#generate sim data:
simdat<-gen_sim_lnorm2(nitem=16,
                         nsubj=42,
                       beta=beta,
                       Sigma_u=Sigma_u,
                       Sigma_w=Sigma_w,
                      sigma_e=sigma_e)

## fit model to sim data:
m<-lmer(log(rt)~so+(1|subj)+(1|item),simdat,
        control=lmerControl(calc.derivs=FALSE))
## record whether t-value is significant:
significant[i]<-ifelse(abs(summary(m)$coefficients[2,3])>2,1,0)
}

## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')

## Type I error is quite high:
mean(significant)

## [1] 0.31

We find that Type I error is quite high. This is the problem with fitting varying intercepts-only models that Barr et al. (2013) point out: if the data-generative process is such that varying slopes should have been included in the model, fitting a varying intercepts-only model generally will lead to Type I error inflation. Of course, the extent of the inflation will vary from data-set to data-set; no simulation can give us a general answer to the question of how much inflation will happen.

9.1.2 Type I error with a varying intercepts and varying slopes model

Next, let’s compute Type I error for a model with varying intercepts and varying slopes, but without correlations for the random effects (because these would be difficult to estimate with the number of subjects (42) and items (16) used in Grodner and Gibson (2005)). The dependent measure is still log rt.

significant<-rep(NA,nsim)

for(i in 1:nsim){
#generate sim data:
simdat<-gen_sim_lnorm2(nitem=16,
                         nsubj=42,
                       beta=beta,
                       Sigma_u=Sigma_u,
                       Sigma_w=Sigma_w,
                      sigma_e=sigma_e)

## fit model to sim data:
m<-lmer(log(rt)~so+(1+so|subj)+(1+so|item),simdat,
        control=lmerControl(calc.derivs=FALSE))
## is the t-value significant?
significant[i]<-ifelse(abs(summary(m)$coefficients[2,3])
                       >2,1,0)
}

## Type I error is the expected value:
mean(significant)

[1] 0.06

Now we find that Type I error is near the expected 5%. As Barr et al. (2013) pointed out in their paper, when varying slopes are included, Type I error will have the expected value.

9.1.3 Type I error inflation due to model mis-specification

Although this point is not discussed in Barr et al. (2013), an inappropriate likelihood function, like the Normal likelihood in the case where the true generative process involves a log-normal distribution, can also lead to inflated Type I error. This can happen even if the model has a full variance-covariance matrix specified for it. This is because of extreme values generated by the log-normal distribution being excessively influential in generating Type I errors.

Type I error is inflated even with a maximal model if the likelihood is mis-specified:

mean(significantlog)

## [1] 0.04

mean(significantraw)

## [1] 0.12

One can also compare the estimates (along with their 95% confidence intervals) that were significant in the raw and log ms scales:

pd <- position_dodge(.3)
ggplot(ests, aes(x=n, y=mn, colour=transform, group=transform)) + geom_errorbar(aes(ymin=lower, ymax=upper),
width=.2, size=0.25, colour="black", position=pd) + geom_line(position=pd) +
geom_point(position=pd, size=2.5)+theme_bw()+theme(axis.title.x = element_blank())

with(ests,tapply(width,transform,mean))

##   log   raw 
## 71.59 94.42

We see that the effect estimates that were significant were based on over-/mis-estimates on the raw scale (the true effect is 0 ms).

References

Barr, Dale J, Roger Levy, Christoph Scheepers, and Harry J Tily. 2013. “Random Effects Structure for Confirmatory Hypothesis Testing: Keep It Maximal.” Journal of Memory and Language 68 (3): 255–78.

Grodner, Daniel, and Edward Gibson. 2005. “Consequences of the Serial Nature of Linguistic Input.” Cognitive Science 29: 261–90.