Textbook

Introduction to Bayesian Data Analysis for Cognitive Science

Nicenboim, Schad, Vasishth

Online version: https://bruno.nicenboim.me/bayescogsci/
Source code: https://github.com/bnicenboim/bayescogsci
Physical book: here

This lecture presupposes that you have read chapter 5 of the textbook.

A simple linear model (EEG data): “Fixed-effects” model

\[\begin{equation} signal_n \sim \mathit{Normal}( \alpha + c\_cloze_n \cdot \beta,\sigma) \end{equation}\]

where $n =1, \ldots, N$, and $signal$ is the dependent variable (average signal in the N400 spatiotemporal window in microvolts). The variable $N$ represents the total number of data points.

We are going to use the following priors:

\[\begin{equation} \begin{aligned} \alpha &\sim \mathit{Normal}(0,10)\\ \beta &\sim \mathit{Normal}(0,10)\\ \sigma &\sim \mathit{Normal}_{+}(0,50) \end{aligned} \end{equation}\]

df_eeg$c_cloze<-df_eeg$cloze-mean(df_eeg$cloze)

fit_N400_cp <-
  brm(n400 ~ c_cloze,
      prior = c(prior(normal(0, 10), class = Intercept),
                prior(normal(0, 10), class = b, coef = c_cloze),
                prior(normal(0, 50), class = sigma)),
      data = df_eeg)

Check the summary:

short_summary(fit_N400_cp)

## ...
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     3.66      0.22     3.23     4.08 1.00     4220     3041
## c_cloze       2.26      0.54     1.19     3.34 1.00     4122     2938
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma    11.82      0.16    11.51    12.14 1.00     4224     3057
## 
## ...

The correlation between the residuals (not terrible):

m<-lm(n400 ~ c_cloze,df_eeg)
acf(residuals(m))

In repeated measurements data, if you fit a fixed effects model, the residuals can be heavily correlated because one has multiple measurements from each subject and each item.

No-pooling model (separate linear models for each subject)

\[\begin{equation} signal_n \sim \mathit{Normal}( \alpha_{subj[n]} + c\_cloze_n \cdot \beta_{subj[n]},\sigma) \end{equation}\]

\[\begin{equation} \begin{aligned} \alpha_{i} &\sim \mathit{Normal}(0,10)\\ \beta_{i} &\sim \mathit{Normal}(0,10)\\ \sigma &\sim \mathit{Normal}_+(0,50) \end{aligned} \end{equation}\]

fit_N400_np <- brm(n400 ~ 0 + factor(subj) + c_cloze:factor(subj),
                   prior = c(prior(normal(0, 10), class = b),
                             prior(normal(0, 50), class = sigma)),
                   data = df_eeg)

# parameter name of beta by subject:
ind_effects_np <- paste0("b_factorsubj",
                         unique(df_eeg$subj), ":c_cloze")
beta_across_subj <- as.data.frame(fit_N400_np) %>%
  #removes the meta data from the object
  select(all_of(ind_effects_np)) %>%
  rowMeans()

# Calculate the average of these estimates
(grand_av_beta <- tibble(mean = mean(beta_across_subj),
                         lq = quantile(beta_across_subj, c(0.025)),
                         hq = quantile(beta_across_subj, c(0.975))))

## # A tibble: 1 × 3
##    mean    lq    hq
##   <dbl> <dbl> <dbl>
## 1  2.18  1.20  3.16

# make a table of beta's by subject
beta_by_subj <- posterior_summary(fit_N400_np,
                                  variable = ind_effects_np) %>%
  as.data.frame() %>%
  mutate(subject = 1:n()) %>%
  ## reorder plot by magnitude of mean:
  arrange(Estimate) %>%
  mutate(subject = factor(subject, levels = subject))

ggplot(beta_by_subj,
       aes(x = Estimate, xmin = Q2.5, xmax = Q97.5, y = subject)) +
  geom_point() +
  geom_errorbarh() +
  geom_vline(xintercept = grand_av_beta$mean) +
  geom_vline(xintercept = grand_av_beta$lq, linetype = "dashed") +
  geom_vline(xintercept = grand_av_beta$hq, linetype = "dashed") +
  xlab("By-subject effect of cloze probability in microvolts")

No pooling model.

Varying intercepts and varying slopes model (without correlation)

\[\begin{equation} signal_n \sim \mathit{Normal}(\alpha + u_{subj[n],1} + c\_cloze_n \cdot (\beta+ u_{subj[n],2}),\sigma) \end{equation}\]

\[\begin{equation} \begin{aligned} \alpha &\sim \mathit{Normal}(0,10)\\ \beta &\sim \mathit{Normal}(0,10)\\ u_1 &\sim \mathit{Normal}(0,\tau_{u_1})\\ u_2 &\sim \mathit{Normal}(0,\tau_{u_2})\\ \tau_{u_1} &\sim \mathit{Normal}_+(0,20) \\ \tau_{u_2} &\sim \mathit{Normal}_+(0,20) \\ \sigma &\sim \mathit{Normal}_+(0,50) \end{aligned} \end{equation}\]

prior_v <-
  c(prior(normal(0, 10), class = Intercept),
    prior(normal(0, 10), class = b, coef = c_cloze),
    prior(normal(0, 50), class = sigma),
    prior(normal(0, 20), class = sd, coef = Intercept, group = subj),
    prior(normal(0, 20), class = sd, coef = c_cloze, group = subj))
fit_N400_v <- brm(n400 ~ c_cloze + (c_cloze || subj),
                  prior = prior_v,
                  data = df_eeg)

When we print a brms fit, we first see the summaries of the posteriors of the standard deviation of the by-group intercept and slopes, $\tau_{u_1}$ and $\tau_{u_2}$ as sd(Intercept) and sd(c_cloze), and then, as with previous models, the population-level effects, $\alpha$ and $\beta$ as Intercept and c_cloze, and the scale of the likelihood, $\sigma$, as sigma. The full summary can be printed out by typing:

short_summary(fit_N400_v)

## ...
## Group-Level Effects: 
## ~subj (Number of levels: 37) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     2.17      0.36     1.53     2.95 1.00     1715     2443
## sd(c_cloze)       1.72      0.90     0.13     3.50 1.00     1159     1608
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     3.64      0.41     2.81     4.44 1.00     1611     2001
## c_cloze       2.32      0.62     1.08     3.52 1.00     4008     2612
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma    11.61      0.15    11.32    11.92 1.00     5480     2898
## 
## ...

mcmc_dens(fit_N400_v, pars = variables(fit_N400_v)[1:5])

The posteriors of the parameters.

# make a table of u_2s
ind_effects_v <- paste0("r_subj[", unique(df_eeg$subj),
                        ",c_cloze]")
u_2_v <- posterior_summary(fit_N400_v, variable = ind_effects_v) %>%
  as_tibble() %>%
  mutate(subj = 1:n()) %>%
  ## reorder plot by magnitude of mean:
  arrange(Estimate) %>%
  mutate(subj = factor(subj, levels = subj))
# We plot:
ggplot(u_2_v,
       aes(x = Estimate, xmin = Q2.5, xmax = Q97.5, y = subj)) +
  geom_point() +
  geom_errorbarh() +
  xlab("By-subject adjustment to the slope in microvolts")

Partial pooling.

There is an important difference between the no-pooling model and the varying intercepts and slopes model we just fit. The no-pooling model fits each individual subject’s intercept and slope independently for each subject. By contrast, the varying intercepts and slopes model takes all the subjects’ data into account in order to compute the fixed effects $\alpha$ and $\beta$; and the model “shrinks” the by-subject intercept and slope adjustments towards the fixed effects estimates. In Figure 4 below, we can see the shrinkage of the estimates in the varying intercepts model by comparing them with the estimates of the no pooling model ($M_{np}$).

# Extract parameter estimates from the no pooling model:
par_np <- posterior_summary(fit_N400_np, variable = ind_effects_np) %>%
  as_tibble() %>%
  mutate(model = "No pooling",
         subj = unique(df_eeg$subj))
# For the hierarchical model, the code is more complicated
# because we want the effect (beta) + adjustment.
# Extract the overall group level effect:
beta <- c(as_draws_df(fit_N400_v)$b_c_cloze)
# Extract the individual adjustments:
ind_effects_v <- paste0("r_subj[", unique(df_eeg$subj), ",c_cloze]")
adjustment <- as_draws_matrix(fit_N400_v, variable = ind_effects_v)
# Get the by subject effects in a data frame where each adjustment
# is in each column.
# Remove all the draws meta data by using as.data.frame
by_subj_effect <- as.data.frame(beta + adjustment)
# Summarize them by getting a table with the mean and the
# quantiles for each column and then binding them.
par_h <- lapply(by_subj_effect, function(x) {
  tibble(Estimate = mean(x),
         Q2.5 = quantile(x, .025),
         Q97.5 = quantile(x, .975))}) %>%
  bind_rows() %>%
  # Add a column to identify that the model,
  # and one with the subject labels:
  mutate(model = "Hierarchical",
         subj = unique(df_eeg$subj))
# The mean and 95% CI of both models in one data frame:
by_subj_df <- bind_rows(par_h, par_np) %>%
  arrange(Estimate) %>%
  mutate(subj = factor(subj, levels = unique(.data$subj)))

b_c_cloze <- posterior_summary(fit_N400_v, variable = "b_c_cloze")
ggplot(by_subj_df,
       aes(ymin = Q2.5, ymax = Q97.5, x = subj,
           y = Estimate, color = model, shape = model)) +
  geom_errorbar(position = position_dodge(1)) +
  geom_point(position = position_dodge(1)) +
  # We'll also add the mean and 95% CrI of the overall difference
  # to the plot:
  geom_hline(yintercept = b_c_cloze[, "Estimate"]) +
  geom_hline(yintercept = b_c_cloze[, "Q2.5"],
             linetype = "dotted", linewidth = .5) +
  geom_hline(yintercept = b_c_cloze[, "Q97.5"],
             linetype = "dotted", linewidth = .5) +
  xlab("N400 effect of predictability") +
  coord_flip()

This plot compares the estimates of the effect of cloze probability for each subject between (i) the no pooling and (ii) the varying intercepts and varying slopes, hierarchical, model.

Varying intercepts and varying slopes (with correlation)

\[\begin{equation} signal_n \sim \mathit{Normal}(\alpha + u_{subj[n],1} + c\_cloze_n \cdot (\beta + u_{subj[n],2}),\sigma) \end{equation}\]

The correlation is indicated in the priors on the adjustments for intercept $u_{1}$ and slopes $u_{2}$.

Priors: \[\begin{equation} \begin{aligned} \alpha & \sim \mathit{Normal}(0,10) \\ \beta & \sim \mathit{Normal}(0,10) \\ \sigma &\sim \mathit{Normal}_+(0,50)\\ {\begin{pmatrix} u_{i,1} \\ u_{i,2} \end{pmatrix}} &\sim {\mathcal {N}} \left( {\begin{pmatrix} 0\\ 0 \end{pmatrix}} ,\boldsymbol{\Sigma_u} \right) \end{aligned} \end{equation}\]

\[\begin{equation} \boldsymbol{\Sigma_u} = {\begin{pmatrix} \tau_{u_1}^2 & \rho_u \tau_{u_1} \tau_{u_2} \\ \rho_u \tau_{u_1} \tau_{u_2} & \tau_{u_2}^2 \end{pmatrix}} \end{equation}\]

Visualization of the LKJ correlation distribution prior with four different values of the eta parameter.

Visualization of the LKJ correlation distribution prior with four different values of the eta parameter.

\[\begin{equation} \begin{aligned} \tau_{u_1} &\sim \mathit{Normal}_+(0,20)\\ \tau_{u_2} &\sim \mathit{Normal}_+(0,20)\\ \rho_u &\sim \mathit{LKJcorr}(2) \end{aligned} \end{equation}\]

prior_h <- c(prior(normal(0, 10), class = Intercept),
             prior(normal(0, 10), class = b, coef = c_cloze),
             prior(normal(0, 50), class = sigma),
             prior(normal(0, 20),
                   class = sd, coef = Intercept,
                   group = subj
                   ),
             prior(normal(0, 20),
                   class = sd, coef = c_cloze,
                   group = subj),
             prior(lkj(2), class = cor, group = subj))
fit_N400_h <- brm(n400 ~ c_cloze + (c_cloze | subj),
                  prior = prior_h,
                  data = df_eeg)

plot(fit_N400_h, nvariables = 6)

Posteriors.

Varying intercept and varying slopes for subject and item (“Maximal model”)

The term “maximal model” became mainstream in psycholinguistics and psychology after Barr et al 2013 came out.

\[\begin{multline} signal_n \sim \mathit{Normal}(\alpha + u_{subj[n],1} + w_{item[n],1} + c\_cloze_n \cdot (\beta + u_{subj[n],2}+ w_{item[n],2}), \sigma) \end{multline}\]

Priors: \[\begin{equation} \begin{aligned} \alpha & \sim \mathit{Normal}(0,10) \\ \beta & \sim \mathit{Normal}(0,10) \\ \sigma &\sim \mathit{Normal}_+(0,50)\\ {\begin{pmatrix} u_{i,1} \\ u_{i,2} \end{pmatrix}} &\sim {\mathcal {N}} \left( {\begin{pmatrix} 0\\ 0 \end{pmatrix}} ,\boldsymbol{\Sigma_u} \right) \\ {\begin{pmatrix} w_{j,1} \\ w_{j,2} \end{pmatrix}} &\sim {\mathcal {N}} \left( {\begin{pmatrix} 0\\ 0 \end{pmatrix}} ,\boldsymbol{\Sigma_w} \right) \end{aligned} \end{equation}\]

We have added the index $j$, which represents each item, as we did with subjects; $item[n]$ indicates the item that corresponds to the observation in the $n$-th row of the data frame.

We have hyperparameters and hyperpriors as before:

\[\begin{equation} \begin{aligned} \boldsymbol{\Sigma_u} & = {\begin{pmatrix} \tau_{u_1}^2 & \rho_u \tau_{u_1} \tau_{u_2} \\ \rho_u \tau_{u_1} \tau_{u_2} & \tau_{u_2}^2 \end{pmatrix}}\\ \boldsymbol{\Sigma_w} & = {\begin{pmatrix} \tau_{w_1}^2 & \rho_w \tau_{w_1} \tau_{w_2} \\ \rho_w \tau_{w_1} \tau_{w_2} & \tau_{w_2}^2 \end{pmatrix}} \end{aligned} \end{equation}\]

\[\begin{equation} \begin{aligned} \tau_{u_1} &\sim \mathit{Normal}_+(0,20)\\ \tau_{u_2} &\sim \mathit{Normal}_+(0,20)\\ \rho_u &\sim \mathit{LKJcorr}(2) \\ \tau_{w_1} &\sim \mathit{Normal}_+(0,20)\\ \tau_{w_2} &\sim \mathit{Normal}_+(0,20)\\ \rho_w &\sim \mathit{LKJcorr}(2) \\ \end{aligned} \end{equation}\]

prior_sih_full <-
  c(prior(normal(0, 10), class = Intercept),
    prior(normal(0, 10), class = b, coef = c_cloze),
    prior(normal(0, 50), class = sigma),
    prior(normal(0, 20),
          class = sd, coef = Intercept,
          group = subj),
    prior(normal(0, 20),
          class = sd, coef = c_cloze,
          group = subj),
    prior(lkj(2), class = cor, group = subj),
    prior(normal(0, 20),
          class = sd, coef = Intercept,
          group = item),
    prior(normal(0, 20),
          class = sd, coef = c_cloze,
          group = item),
    prior(lkj(2), class = cor, group = item))

fit_N400_sih <- brm(n400 ~ c_cloze + (c_cloze | subj) +
                      (c_cloze | item),
                    prior = prior_sih_full,
                    data = df_eeg)

We can also simplify the call to brms, when we assign the same priors to the by-subject and by-item parameters:

prior_sih <-
  c(prior(normal(0, 10), class = Intercept),
    prior(normal(0, 10), class = b),
    prior(normal(0, 50), class = sigma),
    prior(normal(0, 20), class = sd),
    prior(lkj(2), class = cor))

fit_N400_sih <- brm(n400 ~ c_cloze + (c_cloze | subj) +
                      (c_cloze | item),
                    prior = prior_sih,
                    data = df_eeg)

short_summary(fit_N400_sih)

## ...
## Group-Level Effects: 
## ~item (Number of levels: 80) 
##                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(Intercept)              1.48      0.37     0.72     2.16 1.01      965
## sd(c_cloze)                2.30      1.01     0.24     4.24 1.00     1050
## cor(Intercept,c_cloze)    -0.41      0.32    -0.89     0.34 1.00     2715
##                        Tail_ESS
## sd(Intercept)               441
## sd(c_cloze)                1189
## cor(Intercept,c_cloze)     2496
## 
## ~subj (Number of levels: 37) 
##                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(Intercept)              2.20      0.36     1.55     2.97 1.00     1872
## sd(c_cloze)                1.47      0.89     0.08     3.35 1.00     1349
## cor(Intercept,c_cloze)     0.13      0.37    -0.63     0.79 1.00     6291
##                        Tail_ESS
## sd(Intercept)              2817
## sd(c_cloze)                2355
## cor(Intercept,c_cloze)     2738
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     3.66      0.46     2.75     4.55 1.00     3145     3335
## c_cloze       2.31      0.69     0.94     3.66 1.00     5990     3215
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma    11.49      0.16    11.18    11.80 1.00     6812     2362
## 
## ...

plot(fit_N400_sih, nvariables = 9)

$The posterior distributions of the parameters in the model with varying intercepts and varying slopes for subjects and items.$

The posterior distributions of the parameters in the model with varying intercepts and varying slopes for subjects and items.

Beyond the maximal model: Distributional regression

\[\begin{equation} \begin{aligned} signal_n &\sim \mathit{Normal}(\alpha + u_{subj[n],1} + w_{item[n],1} + \\ & \hspace{2cm} c\_cloze_n \cdot (\beta + u_{subj[n],2}+ w_{item[n],2}), \sigma_n)\\ \sigma_n &= \exp(\sigma_\alpha + \sigma_{u_{subj[n]}}) \end{aligned} \end{equation}\]

\[\begin{equation} \begin{aligned} \sigma_\alpha &\sim \mathit{Normal}(0,log(50))\\ \sigma_u &\sim \mathit{Normal}(0, \tau_{\sigma_u}) \\ \tau_{\sigma_u} &\sim \mathit{Normal}_+(0, 5) \end{aligned} \end{equation}\]

prior_s <-
  c(prior(normal(0, 10), class = Intercept),
    prior(normal(0, 10), class = b),
    prior(normal(0, 20), class = sd),
    prior(lkj(2), class = cor),
    prior(normal(0, log(50)), class = Intercept, dpar = sigma),
    prior(normal(0, 5),
          class = sd, group = subj,
          dpar = sigma))

fit_N400_s <- brm(brmsformula(
  n400 ~ c_cloze + (c_cloze | subj) + (c_cloze | item),
  sigma ~ 1 + (1 | subj)),
  prior = prior_s, data = df_eeg)

Inspect the output below; notice that our estimate for the effect of cloze remains very similar to that of the model fit_N400_sih.

Compare the two models’ estimates:

posterior_summary(fit_N400_sih, variable = "b_c_cloze")

##           Estimate Est.Error      Q2.5    Q97.5
## b_c_cloze 2.313692 0.6926312 0.9412067 3.657684

posterior_summary(fit_N400_s, variable = "b_c_cloze")

##           Estimate Est.Error     Q2.5    Q97.5
## b_c_cloze 2.301003 0.6649654 1.023091 3.569686

A log-normal model of the Stroop effect

\[\begin{equation} rt_n \sim \mathit{LogNormal}(\alpha + u_{subj[n],1} + c\_cond_n \cdot (\beta + u_{subj[n],2}), \sigma) \end{equation}\]

\[\begin{equation} \begin{aligned} \mu_{incongruent} &= \alpha + 1 \cdot \beta \\ \mu_{congruent} &= \alpha + -1 \cdot \beta \end{aligned} \end{equation}\]

\[\begin{equation} \begin{aligned} \alpha & \sim \mathit{Normal}(6, 1.5) \\ \beta & \sim \mathit{Normal}(0, 0.01) \\ \sigma &\sim \mathit{Normal}_+(0, 1) \end{aligned} \end{equation}\]

\[\begin{equation} \begin{aligned} {\begin{pmatrix} u_{i,1} \\ u_{i,2} \end{pmatrix}} &\sim {\mathcal {N}} \left( {\begin{pmatrix} 0\\ 0 \end{pmatrix}} ,\boldsymbol{\Sigma_u} \right) \end{aligned} \end{equation}\]

\[\begin{equation} \begin{aligned} \tau_{u_1} &\sim \mathit{Normal}_+(0,1)\\ \tau_{u_2} &\sim \mathit{Normal}_+(0,1)\\ \rho_u &\sim \mathit{LKJcorr}(2) \end{aligned} \end{equation}\]

We restrict ourselves to the correct trials only, and add a c_cond predictor, sum-coded as described earlier.

data("df_stroop")
(df_stroop <- df_stroop %>%
  mutate(c_cond = if_else(condition == "Incongruent", 1, -1)))

## # A tibble: 3,058 × 5
##     subj trial condition      RT c_cond
##    <dbl> <int> <chr>       <int>  <dbl>
##  1     1     0 Congruent    1484     -1
##  2     1     1 Incongruent  1316      1
##  3     1     2 Incongruent   628      1
##  4     1     3 Congruent     511     -1
##  5     1     4 Congruent     509     -1
##  6     1     5 Incongruent   903      1
##  7     1     6 Incongruent   759      1
##  8     1     7 Incongruent  1133      1
##  9     1     8 Incongruent   678      1
## 10     1     9 Incongruent   785      1
## # ℹ 3,048 more rows

Fit the model.

fit_stroop <- brm(RT ~ c_cond + (c_cond | subj),
                  family = lognormal(),
                  prior =
                    c(prior(normal(6, 1.5), class = Intercept),
                      prior(normal(0, .01), class = b),
                      prior(normal(0, 1), class = sigma),
                      prior(normal(0, 1), class = sd),
                      prior(lkj(2), class = cor)),
                  data = df_stroop)

## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#bulk-ess

We will focus on $\beta$ (but you can verify that there is nothing surprising in the other parameters in the model fit_stroop ).

posterior_summary(fit_stroop, variable = "b_c_cond")

##          Estimate  Est.Error       Q2.5      Q97.5
## b_c_cond 0.026946 0.00561219 0.01575175 0.03744102

Class exercise 1

This exercise tests your ability to map the lme4 syntax for linear mixed models to the underlying mathematical/statistical model. This skill is important because in later chapters we will go from the mathematical statement of the model to Stan syntax.

The Dillon et al 2013 data set looks at interference effects in reflexives in English. We look at a simplified version of these data.

library(bcogsci)
data("df_dillonE1")
head(df_dillonE1)

##         subj       item   rt int     expt
## 49 dillonE11 dillonE119 2918 low dillonE1
## 56 dillonE11 dillonE119 1338 low dillonE1
## 63 dillonE11 dillonE119  424 low dillonE1
## 70 dillonE11 dillonE119  186 low dillonE1
## 77 dillonE11 dillonE119  195 low dillonE1
## 84 dillonE11 dillonE119 1218 low dillonE1

I have removed six other conditions, and focus only on two conditions (so this design is not completely correct; we will return to the full design after the contrast coding lectures).

The two conditions are as follows (both are ungrammatical sentences):

High interference: The amateur bodybuilder who worked with the personal trainers amazingly injured themselves on the lightest weights.
Low interference: The amateur bodybuilder who worked with the personal trainer amazingly injured themselves on the lightest weights.

Theory predicts a difference between high and low interference conditions in reading time, at the word themselves. The reason is that there is an illusion of grammaticality in the high interference condition due to local agreement between trainers and themselves. For details, see the original paper.

If we compute grand averages, we see that high interference conditions are indeed read faster than low interference conditions:

round(with(df_dillonE1,tapply(rt,int,mean)))

## high  low 
##  801  856

Your tasks

First, set up an appropriate contrast coding for this design: +1/2 for high interference and -1/2 for low interference:

df_dillonE1$c_int<-ifelse(df_dillonE1$int=="low",-1/2,1/2)

Fit the following models in brms. You will have to decide on priors. For each model, write down the underlying statistical model in mathematical notation.

A no pooling model analogous to this frequentist one:

library(lme4)

## Loading required package: Matrix

## 
## Attaching package: 'lme4'

## The following object is masked from 'package:brms':
## 
##     ngrps

m1<-lmList(log(rt)~c_int|subj,df_dillonE1)
## separate estimates
m1

## Call: lmList(formula = log(rt) ~ c_int | subj, data = df_dillonE1) 
## Coefficients:
##            (Intercept)        c_int
## dillonE11     6.964055  0.162119114
## dillonE111    6.697648  0.099414490
## dillonE112    6.601500  0.188842100
## dillonE114    6.409764 -0.374700251
## dillonE116    6.296188 -0.402684377
## dillonE117    6.003641  0.108183364
## dillonE12     6.390891 -0.006007199
## dillonE123    6.517746  0.024104734
## dillonE124    6.680429 -0.049381637
## dillonE126    6.601036  0.169541977
## dillonE127    6.310201  0.093376109
## dillonE128    6.209580 -0.356135796
## dillonE129    6.590704 -0.046769730
## dillonE13     6.611758 -0.066017984
## dillonE130    6.166103 -0.139666584
## dillonE131    6.317039 -0.423852858
## dillonE132    6.814438 -0.124328800
## dillonE133    6.473360 -0.342476166
## dillonE134    6.627406  0.154700915
## dillonE135    6.539288  0.475863979
## dillonE136    6.340895 -0.241241058
## dillonE137    6.078482 -0.111717409
## dillonE138    6.907188 -0.152755829
## dillonE139    6.811871  0.056525504
## dillonE14     6.723666 -0.217727776
## dillonE140    6.908356 -0.047033060
## dillonE141    6.302499 -0.157205186
## dillonE142    6.300375 -0.078902006
## dillonE143    6.477288 -0.034400977
## dillonE144    6.543147 -0.180240290
## dillonE145    6.985113 -0.515629374
## dillonE146    6.545404 -0.103850529
## dillonE147    6.579029  0.163569924
## dillonE148    6.954835 -0.196034490
## dillonE149    6.577471 -0.043719083
## dillonE15     6.679635  0.165247144
## dillonE150    6.382706 -0.161423042
## dillonE16     6.594270 -0.043360075
## dillonE17     5.623231  0.171543369
## dillonE19     6.330327  0.137291840
## 
## Degrees of freedom: 2855 total; 2775 residual
## Residual standard error: 0.5835455

A complete pooling model

m2<-lm(log(rt)~c_int,df_dillonE1)
summary(m2)

## 
## Call:
## lm(formula = log(rt) ~ c_int, data = df_dillonE1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.96031 -0.42850  0.02082  0.45014  2.68898 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.51257    0.01210 538.130   <2e-16 ***
## c_int       -0.06093    0.02420  -2.517   0.0119 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6466 on 2853 degrees of freedom
## Multiple R-squared:  0.002216,   Adjusted R-squared:  0.001866 
## F-statistic: 6.337 on 1 and 2853 DF,  p-value: 0.01188

## Here is one reason why this model is wrong:
acf(residuals(m2))

A linear mixed model with varying intercepts for subject and for item

m3<-lmer(log(rt)~c_int+(1|subj)+(1|item),df_dillonE1)
summary(m3)

## Linear mixed model fit by REML ['lmerMod']
## Formula: log(rt) ~ c_int + (1 | subj) + (1 | item)
##    Data: df_dillonE1
## 
## REML criterion at convergence: 5127.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.6855 -0.6669  0.0257  0.6750  3.7855 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  item     (Intercept) 0.01729  0.1315  
##  subj     (Intercept) 0.07378  0.2716  
##  Residual             0.33070  0.5751  
## Number of obs: 2855, groups:  item, 48; subj, 40
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  6.51608    0.04819 135.221
## c_int       -0.05783    0.02183  -2.649
## 
## Correlation of Fixed Effects:
##       (Intr)
## c_int -0.001

## look at how the acf changes compared to m2:
acf(residuals(m3))

A linear mixed model with varying intercepts and varying slopes for subject and for item, with no correlation

m4<-lmer(log(rt)~c_int+(1+c_int||subj)+(1+c_int||item),df_dillonE1)
summary(m4)

## Linear mixed model fit by REML ['lmerMod']
## Formula: log(rt) ~ c_int + ((1 | subj) + (0 + c_int | subj)) + ((1 | item) +  
##     (0 + c_int | item))
##    Data: df_dillonE1
## 
## REML criterion at convergence: 5116.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.7619 -0.6763  0.0356  0.6722  3.7978 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  item     c_int       0.01247  0.1117  
##  item.1   (Intercept) 0.01585  0.1259  
##  subj     c_int       0.01399  0.1183  
##  subj.1   (Intercept) 0.07285  0.2699  
##  Residual             0.32523  0.5703  
## Number of obs: 2855, groups:  item, 48; subj, 40
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  6.51483    0.04764 136.753
## c_int       -0.05872    0.03292  -1.783
## 
## Correlation of Fixed Effects:
##       (Intr)
## c_int -0.001

A linear mixed model with varying intercepts and varying slopes for subject and for item, with correlation

m5<-lmer(log(rt)~c_int+(1+c_int|subj)+(1+c_int|item),df_dillonE1)
summary(m5)

## Linear mixed model fit by REML ['lmerMod']
## Formula: log(rt) ~ c_int + (1 + c_int | subj) + (1 + c_int | item)
##    Data: df_dillonE1
## 
## REML criterion at convergence: 5114.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.8132 -0.6732  0.0359  0.6777  3.8199 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  item     (Intercept) 0.01611  0.1269        
##           c_int       0.01233  0.1110   -0.34
##  subj     (Intercept) 0.07387  0.2718        
##           c_int       0.01380  0.1175   -0.20
##  Residual             0.32517  0.5702        
## Number of obs: 2855, groups:  item, 48; subj, 40
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  6.51471    0.04796 135.827
## c_int       -0.05832    0.03280  -1.778
## 
## Correlation of Fixed Effects:
##       (Intr)
## c_int -0.165

## look at how the acf changes compared to m3:
acf(residuals(m5))

Notice that the effect disappears in the above models when the varying slopes are added—why is that? Hint: Barr et al 2013.

Class exercise 2

Here is a $2\times 2$ repeated measures factorial design:

data("df_smithE1")
head(df_smithE1)

##     Participant StimSet  RT N2Factor SemFactor
## 203         101      32 976     N2pl    SemSim
## 245         101      20 640     N2pl    SemSim
## 277         101       7 448     N2sg    SemSim
## 304         101      10 640     N2pl SemDissim
## 344         101      14 448     N2pl SemDissim
## 383         101      18 432     N2pl SemDissim

There are two factors: Semantic similarity (levels: similar and dissimilar), and the number feature on the second noun (singular vs plural):

Dissimilar, N2 singular: The canoe by the cabin likely was damaged in the heavy storm.
Dissimilar, N2 plural: The canoe by the cabins likely was damaged in the heavy storm.
Similar, N2 singular: The canoe by the kayak likely was damaged in the heavy storm.
Similar, N2 plural: The canoe by the kayaks likely was damaged in the heavy stor

The predictions (from a sentence processing theory) are:

Main effect of similarity
Interaction between similarity and number (larger slowdown due to singular second noun in similar conditions than in dissimilar conditions).

The original paper (their experiment 1) is:

Smith, G., Franck, J., & Tabor, W. (2021). Encoding interference effects support self-organized sentence processing. Cognitive Psychology, 124, 101356.

The following model would be appropriate (I will discuss the warning in class):

df_smithE1$N2<-ifelse(df_smithE1$N2Factor=="N2sg",1/2,-1/2)
df_smithE1$Sem<-ifelse(df_smithE1$SemFactor=="SemSim",1/2,-1/2)
df_smithE1$N2xSem<-df_smithE1$N2*df_smithE1$Sem
m<-lmer(log(RT)~N2+Sem+N2xSem+
          (1+N2+Sem+N2xSem|Participant)+
          (1+N2+Sem+N2xSem|StimSet),
        df_smithE1)

## boundary (singular) fit: see help('isSingular')

summary(m)

## Linear mixed model fit by REML ['lmerMod']
## Formula: log(RT) ~ N2 + Sem + N2xSem + (1 + N2 + Sem + N2xSem | Participant) +  
##     (1 + N2 + Sem + N2xSem | StimSet)
##    Data: df_smithE1
## 
## REML criterion at convergence: 2989.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.3578 -0.5803 -0.1202  0.4073  8.3840 
## 
## Random effects:
##  Groups      Name        Variance  Std.Dev. Corr             
##  Participant (Intercept) 0.0563127 0.23730                   
##              N2          0.0025518 0.05052  -0.13            
##              Sem         0.0010239 0.03200   0.15 -0.66      
##              N2xSem      0.0312132 0.17667  -0.23  0.55 -0.99
##  StimSet     (Intercept) 0.0049071 0.07005                   
##              N2          0.0005055 0.02248   0.21            
##              Sem         0.0006671 0.02583   0.22  1.00      
##              N2xSem      0.0049601 0.07043  -0.99 -0.31 -0.32
##  Residual                0.1226560 0.35022                   
## Number of obs: 3441, groups:  Participant, 110; StimSet, 36
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  6.03753    0.02617 230.702
## N2          -0.01765    0.01348  -1.310
## Sem          0.02927    0.01312   2.230
## N2xSem      -0.02596    0.03163  -0.821
## 
## Correlation of Fixed Effects:
##        (Intr) N2     Sem   
## N2     -0.015              
## Sem     0.069  0.046       
## N2xSem -0.266  0.091 -0.167
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

The brms output may be more transparent. In brms (using default priors!):

m_brms<-brm(log(RT)~N2+Sem+N2xSem+
              (1+N2+Sem+N2xSem|Participant)+
              (1+N2+Sem+N2xSem|StimSet),
            df_smithE1,
            family=lognormal())

## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#bulk-ess

short_summary(m_brms)

## ...
## Group-Level Effects: 
## ~Participant (Number of levels: 110) 
##                       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(Intercept)             0.04      0.00     0.04     0.05 1.01      770
## sd(N2)                    0.01      0.00     0.00     0.01 1.00      868
## sd(Sem)                   0.00      0.00     0.00     0.01 1.00     1201
## sd(N2xSem)                0.03      0.01     0.01     0.04 1.00      824
## cor(Intercept,N2)        -0.05      0.33    -0.68     0.64 1.00     4979
## cor(Intercept,Sem)        0.04      0.38    -0.71     0.75 1.00     6084
## cor(N2,Sem)              -0.09      0.44    -0.82     0.77 1.00     2616
## cor(Intercept,N2xSem)    -0.22      0.18    -0.58     0.14 1.00     3345
## cor(N2,N2xSem)            0.22      0.38    -0.60     0.85 1.00      539
## cor(Sem,N2xSem)          -0.28      0.42    -0.90     0.67 1.00      566
##                       Tail_ESS
## sd(Intercept)             1413
## sd(N2)                    1796
## sd(Sem)                   1726
## sd(N2xSem)                 978
## cor(Intercept,N2)         2558
## cor(Intercept,Sem)        2791
## cor(N2,Sem)               3159
## cor(Intercept,N2xSem)     2602
## cor(N2,N2xSem)            1101
## cor(Sem,N2xSem)           1244
## 
## ~StimSet (Number of levels: 36) 
##                       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(Intercept)             0.01      0.00     0.01     0.02 1.00     1359
## sd(N2)                    0.00      0.00     0.00     0.01 1.00     1266
## sd(Sem)                   0.00      0.00     0.00     0.01 1.00     1424
## sd(N2xSem)                0.01      0.01     0.00     0.02 1.00     1672
## cor(Intercept,N2)         0.04      0.39    -0.73     0.77 1.00     5339
## cor(Intercept,Sem)        0.07      0.38    -0.71     0.76 1.00     5086
## cor(N2,Sem)               0.07      0.45    -0.79     0.84 1.00     2964
## cor(Intercept,N2xSem)    -0.54      0.30    -0.94     0.22 1.00     3961
## cor(N2,N2xSem)           -0.03      0.43    -0.80     0.78 1.00     3287
## cor(Sem,N2xSem)          -0.09      0.42    -0.82     0.73 1.00     3301
##                       Tail_ESS
## sd(Intercept)             2110
## sd(N2)                    1642
## sd(Sem)                   1625
## sd(N2xSem)                1575
## cor(Intercept,N2)         2446
## cor(Intercept,Sem)        2964
## cor(N2,Sem)               2951
## cor(Intercept,N2xSem)     3117
## cor(N2,N2xSem)            3300
## cor(Sem,N2xSem)           3374
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     1.80      0.00     1.79     1.80 1.01      346      785
## N2           -0.00      0.00    -0.01     0.00 1.00     5124     2885
## Sem           0.00      0.00     0.00     0.01 1.00     4909     3493
## N2xSem       -0.00      0.01    -0.01     0.01 1.00     4042     3228
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.06      0.00     0.05     0.06 1.00     4706     2985
## 
## ...

The variance-covariance matrices

Write down the full model, along with the variance-covariance matrices for the Participant and StimSet random effects.

Historical methods of data analysis and their connection to linear mixed models: t-tests and repeated measures ANOVA

Function for outputting t-test results in a more human-readable and more compact manner:

summary_ttest <- function(res, paired = TRUE, units = "ms") {
  obs_t <- round(res$statistic, 2)
  dfs <- round(res$parameter)
  pval <- round(res$p.value, 3)
  ci <- round(res$conf.int, 2)
  est <- round(res$estimate, 2)
  if (paired == TRUE) {
    print(paste(
      paste("t(", dfs, ")=",
        obs_t,
        sep = ""
      ),
      paste("p=", pval, sep = "")))
      print(paste("est.: ", est, " [", 
                  ci[1], ",", ci[2], "] ", 
                  units, sep = "")
    )
  } else {
    print(paste(
      paste("t(", dfs, ")=",
        obs_t,
        sep = ""
      ),
      paste("p=", pval, sep = "")))
      print(paste(paste("est. 1: ", est[1], sep = ""),
      paste("est. 2: ", est[2], sep = ""),
      paste("CI of diff. in means: [", ci[1], ",", ci[2], "]", sep = "")))
  }
}

Paired t-test are the traditional way to analyze such data. Here’s how this is done, and here is the connection to linear mixed models.

## all four levels: needed for interaction below:
bysubj <- aggregate(log(RT) ~ Participant +
  N2Factor + SemFactor,
mean,
data = df_smithE1
)

colnames(bysubj)[4]<-"LogRT"

bysubjN2 <- aggregate(log(RT) ~ Participant +
  N2Factor,
mean,
data = df_smithE1
)

bysubjSem <- aggregate(log(RT) ~ Participant +
  SemFactor,
mean,
data = df_smithE1
)

colnames(bysubjN2)[3]<-"LogRT"
colnames(bysubjSem)[3]<-"LogRT"

## main effects:

summary_ttest(
  t.test(subset(bysubjN2,N2Factor=="N2pl")$LogRT,
         subset(bysubjN2,N2Factor=="N2sg")$LogRT,
         paired=TRUE))

## [1] "t(109)=1.24 p=0.218"
## [1] "est.: 0.02 [-0.01,0.04] ms"

summary_ttest(
  t.test(subset(bysubjSem,SemFactor=="SemSim")$LogRT,
         subset(bysubjSem,SemFactor=="SemDissim")$LogRT,
         paired=TRUE))

## [1] "t(109)=2.32 p=0.022"
## [1] "est.: 0.03 [0,0.05] ms"

## interaction:
## difference between N2pl and N2sg in SemSim level:
d1<- subset(bysubj,N2Factor=="N2pl" & SemFactor=="SemSim")$LogRT-
  subset(bysubj,N2Factor=="N2sg" & SemFactor=="SemSim")$LogRT

## difference between N2pl and N2sg in SemDissim level:
d2<- subset(bysubj,N2Factor=="N2pl" & SemFactor=="SemDissim")$LogRT-
  subset(bysubj,N2Factor=="N2sg" & SemFactor=="SemDissim")$LogRT

## interaction: difference of differences:
summary_ttest(t.test(d2-d1))

## [1] "t(109)=-0.65 p=0.515"
## [1] "est.: -0.02 [-0.08,0.04] ms"

These are essentially equivalent to a single linear mixed model on the aggregated data:

bysubj$N2<-ifelse(bysubj$N2Factor=="N2sg",1/2,-1/2)
bysubj$Sem<-ifelse(bysubj$SemFactor=="SemSim",1/2,-1/2)
bysubj$N2xSem<-bysubj$N2*bysubj$Sem
m_aggregated<-lmer(LogRT~N2+Sem+N2xSem+(1|Participant),bysubj)
summary(m_aggregated)

## Linear mixed model fit by REML ['lmerMod']
## Formula: LogRT ~ N2 + Sem + N2xSem + (1 | Participant)
##    Data: bysubj
## 
## REML criterion at convergence: -149
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.0635 -0.5117 -0.0312  0.4772  4.3139 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  Participant (Intercept) 0.05403  0.2324  
##  Residual                0.02196  0.1482  
## Number of obs: 440, groups:  Participant, 110
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  6.03697    0.02326 259.539
## N2          -0.01369    0.01413  -0.969
## Sem          0.02788    0.01413   1.974
## N2xSem      -0.02068    0.02826  -0.732
## 
## Correlation of Fixed Effects:
##        (Intr) N2    Sem  
## N2     0.000             
## Sem    0.000  0.000      
## N2xSem 0.000  0.000 0.000

Some important points to note

The linear mixed model above on the aggregated data is carrying out all three t-tests (the two main effects and interaction) simultaneously.
We can only have by participant intercepts, no slopes, as we have only one data point per condition per participant.

## This will not work:
#m_aggregatedFULL<-lmer(LogRT~N2+Sem+N2xSem+(1+N2+Sem+N2xSem|Participant),bysubj)

Compare with the model with the unaggregated data (this is a more realistic model):

summary(m)

## Linear mixed model fit by REML ['lmerMod']
## Formula: log(RT) ~ N2 + Sem + N2xSem + (1 + N2 + Sem + N2xSem | Participant) +  
##     (1 + N2 + Sem + N2xSem | StimSet)
##    Data: df_smithE1
## 
## REML criterion at convergence: 2989.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.3578 -0.5803 -0.1202  0.4073  8.3840 
## 
## Random effects:
##  Groups      Name        Variance  Std.Dev. Corr             
##  Participant (Intercept) 0.0563127 0.23730                   
##              N2          0.0025518 0.05052  -0.13            
##              Sem         0.0010239 0.03200   0.15 -0.66      
##              N2xSem      0.0312132 0.17667  -0.23  0.55 -0.99
##  StimSet     (Intercept) 0.0049071 0.07005                   
##              N2          0.0005055 0.02248   0.21            
##              Sem         0.0006671 0.02583   0.22  1.00      
##              N2xSem      0.0049601 0.07043  -0.99 -0.31 -0.32
##  Residual                0.1226560 0.35022                   
## Number of obs: 3441, groups:  Participant, 110; StimSet, 36
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  6.03753    0.02617 230.702
## N2          -0.01765    0.01348  -1.310
## Sem          0.02927    0.01312   2.230
## N2xSem      -0.02596    0.03163  -0.821
## 
## Correlation of Fixed Effects:
##        (Intr) N2     Sem   
## N2     -0.015              
## Sem     0.069  0.046       
## N2xSem -0.266  0.091 -0.167
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

Notice that the sign of the main effect or interaction depends on how one sets up the contrast coding!
Repeated measures ANOVA is equivalent to the three t-tests above:

library(rstatix)

## 
## Attaching package: 'rstatix'

## The following object is masked from 'package:stats':
## 
##     filter

subj_anova<-anova_test(data = bysubj,
           dv = LogRT,
wid = Participant,
within = c(N2Factor,SemFactor)
)

get_anova_table(subj_anova)

## ANOVA Table (type III tests)
## 
##               Effect DFn DFd     F     p p<.05      ges
## 1           N2Factor   1 109 1.035 0.311       0.000622
## 2          SemFactor   1 109 4.644 0.033     * 0.003000
## 3 N2Factor:SemFactor   1 109 0.427 0.515       0.000355

Notice that the square roots of the F-scores will be essentially the same (modulo some discrepancies due to numerical inaccuracies) as the t-test values in the paired t-tests:

## main effect N2Factor, F-score to t-value of 1.24:
sqrt(1.035)

## [1] 1.017349

## main effect SemFactor, F-score to t-value of 2.32:
sqrt(4.644)

## [1] 2.154994

## interaction, F-score to absolute t-value of 0.65:
sqrt(0.427)

## [1] 0.6534524

Chapter 5: Hierarchical models

Shravan Vasishth (vasishth.github.io)

July 2025