Chapter 13 Meta-analysis and measurement error models

In this chapter, we introduce two relatively underutilized models that are potentially very important for cognitive science: meta-analysis and measurement-error models.

Meta-analysis can be very informative when carrying out systematic reviews, and measurement-error models are able to take into account uncertainty in one’s dependent or independent variable (or both). What’s common to these two classes of model is that they both assume that the \(n\)-th measured data point \(y_n\) has an unknown true value of a parameter, say \(\zeta_n\) (pronounced zeta en), that is measured with some uncertainty that can be represented by the standard error \(SE_n\) of the measurement \(y_n\):

\(y_n \sim \mathit{Normal}(\zeta_n,SE_n)\)

In both classes of model, the goal is to obtain a posterior distribution of a latent parameter \(\zeta\) which is assumed to generate the \(\zeta_n\), with some standard deviation \(\tau\). The parameter \(\tau\) quantifies the noise in the measurement process or the between-study variability in a meta-analysis.

\(\zeta_n \sim \mathit{Normal}(\zeta,\tau)\)

The main parameter of interest is usually \(\zeta\), but the posterior distributions of \(\tau\) and \(\zeta_n\) can also be informative. The above model specification should remind you of the hierarchical models we saw in earlier chapters.

13.1 Meta-analysis

Once a number of studies have accumulated on a particular topic, it can be very informative to synthesize the data. Here is a commonly used approach- a random-effects meta-analysis.

13.1.1 A meta-analysis of similarity-based interference in sentence comprehension

The model is set up as follows. For each study \(n\), let effect\(_n\) be the effect of interest, and let \(SE_n\) be the standard error of the effect. A concrete example of a recent meta-analysis is the effect of similarity-based interference in sentence comprehension (Jäger, Engelmann, and Vasishth 2017); when two nouns are more similar to each other, there is greater processing difficulty (i.e., longer reading times in milliseconds) when an attempt is made to retrieve one of the nouns to complete a linguistic dependency (such as a subject-verb dependency). The estimate of the effect and its standard error is the information we have from each study \(n\).

First, load the data, and add an id variable that identifies each experiment.

data("df_sbi")
(df_sbi <- df_sbi %>%
  mutate(study_id = 1:n()))
## # A tibble: 12 × 4
##   publication      effect    SE study_id
##   <chr>             <int> <int>    <int>
## 1 VanDyke07E1LoSem     13    30        1
## 2 VanDyke07E2LoSem     37    21        2
## 3 VanDyke07E3LoSem     20    11        3
## # ℹ 9 more rows

The effect size and standard errors were estimated from published summary statistics in the respective article. In some cases, this involved a certain amount of guesswork; the details are documented in the online material accompanying Jäger, Engelmann, and Vasishth (2017).

We begin with the assumption that there is a true (unknown) effect \(\zeta_n\) that lies behind each of these studies. Each of the observed effects has an uncertainty associated with it, \(SE_n\). We can therefore assume that each observed effect, effect\(_n\), is generated as follows:

\[\begin{equation} \text{effect}_n \sim \mathit{Normal}(\zeta_n,SE_n) \end{equation}\]

Each study is assumed to have a different true effect \(\zeta_n\) because each study will have been carried out under different conditions: in a different lab with different protocols and workflows, with different subjects, with different languages, with slightly different experimental designs, etc.

Further, each of the true underlying effects \(\zeta_n\) has behind it some true unknown value \(\zeta\). The parameter \(\zeta\) represents the underlying effect of similarity-based interference across experiments. Our goal is to obtain the posterior distribution of this overall effect.

We can write the above statement as follows:

\[\begin{equation} \zeta_n \sim\mathit{Normal}(\zeta,\tau) \end{equation}\]

\(\tau\) is the between-study standard deviation; this expresses the assumption that there will be some variability between the true effects \(\zeta_n\).

To summarize the model:

  • effect\(_n\) is the observed effect (in this example, in milliseconds) in the \(n\)-th study.
  • \(\zeta_n\) is the true (unknown) effect in each study.
  • \(\zeta\) is the true (unknown) effect of the experimental manipulation, namely, the similarity-based interference effect.
  • Each \(SE_n\) is estimated from the standard error available from study \(n\).
  • The parameter \(\tau\) represents between-study standard deviation.

We can construct a hierarchical model as follows:

\[\begin{equation} \begin{aligned} \text{effect}_n \sim & \mathit{Normal}(\zeta_n, SE_n) \quad n=1,\dots, N_{studies}\\ \zeta_n \sim & \mathit{Normal}(\zeta, \tau) \\ \zeta \sim & \mathit{Normal}(0,100)\\ \tau \sim & \mathit{Normal}_+(0,100) \end{aligned} \tag{13.1} \end{equation}\]

The priors are based on domain knowledge; it seems reasonable to allow the effect to range a priori from \(-200\) to \(+200\) ms with probability \(95\)%. Of course, a sensitivity analysis is necessary (but skipped here).

This model can be implemented in brms in a relatively straightforward way as shown below. We show the Stan version later in the chapter (section 13.1.1.2); the Stan version presents some interesting challenges that can be useful for the reader interested in deepening their Stan modeling knowledge.

13.1.1.1 brms version of the meta-analysis model

First, define the priors:

priors <- c(prior(normal(0, 100), class = Intercept),
            prior(normal(0, 100), class = sd))

Fit the model as follows. Because of our relatively uninformative priors and the few data points, the models of this chapter require us to tune the control parameter, increasing adapt_delta and max_treedepth.

fit_sbi <- brm(effect | resp_se(`SE`, sigma = FALSE) ~ 1
               + (1 | study_id),
               data = df_sbi,
               prior = priors,
               control = list(
                 adapt_delta = .99,
                 max_treedepth = 10))

The posterior of \(\zeta\) and \(\tau\) are summarized below as Intercept and sd(Intercept).

fit_sbi
## ...
## Group-Level Effects: 
## ~study_id (Number of levels: 12) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(Intercept)    11.91      7.81     0.52    29.71 1.00      912
##               Tail_ESS
## sd(Intercept)     1122
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    13.32      6.19     2.76    27.32 1.00     1394     1677
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.00      0.00     0.00     0.00   NA       NA       NA
## 
## ...

The sigma parameter does not play any role in this model, but appears in the brms output anyway. In the model specification, sigma was explicitly removed by writing sigma = FALSE. For this reason, we can ignore that parameter in the model summary output above. Box 13.1 explains what happens if we set sigma = TRUE.

As theory predicts, the overall effect from these studies has a positive sign.

One advantage of such a meta-analysis is that the posterior can now be used as an informative prior for a future study. This is especially important when doing an analysis using Bayes factors. But this meta-analysis posterior could also be used as an informative prior in a future experiment; that would allow the researcher to build on what is known so far from published studies.

Box 13.1 What happens if we set sigma = TRUE?

If we set sigma = TRUE, we won’t be able to get estimates for \(\zeta_{n}\), since they are handled implicitly. The model presented formally in equation (13.1) is equivalent to the following one in (13.2). A critical difference is that \(\zeta_n\) does not appear any more.

\[\begin{equation} \begin{aligned} \text{effect}_n \sim &\mathit{Normal}(\zeta, \sqrt{\tau^2 + SE_n^2} )\\ \zeta \sim &\mathit{Normal}(0,100)\\ \tau \sim &\mathit{Normal}_+(0,100) \end{aligned} \tag{13.2} \end{equation}\]

This works because of the following property of normally distributed random variables:

If \(X\) and \(Y\) are two independent random variables, and

\[\begin{equation} \begin{aligned} X &\sim \mathit{Normal}(\mu_X, \sigma_X)\\ Y &\sim \mathit{Normal}(\mu_Y, \sigma_Y) \end{aligned} \tag{13.3} \end{equation}\]

then, \(Z\), the sum of these two random variables is:

\[\begin{equation} Z = X + Y \tag{13.4} \end{equation}\]

The distribution of \(Z\) has the following form:

\[\begin{equation} Z \sim\mathit{Normal}\left(\mu_X + \mu_Y, \sqrt{\sigma_X^2 + \sigma_Y^2}\right) \tag{13.5} \end{equation}\]

In our case, let

\[\begin{equation} \begin{aligned} U_{n} &\sim \mathit{Normal}(0 , SE_n)\\ \zeta_n &\sim \mathit{Normal}(\zeta, \tau) \end{aligned} \tag{13.6} \end{equation}\]

Analogous to equations (13.4) and (13.5), effect\(_n\) can be expressed as a sum of two independent random variables:

\[\begin{equation} \text{effect}_n = U_n + \zeta_n \end{equation}\]

The distribution of effect\(_n\) will be

\[\begin{equation} \text{effect}_n \sim\mathit{Normal}\left(\zeta, \sqrt{SE^2 + \tau^2}\right) \tag{13.7} \end{equation}\]

We can fit this in brms as follows. In this model specification, one should not include the + (1 | study_id), and the prior for \(\tau\) should now be specified for sigma.

priors2 <- c(prior(normal(0, 100), class = Intercept),
             prior(normal(0, 100), class = sigma))
fit_sbi_sigma <- brm(effect | resp_se(`SE`, sigma = TRUE) ~ 1,
                     data = df_sbi,
                     prior = priors2,
                     control = list(
                       adapt_delta = .99,
                       max_treedepth = 10))

There are slight differences with fit_sbi due to the different parameterization and the sampling process, but the results are very similar:

posterior_summary(fit_sbi_sigma,
  variable = c("b_Intercept", "sigma"))
##             Estimate Est.Error  Q2.5 Q97.5
## b_Intercept     13.3      6.24 3.093  27.2
## sigma           11.7      7.92 0.679  30.3

If we are not interested in the underlying effects in each study, this parameterization of the meta-analysis can be faster and more robust (i.e., it has less potential convergence issues). A major drawback is that we can no longer display a forest plot as we do in Figure 13.1.

Another interesting by-product of a random-effects meta-analysis is the possibility of displaying a forest plot (Figure 13.1). A forest plot shows the meta-analytic estimate (the parameter b_Intercept in brms) alongside the original estimates effect\(_n\) (and their SE\(_n\)) and the posterior distributions of the \(\zeta_n\) for each study (we reconstruct these estimates by adding b_Intercept to the parameters starting with r_ in brms). The original estimates are the ones fed to the model as data and the posterior distributions of the \(\zeta_n\) are calculated, as in previous hierarchical models, after the information from all studies is pooled together. The \(\zeta_n\) estimates are shrunken estimates of each study’s (unknown) true effect, shrunken towards the grand mean \(\zeta\), and weighted by the standard error observed in each study \(n\). The \(\zeta_n\) for a particular study is shrunk more towards the grand mean \(\zeta\) when the study’s standard error is large (i.e., when the estimate is very imprecise). The code below shows how to build a forest plot step by step.

First, change the format of the data so that it looks like the output of brms:

df_sbi <- df_sbi %>%
  mutate(Q2.5 = effect - 2 * SE,
         Q97.5 = effect + 2 * SE,
         Estimate = effect,
         type = "original")

Extract the meta-analytical estimate:

df_Intercept <- posterior_summary(fit_sbi,
  variable = c("b_Intercept")
) %>%
  as.data.frame() %>%
  mutate(publication = "M.A. estimate", type = "")

For the pooled estimated effect (or fitted value) of the individual studies, we need the sum of the meta-analytical estimate (intercept) and each of the by-study adjustment. Obtain this with the fitted() function:

df_model <- fitted(fit_sbi) %>%
  # Convert matrix to data frame:
  as.data.frame() %>%
  # Add a column to identify the estimates,
  # and another column to identify the publication:
  mutate(type = "adjusted",
         publication = df_sbi$publication)

Bind the observed effects, the meta-analytical estimate, and the fitted values of the studies together, and plot the data:

# the adjusted estimates and the meta-analysis estimate:
bind_rows(df_sbi, df_model, df_Intercept) %>%
  # Plot:
  ggplot(aes(x = Estimate,
             y = publication,
             xmin = Q2.5,
             xmax = Q97.5,
             color = type)) +
  geom_point(position = position_dodge(.5)) +
  geom_errorbarh(position = position_dodge(.5)) +
  # Add the meta-analytic estimate and Credible Interval:
  geom_vline(xintercept = df_Intercept$Q2.5,
             linetype = "dashed",
             alpha = .3) +
  geom_vline(xintercept = df_Intercept$Q97.5,
             linetype = "dashed",
             alpha = .3) +
  geom_vline(xintercept = df_Intercept$Estimate,
             linetype = "dashed",
             alpha = .5) +
  scale_color_discrete(breaks = c("adjusted", "original"))
Forest plot showing the original and the adjusted estimates computed from each study from the random-effects meta-analysis. The error bars on the original estimates show 95% confidence intervals, and those on the adjusted estimates show 95% credible intervals.

FIGURE 13.1: Forest plot showing the original and the adjusted estimates computed from each study from the random-effects meta-analysis. The error bars on the original estimates show 95% confidence intervals, and those on the adjusted estimates show 95% credible intervals.

It is important to keep in mind that a meta-analysis is always going to yield biased estimates as long as we have publication bias: if a field has a tendency to allow only “big news” studies to be published, then the literature that will appear in the public domain will be biased, and any meta-analysis based on such information will be biased. Despite this limitation, a meta-analysis is still a useful way to synthesize the known evidence; one just has to remember that the estimate from the meta-analysis is almost certain to be biased.

13.1.1.2 Stan version of the meta-analysis model

Even though brms can handle meta-analyses, fitting them in Stan allows us for more flexibility, which might be necessary in some cases. As a first attempt we could build a model that closely follows the formal specification given in equation (13.1).

data {
  int<lower=1> N;
  vector[N] effect;
  vector[N] SE;
  vector[N] study_id;
}
parameters {
  real zeta;
  real<lower = 0> tau;
  vector[N] zeta_n;
}
model {
  target += normal_lpdf(effect| zeta_n, SE);
  target += normal_lpdf(zeta_n | zeta, tau);
  target += normal_lpdf(zeta | 0, 100);
  target += normal_lpdf(tau | 0, 100)
    - normal_lccdf(0 | 0, 100);
}

Fit the model as follows:

ma0 <- system.file("stan_models",
  "meta-analysis0.stan",
  package = "bcogsci"
)
ls_sbi <- list(N = nrow(df_sbi),
               effect = df_sbi$effect,
               SE = df_sbi$SE,
               study_id = df_sbi$study_id)
fit_sbi0 <- stan(ma0,
                 data = ls_sbi,
                 control = list(
                   adapt_delta = .999,
                   max_treedepth = 12))
 ## Warning: There were 2 divergent transitions after
 ## warmup. See
 ## https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
 ## to find out why this is a problem and how to eliminate
 ## them. 
 ## Warning: There were 1 chains where the estimated
 ## Bayesian Fraction of Missing Information was low. See
 ## https://mc-stan.org/misc/warnings.html#bfmi-low 
 ## Warning: Examine the pairs() plot to diagnose sampling
 ## problems 
 ## Warning: Bulk Effective Samples Size (ESS) is too low,
 ## indicating posterior means and medians may be unreliable.
 ## Running the chains for more iterations may help. See
 ## https://mc-stan.org/misc/warnings.html#bulk-ess 
 ## Warning: Tail Effective Samples Size (ESS) is too low,
 ## indicating posterior variances and tail quantiles may be
 ## unreliable. Running the chains for more iterations may
 ## help. See
 ## https://mc-stan.org/misc/warnings.html#tail-ess 

We see that there are warnings. As discussed in section 11.1.2, we can use pairs plots to uncover pathologies in the sampling. Here we see the samples of zeta and tau are highly correlated:

pairs(fit_sbi0, pars = c("zeta", "tau"))

We face a similar problem as we faced in section 11.1.2, namely, the sampler cannot properly explore the neck of the funnel-shaped space, because of the strong correlation between the parameters. The solution is, as in section 11.1.2, a non-centered parameterization. Re-write (13.1) as follows:

\[\begin{equation} \begin{aligned} z_n & \sim \mathit{Normal}(0, 1)\\ \zeta_n &= z_n \cdot \tau + \zeta \\ \text{effect}_n & \sim \mathit{Normal}(\zeta_n, SE_n)\\ \zeta &\sim \mathit{Normal}(0,100)\\ \tau &\sim \mathit{Normal}_+(0,100) \end{aligned} \tag{13.8} \end{equation}\]

This works because if \(X \sim\mathit{Normal}(a, b)\) and \(Y \sim \mathit{Normal}(0, 1)\), then \(X = a + Y \cdot b\). You can re-visit chapter 11.1.2 for more details.

Translate equation (13.8) into Stan code as follows in meta-analysis1.stan:

data {
  int<lower=1> N;
  vector[N] effect;
  vector[N] SE;
  vector[N] study_id;
}
parameters {
  real zeta;
  real<lower = 0> tau;
  vector[N] z;
}
transformed parameters {
  vector[N] zeta_n = z * tau + zeta;
}
model {
  target += normal_lpdf(effect| zeta_n, SE);
  target += std_normal_lpdf(z);
  target += normal_lpdf(zeta | 0, 100);
  target += normal_lpdf(tau | 0, 100)
    - normal_lccdf(0 | 0, 100);
}

The model converges with values virtually identical to the ones of the brms model.

ma1 <- system.file("stan_models",
  "meta-analysis1.stan",
  package = "bcogsci")
fit_sbi1 <- stan(ma1,
                 data = ls_sbi,
                 control = list(adapt_delta = .999,
                                max_treedepth = 12))
print(fit_sbi1, pars = c("zeta", "tau"))
##      mean 2.5% 97.5% n_eff Rhat
## zeta 13.3 2.54  28.4  1125    1
## tau  11.4 0.52  29.4   614    1

We can also reparameterize the model slightly differently, if we set \(U_{n} \sim\mathit{Normal}(0 , SE_n)\) then,

\[\begin{equation} \text{effect}_n = U_n + \zeta_n \end{equation}\]

Then, given that \(\zeta_n \sim \mathit{Normal}(\zeta, \tau)\),

\[\begin{equation} \text{effect}_n \sim \mathit{Normal}(\zeta, \sqrt{SE^2 + \tau^2}) \tag{13.9} \end{equation}\]

See Box 13.1 if it’s not clear why this reparameterization works.

This is equivalent to the brms model where sigma = TRUE. As with brms, we lose the possibility of estimating the posterior of the true effect of the individual studies.

Write this in Stan as follows; this code is available in the file meta-analysis2.stan within the bcogsci package:

data {
  int<lower=1> N;
  vector[N] effect;
  vector[N] SE;
  vector[N] study_id;
}
parameters {
  real zeta;
  real<lower = 0> tau;
}
model {
  target += normal_lpdf(effect| zeta,
                        sqrt(square(SE) + square(tau)));
  target += normal_lpdf(zeta | 0, 100);
  target += normal_lpdf(tau | 0, 100)
    - normal_lccdf(0 | 0, 100);
}

Fit the model:

ma2 <- system.file("stan_models",
  "meta-analysis2.stan",
  package = "bcogsci"
)
fit_sbi2 <- stan(ma2,
                 data = ls_sbi,
                 control = list(adapt_delta = .9))
print(fit_sbi2, pars = c("zeta", "tau"))
##      mean 2.5% 97.5% n_eff Rhat
## zeta 13.2 2.90  28.7   767 1.00
## tau  11.3 0.63  28.9   914 1.01

This summary could be reported in an article by displaying the posterior means and 95% credible intervals of the parameters.

13.2 Measurement-error models

Measurement error models deal with the situation where some predictor or the dependent variable, or both, are observed with measurement error. This measurement error could arise because a variable is an average (i.e., its standard error can also be estimated), or because we know that our measurement is noisy due to limitations of our equipment (e.g., delays in the signal from the keyboard to the motherboard, impedance in the electrodes in an EEG system, etc.).

13.2.1 Accounting for measurement error in individual differences in working memory capacity and reading fluency

As a motivating example, consider the following data from Nicenboim, Vasishth, et al. (2018). For each subject, we have the partial-credit unit (PCU) scores of an operation span task as a measure of their working memory capacity (Conway et al. 2005) along with their standard error. In addition, the reading fluency of each subject is calculated from a separate set of data based on the mean reading speeds (character/second) in a rapid automatized naming task (RAN, Denckla and Rudel 1976); the standard error of the reading speed is also available.

Of interest here is the extent of the association between working memory capacity (measured as PCU) and reading fluency (measured as reading speed in 50 characters per second). We avoid making any causal claims: It could be that our measure of working memory capacity really affects reading fluency or it could be the other way around. A third possibility is that there is a third variable (or several) that affects both reading fluency and working memory capacity. A treatment of causality in Bayesian models can be found in chapters 5 and 6 of McElreath (2020).

data("df_indiv")
df_indiv
## # A tibble: 100 × 5
##    subj mean_rspeed se_rspeed mean_pcu se_pcu
##   <dbl>       <dbl>     <dbl>    <dbl>  <dbl>
## 1     1      0.0521   0.00113    0.738 0.0648
## 2     2      0.0479   0.00121    0.292 0.0315
## 3     3      0.0601   0.00117    0.408 0.0900
## # ℹ 97 more rows

At first glance, we see a relationship between mean PCU scores and mean reading speed; see Figure 13.2. However, this relationship seems to be driven by two extremes data points on the top left corner of the plot.


df_indiv <- df_indiv %>%
  mutate(c_mean_pcu = mean_pcu - mean(mean_pcu))
ggplot(df_indiv, aes(x = c_mean_pcu, y = mean_rspeed)) +
  geom_point() +
  geom_smooth(method = "lm")
The relationship between (centered) mean PCU scores and mean reading speed.

FIGURE 13.2: The relationship between (centered) mean PCU scores and mean reading speed.

A simple linear model shows a somewhat weak association between mean reading speed and centered mean PCU. The priors are relatively arbitrary but they are in the right order of magnitude given that reading speeds are quite short and well below 1.

df_indiv <- df_indiv %>%
  mutate(c_mean_pcu = mean_pcu - mean(mean_pcu))
priors <- c(
  prior(normal(0, 0.5), class = Intercept),
  prior(normal(0, 0.5), class = b),
  prior(normal(0, 0.5), class = sigma)
)
fit_indiv <- brm(mean_rspeed ~ c_mean_pcu,
  data = df_indiv,
  family = gaussian(),
  prior = priors
)
fit_indiv
## ...
## Population-Level Effects: 
##            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept      0.06      0.00     0.05     0.06 1.00     4968
## c_mean_pcu    -0.01      0.01    -0.03     0.00 1.00     2191
##            Tail_ESS
## Intercept      2851
## c_mean_pcu     2197
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.01      0.00     0.01     0.01 1.00     1989     2206
## 
## ...
# Proportion of samples below zero
(Pb <- mean(as_draws_df(fit_indiv)$b_c_mean_pcu < 0))
## [1] 0.946

Figure 13.3 shows the posterior distribution of the slope in this model. Most of the probability mass is negative (94.625%), suggesting that a better PCU score is associated with slower reading speed rather than faster; that is, that a larger working memory capacity is associated with less reading fluency. This is not a very intuitive result and it could be the case that is driven by the two extreme data points. Rather than removing these data points, we’ll examine what happens when the uncertainty of the measurements is taken into account.


mcmc_plot(fit_indiv,
          variable = "^b_c",
          regex = TRUE,
          type = "hist")
The posterior distribution of the slope in the linear model, modeling the effect of centered mean PCU on mean reading speed (the unit is 50 characters per second).

FIGURE 13.3: The posterior distribution of the slope in the linear model, modeling the effect of centered mean PCU on mean reading speed (the unit is 50 characters per second).

Taking this uncertainty of the measurement is important; in many practical research problems, researchers will often take average measurements like these and examine the correlation between them. However, each of those data points is being measured with some error (uncertainty), but this error is being ignored when we take the averaged values. Ignoring this uncertainty leads to over-enthusiastic inferences. A measurement-error model solves this issue by taking uncertainty into account.

The measurement error model is stated as follows. There is assumed to be a true unobserved value \(y_{n,TRUE}\) for the dependent variable, and a true unobserved value \(x_{n,TRUE}\) for the predictor, where \(n\) is indexing the observation number. The observed values \(y_n\) and the predictor \(x_n\) are assumed to be generated with some error:

\[\begin{equation} \begin{aligned} y_n &\sim\mathit{Normal}(y_{n,TRUE},SE_y) \\ x_n &\sim\mathit{Normal}(x_{n,TRUE},SE_x) \end{aligned} \end{equation}\]

The regression is fit to the (unknown) true values of the dependent and independent variables:

\[\begin{equation} y_{n,TRUE} \sim\mathit{Normal}(\alpha + \beta x_{n,TRUE},\sigma) \tag{13.10} \end{equation}\]

In addition, there is also an unknown standard deviation (standard error) of the latent unknown means that generate the underlying PCU means. I.e., we assume that each of the observed centered PCU scores are normally distributed with an underlying mean, \(\chi\), and a standard deviation \(\tau\). This is very similar to the meta-analysis situation we saw earlier: \(\zeta_n \sim\mathit{Normal}(\zeta,\tau)\), where \(\zeta_n\) was the true latent mean of each study, and \(\zeta\) was the (unknown) true value of the parameter, and \(\tau\) was the between-study variability.

\[\begin{equation} x_{n,TRUE} \sim\mathit{Normal}(\chi,\tau) \end{equation}\]

The goal of the modeling is to obtain posterior distributions for the intercept and slope \(\alpha\) and \(\beta\) (and the residual error standard deviation \(\sigma\)).

We need to decide on priors for all the parameters now. We use relatively vague priors, which can still be considered regularizing priors based on our knowledge of the order of magnitude of the measurements. In situations where not much is known about a research question, one could use such vague priors.

\[\begin{equation} \begin{aligned} \alpha &\sim\mathit{Normal}(0, 0.5)\\ \beta &\sim\mathit{Normal}(0, 0.5)\\ \chi &\sim\mathit{Normal}(0, 0.5)\\ \sigma &\sim\mathit{Normal}_+(0, 0.5)\\ \tau &\sim\mathit{Normal}_+(0, 0.5) \end{aligned} \tag{13.11} \end{equation}\]

13.2.1.1 The brms version of the measurement error model

In brms, the model specification would be as follows:

priors_me <- c(
  prior(normal(0, 0.5), class = Intercept),
  prior(normal(0, 0.5), class = b),
  prior(normal(0, 0.5), class = meanme),
  prior(normal(0, 0.5), class = sdme),
  prior(normal(0, 0.5), class = sigma)
)

Here the parameter with class meanme and sdme refer to the unknown mean and standard deviation (standard error) of the latent unknown means that generate the underlying PCU means, \(\chi\) and \(\tau\) in (13.11). Once we decide on the priors, we use resp_se(.) with sigma = TRUE (i.e, we don’t estimate \(y_{n,TRUE}\) explicitly) and we use me(c_meanpcu, se_pcu) to indicate that the dependent variable c_mean_pcu is measured with error and se_pcu is its SE.

fit_indiv_me <- brm(mean_rspeed | resp_se(se_rspeed, sigma = TRUE) ~
                      me(c_mean_pcu, se_pcu),
                    data = df_indiv,
                    family = gaussian(),
                    prior = priors_me) 
fit_indiv_me
## ...
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept              0.05      0.00     0.05     0.06 1.00     4173
## mec_mean_pcuse_pcu    -0.00      0.01    -0.01     0.01 1.00     5363
##                    Tail_ESS
## Intercept              2815
## mec_mean_pcuse_pcu     3255
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.01      0.00     0.01     0.01 1.00     6975     2874
## 
## ...
# Proportion of samples below zero
# Parameter names can be found out with `variables(fit_indiv_me)`
(Pb_me <- mean(as_draws_df(fit_indiv_me)$bsp_mec_mean_pcuse_pcu < 0))
## [1] 0.607

The posterior for the slope is plotted in Figure 13.4; this figure shows that the association between PCU scores and reading speed is much weaker once measurement error is taken into account: The posterior is much more uncertain (much more widely distributed) than in the simple linear model we fit above (compare Figures 13.4 with 13.3), and the direction of the association is now unclear, with 61% of the probability mass below zero, rather than 95%.

The posterior distribution of the slope in the measurement error model, modeling the effect of centered mean PCU on mean reading speed.

FIGURE 13.4: The posterior distribution of the slope in the measurement error model, modeling the effect of centered mean PCU on mean reading speed.

Figure 13.5 visualizes the main reason why we have no clear association in the measurement error analysis: the two points at the top left part of the plot that were driving the effect have very large SE for the measurement of reading speed. The code to produce it appears below and overlays several (250) regression lines that correspond to different samples of the posterior distribution with the measurements of reading speed and PCU.

df_reg <- as_draws_df(fit_indiv_me) %>%
  select(alpha = b_Intercept, beta = bsp_mec_mean_pcuse_pcu) %>%
  slice(1:250)
ggplot(df_indiv, aes(x = c_mean_pcu, y = mean_rspeed)) +
  geom_point() +
  geom_errorbarh(aes(xmin = c_mean_pcu - 2 * se_pcu,
                     xmax = c_mean_pcu + 2 * se_pcu),
                 alpha = .5, linetype = "dotted") +
  geom_errorbar(aes(ymin = mean_rspeed - 2 * se_rspeed,
                    ymax = mean_rspeed + 2 * se_rspeed),
                alpha = .5, linetype = "dotted") +
  geom_abline(aes(intercept = alpha, slope = beta),
              data = df_reg,
              alpha = .05)
The relationship between centered mean PCU scores and  mean reading speed accounting for measurement error. The error bars represent two standard errors. The regression lines are produced with 250 samples of the intercept and slope from the posterior distribution.

FIGURE 13.5: The relationship between centered mean PCU scores and mean reading speed accounting for measurement error. The error bars represent two standard errors. The regression lines are produced with 250 samples of the intercept and slope from the posterior distribution.

Of course, the conclusion here cannot be that there is no association between PCU scores and reading speed. In order to claim an absence of an effect, we would need to use Bayes factor (see chapter 15) or cross validation (see chapter 16).

13.2.1.2 The Stan version of the measurement error model

As it happened when we modeled the meta-analysis, the main difficulty for modeling measurement error models directly in Stan is that we need to reparameterize the models to avoid correlations between samples of different parameters. The two changes that we need to do to the parameterization of our model presented in equation (13.11) are the following.

  1. Sample from an auxiliary parameter \(z_n\) rather than directly from \(x_{n,TRUE}\), as we did in (13.8):

\[\begin{equation} \begin{aligned} z_n & \sim\mathit{Normal}(0, 1)\\ x_{n,TRUE} &= z_n \cdot \tau + \chi \\ x_n & \sim \mathit{Normal}(x_{n,TRUE}, SE_x) \end{aligned} \end{equation}\]

  1. Don’t model \(y_{n,TRUE}\) explicitly as in (13.10); rather take into account the SE and the variation on \(y_{n,TRUE}\) in the following way:

\[\begin{equation} y_{n} \sim\mathit{Normal}\left(\alpha + \beta x_{n,TRUE},\sqrt{SE_y^2 + \sigma^2}\right) \end{equation}\]

We are now ready to write this in Stan; the code is in the model called me.stan:

data {
  int<lower=1> N;
  vector[N] x;
  vector[N] SE_x;
  vector[N] y;
  vector[N] SE_y;
}
parameters {
  real alpha;
  real beta;
  real chi;
  real<lower = 0> sigma;
  real<lower = 0> tau;
  vector[N] z;
}
transformed parameters {
  vector[N] x_true = z * tau + chi;
}
model {
  target += normal_lpdf(x | x_true, SE_x);
  target += normal_lpdf(y | alpha + beta * x_true,
                        sqrt(square(SE_y) + square(sigma)));
  target += std_normal_lpdf(z);
  target += normal_lpdf(alpha | 0, 0.5);
  target += normal_lpdf(beta | 0, 0.5);
  target += normal_lpdf(chi | 0, 0.5);
  target += normal_lpdf(sigma| 0, 0.5)
    - normal_lccdf(0 | 0, 0.5);
  target += normal_lpdf(tau | 0, 0.5)
    - normal_lccdf(0 | 0, 0.5);
}

Fit the model:

me <- system.file("stan_models",
  "me.stan",
  package = "bcogsci")  
ls_me <- list(N = nrow(df_indiv),
              y = df_indiv$mean_rspeed,
              SE_y = df_indiv$se_rspeed,
              x = df_indiv$c_mean_pcu,
              SE_x = df_indiv$se_pcu)
fit_indiv_me_stan <- stan(me, data = ls_me)
print(fit_indiv_me_stan, pars = c("alpha", "beta", "sigma")) 
##       mean  2.5% 97.5% n_eff Rhat
## alpha 0.05  0.05  0.06  3775    1
## beta  0.00 -0.01  0.01  5457    1
## sigma 0.01  0.01  0.01  6126    1

The posterior distributions are similar to those that we obtained with brms.

13.3 Summary

This chapter introduced two statistical tools that are potentially of great relevance to cognitive science: random-effects meta-analysis and measurement error models. Despite the inherent limitations of meta-analysis, these should be used routinely to accumulate knowledge through systematic evidence synthesis. Measurement errors can also prevent over-enthusiastic conclusions that are often made based on noisy data.

13.4 Further reading

For some examples of Bayesian meta-analyses in psycholinguistics, see Vasishth et al. (2013), Jäger, Engelmann, and Vasishth (2017), Nicenboim, Roettger, and Vasishth (2018), Nicenboim, Vasishth, and Rösler (2020), Bürki et al. (2020), Cox et al. (2022), and Bürki, Alario, and Vasishth (2023). A frequentist meta-analysis of priming effects in psycholinguistics appears in Mahowald et al. (2016). Sutton et al. (2012) and Higgins and Green (2008) are two useful general introductions that discuss systematic reviews, meta-analysis, and evidence synthesis; these two references are from medicine, where meta-analysis is more widely used than in cognitive science. A potentially important article for meta-analysis introduces a methodology for modeling bias, to adjust for different kinds of bias in the data (Turner et al. 2008). Meta-analyses have important limitations and should not be taken at face value; this point is discussed in, e.g., Spector and Thompson (1991).

13.5 Exercises

Exercise 13.1 A meta-analysis of picture-word interference data

Load the following data set:

data("df_buerki")
head(df_buerki)
##                  study   d    se study_id
## 1 Collina 2013 Exp.1 a  24 13.09        1
## 2 Collina 2013 Exp.1 b -25 17.00        2
## 3   Collina 2013 Exp.2  46 22.79        3
## 4     Mahon 2007 Exp.1  17 12.24        4
## 5     Mahon 2007 Exp.2  57 13.96        5
## 6    Mahon 2007 Exp. 4  17  8.01        6
df_buerki <- subset(df_buerki, se > 0.60)

The data are from Bürki et al. (2020). We have a summary of the effect estimates (d) and standard errors (se) of the estimates from 162 published experiments on a phenomenon called semantic picture-word interference. We removed an implausibly low SE in the code above, but the results don’t change regardless of whether we keep them or not, because we have data from a lot of studies.

In this experimental paradigm, subjects are asked to name a picture while ignoring a distractor word (which is either related or unrelated to the picture). The word can be printed on the picture itself, or presented auditorily. The dependent measure is the response latency, or time interval between the presentation of the picture and the onset of the vocal response. Theory says that distractors that come from the same semantic category as the picture to be named lead to a slower response then when the distractor comes from a different semantic category.

Carry out a random effects meta-analysis using brms and display the posterior distribution of the effect, along with the posterior of the between study standard deviation.

Choose \(\mathit{Normal}(0,100)\) priors for the intercept and between study sd parameters. You can also try vague priors (sensitivity analysis). Examples would be:

  • \(\mathit{Normal}(0,200)\)
  • \(\mathit{Normal}(0,400)\)

Exercise 13.2 Measurement error model for English VOT data

Load the following data:

data("df_VOTenglish")
head(df_VOTenglish)
##   subject meanVOT seVOT meanvdur sevdur
## 1     F01   108.1  4.56      171   11.7
## 2     F02    92.5  4.62      189   12.7
## 3     F03    82.6  3.13      171   10.0
## 4     F04    88.3  3.21      168   11.8
## 5     F05    94.6  3.67      166   15.0
## 6     F06    75.9  3.70      176   12.9

You are given mean voice onset time (VOT) data (with SEs) in milliseconds for English, along with mean vowel durations (with SEs) in milliseconds. Fit a measurement-error model investigating the effect of mean vowel duration on mean VOT duration. First plot the relationship between the two variables; does it look like there is an association between the two?

Then use brms with measurement error included in both the dependent and independent variables. Do a sensitivity analysis to check the influence of the priors on the posteriors of the relevant parameters.

References

Bürki, Audrey, Francois-Xavier Alario, and Shravan Vasishth. 2023. “When Words Collide: Bayesian Meta-Analyses of Distractor and Target Properties in the Picture-Word Interference Paradigm.” Quarterly Journal of Experimental Psychology 76 (6): 1410–30. https://doi.org/https://doi.org/10.1177/17470218221114644.

Bürki, Audrey, Shereen Elbuy, Sylvain Madec, and Shravan Vasishth. 2020. “What Did We Learn from Forty Years of Research on Semantic Interference? A Bayesian Meta-Analysis.” Journal of Memory and Language 114. https://doi.org/10.1016/j.jml.2020.104125.

Conway, Andrew R. A., Michael J. Kane, Michael F. Bunting, D. Zach Hambrick, Oliver Wilhelm, and Randall W. Engle. 2005. “Working Memory Span Tasks: A Methodological Review and User’s Guide.” Psychonomic Bulletin & Review 12 (5): 769–86.

Cox, Christopher Martin Mikkelsen, Tamar Keren-Portnoy, Andreas Roepstorff, and Riccardo Fusaroli. 2022. “A Bayesian Meta-Analysis of Infants’ Ability to Perceive Audio–Visual Congruence for Speech.” Infancy 27 (1): 67–96.

Denckla, Martha Bridge, and Rita G. Rudel. 1976. “Rapid ‘Automatized’ Naming (R.A.N.): Dyslexia Differentiated from Other Learning Disabilities.” Neuropsychologia 14 (4): 471–79. https://doi.org/https://doi.org/10.1016/0028-3932(76)90075-0.

Higgins, Julian, and Sally Green. 2008. Cochrane Handbook for Systematics Reviews of Interventions. New York: Wiley-Blackwell.

Jäger, Lena A., Felix Engelmann, and Shravan Vasishth. 2017. “Similarity-Based Interference in Sentence Comprehension: Literature review and Bayesian meta-analysis.” Journal of Memory and Language 94: 316–39. https://doi.org/https://doi.org/10.1016/j.jml.2017.01.004.

Mahowald, Kyle, Ariel James, Richard Futrell, and Edward Gibson. 2016. “A Meta-Analysis of Syntactic Priming in Language Production.” Journal of Memory and Language 91: 5–27. https://doi.org/https://doi.org/10.1016/j.jml.2016.03.009.

McElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Boca Raton, Florida: Chapman; Hall/CRC.

Nicenboim, Bruno, Timo B. Roettger, and Shravan Vasishth. 2018. “Using Meta-Analysis for Evidence Synthesis: The case of incomplete neutralization in German.” Journal of Phonetics 70: 39–55. https://doi.org/https://doi.org/10.1016/j.wocn.2018.06.001.

Nicenboim, Bruno, Shravan Vasishth, Felix Engelmann, and Katja Suckow. 2018. “Exploratory and Confirmatory Analyses in Sentence Processing: A case study of number interference in German.” Cognitive Science 42 (S4). https://doi.org/10.1111/cogs.12589.

Nicenboim, Bruno, Shravan Vasishth, and Frank Rösler. 2020. “Are Words Pre-Activated Probabilistically During Sentence Comprehension? Evidence from New Data and a Bayesian Random-Effects Meta-Analysis Using Publicly Available Data.” Neuropsychologia 142. https://doi.org/10.1016/j.neuropsychologia.2020.107427.

Spector, Tim D., and Simon G. Thompson. 1991. “The Potential and Limitations of Meta-Analysis.” Journal of Epidemiology and Community Health 45 (2): 89.

Sutton, Alexander J., Nicky J. Welton, Nicola Cooper, Keith R. Abrams, and A. E. Ades. 2012. Evidence Synthesis for Decision Making in Healthcare. Vol. 132. John Wiley & Sons.

Turner, R. M., David J. Spiegelhalter, G. Smith, and Simon G. Thompson. 2008. “Bias Modelling in Evidence Synthesis.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 172 (1): 21–47.

Vasishth, Shravan, Zhong Chen, Qiang Li, and Gueilan Guo. 2013. “Processing Chinese Relative Clauses: Evidence for the Subject-Relative Advantage.” PLoS ONE 8 (10): 1–14.