# Chapter 13 Meta-analysis and measurement error models

In this chapter, we introduce two relatively underutilized models that are potentially very important for cognitive science: meta-analysis and measurement-error models.

Meta-analysis can be very informative when carrying out systematic reviews, and measurement-error models are able to take into account uncertainty in one’s dependent or independent variable (or both). What’s common to these two classes of model is that they both assume that the \(n\)-th measured data point \(y_n\) has an unknown true value of a parameter, say \(\zeta_n\) (pronounced *zeta en*), that is measured with with some uncertainty that can be represented by the standard error \(SE_n\) of the measurement \(y_n\):

\(y_n \sim \mathit{Normal}(\zeta_n,SE_n)\)

In both classes of model, the goal is to obtain a posterior distribution of a latent parameter \(\zeta\) which is assumed to generate the \(\zeta_n\), with some standard deviation \(\tau\). The parameter \(\tau\) quantifies the noise in the measurement process or the between-study variability in a meta-analysis.

\(\zeta_n \sim \mathit{Normal}(\zeta,\tau)\)

The main parameter of interest is usually \(\zeta\), but the posterior distributions of \(\tau\) and \(\zeta_n\) can also be informative. The above model specification should remind you of the hierarchical models we saw in earlier chapters.

## 13.1 Meta-analysis

Once a number of studies have accumulated on a particular topic, it can be very informative to synthesize the data. Here is a commonly used approach- a random-effects meta-analysis.

### 13.1.1 A meta-analysis of similarity-based interference in sentence comprehension

The model is set up as follows. For each study \(n\), let effect\(_n\) be the effect of interest, and let \(SE_n\) be the standard error of the effect. A concrete example of a recent meta-analysis is the effect of similarity-based interference in sentence comprehension (Jäger, Engelmann, and Vasishth 2017); when two nouns are more similar to each other, there is greater processing difficulty (i.e., longer reading times in milliseconds) when an attempt is made to retrieve one of the nouns to complete a linguistic dependency (such as a subject-verb dependency). The estimate of the effect and its standard error is the information we have from each study \(n\).

First, load the data, and add an id variable that identifies each experiment.

```
## # A tibble: 12 × 4
## publication effect SE study_id
## <chr> <int> <int> <int>
## 1 VanDyke07E1LoSem 13 30 1
## 2 VanDyke07E2LoSem 37 21 2
## 3 VanDyke07E3LoSem 20 11 3
## # … with 9 more rows
```

The effect size and standard errors were estimated from published summary statistics in the respective article. In some cases, this involved a certain amount of guesswork; the details are documente d in the online material accompanying Jäger, Engelmann, and Vasishth (2017).

We begin with the assumption that there is a true (unknown) effect \(\zeta_n\) that lies behind each of these studies. Each of the observed effects has an uncertainty associated with it, \(SE_n\). We can therefore assume that each observed effect, effect\(_n\), is generated as follows:

\[\begin{equation} \text{effect}_n \sim \mathit{Normal}(\zeta_n,SE_n) \end{equation}\]

Each study is assumed to have a different true effect \(\zeta_n\) because each study will have been carried out under different conditions: in a different lab with different protocols and workflows, with different subjects, with different languages, with slightly different experimental designs, etc.

Further, each of the true underlying effects \(\zeta_n\) has behind it some true unknown value \(\zeta\). The parameter \(\zeta\) represents the underlying effect of similarity-based interference across experiments. Our goal is to obtain the posterior distribution of this overall effect.

We can write the above statement as follows:

\[\begin{equation} \zeta_n \sim\mathit{Normal}(\zeta,\tau) \end{equation}\]

\(\tau\) is the between-study standard deviation; this expresses the assumption that there will be some variability between the true effects \(\zeta_n\).

To summarize the model:

- effect\(_n\) is the observed effect (in this example, in milliseconds) in the \(n\)-th study.
- \(\zeta_n\) is the true (unknown) effect in each study.
- \(\zeta\) is the true (unknown) effect of the experimental manipulation, namely, the similarity-based interference effect.
- Each \(SE_n\) is estimated from the standard error available from study \(n\).
- The parameter \(\tau\) represents between-study standard deviation.

We can construct a hierarchical model as follows:

\[\begin{equation} \begin{aligned} \text{effect}_n \sim & \mathit{Normal}(\zeta_n, SE_n) \quad n=1,\dots, N_{studies}\\ \zeta_n \sim & \mathit{Normal}(\zeta, \tau) \\ \zeta \sim & \mathit{Normal}(0,100)\\ \tau \sim & \mathit{Normal}_+(0,100) \end{aligned} \tag{13.1} \end{equation}\]

The priors are based on domain knowledge; it seems reasonable to allow the effect to range a priori from \(-200\) to \(+200\) ms with probability \(95\)%. Of course, a sensitivity analysis is necessary (but skipped here).

This model can be implemented in `brms`

in a relatively straightforward way as shown below. We show the Stan version later in the chapter (section 13.1.1.2); the Stan version presents some interesting challenges that can be useful for the reader interested in deepening their Stan modeling knowledge.

#### 13.1.1.1 `brms`

version of the meta-analysis model

First, define the priors:

Fit the model as follows. Because of our relatively uninformative priors and the few data points, the models of this chapter require us to tune the `control`

parameter, increasing `adapt_delta`

and `max_treedepth`

.

```
fit_sbi <- brm(effect | resp_se(`SE`, sigma = FALSE) ~ 1
+ (1 | study_id),
data = df_sbi,
prior = priors,
control = list(
adapt_delta = .99,
max_treedepth = 10))
```

The posterior of \(\zeta\) and \(\tau\) are summarized below as `Intercept`

and `sd(Intercept)`

.

```
## ...
## Group-Level Effects:
## ~study_id (Number of levels: 12)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(Intercept) 11.82 7.59 0.81 29.17 1.00 962
## Tail_ESS
## sd(Intercept) 1543
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 13.37 6.19 2.91 27.78 1.00 1460 1633
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.00 0.00 0.00 0.00 NA NA NA
##
## ...
```

The `sigma`

parameter does not play any role in this model, but appears in the `brms`

output anyway. In the model specification, `sigma`

was explicitly removed by writing `sigma = FALSE`

. For this reason, we can ignore that parameter in the model summary output above. Box 13.1 explains what happens if we set `sigma = TRUE`

.

As theory predicts, the overall effect from these studies has a positive sign.

One advantage of such a meta-analysis is that the posterior can now be used as an informative prior for a future study. This is especially important when doing an analysis using Bayes factors. But this meta-analysis posterior could also be used as an informative prior in a future experiment; that would allow the researcher to build on what is known so far from published studies.

**Box 13.1 ****What happens if we set sigma = TRUE?**

If we set `sigma = TRUE`

, we won’t be able to get estimates for \(\zeta_{n}\), since they are handled implicitly. The model presented formally in equation (13.1) is equivalent to the following one in (13.2). A critical difference is that \(\zeta_n\) does not appear any more.

\[\begin{equation} \begin{aligned} \text{effect}_n \sim &\mathit{Normal}(\zeta, \sqrt{\tau^2 + SE_n^2} )\\ \zeta \sim &\mathit{Normal}(0,100)\\ \tau \sim &\mathit{Normal}_+(0,100) \end{aligned} \tag{13.2} \end{equation}\]

This works because of the following property of normally distributed random variables:

If \(X\) and \(Y\) are two independent random variables, and

\[\begin{equation} \begin{aligned} X &\sim \mathit{Normal}(\mu_X, \sigma_X)\\ Y &\sim \mathit{Normal}(\mu_Y, \sigma_Y) \end{aligned} \tag{13.3} \end{equation}\]

then, \(Z\), the sum of these two random variables is:

\[\begin{equation} Z = X + Y \tag{13.4} \end{equation}\]

The distribution of \(Z\) has the following form:

\[\begin{equation} Z \sim\mathit{Normal}\left(\mu_X + \mu_Y, \sqrt{\sigma_X^2 + \sigma_Y^2}\right) \tag{13.5} \end{equation}\]

In our case, let

\[\begin{equation} \begin{aligned} U_{n} &\sim \mathit{Normal}(0 , SE_n)\\ \zeta_n &\sim \mathit{Normal}(\zeta, \tau) \end{aligned} \tag{13.6} \end{equation}\]

Analogous to equations (13.4) and (13.5), effect\(_n\) can be expressed as a sum of two independent random variables:

\[\begin{equation} \text{effect}_n = U_n + \zeta_n \end{equation}\]

The distribution of effect\(_n\) will be

\[\begin{equation} \text{effect}_n \sim\mathit{Normal}\left(\zeta, \sqrt{SE^2 + \tau^2}\right) \tag{13.7} \end{equation}\]

We can fit this in `brms`

as follows. In this model specification, one should not include the + `(1 | study_id)`

, and the prior for \(\tau\) should now be specified for `sigma`

.

```
priors2 <- c(prior(normal(0, 100), class = Intercept),
prior(normal(0, 100), class = sigma))
fit_sbi_sigma <- brm(effect | resp_se(`SE`, sigma = TRUE) ~ 1,
data = df_sbi,
prior = priors2,
control = list(
adapt_delta = .99,
max_treedepth = 10))
```

There are slight differences with `fit_sbi`

due to the different parameterization and the sampling process, but the results are very similar:

```
## Estimate Est.Error Q2.5 Q97.5
## b_Intercept 13.4 6.07 3.028 27.6
## sigma 11.6 7.63 0.626 29.5
```

If we are not interested in the underlying effects in each study, this parameterization of the meta-analysis can be faster and more robust (i.e., it has less potential convergence issues). A major drawback is that we can no longer display a forest plot as we do in Figure 13.1.

Another interesting by-product of a random-effects meta-analysis is the possibility of displaying a forest plot (Figure 13.1). A forest plot shows the meta-analytic estimate (the parameter `b_Intercept`

in `brms`

) alongside the original estimates effect\(_n\) (and their SE\(_n\)) and the posterior distributions of the \(\zeta_n\) for each study (we reconstruct these estimates by adding `b_Intercept`

to the parameters starting with `r_`

in `brms`

). The original estimates are the ones fed to the model as data and the posterior distributions of the \(\zeta_n\) are calculated, as in previous hierarchical models, after the information from all studies is pooled together. The \(\zeta_n\) estimates are shrunken estimates of each study’s (unknown) true effect, shrunken towards the grand mean \(\zeta\), and weighted by the standard error observed in each study \(n\). The \(\zeta_n\) for a particular study is shrunk more towards the grand mean \(\zeta\) when the study’s standard error is large (i.e., when the estimate is very imprecise). The code below shows how to build a forest plot step by step.

First, change the format of the data so that it looks like the output of `brms`

:

```
df_sbi <- df_sbi %>%
mutate(Q2.5 = effect - 2 * SE,
Q97.5 = effect + 2 * SE,
Estimate = effect,
type = "original")
```

Extract the meta-analytical estimate:

```
df_Intercept <- posterior_summary(fit_sbi,
variable = c("b_Intercept")
) %>%
as.data.frame() %>%
mutate(publication = "M.A. estimate", type = "")
```

For the pooled estimated effect (or fitted value) of the individual studies, we need the sum of the meta-analytical estimate (intercept) and each of the by-study adjustment. Obtain this with the `fitted()`

function:

```
df_model <- fitted(fit_sbi) %>%
# Convert matrix to data frame:
as.data.frame() %>%
# Add a column to identify the estimates,
# and another column to identify the publication:
mutate(type = "adjusted",
publication = df_sbi$publication)
```

Bind the observed effects, the meta-analytical estimate, and the fitted values of the studies together, and plot the data:

```
# the adjusted estimates and the meta-analysis estimate:
bind_rows(df_sbi, df_model, df_Intercept) %>%
# Plot:
ggplot(aes(x = Estimate,
y = publication,
xmin = Q2.5,
xmax = Q97.5,
color = type)) +
geom_point(position = position_dodge(.5)) +
geom_errorbarh(position = position_dodge(.5)) +
# Add the meta-analytic estimate and Credible Interval:
geom_vline(xintercept = df_Intercept$Q2.5,
linetype = "dashed",
alpha = .3) +
geom_vline(xintercept = df_Intercept$Q97.5,
linetype = "dashed",
alpha = .3) +
geom_vline(xintercept = df_Intercept$Estimate,
linetype = "dashed",
alpha = .5) +
scale_color_discrete(breaks = c("adjusted", "original"))
```

It is important to keep in mind that a meta-analysis is always going to yield biased estimates as long as we have publication bias: if a field has a tendency to allow only “big news” studies to be published, then the literature that will appear in the public domain will be biased, and any meta-analysis based on such information will be biased. Despite this limitation, a meta-analysis is still a useful way to synthesize the known evidence; one just has to remember that the estimate from the meta-analysis is almost certain to be biased.

#### 13.1.1.2 Stan version of the meta-analysis model

Even though `brms`

can handle meta-analyses, fitting them in Stan allows us for more flexibility, which might be necessary in some cases. As a first attempt we could build a model that closely follows the formal specification given in equation (13.1).

```
data {
int<lower=1> N;
vector[N] effect;
vector[N] SE;
vector[N] study_id;
}
parameters {
real zeta;
real<lower = 0> tau;
vector[N] zeta_n;
}
model {
target += normal_lpdf(effect| zeta_n, SE);
target += normal_lpdf(zeta_n | zeta, tau);
target += normal_lpdf(zeta | 0, 100);
target += normal_lpdf(tau | 0, 100)
- normal_lccdf(0 | 0, 100);
}
```

Fit the model as follows:

```
ma0 <- system.file("stan_models",
"meta-analysis0.stan",
package = "bcogsci"
)
ls_sbi <- list(N = nrow(df_sbi),
effect = df_sbi$effect,
SE = df_sbi$SE,
study_id = df_sbi$study_id)
fit_sbi0 <- stan(ma0,
data = ls_sbi,
control = list(
adapt_delta = .999,
max_treedepth = 12))
```

```
## Warning: There were 1 divergent transitions after
## warmup. See
## https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate
## them.
```

```
## Warning: There were 1 chains where the estimated
## Bayesian Fraction of Missing Information was low. See
## https://mc-stan.org/misc/warnings.html#bfmi-low
```

```
## Warning: Examine the pairs() plot to diagnose sampling
## problems
```

```
## Warning: Bulk Effective Samples Size (ESS) is too low,
## indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## https://mc-stan.org/misc/warnings.html#bulk-ess
```

```
## Warning: Tail Effective Samples Size (ESS) is too low,
## indicating posterior variances and tail quantiles may be
## unreliable. Running the chains for more iterations may
## help. See
## https://mc-stan.org/misc/warnings.html#tail-ess
```

We see that there are warnings. As discussed in section 11.1.2, we can use pairs plots to uncover pathologies in the sampling. Here we see the samples of `zeta`

and `tau`

are highly correlated:

We face a similar problem as we faced in section 11.1.2, namely, the sampler cannot properly explore the neck of the funnel-shaped space because, because of the strong correlation between the parameters. The solution is, as in section 11.1.2, a non-centered parameterization. Re-write (13.1) as follows:

\[\begin{equation} \begin{aligned} z_n & \sim \mathit{Normal}(0, 1)\\ \zeta_n &= z_n \cdot \tau + \zeta \\ \text{effect}_n & \sim \mathit{Normal}(\zeta_n, SE_n)\\ \zeta &\sim \mathit{Normal}(0,100)\\ \tau &\sim \mathit{Normal}_+(0,100) \end{aligned} \tag{13.8} \end{equation}\]

This works because if \(X \sim\mathit{Normal}(a, b)\) and \(Y \sim \mathit{Normal}(0, 1)\), then \(X = a + Y \cdot b\). You can re-visit chapter 11.1.2 for more details.

Translate equation (13.8) into Stan code as follows in `meta-analysis1.stan`

:

```
data {
int<lower=1> N;
vector[N] effect;
vector[N] SE;
vector[N] study_id;
}
parameters {
real zeta;
real<lower = 0> tau;
vector[N] z;
}
transformed parameters {
vector[N] zeta_n = z * tau + zeta;
}
model {
target += normal_lpdf(effect| zeta_n, SE);
target += std_normal_lpdf(z);
target += normal_lpdf(zeta | 0, 100);
target += normal_lpdf(tau | 0, 100)
- normal_lccdf(0 | 0, 100);
}
```

The model converges with values virtually identical to the ones of the `brms`

model.

```
ma1 <- system.file("stan_models",
"meta-analysis1.stan",
package = "bcogsci")
fit_sbi1 <- stan(ma1,
data = ls_sbi,
control = list(adapt_delta = .999,
max_treedepth = 12))
```

```
## mean 2.5% 97.5% n_eff Rhat
## zeta 13.3 2.76 27.9 1132 1
## tau 11.5 0.65 29.5 829 1
```

We can also reparameterize the model slightly differently, if we set \(U_{n} \sim\mathit{Normal}(0 , SE_n)\) then,

\[\begin{equation} \text{effect}_n = U_n + \zeta_n \end{equation}\]

Then, given that \(\zeta_n \sim \mathit{Normal}(\zeta, \tau)\),

\[\begin{equation} \text{effect}_n \sim \mathit{Normal}(\zeta, \sqrt{SE^2 + \tau^2}) \tag{13.9} \end{equation}\]

See Box 13.1 if it’s not clear why this reparameterization works.

This is equivalent to the `brms`

model where `sigma = TRUE`

. As with `brms`

, we lose the possibility of estimating the posterior of the true effect of the individual studies.

Write this in Stan as follows; this code is available in the file `meta-analysis2.stan`

within the `bcogsci`

package:

```
data {
int<lower=1> N;
vector[N] effect;
vector[N] SE;
vector[N] study_id;
}
parameters {
real zeta;
real<lower = 0> tau;
}
model {
target += normal_lpdf(effect| zeta, sqrt(square(SE) + square(tau)));
target += normal_lpdf(zeta | 0, 100);
target += normal_lpdf(tau | 0, 100)
- normal_lccdf(0 | 0, 100);
}
```

Fit the model:

```
ma2 <- system.file("stan_models",
"meta-analysis2.stan",
package = "bcogsci"
)
fit_sbi2 <- stan(ma2,
data = ls_sbi,
control = list(adapt_delta = .9))
```

```
## mean 2.5% 97.5% n_eff Rhat
## zeta 13.1 1.82 27.5 844 1
## tau 11.8 0.58 30.1 1098 1
```

This summary could be reported in an article by displaying the posterior means and 95% credible intervals of the parameters.

## 13.2 Measurement-error models

Measurement error models deal with the situation where some predictor or the dependent variable, or both, are observed with measurement error. This measurement error could arise because a variable is an average (i.e., its standard error can also be estimated), or because we know that our measurement is noisy due to limitations of our equipment (e.g., delays in the signal from the keyboard to the motherboard, impedance in the electrodes in an EEG system, etc.).

### 13.2.1 Accounting for measurement error in individual differences in working memory capacity and reading fluency

As a motivating example, consider the following data from Nicenboim, Vasishth, et al. (2018). For each subject, we have the partial-credit unit (PCU) scores of an operation span task as a measure of their working memory capacity (Conway et al. 2005) along with their standard error. In addition, the reading fluency of each subject is calculated from a separate set of data based on the mean reading speeds (character/second) in a rapid automatized naming task (RAN, Denckla and Rudel 1976); the standard error of the reading speed is also available.

Of interest here is the extent of the association between working memory capacity (measured as PCU) and reading fluency (measured as reading speed in 50 characters per second). We avoid making any causal claims: It could be that our measure of working memory capacity really affects reading fluency or it could be the other way around, a third possibility is that there is a third variable (or several) that affects both reading fluency and working memory capacity. A treatment of causality in Bayesian models can be found in chapters 5 and 6 of McElreath (2020).

```
## # A tibble: 100 × 5
## subj mean_rspeed se_rspeed mean_pcu se_pcu
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 0.0521 0.00113 0.738 0.0648
## 2 2 0.0479 0.00121 0.292 0.0315
## 3 3 0.0601 0.00117 0.408 0.0900
## # … with 97 more rows
```

At first glance, we see a relationship between mean PCU scores and mean reading speed; see Figure 13.2. However, this relationship seems to be driven by two extremes data points on the top left corner of the plot.

```
df_indiv <- df_indiv %>%
mutate(c_mean_pcu = mean_pcu - mean(mean_pcu))
ggplot(df_indiv, aes(x = c_mean_pcu, y = mean_rspeed)) +
geom_point() +
geom_smooth(method = "lm")
```

A simple linear model shows a somewhat weak association between mean reading speed and centered mean PCU. The priors are relatively arbitrary but they are in the right order of magnitude given that reading speeds are quite short and well below 1.

```
df_indiv <- df_indiv %>%
mutate(c_mean_pcu = mean_pcu - mean(mean_pcu))
priors <- c(
prior(normal(0, 0.5), class = Intercept),
prior(normal(0, 0.5), class = b),
prior(normal(0, 0.5), class = sigma)
)
fit_indiv <- brm(mean_rspeed ~ c_mean_pcu,
data = df_indiv,
family = gaussian(),
prior = priors
)
```

```
## ...
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept 0.06 0.00 0.05 0.06 1.00 5429
## c_mean_pcu -0.01 0.01 -0.03 0.00 1.00 2376
## Tail_ESS
## Intercept 3209
## c_mean_pcu 2288
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.01 0.00 0.01 0.01 1.00 2177 2084
##
## ...
```

`## [1] 0.942`

Figure 13.3 shows the posterior distribution of the slope in this model. Most of the probability mass is negative (94.2%), suggesting that a better PCU score is associated with slower reading speed rather than faster; that is, that a larger working memory capacity is associated with less reading fluency. This is not a very intuitive result and it could be the case that is driven by the two extreme data points. Rather than removing these data points, we’ll examine what happens when the uncertainty of the measurements is taken into account.

Taking this uncertainty of the measurement is important;

in many practical research problems, researchers will often take average measurements like these and examine the correlation between them. However, each of those data points is being measured with some error (uncertainty), but this error is being ignored when we take the averaged values. Ignoring this uncertainty leads to over-enthusiastic inferences. A measurement-error model solves this issue by taking uncertainty into account.

The measurement error model is stated as follows. There is assumed to be a true unobserved value \(y_{n,TRUE}\) for the dependent variable, and a true unobserved value \(x_{n,TRUE}\) for the predictor, where \(n\) is indexing the observation number. The observed values \(y_n\) and the predictor \(x_n\) are assumed to be generated with some error:

\[\begin{equation} \begin{aligned} y_n &\sim\mathit{Normal}(y_{n,TRUE},SE_y) \\ x_n &\sim\mathit{Normal}(x_{n,TRUE},SE_x) \end{aligned} \end{equation}\]

The regression is fit to the (unknown) *true* values of the dependent and independent variables:

\[\begin{equation} y_{n,TRUE} \sim\mathit{Normal}(\alpha + \beta x_{n,TRUE},\sigma) \tag{13.10} \end{equation}\]

In addition, there is also an unknown standard deviation (standard error) of the latent unknown means that generate the underlying PCU means. I.e., we assume that each of the observed centered PCU scores are normally distributed with an underlying mean, \(\chi\), and a standard deviation \(\tau\). This is very similar to the meta-analysis situation we saw earlier: \(\zeta_n \sim\mathit{Normal}(\zeta,\tau)\), where \(\zeta_n\) was the true latent mean of each study, and \(\zeta\) was the (unknown) true value of the parameter, and \(\tau\) was the between-study variability.

\[\begin{equation} x_{n,TRUE} \sim\mathit{Normal}(\chi,\tau) \end{equation}\]

The goal of the modeling is to obtain posterior distributions for the intercept and slope \(\alpha\) and \(\beta\) (and the residual error standard deviation \(\sigma\)).

We need to decide on priors for all the parameters now. We use relatively vague priors, which can still be considered regularizing priors based on our knowledge of the order of magnitude of the measurements. In situations where not much is known about a research question, one could use such vague priors.

\[\begin{equation} \begin{aligned} \alpha &\sim\mathit{Normal}(0, 0.5)\\ \beta &\sim\mathit{Normal}(0, 0.5)\\ \chi &\sim\mathit{Normal}(0, 0.5)\\ \sigma &\sim\mathit{Normal}_+(0, 0.5)\\ \tau &\sim\mathit{Normal}_+(0, 0.5) \end{aligned} \tag{13.11} \end{equation}\]

#### 13.2.1.1 The `brms`

version of the measurement error model

In `brms`

, the model specification would be as follows:

```
priors_me <- c(
prior(normal(0, 0.5), class = Intercept),
prior(normal(0, 0.5), class = b),
prior(normal(0, 0.5), class = meanme),
prior(normal(0, 0.5), class = sdme),
prior(normal(0, 0.5), class = sigma)
)
```

Here the parameter with class `meanme`

and `sdme`

refer to the unknown mean and standard deviation (standard error) of the latent unknown means that generate the underlying PCU means, \(\chi\) and \(\tau\) in (13.11). Once we decide on the priors, we use `resp_se(.)`

with `sigma = TRUE`

(i.e, we don’t estimate \(y_{n,TRUE}\) explicitly) and we use `me(c_meanpcu, se_pcu)`

to indicate that the dependent variable `c_mean_pcu`

is measured with error and `se_pcu`

is its SE.

```
fit_indiv_me <- brm(mean_rspeed | resp_se(se_rspeed, sigma = TRUE) ~
me(c_mean_pcu, se_pcu),
data = df_indiv,
family = gaussian(),
prior = priors_me)
```

```
## ...
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept 0.05 0.00 0.05 0.06 1.00 4089
## mec_mean_pcuse_pcu -0.00 0.01 -0.01 0.01 1.00 6359
## Tail_ESS
## Intercept 2684
## mec_mean_pcuse_pcu 3315
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.01 0.00 0.01 0.01 1.00 6813 3025
##
## ...
```

```
# Proportion of samples below zero
# Parameter names can be found out with `variables(fit_indiv_me)`
(Pb_me <- mean(as_draws_df(fit_indiv_me)$bsp_mec_mean_pcuse_pcu < 0))
```

`## [1] 0.622`

The posterior for the slope is plotted in Figure 13.4; this figure shows that the association between PCU scores and reading speed is much weaker once measurement error is taken into account: The posterior is much more uncertain (much more widely distributed) than in the simple linear model we fit above (compare Figures 13.4 with 13.3), and the direction of the association is now unclear, with 62 % of the probability mass below zero, rather than 94%.

Figure 13.5 visualizes the main reason why we have no clear association in the measurement error analysis, the two points at the top left part of the plot that were driving the effect have very large SE for the measurement of reading speed. The code to produce it appears below and overlays several (250) regression lines that correspond to different samples of the posterior distribution with the measurements of reading speed and PCU.

```
df_reg <- as_draws_df(fit_indiv_me) %>%
select(alpha = b_Intercept, beta = bsp_mec_mean_pcuse_pcu) %>%
slice(1:250)
ggplot(df_indiv, aes(x = c_mean_pcu, y = mean_rspeed)) +
geom_point() +
geom_errorbarh(aes(xmin = c_mean_pcu - 2 * se_pcu,
xmax = c_mean_pcu + 2 * se_pcu),
alpha = .5, linetype = "dotted") +
geom_errorbar(aes(ymin = mean_rspeed - 2 * se_rspeed,
ymax = mean_rspeed + 2 * se_rspeed),
alpha = .5, linetype = "dotted") +
geom_abline(aes(intercept = alpha, slope = beta),
data = df_reg,
alpha = .05)
```

Of course, the conclusion here cannot be that there is no association between PCU scores and reading speed. In order to claim for an absence of an effect we would need to use Bayes factor (see chapter 15) or cross validation (see chapter 16).

#### 13.2.1.2 The Stan version of the measurement error model

As it happened when we modeled the meta-analysis, the main difficulty for modeling measurement error models directly in Stan is that we need to reparameterize the models to avoid correlations between samples of different parameters. The two changes that we need to do to the parameterization of our model presented in equation (13.11) are the following.

- Sample from an auxiliary parameter \(z_n\) rather than directly from \(x_{n,TRUE}\), as we did in (13.8):

\[\begin{equation} \begin{aligned} z_n & \sim\mathit{Normal}(0, 1)\\ x_{n,TRUE} &= z_n \cdot \tau + \chi \\ x_n & \sim \mathit{Normal}(x_{n,TRUE}, SE_x) \end{aligned} \end{equation}\]

- Don’t model \(y_{n,TRUE}\) explicitly as in (13.10); rather take into account the SE and the variation on \(y_{n,TRUE}\) in the following way:

\[\begin{equation} y_{n} \sim\mathit{Normal}\left(\alpha + \beta x_{n,TRUE},\sqrt{SE_y^2 + \sigma^2}\right) \end{equation}\]

We are now ready to write this in Stan; the code is in the model called `me.stan`

:

```
data {
int<lower=1> N;
vector[N] x;
vector[N] SE_x;
vector[N] y;
vector[N] SE_y;
}
parameters {
real alpha;
real beta;
real chi;
real<lower = 0> sigma;
real<lower = 0> tau;
vector[N] z;
}
transformed parameters {
vector[N] x_true = z * tau + chi;
}
model {
target += normal_lpdf(x | x_true, SE_x);
target += normal_lpdf(y | alpha + beta * x_true,
sqrt(square(SE_y) + square(sigma)));
target += std_normal_lpdf(z);
target += normal_lpdf(alpha | 0, 0.5);
target += normal_lpdf(beta | 0, 0.5);
target += normal_lpdf(chi | 0, 0.5);
target += normal_lpdf(sigma| 0, 0.5)
- normal_lccdf(0 | 0, 0.5);
target += normal_lpdf(tau | 0, 0.5)
- normal_lccdf(0 | 0, 0.5);
}
```

Fit the model:

```
me <- system.file("stan_models",
"me.stan",
package = "bcogsci")
ls_me <- list(N = nrow(df_indiv),
y = df_indiv$mean_rspeed,
SE_y = df_indiv$se_rspeed,
x = df_indiv$c_mean_pcu,
SE_x = df_indiv$se_pcu)
fit_indiv_me_stan <- stan(me, data = ls_me)
```

```
## mean 2.5% 97.5% n_eff Rhat
## alpha 0.05 0.05 0.06 3719 1
## beta 0.00 -0.01 0.01 6488 1
## sigma 0.01 0.01 0.01 6378 1
```

The posterior distributions are similar to those that we obtained with `brms`

.

## 13.3 Summary

This chapter introduced two statistical tools that are potentially of great relevance to cognitive science: random-effects meta-analysis and measurement error models. Despite the inherent limitations of meta-analysis, these should be used routinely to accumulate knowledge through systematic evidence synthesis. Measurement errors can also prevent over-enthusiastic conclusions that are often made based on noisy data.

## 13.4 Further reading

For some examples of Bayesian meta-analyses in psycholinguistics, see Vasishth et al. (2013), Jäger, Engelmann, and Vasishth (2017), Nicenboim, Roettger, and Vasishth (2018), Nicenboim, Vasishth, and Rösler (2020a), Bürki et al. (2020), Cox et al. (2022), and Bürki, Alario, and Vasishth (2022). A frequentist meta-analysis of priming effects in psycholinguistics appears in Mahowald et al. (2016). Sutton et al. (2012) and Higgins and Green (2008) are two useful general introductions that discuss systematic reviews, meta-analysis, and evidence synthesis; these two references are from medicine, where meta-analysis is more widely used than in cognitive science. A potentially important article for meta-analysis introduces a methodology for modeling bias, to adjust for different kinds of bias in the data (Turner et al. 2008).

## 13.5 Exercises

**Exercise 13.1 **A meta-analysis data of picture-word interference data

Load the following data set:

```
## study d se study_id
## 1 Collina 2013 Exp.1 a 24 13.09 1
## 2 Collina 2013 Exp.1 b -25 17.00 2
## 3 Collina 2013 Exp.2 46 22.79 3
## 4 Mahon 2007 Exp.1 17 12.24 4
## 5 Mahon 2007 Exp.2 57 13.96 5
## 6 Mahon 2007 Exp. 4 17 8.01 6
```

The data are from Bürki et al. (2020). We have a summary of the effect estimates (d) and standard errors (se) of the estimates from 162 published experiments on a phenomenon called *semantic picture-word interference*. We removed an implausibly low SE in the code above, but the results don’t change regardless of whether we keep them or not, because we have data from a lot of studies.

In this experimental paradigm, subjects are asked to name a picture while ignoring a distractor word (which is either related or unrelated to the picture). The word can be printed on the picture itself, or presented auditorily. The dependent measure is the response latency, or time interval between the presentation of the picture and the onset of the vocal response. Theory says that distractors that come from the same semantic category as the picture to be named lead to a slower response then when the distractor comes from a different semantic category.

Carry out a random effects meta-analysis using `brms`

and display the posterior distribution of the effect, along with the posterior of the between study standard deviation.

Choose \(\mathit{Normal}(0,100)\) priors for the intercept and between study sd parameters. You can also try vague priors (sensitivity analysis). Examples would be:

- \(\mathit{Normal}(0,200)\)
- \(\mathit{Normal}(0,400)\)

**Exercise 13.2 **Measurement error model for English VOT data

Load the following data:

```
## subject meanVOT seVOT meanvdur sevdur
## 1 F01 108.1 4.56 171 11.7
## 2 F02 92.5 4.62 189 12.7
## 3 F03 82.6 3.13 171 10.0
## 4 F04 88.3 3.21 168 11.8
## 5 F05 94.6 3.67 166 15.0
## 6 F06 75.9 3.70 176 12.9
```

You are given mean voice onset time (VOT) data (with SEs) in milliseconds for English, along with mean vowel durations (with SEs) in milliseconds. Fit a measurement-error model investigating the effect of mean vowel duration on mean VOT duration. First plot the relationship between the two variables; does it look like there is an association between the two?

Then use `brms`

with measurement error included in both the dependent and independent variables. Do a sensitivity analysis to check the influence of the priors on the posteriors of the relevant parameters.

### References

Bürki, Audrey, Francois-Xavier Alario, and Shravan Vasishth. 2022. “When Words Collide: Bayesian Meta-Analyses of Distractor and Target Properties in the Picture-Word Interference Paradigm.” *Quarterly Journal of Experimental Psychology*.

Bürki, Audrey, Shereen Elbuy, Sylvain Madec, and Shravan Vasishth. 2020. “What Did We Learn from Forty Years of Research on Semantic Interference? A Bayesian Meta-Analysis.” *Journal of Memory and Language*. https://doi.org/10.1016/j.jml.2020.104125.

Conway, Andrew RA, Michael J Kane, Michael F Bunting, D Zach Hambrick, Oliver Wilhelm, and Randall W Engle. 2005. “Working Memory Span Tasks: A Methodological Review and User’s Guide.” *Psychonomic Bulletin & Review* 12 (5). Springer: 769–86.

Cox, Christopher Martin Mikkelsen, Tamar Keren-Portnoy, Andreas Roepstorff, and Riccardo Fusaroli. 2022. “A Bayesian Meta-Analysis of Infants’ Ability to Perceive Audio–Visual Congruence for Speech.” *Infancy* 27 (1). Wiley Online Library: 67–96.

Denckla, Martha Bridge, and Rita G Rudel. 1976. “Rapid ‘Automatized’naming (Ran): Dyslexia Differentiated from Other Learning Disabilities.” *Neuropsychologia* 14 (4). Elsevier: 471–79.

Higgins, Julian, and Sally Green. 2008. *Cochrane Handbook for Systematics Reviews of Interventions*. New York: Wiley-Blackwell.

Jäger, Lena A., Felix Engelmann, and Shravan Vasishth. 2017. “Similarity-Based Interference in Sentence Comprehension: Literature review and Bayesian meta-analysis.” *Journal of Memory and Language* 94: 316–39. https://doi.org/https://doi.org/10.1016/j.jml.2017.01.004.

Mahowald, Kyle, Ariel James, Richard Futrell, and Edward Gibson. 2016. “A Meta-Analysis of Syntactic Priming in Language Production.” *Journal of Memory and Language* 91. Elsevier: 5–27.

McElreath, Richard. 2020. *Statistical Rethinking: A Bayesian Course with Examples in R and Stan*. Boca Raton, Florida: Chapman; Hall/CRC.

Nicenboim, Bruno, Timo B. Roettger, and Shravan Vasishth. 2018. “Using Meta-Analysis for Evidence Synthesis: The case of incomplete neutralization in German.” *Journal of Phonetics* 70: 39–55. https://doi.org/https://doi.org/10.1016/j.wocn.2018.06.001.

Nicenboim, Bruno, Shravan Vasishth, Felix Engelmann, and Katja Suckow. 2018. “Exploratory and Confirmatory Analyses in Sentence Processing: A case study of number interference in German.” *Cognitive Science* 42 (S4). https://doi.org/10.1111/cogs.12589.

Nicenboim, Bruno, Shravan Vasishth, and Frank Rösler. 2020a. “Are Words Pre-Activated Probabilistically During Sentence Comprehension? Evidence from New Data and a Bayesian Random-Effects Meta-Analysis Using Publicly Available Data.” *Neuropsychologia* 142. https://doi.org/10.1016/j.neuropsychologia.2020.107427.

Sutton, Alexander J, Nicky J Welton, Nicola Cooper, Keith R Abrams, and AE Ades. 2012. *Evidence Synthesis for Decision Making in Healthcare*. Vol. 132. John Wiley & Sons.

Turner, R.M., D.J. Spiegelhalter, G. Smith, and S.G. Thompson. 2008. “Bias Modelling in Evidence Synthesis.” *Journal of the Royal Statistical Society: Series A (Statistics in Society)* 172 (1). Wiley Online Library: 21–47.

Vasishth, Shravan, Zhong Chen, Qiang Li, and Gueilan Guo. 2013. “Processing Chinese Relative Clauses: Evidence for the Subject-Relative Advantage.” *PLoS ONE* 8 (10). Public Library of Science: 1–14.