Introduction to Bayesian Data Analysis for Cognitive Science
Nicenboim, Schad, Vasishth
Be sure to read the textbook’s chapters 1-3 in addition to watching this lecture.
Suppose \(y\) is a vector of continuous responses; assume for now that the \(y\) are independent and identically distributed:
\(y \stackrel{iid}{\sim} Normal(\mu,\sigma)\)
This is the simple linear model:
\(y = \mu + \varepsilon \hbox{ where } \varepsilon \stackrel{iid}{\sim} Normal(0,\sigma)\)
There are two parameters, \(\mu,\sigma\), so we need priors on these. We expand on this simple model next.
The way we will conduct data analysis is as follows:
We will now work through some specific examples to illustrate how the data analysis process works.
More realistically, we might have run the small experiment to find out whether the participant tended to speedup (practice effect) or slowdown (fatigue effect) while pressing the space bar.
We need to add to our list information about the trial id in each observation.
It’s a good idea to center the trial id (a covariate) to have a clearer interpretation of the intercept.
In general, centering predictors is always a good idea, for interpretability and for computational reasons.
See the contrast coding chapters in the book for details on this point.
df_spacebar <- df_spacebar %>%
mutate(c_trial = trial - mean(trial))
Our model changes, because we have a new parameter.
\[\begin{equation} t_n \sim \mathit{Normal}(\alpha + c\_trial_n \cdot \beta,\sigma) \end{equation}\]
where \(n =1 \ldots N\).
We could use the following priors.
\[\begin{equation} \begin{aligned} \alpha &\sim Normal(0, 2000) \\ \beta &\sim Normal(0, 500) \\ \sigma &\sim Normal(0, 500) \text{ truncated so that } \sigma > 0 \\ \end{aligned} \end{equation}\]
We are fitting a linear model, \(\alpha\) represents the intercept (namely, the grand mean of the button-tapping times), and \(\beta\) represents the slope.
What information are the priors encoding?
Do the priors make sense?
We’ll write this in brms as follows.
priors <- c(set_prior("normal(0, 2000)",
class = "Intercept"),
set_prior("normal(0, 500)",
class = "b",
coef="c_trial"),
set_prior("normal(0, 500)",
class = "sigma"))
m2<-brm(t~1+c_trial,df_spacebar,
prior = priors,
iter = 2000, chains = 4,family = gaussian(),
control = list(adapt_delta = 0.99))
library(bayesplot)
#bayesplot::theme_default()
stanplot(m2,type="hist")
The posterior distribution of \(\beta\) is of primary interest here. The summary gives us the relevant information.
m2_post_samp_b <- posterior_samples(m2, "^b")
str(m2_post_samp_b)
## 'data.frame': 4000 obs. of 2 variables:
## $ b_Intercept: num 170 168 168 168 169 ...
## $ b_c_trial : num 0.0874 0.0723 0.0764 0.0938 0.0811 ...
beta_samples <- m2_post_samp_b$b_c_trial
beta_mean<-mean(beta_samples)
quantiles_beta <- quantile(beta_samples,
prob=c(0.025,0.975))
beta_low<-quantiles_beta[1]
beta_high<-quantiles_beta[2]
Examine what happens with \(\beta\). The summary gives us the relevant information:
m2_post_samp_b2 <- posterior_summary(m2, "^b")
round(m2_post_samp_b2,2)
## Estimate Est.Error Q2.5 Q97.5
## b_Intercept 168.65 1.24 166.21 170.98
## b_c_trial 0.09 0.01 0.07 0.11
To do posterior predictive checks for our example above using
brms
, we need to type:
pp_check(m2,nsamples=100)+
theme(text = element_text(size=16),
legend.text=element_text(size=16))
Posterior predictive check of model m2.
mu <- 6
sigma <- 0.5
N <- 100000
# Generate N random samples
## from a log-normal distribution
sl <- rlnorm(N, mu, sigma)
lognormal_plot <- ggplot(data.frame(samples=sl),
aes(sl)) + geom_histogram() +
ggtitle("Log-normal distribution\n") +
ylim(0,25000) + xlim(0,2000) + theme_bw()
# Generate N random samples
# from a normal distribution,
# and then exponentiate them
sn <- exp(rnorm(N, mu, sigma))
normalplot <- ggplot(data.frame(samples=sn),
aes(sn)) + geom_histogram() +
ggtitle("Exponentiated samples of
\n a normal distribution") +
ylim(0,25000) + xlim(0,2000) + theme_bw()
plot(lognormal_plot)
The log-normal distribution.
plot(normalplot)
Exponentiated samples from a log-normal distribution.
If we assume that buttom-tapping times are log-normally distributed, we’ll need to change our model:
\[\begin{equation} t_n \sim LogNormal(\alpha + c\_trial_n \cdot \beta,\sigma) \end{equation}\]
where \(n =1 \ldots N\)
But now the scale of our priors needs to change! They are no longer in milliseconds.
\[\begin{equation} \begin{aligned} \alpha &\sim Normal(0, 10) \\ \beta &\sim Normal(0, 1) \\ \sigma &\sim Normal(0, 2) \text{ truncated so that } \sigma > 0 \\ \end{aligned} \end{equation}\]
priors_log <- c(set_prior("normal(0, 10)",
class = "Intercept"),
set_prior("normal(0, 1)",
class = "b",
coef="c_trial"),
set_prior("normal(0, 2)",
class = "sigma"))
m2_logn<-brm(t~1+ c_trial,df_spacebar,
prior = priors_log,
iter = 2000, chains = 4,family = lognormal(),
control = list(adapt_delta = 0.99,
max_treedepth=15))
## Compiling Stan program...
## Start sampling
m2_logn_post_samp_b2 <- posterior_summary(m2_logn, "^b")
## Warning: Argument 'pars' is deprecated. Please use 'variable' instead.
round(m2_logn_post_samp_b2,4)
## Estimate Est.Error Q2.5 Q97.5
## b_Intercept 5.1181 0.0067 5.1052 5.1310
## b_c_trial 0.0005 0.0001 0.0004 0.0007
Next, we turn to the question of what we can report as our results, and what we can conclude from the data.
We are going to back-transform the posterior distributions back to the ms scale from the log scale:
#options(scipen=999, digits=6)
alpha_samples<-posterior_samples(m2_logn,variable="b_Intercept")$b_Intercept
beta_samples<-posterior_samples(m2_logn,variable="b_c_trial")$b_c_trial
## back-transformation:
beta_ms<-exp(alpha_samples+beta_samples)-exp(alpha_samples)
(beta_msmean <- mean(beta_ms))
(beta_mslow <- quantile(beta_ms,prob=0.025))
(beta_mshigh <- quantile(beta_ms,prob=0.975))
beta_mean <- mean(beta_samples)
beta_low <- quantile(beta_samples,prob=0.025)
beta_high <- quantile(beta_samples,prob=0.975)
We can summarize the posterior and do inference as discussed in Example 1.
If we want to talk about the effect estimated by the model, we summarize the posterior of \(\beta\) in the following way:
\(\hat\beta = 0.09\) ms, 95% CrI = \([ 0.07 , 0.11 ]\), \(P(\beta >0) \approx 1\)
You probably do not want to report your posterior up to too many significant digits. Just use common sense.
We can now verify whether our predicted datasets look more similar to the real dataset. See the figure below.
pp_check(m2_logn, nsamples = 100)+
theme(text = element_text(size=16),
legend.text=element_text(size=16))
Posterior predictive check.
m2_logn<-brm(t~1+c_trial,df_spacebar,
prior = priors_log,
iter = 2000, chains = 4,
family = lognormal(),
control = list(adapt_delta = 0.99,
max_treedepth=15))
This is the general workflow that we suggest for a Bayesian model.
brms
or Stan model.