## 10.2 A first simple example with Stan: Normal likelihood

Let’s fit a Stan model to estimate the simple example given at the introduction of this chapter, where we simulate data from a normal distribution with a true mean of 3 and a true standard deviation of 10:

Y <- rnorm(n = 100, mean = 3, sd = 10)

As we said before Stan code is organized in blocks. The first block indicates what constitutes “data” for the model:

data {
int<lower = 1> N;  // Total number of trials
vector[N] y;  // Score in each trial
}

The variable of type int (integer) represents the number of trials. In addition to the type, some constraints can be indicated with lower and upper. In this case, N can’t be smaller than 1. These constraints serve as a sanity check; if they are not satisfied, we get an error and the model won’t run. The data are stored in a vector of length N, unlike R, vectors (and matrices and arrays) need to be defined with their dimensions. Comments are indicated with // rather than #.

The next block indicates the parameters of the model:

parameters {
real mu;
real<lower = 0> sigma;
}

The two parameters are real numbers, and sigma is constrained to be positive.

Finally, we indicate the prior distributions and likelihood functions in the model block:

model {
// Priors:
target += normal_lpdf(mu | 0, 20);
target += lognormal_lpdf(sigma | 3, 1);
// Likelihood:
for(i in 1:N)
target += normal_lpdf(y[i] | mu, sigma);
}

The variable target is a reserved word in Stan; every statement with target += adds terms to the unnormalized log posterior probability. We do this because adding to the unnormalized log posterior means to multiply a term in the numerator of the unnormalized posterior. As we explained before, Stan uses the shape of the unnormalized posterior to sample from the actual posterior distribution. See Box 10.1 for a more detailed explanation.

Box 10.1 What does target do?

We can exemplify how target works with one hypothetical iteration of the sampler.

In every iteration where the sampler explores the posterior space, mu and sigma acquire different values (this is where Stan algorithm stops the movement of the particle in the Hamiltonian space). Say that in an iteration, mu = 2.617 and sigma = 8.004. Then the following happens in the model block:

1. At the beginning of the iteration, target is zero.
2. The transformations that the sampler automatically does are taken into account. In our case, although sigma is constrained to be positive in our model, inside Stan’s sampler it is transformed to an “unconstrained” space amenable to Hamiltonian Monte Carlo. That is, Stan samples from an auxiliary parameter that ranges from minus infinity to infinity, which is equivalent to log(sigma). This auxiliary parameter is then exponentiated, when it is incorporated into our model. Because of the mismatch between the constrained parameter space that we defined and the unconstrained space that its converted to by Stan, an adjustment to the unnormalized posterior is required and added automatically. The reasons for this requirement are somewhat complex and will be discussed in section 12. In this particular case, this adjustment (which is the log absolute value of the Jacobian determinant), is equivalent to adding log(sigma) = 2.08 to target.
3. After target + = normal_lpdf(mu | 0, 20); the log of the density of $$\mathit{Normal}(0,20)$$ is evaluated at a given sample of mu (specifically 2.617) and this is added to target. In R, this would be dnorm(x = 2.617, mean = 0, sd = 20, log = TRUE), which is equal to -3.923. Thus, target should be -3.923 + 2.08 = -1.843.
4. After target += lognormal_lpdf(sigma | 3, 1), we add the log of the density of $$\mathit{LogNormal}(3, 1)$$ evaluated at 8.004 to the previous value of the target. In R, this would be dlnorm(x = 8.004 , mean = 3, sd = 1, log = TRUE), which is equal to -3.422. Thus, target should be updated to -1.843 + -3.422 = -5.265.
5. After each iteration of the for-loop in the model block, we add to the target the log density of $$\mathit{Normal}( 2.617, 8.004)$$ evaluated at each of the values of Y. In R, this would be to add sum(dnorm(Y, 2.617, 8.004, log = TRUE)) (which is equal to -371.322) to the current value of target -5.265 + -371.322 = -376.587.

This means that for the coordinates <mu = 2.617, sigma = 8.004>, the height of the unnormalized posterior would be the value exp(target) = $$\exp( -376.587 ) = 2.82\times 10^{-164}$$. Incidentally, the value of target is returned as lp__ (log probability) in an object storing a fit model with Stan.

It is possible to expose the value of target, by printing target() inside a Stan model. The value of target after each iteration is named lp__ in the Stan object. This can be useful to troubleshoot a problematic model.

We didn’t use curly brackets with the for-loop; this is a common practice if the for-loop has only one line, but brackets can be added and are obligatory if the for-loop spans several lines.

It’s also possible to avoid the for-loop since many functions are vectorized in Stan:

model {
// Priors:
target += normal_lpdf(mu | 0, 20);
target += lognormal_lpdf(sigma | 3, 1);
// Likelihood:
target += normal_lpdf(y | mu, sigma);
}

The for-loop and vectorized versions are giving us the exact same output: The for-loop version evaluated the log-likelihood at each value of y and added it to target. The vectorized version does not create a vector of log-likelihoods, instead it sums the log-likelihood evaluated at each element of y and then it adds that to target.

The complete model that we will fit looks like this:

data {
int<lower = 1> N;  // Total number of trials
vector[N] y;  // Score in each trial
}
parameters {
real mu;
real<lower = 0> sigma;
}
model {
// Priors:
target += normal_lpdf(mu | 0, 20);
target += lognormal_lpdf(sigma | 3, 1);
// Likelihood:
target += normal_lpdf(y | mu, sigma);
}

You can save the above code as normal.stan. Alternatively, you can use the version stored in the package bcogsci. (?stan-normal provides some documentation for the model.) You can access the code of the models of this book by using system.file("stan_models", "name_of_the_model.stan", package = "bcogsci").

normal <- system.file("stan_models",
"normal.stan",
package = "bcogsci")

This command just points to a text file that the package bcogsci stored in your system. You can open it to read the code (with any text editor, or readLines() in R). You’ll need to compile this code and run it with stan().

Stan requires the data to be in a list object in R. Below we fit the model with the default number of chains and iterations.

lst_score_data <- list(y = Y, N = length(Y))
# Fit the model with the default values of number of
# chains and iterations: chains = 4, iter = 2000
fit_score <- stan(
file = normal,
data = lst_score_data
)
# alternatively:
# stan(file = "normal.stan", data = lst_score_data)

We can inspect how well the chains mixed in Figure 10.2. The chains for each parameter should look like a “fat hairy caterpillar” (Lunn et al. 2012); see section 3.1.1.1.2 for a brief discussion of convergence.

traceplot(fit_score, pars = c("mu", "sigma"))

We can see a summary of the posterior by either printing out the model fit, or by plotting it. The summary displayed by the function print includes means, standard deviations (sd), quantiles, Monte Carlo standard errors for the mean of the posterior (se_mean), split Rhats, and effective sample sizes (n_eff). The summaries are computed after removing the warmup and merging together all chains. The se_mean is unrelated to the se of an estimate in the parallel frequentist model. Similarly to a large effective sample size, small Monte Carlo standard errors indicate an “efficient” sampling procedure: with a large value of n_eff and a small value for se_mean we can be relatively sure of the reliability of the mean of the posterior. However, what constitutes a large or small se_mean is harder to define (see Vehtari, Gelman, et al. 2019 for a more extensive discussion).32

print(fit_score, pars = c("mu", "sigma"))
## Inference for Stan model: normal.
## 4 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=4000.
##
##       mean se_mean   sd 2.5%  25%  50%   75% 97.5% n_eff Rhat
## mu    3.02    0.02 0.97 1.13 2.36 3.02  3.65  4.97  3400    1
## sigma 9.73    0.01 0.71 8.45 9.23 9.70 10.19 11.22  3213    1
##
## Samples were drawn using NUTS(diag_e) at Sat Sep 18 11:21:08 2021.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).

After transforming the stanfit object into a data frame, it’s possible to provide summary plots as the one shown in 10.3. The package bayesplot (Gabry and Mahr 2019) is a wrapper around ggplot2 (Wickham, Chang, et al. 2019) and has several convenient functions to plot the samples. Bayesplot functions for posterior summaries start with mcmc_:

df_fit_score <- as.data.frame(fit_score)
mcmc_hist(df_fit_score, pars = c("mu", "sigma"))

There are also several ways to get the samples for other summaries or customized plots, depending on whether we want a list, a data frame, or an array.

# extract from rstan is sometimes overwritten by
# a tidyverse version, we make sure that it's right one:
rstan::extract(fit_score) %>%
str()
## List of 3
##  $mu : num [1:4000(1d)] 2.617 3.146 -0.264 2.094 2.304 ... ## ..- attr(*, "dimnames")=List of 1 ## .. ..$ iterations: NULL
##  $sigma: num [1:4000(1d)] 8 9.25 9.99 9.03 9.59 ... ## ..- attr(*, "dimnames")=List of 1 ## .. ..$ iterations: NULL
##  $lp__ : num [1:4000(1d)] -377 -373 -378 -374 -373 ... ## ..- attr(*, "dimnames")=List of 1 ## .. ..$ iterations: NULL
as.data.frame(fit_score) %>%
str(list.len = 5)
## 'data.frame':    4000 obs. of  3 variables:
##  $mu : num 1.2 1.27 1.78 1.93 2.04 ... ##$ sigma: num  9.37 9.48 9.76 9.61 9.83 ...
##  $lp__ : num -375 -374 -374 -373 -373 ... as.array(fit_score) %>% str() ## num [1:1000, 1:4, 1:3] 1.2 1.27 1.78 1.93 2.04 ... ## - attr(*, "dimnames")=List of 3 ## ..$ iterations: NULL
##   ..$chains : chr [1:4] "chain:1" "chain:2" "chain:3" "chain:4" ## ..$ parameters: chr [1:3] "mu" "sigma" "lp__"

### References

Gabry, Jonah, and Tristan Mahr. 2019. Bayesplot: Plotting for Bayesian Models. https://CRAN.R-project.org/package=bayesplot.

Lunn, David, Chris Jackson, David J Spiegelhalter, Nicky Best, and Andrew Thomas. 2012. The BUGS Book: A Practical Introduction to Bayesian Analysis. Vol. 98. CRC Press.

Vehtari, Aki, Andrew Gelman, Daniel Simpson, Bob Carpenter, and Paul-Christian Bürkner. 2019. “Rank-Normalization, Folding, and Localization: An Improved $$\widehat{R}$$ for Assessing Convergence of Mcmc.”

Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, and Hiroaki Yutani. 2019. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.

1. We simplify the output of print in the text after this call, by actually calling summary(fit, pars = pars, probs = c(0.025, 0.975))\$summary.