## 10.2 A first simple example with Stan: Normal likelihood

Let’s fit a Stan model to estimate the simple example given at the introduction of this chapter, where we simulate data from a normal distribution with a true mean of 3 and a true standard deviation of 10:

As we said before Stan code is organized in blocks. The first block indicates what constitutes “data” for the model:

```
data {
int<lower = 1> N; // Total number of trials
vector[N] y; // Score in each trial
}
```

The variable of type `int`

(integer) represents the number of trials. In addition to the type, some constraints can be indicated with `lower`

and `upper`

. In this case, `N`

can’t be smaller than `1`

. These constraints serve as a sanity check; if they are not satisfied, we get an error and the model won’t run. The data are stored in a vector of length `N`

, unlike R, vectors (and matrices and arrays) need to be defined with their dimensions. Comments are indicated with `//`

rather than `#`

.

The next block indicates the parameters of the model:

```
parameters {
real mu;
real<lower = 0> sigma;
}
```

The two parameters are real numbers, and `sigma`

is constrained to be positive.

Finally, we indicate the prior distributions and likelihood functions in the model block:

```
model {
// Priors:
target += normal_lpdf(mu | 0, 20);
target += lognormal_lpdf(sigma | 3, 1);
// Likelihood:
for(i in 1:N)
target += normal_lpdf(y[i] | mu, sigma);
}
```

The variable `target`

is a reserved word in Stan; every statement with `target +=`

adds terms to the unnormalized *log* posterior probability. We do this because adding to the unnormalized log posterior means to multiply a term in the numerator of the unnormalized posterior. As we explained before, Stan uses the shape of the unnormalized posterior to sample from the actual posterior distribution. See Box 10.1 for a more detailed explanation.

**Box 10.1**

**What does**

`target`

do?We can exemplify how `target`

works with one hypothetical iteration of the sampler.

In every iteration where the sampler explores the posterior space, `mu`

and `sigma`

acquire different values (this is where Stan algorithm stops the movement of the particle in the Hamiltonian space). Say that in an iteration, `mu = 2.617`

and `sigma = 8.004`

. Then the following happens in the model block:

- At the beginning of the iteration,
`target`

is zero. - The transformations that the sampler
*automatically*does are taken into account. In our case, although`sigma`

is constrained to be positive in our model, inside Stan’s sampler it is transformed to an “unconstrained” space amenable to Hamiltonian Monte Carlo. That is, Stan samples from an auxiliary parameter that ranges from minus infinity to infinity, which is equivalent to`log(sigma)`

. This auxiliary parameter is then exponentiated, when it is incorporated into our model. Because of the mismatch between the constrained parameter space that we defined and the unconstrained space that its converted to by Stan, an adjustment to the unnormalized posterior is required and added*automatically*. The reasons for this requirement are somewhat complex and will be discussed in section 12. In this particular case, this adjustment (which is the log absolute value of the Jacobian determinant), is equivalent to adding`log(sigma) = 2.08`

to`target`

. - After
`target + = normal_lpdf(mu | 0, 20);`

the log of the density of \(\mathit{Normal}(0,20)\) is evaluated at a given sample of mu (specifically 2.617) and this is added to`target`

. In R, this would be`dnorm(x = 2.617, mean = 0, sd = 20, log = TRUE)`

, which is equal to`-3.923`

. Thus,`target`

should be`-3.923 + 2.08 = -1.843`

. - After
`target += lognormal_lpdf(sigma | 3, 1)`

, we add the log of the density of \(\mathit{LogNormal}(3, 1)\) evaluated at`8.004`

to the previous value of the target. In R, this would be`dlnorm(x = 8.004 , mean = 3, sd = 1, log = TRUE)`

, which is equal to`-3.422`

. Thus,`target`

should be updated to`-1.843 + -3.422 = -5.265`

. - After each iteration of the for-loop in the model block, we add to the target the log density of \(\mathit{Normal}( 2.617, 8.004)\) evaluated at each of the values of Y. In R, this would be to add
`sum(dnorm(Y, 2.617, 8.004, log = TRUE))`

(which is equal to`-371.322`

) to the current value of`target`

`-5.265 + -371.322 = -376.587`

.

This means that for the coordinates <mu = 2.617, sigma = 8.004>, the height of the unnormalized posterior would be the value `exp(target) =`

\(\exp( -376.587 ) = 2.82\times 10^{-164}\). Incidentally, the value of `target`

is returned as `lp__`

(log probability) in an object storing a fit model with Stan.

It is possible to expose the value of `target`

, by printing `target()`

inside a Stan model. The value of `target`

after each iteration is named `lp__`

in the Stan object. This can be useful to troubleshoot a problematic model.

We didn’t use curly brackets with the for-loop; this is a common practice if the for-loop has only one line, but brackets can be added and are obligatory if the for-loop spans several lines.

It’s also possible to avoid the for-loop since many functions are vectorized in Stan:

```
model {
// Priors:
target += normal_lpdf(mu | 0, 20);
target += lognormal_lpdf(sigma | 3, 1);
// Likelihood:
target += normal_lpdf(y | mu, sigma);
}
```

The for-loop and vectorized versions are giving us the exact same output: The for-loop version evaluated the log-likelihood at each value of `y`

and added it to `target`

. The vectorized version does not create a vector of log-likelihoods, instead it sums the log-likelihood evaluated at each element of `y`

and then it adds that to `target`

.

The complete model that we will fit looks like this:

```
data {
int<lower = 1> N; // Total number of trials
vector[N] y; // Score in each trial
}
parameters {
real mu;
real<lower = 0> sigma;
}
model {
// Priors:
target += normal_lpdf(mu | 0, 20);
target += lognormal_lpdf(sigma | 3, 1);
// Likelihood:
target += normal_lpdf(y | mu, sigma);
}
```

You can save the above code as `normal.stan`

. Alternatively, you can use the version stored in the package `bcogsci`

. (`?stan-normal`

provides some documentation for the model.) You can access the code of the models of this book by using `system.file("stan_models", "name_of_the_model.stan", package = "bcogsci")`

.

This command just points to a text file that the package `bcogsci`

stored in your system. You can open it to read the code (with any text editor, or `readLines()`

in R). You’ll need to compile this code and run it with `stan()`

.

Stan requires the data to be in a list object in R. Below we fit the model with the default number of chains and iterations.

```
lst_score_data <- list(y = Y, N = length(Y))
# Fit the model with the default values of number of
# chains and iterations: chains = 4, iter = 2000
fit_score <- stan(
file = normal,
data = lst_score_data
)
# alternatively:
# stan(file = "normal.stan", data = lst_score_data)
```

We can inspect how well the chains mixed in Figure 10.2. The chains for each parameter should look like a “fat hairy caterpillar” (Lunn et al. 2012); see section 3.1.1.1.2 for a brief discussion of convergence.

We can see a summary of the posterior by either printing out the model fit, or by plotting it. The summary displayed by the function `print`

includes means, standard deviations (`sd`

), quantiles, Monte Carlo standard errors for the mean of the posterior (`se_mean`

), split Rhats, and effective sample sizes (`n_eff`

). The summaries are computed after removing the warmup and merging together all chains. The `se_mean`

is unrelated to the `se`

of an estimate in the parallel frequentist model. Similarly to a large effective sample size, small Monte Carlo standard errors indicate an “efficient” sampling procedure: with a large value of `n_eff`

and a small value for `se_mean`

we can be relatively sure of the reliability of the mean of the posterior. However, what constitutes a large or small `se_mean`

is harder to define (see Vehtari, Gelman, et al. 2019 for a more extensive discussion).^{32}

```
## Inference for Stan model: normal.
## 4 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=4000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## mu 3.02 0.02 0.97 1.13 2.36 3.02 3.65 4.97 3400 1
## sigma 9.73 0.01 0.71 8.45 9.23 9.70 10.19 11.22 3213 1
##
## Samples were drawn using NUTS(diag_e) at Sat Sep 18 11:21:08 2021.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
```

After transforming the `stanfit`

object into a data frame, it’s possible to provide summary plots as the one shown in 10.3. The package `bayesplot`

(Gabry and Mahr 2019) is a wrapper around `ggplot2`

(Wickham, Chang, et al. 2019) and has several convenient functions to plot the samples. Bayesplot functions for posterior summaries start with `mcmc_`

:

There are also several ways to get the samples for other summaries or customized plots, depending on whether we want a list, a data frame, or an array.

```
# extract from rstan is sometimes overwritten by
# a tidyverse version, we make sure that it's right one:
rstan::extract(fit_score) %>%
str()
```

```
## List of 3
## $ mu : num [1:4000(1d)] 2.617 3.146 -0.264 2.094 2.304 ...
## ..- attr(*, "dimnames")=List of 1
## .. ..$ iterations: NULL
## $ sigma: num [1:4000(1d)] 8 9.25 9.99 9.03 9.59 ...
## ..- attr(*, "dimnames")=List of 1
## .. ..$ iterations: NULL
## $ lp__ : num [1:4000(1d)] -377 -373 -378 -374 -373 ...
## ..- attr(*, "dimnames")=List of 1
## .. ..$ iterations: NULL
```

```
## 'data.frame': 4000 obs. of 3 variables:
## $ mu : num 1.2 1.27 1.78 1.93 2.04 ...
## $ sigma: num 9.37 9.48 9.76 9.61 9.83 ...
## $ lp__ : num -375 -374 -374 -373 -373 ...
```

```
## num [1:1000, 1:4, 1:3] 1.2 1.27 1.78 1.93 2.04 ...
## - attr(*, "dimnames")=List of 3
## ..$ iterations: NULL
## ..$ chains : chr [1:4] "chain:1" "chain:2" "chain:3" "chain:4"
## ..$ parameters: chr [1:3] "mu" "sigma" "lp__"
```

### References

Gabry, Jonah, and Tristan Mahr. 2019. *Bayesplot: Plotting for Bayesian Models*. https://CRAN.R-project.org/package=bayesplot.

Lunn, David, Chris Jackson, David J Spiegelhalter, Nicky Best, and Andrew Thomas. 2012. *The BUGS Book: A Practical Introduction to Bayesian Analysis*. Vol. 98. CRC Press.

Vehtari, Aki, Andrew Gelman, Daniel Simpson, Bob Carpenter, and Paul-Christian Bürkner. 2019. “Rank-Normalization, Folding, and Localization: An Improved \(\widehat{R}\) for Assessing Convergence of Mcmc.”

Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, and Hiroaki Yutani. 2019. *Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics*. https://CRAN.R-project.org/package=ggplot2.

We simplify the output of

`print`

in the text after this call, by actually calling`summary(fit, pars = pars, probs = c(0.025, 0.975))$summary`

.↩