Introduction to Bayesian data analysis (SMLP 2019)


Instructor

Shravan Vasishth

Dates and location

Taught at SMLP.
Every year in September. Haus 6, Griebnitzsee campus, University of Potsdam.

Overview

In recent years, Bayesian methods have come to be widely adopted in all areas of science. This is in large part due to the development of sophisticated software for probabilisic programming; a recent example is the astonishing computing capability afforded by the language Stan (mc-stan.org). However, the underlying theory needed to use this software sensibly is often inaccessible because end-users don't necessarily have the statistical and mathematical background to read the primary textbooks (such as Gelman et al's classic Bayesian data analysis, 3rd edition). In this course, we seek to cover this gap, by providing a relatively accessible and technically non-demanding introduction to the basic workflow for fitting different kinds of linear models using Stan. To illustrate the capability of Bayesian modeling, we will use the R package RStan and a powerful front-end R package for Stan called brms.

Prerequisites

We assume familiarity with R. Participants will benefit most if they have previously fit linear models and linear mixed models (using lme4) in R, in any scientific domain within linguistics and psychology. No knowledge of calculus or linear algebra is assumed (but will be helpful to know), but basic school level mathematics knowledge is assumed (this will be quickly revisited in class).

Please install the following software before coming to the course

We will be using the software R, and RStudio, so make sure you install these on your computer. You should also install the R package rstan; the R package brms.

Outcomes

After completing this course, the participant will have become familiar with the foundations of Bayesian inference using Stan (RStan and brms), and will be able to fit a range of multiple regression models and hierarchical models, for normally distributed data, and for lognormal and Binomially distributed data. They will know how to calibrate their models using prior and posterior predictive checks; they will be able to establish true and false discovery rates to validate discovery claims. If there is time, we will discuss how to carry out model comparison using Bayes factors and k-fold cross validation.

Course materials

Click here to download everything. If you use github, you can clone this repository: https://github.com/vasishth/IntroductionBayes
Solutions to exercises are not publicly available; they will only be provided to participants.
Draft textbook: See here. PDF version available on request.
slides and exercises:
part 1
  1. 00 Frequentist Foundations (optional review)
  2. 01 Foundations
  3. 02 Introduction to Bayesian methods
  4. 02 Sampling
part 2
  1. 03 Linear Modeling
  2. 04 Hierarchical Linear Models
  3. 05 Model Comparison using Bayes Factors
case studies: Three case studies (zip archive): meta-analysis, measurement error models, and example of pre-registration.

Tentative schedule

Depending on the class, I may go faster or slower, so I may not adhere to this exact schedule.
  1. Monday: Foundations of Bayesian inference
    Probability theory and Bayes' rule, Probability distributions, Understanding and eliciting priors, Analytical Bayes: Beta-Binomial, Poisson-Gamma, Normal-Normal
  2. Tuesday: Linear models
    Basic theory of linear modeling. Generating prior predictive distributions using RStan and R, Fake-data simulation for model evaluation, Sampling methods will be skipped in class but please read the lecture notes later which cover: Inverse sampling, Gibbs sampling, Random Walk Metropolis, Hamiltonian Monte Carlo.
  3. Wednesday: Hierarchical linear models
    HLMs using RStan and brms, fake-data generation, true and false discovery rate, logistics mixed effects models, individual differences, shrinkage.
  4. Thursday: HLMs continued, exercises
    Here we will get some hands-on experience with real life problems.
  5. Friday: keynote lectures
    Please see the SMLP schedule.


Additional readings

R programming
  1. Getting started with R
  2. R for data science
  3. Efficient R programming.
Books
  1. A Student's Guide to Bayesian Statistics, by Ben Lambert: A good, non-technical introduction to Stan and Bayesian modeling.
  2. Statistical Rethinking, by Richard McElreath: A classic introduction.
  3. Doing Bayesian Data Analysis, Second Edition: A Tutorial with R, JAGS, and Stan, By John Kruschke: A good introduction specifically for psychologists.
Tutorial articles
  1. brms tutorial by the author of the package, Paul Buerkner.
  2. Ordinal regression models in psychological research: A tutorial, by Buerkner and Vuorre.
  3. Contrast coding tutorial, by Schad, Hohenstein, Vasishth, Kliegl.
  4. Bayesian workflow tutorial, by Schad, Betancourt, Vasishth.
  5. Linear mixed models tutorial, Sorensen, Hohenstein, Vasishth.
  6. brms tutorial for phonetics/phonology, Vasishth, Nicenboim, Beckman, Li, Kong.
  7. Michael Betancourt's resources: These are a must if you want to get deeper into Stan and Bayesian modeling.
  8. MCMC animations/visualizations,McElreath's blog post on MCMC
Some example articles from our lab and other groups that use Bayesian methods
  1. Example random-effects meta-analysis.
  2. Example of finite mixture models using Stan.
  3. Replication attempt of a published study.
  4. Bayesian analysis of relatively large-sample psycholinguistic experiment.
  5. Examples of regression analyses by Vehtari and colleagues