Introduction to Bayesian data analysis (home page)


Instructors

Bruno Nicenboim Shravan Vasishth

Dates and location

March 2020, taught online.

Overview

In recent years, Bayesian methods have come to be widely adopted in all areas of science. This is in large part due to the development of sophisticated software for probabilisic programming; a recent example is the astonishing computing capability afforded by the language Stan (mc-stan.org). However, the underlying theory needed to use this software sensibly is often inaccessible because end-users don't necessarily have the statistical and mathematical background to read the primary textbooks (such as Gelman et al's classic Bayesian data analysis, 3rd edition). In this course, we seek to cover this gap, by providing a relatively accessible and technically non-demanding introduction to the basic workflow for fitting different kinds of linear models using Stan. To illustrate the capability of Bayesian modeling, we will use the R package RStan and a powerful front-end R package for Stan called brms.

Prerequisites

We assume familiarity with R. Participants will benefit most if they have previously fit linear models and linear mixed models (using lme4) in R, in any scientific domain within linguistics and psychology. No knowledge of calculus or linear algebra is assumed (but will be helpful to know), but basic school level mathematics knowledge is assumed (this will be quickly revisited in class).

Please install the following software before coming to the course

We will be using the software R, and RStudio, so make sure you install these on your computer. You should also install the R package rstan; the R package brms.

Outcomes

After completing this course, the participant will have become familiar with the foundations of Bayesian inference using Stan (RStan and brms), and will be able to fit a range of multiple regression models and hierarchical models, for normally distributed data, and for lognormal and Binomially distributed data. They will know how to calibrate their models using prior and posterior predictive checks; they will be able to establish true and false discovery rates to validate discovery claims. If there is time, we will discuss how to carry out model comparison using Bayes factors and k-fold cross validation.

Online interaction

We will use google groups and zoom. A link to the private group will be sent to participants.

Course materials

Click here to download everything. If you use github, you can clone this repository: https://github.com/vasishth/IntroductionBDA

Textbook (in progress): See here. PDF version available on request.

Part 1 (Monday-Tuesday): Shravan Vasishth
The lectures correspond roughly to chapters 1 and 2 of our textbook in preparation
  1. Monday:
    1. Introductory video
    2. PDF: 00 Frequentist Foundations (review of some basic ideas)
      Exercises: 00 Frequentist Foundations Exercises
    3. PDF: 01 Foundations
      Exercises Part 1: 01 Foundations Exercises Part 1
      Exercises Part 2: 01 Foundations Exercises Part 2
  2. Tuesday:
    1. PDF: 02 Introduction to Bayesian methods
      Exercises: 02 Introduction to Bayesian methods Exercises
    2. PDF: 02 Sampling
      02 Sampling, Additional Notes
Part 2 (Wednesday-Friday): Bruno Nicenboim
For this part of the workshop besides rstan and brms, be sure to have the following packages installed (and loaded in your session):
MASS, dplyr, tidyr, purrr, readr, extraDistr, ggplot2, brms, bayesplot, tictoc, gridExtra The lectures correspond roughly to chapters 3, 4 and 5 of our textbook in preparation

  1. Wednesday - 03 Computational Bayesian data analysis
    Slides and exercises
    Stan slides
    Part 1

    Part 2

    Part 3

    A brief intro to Stan
  2. Thursday - 04 - Bayesian regression models
    Slides and exercises
    More exercises
    Part 1 (Linear model)

    Part 2 (Log-normal regression)

    Part 3 (Logistic regression)
  3. Friday
    05 - Bayesian hierarchical models
    Slides
    Exercises

    06 - Model comparison with Bayes factor
    Slides
    Exercises
Case studies:
Three case studies (zip archive): meta-analysis, measurement error models, and an example of pre-registration.

Tentative schedule

Depending on the class, we may go faster or slower, so I may not adhere to this exact schedule.

Additional readings

R programming
  1. Getting started with R
  2. R for data science
  3. Efficient R programming.
Books
  1. A Student's Guide to Bayesian Statistics, by Ben Lambert: A good, non-technical introduction to Stan and Bayesian modeling.
  2. Statistical Rethinking, by Richard McElreath: A classic introduction.
  3. Doing Bayesian Data Analysis, Second Edition: A Tutorial with R, JAGS, and Stan, By John Kruschke: A good introduction specifically for psychologists.
Tutorial articles and materials
  1. brms tutorial by the author of the package, Paul Buerkner.
  2. Ordinal regression models in psychological research: A tutorial, by Buerkner and Vuorre.
  3. Contrast coding tutorial, by Schad, Hohenstein, Vasishth, Kliegl.
  4. Bayesian workflow tutorial, by Schad, Betancourt, Vasishth.
  5. Linear mixed models tutorial, Sorensen, Hohenstein, Vasishth.
  6. brms tutorial for phonetics/phonology, Vasishth, Nicenboim, Beckman, Li, Kong.
  7. Reproducible workflows tutorial
  8. Michael Betancourt's resources: These are a must if you want to get deeper into Stan and Bayesian modeling.
  9. MCMC animations/visualizations,McElreath's blog post on MCMC
Some example articles from our lab and other groups that use Bayesian methods
  1. Example random-effects meta-analysis (phonetics data on neutralization).
  2. A second example of a large-scale study and a random-effects meta-analysis (EEG data)
  3. A third example of a large-scale study and a random-effects meta-analysis (reading data)
  4. Example of a hierarchical finite mixture model using Stan.
  5. Replication attempt of a published study.
  6. Another (large-sample) replication attempt of a published study.
  7. Bayesian analysis of relatively large-sample psycholinguistic experiment.
  8. Examples of regression analyses by Vehtari and colleagues