Introduction to Bayesian data analysis for cognitive science (textbook)





Video lectures by

Shravan Vasishth

Online version and source code for book

The free online version of the book is available here. You can access the source code for the book from here. You can buy a physical copy of the book too.

Overview and motivation for this course

In recent years, Bayesian methods have come to be widely adopted in all areas of science. This is in large part due to the development of sophisticated software for probabilisic programming; a recent example is the astonishing computing capability afforded by the language Stan (mc-stan.org).
However, the underlying theory needed to use this software sensibly is often inaccessible because end-users don't necessarily have the statistical and mathematical background to read the primary textbooks (such as Gelman et al's classic Bayesian data analysis, 3rd edition).
In this book and the accompanying course, we seek to cover this gap, by providing a relatively accessible and technically non-demanding introduction to the basic workflow for fitting different kinds of linear (mixed) models using Stan. To illustrate the capability of Bayesian modeling, we will use the R package RStan and a powerful front-end R package for Stan called brms.

MOOC on OpenHPI

You can do a self-paced version of this course (chapters 1-4), with auto-graded exercises, here. Some 5000 people have signed up for this course.

Is this course taught in person?

We regularly teach parts of this course in person at:
  1. The annual Statistical Methods for Linguistics and Psychology (SMLP), Potsdam, Germany. This is an annual week-long summer school, held usually just before the AMLaP conference. Chapters 1-5 and 13 are covered in Intro Bayes, and the rest of the chapters in the Advanced Bayes track.
  2. The annual Methods in Linguistic Sciences (MILS) summer school, Gent, Belgium (in July usually). In this course, I cover chapters 1-5, and 13.
  3. The University of Potsdam, Germany. This is a two-semester course that covers the entire book (except the last chapter).
  4. The Indian Institute of Technology, Kanpur. Himanshu Yadav, also teaches this material, in India.

Prerequisites

You must have a functioning computer to do this course.
We also assume familiarity with R. Participants will benefit most if they have previously fit linear models and linear mixed models (using lme4) in R, in any scientific domain within linguistics and psychology. No knowledge of calculus or linear algebra is assumed (but will be helpful to know), but basic school level mathematics knowledge is assumed.
Finally, to follow this course, it is important to be familiar with the basics of R Markdown.

Please install the following software before watching the videos

We will be using the software R, and RStudio or Visual Studio Code, so make sure you install these on your computer. You should also install the R package rstan; the R package brms, and all the packages mentioned in the introduction to the book. Please follow the installation instructions carefully.
Install the library bcogsci from here.

Outcomes

After completing this course, the participant will have become familiar with the foundations of Bayesian inference using brms, and will be able to fit a range of multiple regression models and hierarchical models, for normally distributed data, and for lognormal and binomially distributed data. They will know how to calibrate their models using prior and posterior predictive checks.

  1. Lecture on chapter 1:
    1. Video: Part 1/2, Part 2/2
    2. Lecture notes: PDF, html, Source code (Rmd).
  2. Lecture on chapter 2:
    1. Video: coming soon
    2. Lecture notes: PDF, html, Source code (Rmd).

Solutions to exercises

Solutions to exercises are not publicly available; they will only be provided to participants.

Tutorial articles on specific topics

Here are some notes and articles that provide background reading, and cover some additional topics that we will skip:
  1. Why teaching statistics with a cookbook approach doesn't work (video lecture coming soon).
  2. Mathematical foundations: Video lectures, enroll here for online self-paced course.
  3. Linear modeling: Lecture notes.
  4. Deciding on sample sizes from a Bayesian perspective: Sample size determination for Bayesian hierarchical models commonly used in psycholinguistics. Shravan Vasishth, Himanshu Yadav, Daniel Schad, and Bruno Nicenboim. Computational Brain and Behavior, 2022.
  5. brms tutorial: brms tutorial by the author of the package, Paul Buerkner.
  6. Ordinal regression models in psychological research: A tutorial, by Buerkner and Vuorre.
  7. Bayesian workflow tutorial: Bayesian workflow tutorial, by Schad, Betancourt, Vasishth.
  8. Intro to Bayes for phonetics: brms tutorial for phonetics/phonology, Vasishth, Nicenboim, Beckman, Li, Kong.
  9. Developing a reproducible workflow: Lecture notes
  10. Michael Betancourt's resources: These are a must if you want to get deeper into Stan and Bayesian modeling.
  11. MCMC animations/visualizations,McElreath's blog post on MCMC

Some example articles that use Bayesian methods from our lab

A frequently asked question is: how to summarize the results of a Bayesian analysis? Here are some examples of articles we have published using Bayesian data analysis. Our presentation of results is continuously evolving; there is no fixed answer to the question, how should I display my results? Use your judgement. The most important thing you can do to facilitate understanding of your work is to be open and transparent about your analyses. This means releasing all data and code with the published paper (see the reproducible workflows lecture above). For other papers that use Bayesian methods, see my post-2012 publications.
  1. Example random-effects meta-analysis.
  2. Example of finite mixture models using Stan.
  3. Example of a multinomial processing tree modeling approach.
  4. Replication attempt of a published study.
  5. Example from psycholinguistics of model comparison using k-fold cross-validation.