This book is intended to be a relatively gentle introduction to carrying out Bayesian data analysis and cognitive modeling using the probabilistic programming language Stan (Carpenter et al. 2017), and the front-end to Stan called brms (Bürkner 2019). Our target audience is cognitive scientists (e.g., linguists and psychologists) who carry out planned behavioral experiments, and who are interested in learning the Bayesian data analysis methodology from the ground up and in a principled manner. Our aim is to make Bayesian statistics a standard part of the data analysis toolkit for experimental linguistics, psycholinguistics, psychology, and related disciplines.

Many excellent introductory textbooks already exist for Bayesian data analysis. Why write yet another book? Our text is different from other attempts in two respects. First, our main focus is on showing how to analyze data from planned experiments involving repeated measures; this type of experimental data involves unique complexities. We provide many examples of data sets involving time measurements (e.g., self-paced reading, eye-tracking-while-reading, voice onset time), event-related potentials, pupil sizes, accuracies (e.g., recall tasks, yes-no questions), categorical answers (e.g., picture naming), choice-reaction time (e.g, Stroop task, motion detection task), etc. Second, from the very outset, we stress a particular workflow that has as its centerpiece simulating data; we aim to teach a philosophy that involves thinking hard about the assumed underlying generative process, even before the data are collected. The data analysis approach that we hope to teach through this book involves a cycle of prior predictive and posterior predictive checks, and model validation using simulated data. We try to inculcate a sense of how inferences can be drawn from the posterior distribution of theoretically interesting parameters without resorting to binary decisions like “significant” or “not-significant”. We are hopeful that this will set a new standard for reporting and interpreting results of data analyses in a more nuanced manner, and lead to more measured claims in the published literature.

Please report typos, errors, or suggestions for improvement at

Why read this book, and what is its target audience?

A commonly-held belief in psychology, psycholinguistics, and other areas is that statistical data analysis is secondary to the science, and should be quick and easy. For example, a senior mathematical psychologist once told the last author of this book: ``if you need to run anything more complicated than a paired t-test, you are asking the wrong question.’’ The most colorful version of this sentiment was expressed by a former editor-in-chief of the Journal of Memory and Language. The gist of the tweet was that statistical analysis should be like going to the toilet, and as a scientist, one should not be expected to invest too much time into studying statistics. If one really believes that statistics should be like going to the toilet—quick and dirty—then one should not be surprised if the end-result turns out to be crap.

The target audience for this book is students and researchers who want to treat statistics as an equal partner in their scientific work. We expect that the reader is willing to take the time to both understand and to run the computational analyses.

Any rigorous introduction to Bayesian data analysis requires at least a passive knowledge of probability theory, calculus, and linear algebra. However, do not require that the reader has this background when they start the book. Instead, the relevant ideas are introduced informally and just in time, as soon as they are needed. The reader is never required to have an active ability to solve probability problems, to solve integrals or compute derivatives, or to carry out relatively complex matrix computations (such as inverting matrices) by hand.

What we do expect is familiarity with arithmetic, basic set theory and elementary probability theory (e.g., sum and product rules, conditional probability), simple matrix operations like addition and multiplication, and simple algebraic operations. A quick look through chapter 1 of Gill (2006) before starting this book is highly recommended. We also presuppose that, when the need arises, the reader is willing to look up concepts that that they might have forgotten (e.g., logarithms).

We also expect that the reader already knows and/or is willing to learn enough of the programming language R (R Core Team 2019) to reproduce the examples presented and to carry out the exercises. If the reader is completely unfamiliar with R, before starting this book they should first consult books like R for data science, and Efficient R programming.

We also assume that the reader has encountered simple linear modeling, and linear mixed models (Bates, Mächler, et al. 2015a; Baayen, Davidson, and Bates 2008). What this means in practice is that the reader should have used the lm() and lmer() functions in R. A passing acquaintance with basic statistical concepts, like the correlation between two variables, is also taken for granted.

This book is not appropriate for complete beginners to data analysis. Newcomers to data analysis should start with a freely available textbook like Kerns (2014), and then read our introduction to frequentist data analysis, which is also available freely online (Vasishth et al. 2021). This latter book will prepare the reader well for the material presented here.

Developing the right mindset for this book

One very important characteristic that the reader should bring to this book is a can-do spirit. There will be many places where the going will get tough, and the reader will have to play around with the material, or refresh their understanding of arithmetic or middle-school algebra. The basic principles of such a can-do spirit are nicely summarized in the book by Burger and Starbird (2012); also see Levy (2021). Although we cannot summarize the insights from these books in a few words, inspired by the Burger and Starbird (2012) book, here is a short enumeration of the kind of mindset the reader will need to cultivate:

  • Spend time on the basic, apparently easy material; make sure you understand it deeply. Look for gaps in your understanding. Reading different presentations of the same material (in different books or articles) can yield new insights.
  • Let mistakes and errors be your teacher. We instinctively recoil from our mistakes, but errors are ultimately our friends; they have the potential to teach us more than our correct answers can. In this sense, a correct solution can be less interesting than an incorrect one.
  • When you are intimidated by some exercise or problem, give up and admit defeat immediately. This relaxes the mind; you’ve already given up, there’s nothing more to do. Then, after a while, try to solve a simpler version of the problem. Sometimes, it is useful to break the problem down to smaller parts, each of which may be easier to solve.
  • Create your own questions. Don’t wait to be asked questions; develop your own problems and then try to solve them.
  • Don’t expect to understand everything in the first pass. Just mentally note the gaps in your understanding, and return to them later and work on these gaps.
  • Step back periodically to try to sketch out a broader picture of what you are learning. Writing down what you know, without looking up anything, is one helpful way to achieve this. Don’t wait for the teacher to give you bullet-point summaries of what you should have learned; develop such summaries yourself.
  • Develop the art of finding information. When confronted with something you don’t know, or with some obscure error message, use google to find some answers.

As instructors, we have noticed over the years that students with such a mindset generally do very well. Some students already have that spirit, but others need to explicitly develop it. We firmly believe that everyone can develop such a mindset; but one may have to work on acquiring it.

In any case, such an attitude is absolutely necessary for a book of this sort.

How to read this book

The chapters in this book are intended to be read in sequence, but during the first pass through the book, the reader should feel free to completely skip the boxes. These boxes provide a more formal development (useful to transition to more advanced textbooks like Gelman et al. 2014), or deal with tangential aspects of the topics presented in the chapter.

Here are some suggested paths through this book, depending on the reader’s goals:

  • For a short course for complete beginners, read chapters 1 to 5. We usually cover these five chapters in a five-day summer school course that we teach annually. Most of the material in this chapter is also covered in a free four-week course available online:
  • For a course that focuses on regression models with the R package brms, read chapters 1 to 9 and, optionally, 15.
  • For an advanced course that focuses on complex models involving Stan, read chapters 10 to 20.

Some conventions used in this book

We adopt the following conventions:

  • All distribution names are lower-case unless they are also a proper name (e.g., Poisson, Bernoulli).
  • The univariate normal distribution is parameterized by the mean and standard deviation (not variance).
  • The code for figures is provided only in some cases, where we consider it to be pedagogically useful. In other cases, the code remains hidden, but it can be found in the web version of the book. Notice that all the R code from the book can be extracted from the Rmd source files for each chapter, which are released with the book.

Online materials

The entire book, including all data and source code, is available online for free on The solutions to exercises will be made available on request.

Software needed

Before you start, please install

  • R and RStudio, or any other Integrated Development Environment that you prefer, such as Visual Studio Code and Emacs Speaks Statistics.
  • The R package rstan. At the time of writing this book, the CRAN version of rstan lags behind the latest developments in Stan so it is recommended to install rstan from as indicated in
  • The R packages dplyr, purrr, tidyr, extraDistr, brms, hypr and lme4 are used in many chapters of the book and can be installed the usual way: install.packages(c("dplyr","purrr","tidyr", "extraDistr", "brms","hypr","lme4")).
  • The following R packages are optional: tictoc, rootSolve, SHELF, cmdstanr, and SBC.
  • Some packages, such as intoo, barsurf, bivariate, SIN, and rethinking could require manual installation from archived or github versions.
  • The data and Stan models used in this book can be installed using remotes::install_github("bnicenboim/bcogsci"). This command uses the function install_github from the package remotes. (Thus this package should be in the system as well.)

In every R session, load these packages, and set the options shown below for Stan.


We are grateful to the many generations of students at the University of Potsdam, various summer schools at ESSLLI, the LOT winter school, other short courses we have taught at various institutions, and the annual summer school on Statistical Methods for Linguistics and Psychology (SMLP) held annually at Potsdam, Germany. The participants in these courses helped us considerably in improving the material presented here. A special thanks to Anna Laurinavichyute, Paula Lissón, and Himanshu Yadav for co-teaching the the Bayesian courses at SMLP. We are also grateful to members of Vasishth lab, especially Dorothea Pregla, for comments on earlier drafts of this book. We would also like to thank Christian Robert (otherwise known as Xi’an), Robin Ryder, Nicolas Chopin, Michael Betancourt, Andrew Gelman, and the Stan developers (especially Bob Carpenter and Paul-Christian Bürkner) for their advice; to Pavel Logačev for his feedback, and Athanassios Protopapas, Patricia Mirabile, Masataka Ogawa, Alex Swiderski, Andrew Ellis, Jakub Szewczyk, Chi Hou Pau, Alec Shaw, Patrick Wen, Riccardo Fusaroli, Abdulrahman Dallak, Elizabeth Pankratz, Jean-Pierre Haeberly, Chris Hammill, Florian Wickelmaier, Ole Seeth, Jules Bouton, Siqi Zheng, Michael Gaunt, Benjamin Senst, Chris Moreh, Richard Hatcher, and Noelia Stetie for catching typos, unclear passages, and errors in the book. Thanks also go to Jeremy Oakley and other statisticians at the School of Mathematics and Statistics, University of Sheffield, UK, for helpful discussions, and ideas for exercises that were inspired from the MSc program taught online at Sheffield.

This book would have been impossible to write without the following software: R (Version 4.2.2; R Core Team 2019) and the R-packages afex (Singmann et al. 2020), barsurf (Version 0.7.0; Spurdle 2020a), bayesplot (Version 1.9.0; Gabry and Mahr 2019), bcogsci (Version; Nicenboim, Schad, and Vasishth 2020), bibtex (Version 0.5.0; Francois 2017), bivariate (Version 0.7.0; Spurdle 2020b), bookdown (Version 0.28; Xie 2019a), bridgesampling (Version 1.1.2; Gronau, Singmann, and Wagenmakers 2020), brms (Version 2.17.0; Bürkner 2019), citr (Aust 2019), cmdstanr (Version 0.5.3; Gabry and Češnovar 2021), cowplot (Version 1.1.1; Wilke 2020), digest (Version 0.6.31; Antoine Lucas et al. 2021), dplyr (Version 1.1.0; Wickham, François, et al. 2019), DT (Version 0.24; Xie, Cheng, and Tan 2019), extraDistr (Version 1.9.1; Wolodzko 2019), forcats (Version 1.0.0; Wickham 2019a), gdtools (Gohel et al. 2019), ggplot2 (Version 3.4.0; Wickham, Chang, et al. 2019), gridExtra (Version 2.3; Auguie 2017), htmlwidgets (Version 1.5.4; Vaidyanathan et al. 2018), hypr (Version 0.2.3; Schad et al. 2019; Rabe, Vasishth, Hohenstein, Kliegl, and Schad 2020a), intoo (Version 0.4.0; Spurdle and Bode 2020), kableExtra (Version 1.3.4; Zhu 2019), knitr (Version 1.42; Xie 2019b), lme4 (Version 1.1.31; Bates, Mächler, et al. 2015b), loo (Version 2.5.1; Vehtari, Gelman, and Gabry 2017a; Yao et al. 2017), MASS (Version; Ripley 2019), Matrix (Version 1.5.3; Bates and Maechler 2019), miniUI (Version; Cheng 2018), papaja (Version 0.1.1; Aust and Barth 2020), pdftools (Version 3.3.2; Ooms 2021), purrr (Version 1.0.1; Henry and Wickham 2019), Rcpp (Version 1.0.10; Eddelbuettel et al. 2019), readr (Version 2.1.4; Wickham, Hester, and Francois 2018), RefManageR (Version 1.4.0; McLean 2017), remotes (Version 2.4.2; Hester et al. 2021), rethinking (Version 2.21; McElreath 2021), rmarkdown (Version 2.20; Allaire et al. 2019), rootSolve (Version; Soetaert and Herman 2009), rstan (Version 2.26.13; Guo, Gabry, and Goodrich 2019), SBC (Version; Kim et al. 2022), servr (Version 0.24; Xie 2019c), SHELF (Version 1.8.0; Oakley 2021), SIN (Version 0.6; Drton 2013), StanHeaders (Version 2.26.13; Goodrich et al. 2019), stringr (Version 1.5.0; Wickham 2019b), texPreview (Sidi and Polhamus 2020), tibble (Version 3.1.8; Müller and Wickham 2020), tictoc (Version 1.0.1; Izrailev 2014), tidyr (Version 1.2.1; Wickham and Henry 2019), tidyverse (Version 1.3.2; Wickham, Averick, et al. 2019), tinylabels (Version 0.2.3; Barth 2022), and webshot (Version 0.5.3; Chang 2018).

Bruno Nicenboim (Tilburg, The Netherlands), Daniel Schad (Potsdam, Germany), Shravan Vasishth (Potsdam, Germany)


Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2019. rmarkdown: Dynamic Documents for R.

Antoine Lucas, Dirk Eddelbuettel with contributions by, Jarek Tuszynski, Henrik Bengtsson, Simon Urbanek, Mario Frasca, Bryan Lewis, Murray Stokely, et al. 2021. Digest: Create Compact Hash Digests of R Objects.

Auguie, Baptiste. 2017. GridExtra: Miscellaneous Functions for "Grid" Graphics.

Aust, Frederik. 2019. citr: RStudio Add-in to Insert Markdown Citations.

Aust, Frederik, and Marius Barth. 2020. papaja: Create APA Manuscripts with R Markdown.

Baayen, R Harald, Douglas J Davidson, and Douglas M Bates. 2008. “Mixed-Effects Modeling with Crossed Random Effects for Subjects and Items.” Journal of Memory and Language 59 (4). Elsevier: 390–412.

Barth, Marius. 2022. tinylabels: Lightweight Variable Labels.

Bates, Douglas M, and Martin Maechler. 2019. Matrix: Sparse and Dense Matrix Classes and Methods.

Bates, Douglas M, Martin Mächler, Ben Bolker, and Steve Walker. 2015a. “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software 67 (1): 1–48.

Bates, Douglas M, Martin Mächler, Ben Bolker, and Steve Walker. 2015b. “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software 67 (1): 1–48.

Burger, Edward B, and Michael Starbird. 2012. The 5 Elements of Effective Thinking. Princeton University Press.

Bürkner, Paul-Christian. 2019. brms: Bayesian Regression Models Using “Stan”.

Carpenter, Bob, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael J. Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan: A Probabilistic Programming Language.” Journal of Statistical Software 76 (1). Columbia Univ., New York, NY (United States); Harvard Univ., Cambridge, MA (United States).

Chang, Winston. 2018. webshot: Take Screenshots of Web Pages.

Cheng, Joe. 2018. miniUI: Shiny Ui Widgets for Small Screens.

Drton, Mathias. 2013. SIN: A Sinful Approach to Selection of Gaussian Graphical Markov Models.

Eddelbuettel, Dirk, Romain Francois, JJ Allaire, Kevin Ushey, Qiang Kou, Nathan Russell, Douglas M Bates, and John Chambers. 2019. Rcpp: Seamless R and C++ Integration.

Francois, Romain. 2017. Bibtex: Bibtex Parser.

Gabry, Jonah, and Rok Češnovar. 2021. cmdstanr: R Interface to “CmdStan”.

Gabry, Jonah, and Tristan Mahr. 2019. bayesplot: Plotting for Bayesian Models.

Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2014. Bayesian Data Analysis. Third Edition. Boca Raton, FL: Chapman; Hall/CRC Press.

Gill, Jeff. 2006. Essential Mathematics for Political and Social Research. Cambridge University Press Cambridge.

Gohel, David, Hadley Wickham, Lionel Henry, and Jeroen Ooms. 2019. gdtools: Utilities for Graphical Rendering.

Goodrich, Ben, Andrew Gelman, Bob Carpenter, Matt Hoffman, Daniel Lee, Michael Betancourt, Marcus Brubaker, et al. 2019. StanHeaders: C++ Header Files for Stan.

Gronau, Quentin F., Henrik Singmann, and Eric-Jan Wagenmakers. 2017. “Bridgesampling: An R Package for Estimating Normalizing Constants.” Arxiv.

2020. “bridgesampling: An R Package for Estimating Normalizing Constants.” Journal of Statistical Software 92 (10): 1–29.

Guo, Jiqiang, Jonah Gabry, and Ben Goodrich. 2019. rstan: R Interface to Stan.

Henry, Lionel, and Hadley Wickham. 2019. purrr: Functional Programming Tools.

Hester, Jim, Gábor Csárdi, Hadley Wickham, Winston Chang, Martin Morgan, and Dan Tenenbaum. 2021. Remotes: R Package Installation from Remote Repositories, Including ’Github’.

Izrailev, Sergei. 2014. Tictoc: Functions for Timing R Scripts, as Well as Implementations of Stack and List Structures.

Kerns, G.J. 2014. Introduction to Probability and Statistics Using R. Second Edition.

Kim, Shinyoung, Hyunji Moon, Martin Modrák, and Teemu Säilynoja. 2022. SBC: Simulation Based Calibration for Rstan/Cmdstanr Models.

Levy, Dan. 2021. Maxims for Thinking Analytically: The Wisdom of Legendary Harvard Professor Richard Zeckhauser. Dan Levy.

McElreath, Richard. 2021. Rethinking: Statistical Rethinking Book Package.

McLean, Mathew William. 2017. “RefManageR: Import and Manage Bibtex and Biblatex References in R.” The Journal of Open Source Software.

Müller, Kirill, and Hadley Wickham. 2020. Tibble: Simple Data Frames.

Nicenboim, Bruno, Daniel J. Schad, and Shravan Vasishth. 2020. bcogsci: Data and Models for the Book “an Introduction to Bayesian Data Analysis for Cognitive Science”.

Oakley, Jeremy. 2021. SHELF: Tools to Support the Sheffield Elicitation Framework.

Ooms, Jeroen. 2021. pdftools: Text Extraction, Rendering and Converting of Pdf Documents.

Rabe, Maximilian M., Shravan Vasishth, Sven Hohenstein, Reinhold Kliegl, and Daniel J. Schad. 2020a. “Hypr: An R Package for Hypothesis-Driven Contrast Coding.” The Journal of Open Source Software.

R Core Team. 2019. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.

Ripley, Brian. 2019. MASS: Support Functions and Datasets for Venables and Ripley’s Mass.

Schad, Daniel J., Shravan Vasishth, Sven Hohenstein, and Reinhold Kliegl. 2019. “How to Capitalize on a Priori Contrasts in Linear (Mixed) Models: A Tutorial.” Journal of Memory and Language 110.

Sidi, Jonathan, and Daniel Polhamus. 2020. TexPreview: Compile and Preview Snippets of “Latex”.

Singmann, Henrik, Ben Bolker, Jake Westfall, Frederik Aust, and Mattan S. Ben-Shachar. 2020. Afex: Analysis of Factorial Experiments.

Soetaert, Karline, and Peter M.J. Herman. 2009. A Practical Guide to Ecological Modelling. Using R as a Simulation Platform. Springer.

Spurdle, Abby. 2020a. Barsurf: Heatmap-Related Plots and Smooth Multiband Color Interpolation.

Spurdle, Abby. 2020b. Bivariate: Bivariate Probability Distributions.

Spurdle, Abby, and Emil Bode. 2020. Intoo: Minimal Language-Like Extensions.

Vaidyanathan, Ramnath, Yihui Xie, JJ Allaire, Joe Cheng, and Kenton Russell. 2018. htmlwidgets: HTML Widgets for R.

Vasishth, Shravan, Daniel J. Schad, Audrey Bürki, and Reinhold Kliegl. 2021. Linear Mixed Models for Linguistics and Psychology: A Comprehensive Introduction. CRC Press.

Vehtari, Aki, Andrew Gelman, and Jonah Gabry. 2017a. “Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and Waic.” Statistics and Computing 27 (5): 1413–32.

Wickham, Hadley. 2019a. forcats: Tools for Working with Categorical Variables (Factors).

Wickham, Hadley. 2019b. stringr: Simple, Consistent Wrappers for Common String Operations.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686.

Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, and Hiroaki Yutani. 2019. ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics.

Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2019. dplyr: A Grammar of Data Manipulation.

Wickham, Hadley, and Lionel Henry. 2019. Tidyr: Tidy Messy Data.

Wickham, Hadley, Jim Hester, and Romain Francois. 2018. readr: Read Rectangular Text Data.

Wilke, Claus O. 2020. cowplot: Streamlined Plot Theme and Plot Annotations for ’Ggplot2’.

Wolodzko, Tymoteusz. 2019. extraDistr: Additional Univariate and Multivariate Distributions.

Xie, Yihui. 2019a. bookdown: Authoring Books and Technical Documents with R Markdown.

Xie, Yihui. 2019b. knitr: A General-Purpose Package for Dynamic Report Generation in R.

Xie, Yihui. 2019c. servr: A Simple Http Server to Serve Static Files or Dynamic Documents.

Xie, Yihui, Joe Cheng, and Xianying Tan. 2019. DT: A Wrapper of the Javascript Library ’Datatables’.

Yao, Yuling, Aki Vehtari, Daniel Simpson, and Andrew Gelman. 2017. “Using Stacking to Average Bayesian Predictive Distributions.” Bayesian Analysis.

Zhu, Hao. 2019. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax.