Chapter 6 The Art and Science of Prior Elicitation
Nothing strikes fear into the heart of the newcomer to Bayesian methods more than the idea of specifying priors for the parameters in a model. On the face of it, this concern seems like a valid one; how can one know what the plausible parameter values are in a model before one has even seen the data?
In reality, this worry is purely a consequence of the way we are normally taught to carry out data analysis, especially in areas like psychology and linguistics. Model fitting is considered to be a black-box activity, with the primary concern being whether the effect of interest is “significant” or “non-significant.” As a consequence of the training that we receive, we learn to focus on one thing (the \(p\)-value) and we learn to ignore the estimates that we obtain from the model; it becomes irrelevant whether the effect of interest has a mean value of 500 ms (in a reading study, say) or 10 ms; all that matters is whether it is a significant effect or not. In fact, the way many scientists summarize the literature in their field is by classifying studies into two bins: significant and non-significant. There are obvious problems with this classification method; for example, \(p=0.051\) might be counted as “marginally” significant, but \(p=0.049\) is never counted as marginally non-significant. Real-life examples of such a binary classification approach are Phillips, Wagers, and Lau (2011) and Hammerly, Staub, and Dillon (2019). Because the focus is on significance, we never develop a sense of what the estimates of an effect are likely to be in a future study. This is why, when faced with a prior-distribution specification problem, we are misled into feeling like we know nothing about the quantitative estimates relating to a problem we are studying.
Prior specification has a lot in common with something that physicists call a Fermi problem. As Von Baeyer (1988) describes it: “A Fermi problem has a characteristic profile: Upon first hearing it, one doesn’t have even the remotest notion what the answer might be. And one feels certain that too little information exists to find a solution. Yet, when the problem is broken down into subproblems, each one answerable without the help of experts or reference books, an estimate can be made ”. Fermi problems in the physics context are situations where one needs ballpark (approximate) estimates of physical quantities in order to proceed with a calculation. The name comes from a physicist, Enrico Fermi; he developed the ability to carry out fairly accurate back-of-the-envelope calculations when working out approximate numerical values needed for a particular computation. Von Baeyer (1988) puts it well: “Prudent physicists—those who want to avoid false leads and dead ends—operate according to a long-standing principle: Never start a lengthy calculation until you know the range of values within which the answer is likely to fall (and, equally important, the range within which the answer is unlikely to fall).” As in physics, so in data analysis: as Bayesians, we need to acquire the ability to work out plausible ranges of values for parameters. This is a learnable skill, and improves with practice. With time and practice, we can learn to emulate prudent physicists.
As Spiegelhalter, Abrams, and Myles (2004) point out, there is no one ``correct’’ prior distribution. One consequence of this fact is that a good Bayesian analysis always takes a range of prior specifications into account; this is called a sensitivity analysis. We have already seen examples of this, but more examples will be provided in this and later chapters.
Prior specification requires the estimation of probabilities. Human beings are not good at estimating probabilities, because they are susceptible to several kinds of biases (Kadane and Wolfson 1998; Spiegelhalter, Abrams, and Myles 2004). We list the most important ones that are relevant to cognitive science applications:
- Availability bias: Events that are more salient to the researcher are given higher probability, and events that are less salient are given lower probability.
- Adjustment and anchoring bias: One’s initial assessment of the probability of an event can influence one’s subsequent judgements, e.g., of uncertainty intervals—one’s estimate of the uncertainty interval will tend to be influenced by one’s initial assessment.
- Overconfidence: When eliciting uncertainty intervals from oneself, there is a tendency to specify too tight an interval.
- Hindsight bias: If one relies on the data to come up with a prior for the analysis of that very same data set, one’s assessment is likely to be biased.
Although training can improve the natural tendency to be biased in these different ways, one must recognize that bias is inevitable when eliciting priors, either from oneself or from other experts; it follows that one should always define ``a community of priors’’ (Kass and Greenhouse 1989): one should consider the effect of informed as well as skeptical or agnostic (uninformative) priors on the posterior distribution of interest. Incidentally, bias is not unique to Bayesian statistics; the same problems arise in frequentist data analysis. Even in frequentist analyses, the researcher always interprets the data in the light of their prior beliefs; the data never really “speak for themselves.” The great advantage that Bayesian methods have is that they allow us to formally take a range of (competing) prior beliefs formally into account in interpreting the data. We illustrate this point in the present chapter with some examples.
Hammerly, Christopher, Adrian Staub, and Brian Dillon. 2019. “The Grammaticality Asymmetry in Agreement Attraction Reflects Response Bias: Experimental and Modeling Evidence.” Cognitive Psychology 110: 70–104.
Kadane, Joseph, and Lara J Wolfson. 1998. “Experiences in Elicitation: [Read Before the Royal Statistical Society at a Meeting on’Elicitation ‘on Wednesday, April 16th, 1997, the President, Professor Afm Smith in the Chair].” Journal of the Royal Statistical Society: Series D (the Statistician) 47 (1). Wiley Online Library: 3–19.
Kass, Robert E, and Joel B Greenhouse. 1989. “[Investigating Therapies of Potentially Great Benefit: ECMO]: Comment: A Bayesian Perspective.” Statistical Science 4 (4). JSTOR: 310–17.
Phillips, Colin, Matthew W. Wagers, and Ellen F. Lau. 2011. “Grammatical Illusions and Selective Fallibility in Real-Time Language Comprehension.” In Experiments at the Interfaces, 37:147–80. Emerald Bingley, UK.
Spiegelhalter, David J, Keith R Abrams, and Jonathan P Myles. 2004. Bayesian Approaches to Clinical Trials and Health-Care Evaluation. Vol. 13. John Wiley & Sons.
Von Baeyer, Hans Christian. 1988. “How Fermi Would Have Fixed It.” The Sciences 28 (5). Blackwell Publishing Ltd Oxford, UK: 2–4.