2.4 The confidence interval, and what it’s good for
Once we have sample of data \(y\), and once the sample mean \(\bar{y}\) and the \(SE = s/\sqrt{n}\) have been computed, it is common to define what is called a 95% confidence interval:
\[\begin{equation} \bar{y} \pm 2 SE \end{equation}\]
Because the sampling distribution of means is normally distributed, and because 95% of the area under the curve is covered by two times the standard deviation of the normal distribution,
the upper and lower bounds of the interval defined by the interval \(\bar{y} \pm 2 SE\) covers approximately 95% of the area under the curve in the sampling distribution.
This interval is usually computed after estimating the sample mean and standard error from a data-set, and is called the confidence interval (CI). It has the following meaning: If you take samples repeatedly and compute the CI each time, 95% of those CIs will contain the true population mean \(\mu\). To understand this point, one can simulate this situation. This time we will do 1000 repeated experiments instead of 100.
<- 500
mu <- 100
sigma <- 1000
n <- 1000
nsim <- rep(NA, nsim)
lower <- rep(NA, nsim)
upper for (i in 1:nsim) {
<- rnorm(n, mean = mu, sd = sigma)
y <- mean(y) - 2 * sd(y) / sqrt(n)
lower[i] <- mean(y) + 2 * sd(y) / sqrt(n)
upper[i]
}
## check how many CIs contain mu:
<- ifelse(lower < mu & upper > mu, 1, 0)
CIs ## approx. 95% of the CIs contain true mean:
round(mean(CIs), 2)
## [1] 0.95
Figure 2.6 visualizes the coverage properties of the confidence interval in 100 simulations; by coverage we mean here the proportion of cases where the true \(\mu\) is contained in the CI.
2.4.1 Confidence interals are often misinterpreted
The confidence interval is widely misinterpreted, i.e., as representing the range of plausible value of the \(\mu\) parameter. This is the wrong interpretation because \(\mu\) is a point value by assumption, it doesn’t have a pdf associated with it. The frequentist CI is defined with reference to the sampling distribution of the mean under repeated sampling, not the probability distribution of \(\mu\). By contrast, the Bayesian credible interval does have this interpretation. In most modeling settings that the authors have encountered in their work, the frequentist confidence interval and Bayesian credible interval have very similar widths, with the Bayesian interval being slightly wider depending on the prior specifications. But these similarities in the intervals do not change the fact that they have different meanings.
Given the convoluted meaning of the CI, and the impossibility of interpreting a single CI, it is reasonable to ask: what good is a CI? One can treat the CI as a summary that tells us the width of the sampling distribution of the mean—the wider the sampling distribution, the more the implied variability under repeated sampling. The confidence interval can therefore be used to assess how uncertain we can be about the estimate of the sample mean under hypothetical repeated sampling. See Cumming (2014) for a useful perspective relating to using confidence intervals for inference. As discussed later in the book, we will use the CI to informally assess uncertainty.
We turn next to the central ideas behind the hypothesis test. We begin with the humble one-sample t-test, which contains many subtleties and is well worth close study before we move on to the main topic of this book: linear mixed models.