48 Confidence Intervals
$$
$$
In this book so far, we have focused mostly on point estimation, where the goal is to produce a single estimate of a parameter of interest, ignoring the uncertainty in this estimate. In this chapter, we will discuss how to report a range of plausible values of a parameter, capturing the uncertainty in our estimate. This range of values is called a confidence interval and has a fundamental connection with the hypothesis tests from Chapter 47.
48.1 The \(z\)-interval
Recall Example 47.1, where the quality engineers take daily samples of \(n=5\) ball bearings from a production line. One day, the diameters of the ball bearings (in mm) are measured to be \[ X_1 = 10.06, X_2 = 10.07, X_3 = 9.98, X_4 = 10.02, X_5 = 10.09. \] It is assumed that \(X_1, \dots, X_5\) are i.i.d. \(\text{Normal}(\mu, \sigma^2 = 0.03^2)\).
In Example 47.1, we tested whether the production line was producing ball bearings that met the specification of \(\mu = 10\) mm. Alternatively, we can estimate \(\mu\), the mean diameter of ball bearings currently being produced by the line.
We know from Example 30.3 that the maximum likelihood estimate of \(\mu\) is \[ \bar X = 10.044. \]
This is a point estimate of \(\mu\). We can quantify the uncertainty by reporting a 95% confidence interval. To do so, first observe that \[ Z = \frac{\bar X - \mu}{\sqrt{\sigma^2 / n}} \tag{48.1}\] is standard normal, so by definition, \[ P(\Phi^{-1}(0.025) < Z < \Phi^{-1}(0.975)) = 0.95. \] This is illustrated in Figure 48.1. (Note that \(\Phi^{-1}(0.975) = -\Phi^{-1}(0.025) \approx 1.96 \approx 2\).)
Substituting Equation 48.1 for \(Z\), we can rearrange the inequalities so that \(\mu\) is in the middle: \[ \begin{align} &P\left(\Phi^{-1}(0.025) \leq \frac{\bar X - \mu}{\sqrt{\sigma^2 / n}} \leq \Phi^{-1}(0.975)\right) \\ &= P\left(\Phi^{-1}(0.025) \sqrt{\frac{\sigma^2}{n}} \leq \bar X - \mu \leq \Phi^{-1}(0.975)\sqrt{\frac{\sigma^2}{n}} \right) \\ &= P\left(\bar X - \Phi^{-1}(0.025)\sqrt{\frac{\sigma^2}{n}} \geq \mu \geq \bar X - \Phi^{-1}(0.975)\sqrt{\frac{\sigma^2}{n}}\right). \end{align} \]
This leads to the conclusion that the random interval \[ \left[\bar X - \Phi^{-1}(0.975) \sqrt{\frac{\sigma^2}{n}}, \bar X - \Phi^{-1}(0.025) \sqrt{\frac{\sigma^2}{n}}\right] \] has a 95% probability of containing \(\mu\). This is called a 95% confidence interval for \(\mu\).
For the ball bearings data, a 95% confidence interval for \(\mu\) is \[ \left[10.044 - 1.96 \sqrt{\frac{0.03^2}{5}}, 10.044 + 1.96 \sqrt{\frac{0.03^2}{5}}\right] = [10.0177, 10.0703]. \tag{48.2}\] We say that we are 95% confident that \(\mu\) is between \(10.0177\) mm and \(10.0703\) mm.
We cannot say for certain whether \(\mu\) is between these two numbers or not, but we know that an interval constructed in this way will contain \(\mu\) 95% of the time. We can illustrate this via simulation. For the simulation, we have to assume a particular value of \(\mu\).
About 95% of these intervals contain \(\mu = 10.02\) in this simulation. In practice, we only ever observe one of these intervals, and we have no way of knowing whether \(\mu\) is in that interval or not. However, we hope that our interval is one of 95% of intervals that do contain \(\mu\), as opposed to the 5% that do not.
The next proposition summarizes the results of this section, generalizing them to confidence levels other than 95%.
In the rest of this chapter, we will focus on the case where \(\alpha = 0.05\) (95% confidence intervals), although it is straightforward to generalize the results to other values of \(\alpha\).
48.2 Duality of Confidence Intervals and Hypothesis Tests
The 95% confidence interval that we constructed in Equation 48.2 did not contain \(\mu = 10.00\), which agrees with our decision in Example 47.1 to reject the null hypothesis that \(\mu = 10.00\). This is no coincidence. A 95% confidence interval will contain exactly the values of \(\mu\) that are not rejected by the corresponding hypothesis test.
Proposition 48.2 says that we can “invert” a hypothesis test to obtain a confidence interval and vice versa. If we invert the \(z\)-test from Section 47.1, then we obtain the \(z\)-interval above.
48.3 The \(t\)-interval
In the examples so far, we assumed the variance \(\sigma^2\) was known. What if it is not known?
In Section 47.2, we saw that for hypothesis testing, we can perform a \(t\)-test instead of a \(z\)-test. In the \(t\)-test, we replace \(\sigma^2\) by the sample variance \(S^2\). This introduces additional uncertainty, so instead of comparing to a standard normal distribution, we compare to a \(t\)-distribution.
We can use Proposition 48.2 to invert the \(t\)-test to obtain a confidence interval when \(\sigma^2\) is unknown. The result is called a \(t\)-interval.
Armed with the \(t\)-interval, we can calculate a 95% confidence interval for the average human body temperature.
In the examples that we have encountered so far, we started with a function of the data and the parameter, \[ g(\vec X; \theta), \tag{48.3}\] whose distribution is known and does not depend on \(\theta\). This is called a pivot (or pivotal quantity). In the examples above, the pivots were
- \(Z = \frac{\bar X - \mu}{\sqrt{\frac{\sigma^2}{n}}}\) follows a standard normal distribution.
- \(T = \frac{\bar X - \mu}{\sqrt{\frac{S^2}{n}}}\) follows a \(t\)-distribution with \(n-1\) degrees of freedom.
To obtain a confidence interval for \(\theta\), we used the quantiles of the pivot’s known distribution and rearranged the inequality.
48.4 Asymptotic Confidence Intervals
If \(X_1, \dots, X_n\) are not i.i.d. normal, then it is not easy to obtain a pivot. In these situations, it may not be possible to obtain a confidence interval with exactly 95% probability of covering \(\mu\). However, we can still obtain an interval with approximate 95% coverage when \(n\) is large, thanks to the Central Limit Theorem.
These asymptotic confidence intervals are known as Wald intervals. We now return to the skew die example that we encountered at the beginning of our exploration of statistical inference in Chapter 29 and construct a Wald interval for the probability \(p\) of rolling a six.
The Wald interval is only guaranteed to have 95% coverage as \(n\to\infty\). How good is the interval for \(n=25\)? We can answer this question by simulation. To do this, we have to assume a particular value of \(p\).
Shockingly, a 95% Wald interval only covers \(p = 0.12\) about 81% of the time. Notice in particular that when \(\hat p = 0\), the Wald interval is \([0, 0]\), which seems awfully pessimistic. This example illustrates the dangers of relying on asymptotic results.
We can obtain an interval with better coverage by returning to first principles (Proposition 48.2). We start by deriving a test of \(H_0: p = p_0\). Then, we invert this test to obtain a confidence interval.
This is called the Wilson interval for the binomial proportion. Comparing it with the Wald interval from Example 48.3, we see that:
- the Wilson interval is scaled by \(\frac{1}{1 + \frac{\Phi^{-1}(0.975)^2}{n}}\),
- the Wilson interval is centered around \(\hat p + \frac{\Phi^{-1}(0.975)^2}{2n}\) instead of \(\hat p\), and
- the Wilson interval estimates \(\text{Var}\!\left[ \hat p \right]\) by \(\frac{\hat p (1 - \hat p)}{n} + \frac{\Phi^{-1}(0.975)^2}{4n^2}\) instead of \(\frac{\hat p (1 - \hat p)}{n}\).
All of these adjustments become negligible as \(n\to\infty\), which makes sense because the Wald interval has asymptotic 95% coverage. Nevertheless, these small adjustments improve the coverage dramatically for finite \(n\), as the simulation below demonstrates.
Whereas the coverage of the Wald interval was 81%, the coverage of the Wilson interval is close to 95%! Furthermore, unlike the Wald interval, the Wilson interval gives a sensible answer when \(\hat p = 0\): \[ \left[0, \frac{\Phi^{-1}(0.975)^2}{n + \Phi^{-1}(0.975)^2}\right], \] which results in the interval \([0, 0.133]\) for the example above.
The Wilson interval is still asymptotic; it relies on the Central Limit Theorem and is only guaranteed to have 95% coverage as \(n \to \infty\). However, because it was derived by inverting a hypothesis test that used the exact variance \(\sigma^2 = p_0 (1 - p_0)\) instead of an estimate, it tends to perform much better than the Wald interval for smaller values of \(n\).
48.5 Exercises
Exercise 48.1 (Confidence interval for an exponential mean) Let \(X_1, \dots, X_n\) be i.i.d. \(\text{Exponential}(\lambda)\). Consider forming a confidence interval for \(\mu \overset{\text{def}}{=}1/\lambda\).
- Form a 95% Wald interval for \(\mu\).
- Form a 95% confidence interval for \(\mu\) by deriving a test of \(H_0: \mu = \mu_0\) and inverting the test.
- Compare the coverage of the two intervals using simulation.