36  Central Limit Theorem

In this section, we will be looking at \(X_1, \dots, X_n\) i.i.d. with \(\text{E}\!\left[ X_i \right] = \mu\) and \(\text{Var}\!\left[ X_i \right] = \sigma^2\).

In Example 32.1 and Example 32.2, we saw that as \(n\) (the sample size) became very large, \[ \bar{X} = \frac{X_1 + \cdots + X_n}{n} \] becomes concentrated at the mean. This phenomenon happens to be true for \(X_1, \dots, X_n\) i.i.d. no matter the distribution of \(X_i\). This was proved in Theorem 32.1.

For values of \(n\) between 10 and 50, \(\bar{X}\) takes on the normal shape before “collapsing” onto the mean. What happens when we look at the distribution \(X_1 + \cdots + X_n\)?

36.1 Sample sum and normalization

Consider the following sample sums from exponential and Poisson distributions.

Example 36.1 (Sample sum of exponential) Let \(X_1, \dots, X_n\) be i.i.d. \(\text{Exponential}(\lambda)\). We run \(N\) simulations where we take \(n\) samples \(X_1, \dots, X_n\); we then plot the results.

Try changing the value of \(n\) to 10, 100, 1000, and 10000.

Example 36.2 (Sample sum of Poisson) Let \(X_1, \dots, X_n\) be i.i.d. \(\text{Poisson}(\lambda)\). We run \(N\) simulations where we take \(n\) samples \(X_1, \dots, X_n\); we then plot the results.

Try changing the value of \(n\) to 10, 100, 1000, and 10000.

In both Example 36.1 and Example 36.2, as we increase \(n\), we observe that the sample sum seems to take on a normal shape (despite a different starting shape). We also note that the bell curves shift right and the spread grows larger as well.

However, this is expected as \[ \text{E}\!\left[ X_1 + \cdots + X_n \right] = n \text{E}\!\left[ X_1 \right] \qquad \text{and} \qquad \text{Var}\!\left[ X_1 + \cdots + X_n \right] = n \text{Var}\!\left[ X_1 \right]. \] As \(n\) grows larger, both the expected value and variance grows as well.

In fact, this gives an insight into why the Weak Law of Large Numbers (Theorem 32.1) holds: since \[ \text{Var}\!\left[ \bar{X} \right] = \frac{\text{Var}\!\left[ X_1 \right]}{n}, \] \(\text{Var}\!\left[ \bar{X} \right]\) diminishes to \(0\) and \(n \to \infty\), which is the concentrating phenomenon!

So, how can we “preserve” the normal shape? We want the mean to “stay” and the variance to stay constant, rather than increase or decrease. We can tackle each issue separately.

We can center the mean at \(0\) by subtracting the expected value; i.e., \(X_i - \mu\) has mean 0, so \[ \sum_{i=1}^n (X_i - \mu) = \sum_{i=1}^n X_i - n \mu \] will also have mean 0. Now, we do not have to worry about the mean moving to the right as \(n\) gets larger. A nice bonus is that the variance is unchanged: \[ \text{Var}\!\left[ \sum_{i=1}^n (X_i - \mu) \right] = \text{Var}\!\left[ \sum_{i=1}^n X_i - n \mu \right] = \text{Var}\!\left[ \sum_{i=1}^n X_i \right] = n \sigma^2. \] Now, if we divide by \(\sqrt{n \sigma^2} = \sqrt{n} \sigma\), then the resulting random variable will have variance \(1\). We are now considering \[ \frac{\displaystyle \sum_{i=1}^n (X_i - \mu)}{\sqrt{n} \sigma} \] which has mean \(0\) and variance \(1\). This is called the normalized sum of \(X_i\)s. Let us revisit the exponential and Poisson distributions with the normalized sums.

Example 36.3 (Normalized sum of exponential) Let \(X_1, \dots, X_n\) be i.i.d. \(\text{Exponential}(\lambda)\). We run \(N\) simulations where we take \(n\) samples \(X_1, \dots, X_n\); we then plot the normalized sum.

Try changing the value of \(n\) to 10, 100, 1000, and 10000.

Example 36.4 (Normalized sum of Poisson) Let \(X_1, \dots, X_n\) be i.i.d. \(\text{Poisson}(\lambda)\). We run \(N\) simulations where we take \(n\) samples \(X_1, \dots, X_n\); we then plot the results.

Try changing the value of \(n\) to 10, 100, 1000, and 10000.

The plots in Example 36.3 and Example 36.4 look like the standard normal as \(n\) gets larger! Why? We will prove it in the next section.

36.2 Central Limit Theorem

We can show that the above phenomena are not coincidences. In fact, no matter what the distribution is, if \(X_1, \dots, X_n\) are i.i.d, then the normalized sum is guaranteed to converge to the standard normal! To prove it, we need the following definition and theorem.

Definition 36.1 Let \(X_1, \dots, X_n, \dots\) be a sequence of random variables with \(X\) another random variable. \(X_n\) is said to converge in distribution to \(X\), denoted \(X_n \stackrel{d}{\to} X\) if \[ F_{X_n}(t) \to F_X(t) \] for all \(t\) at which \(F_{X_n}\) is continuous, as \(n \to \infty\).

Theorem 36.1 (Curtiss’s theorem) Let \(X_1, X_2, \dots\) be a sequence of random variables with respective CDFs \(F_{X_n}\) and MGFs \(M_{X_n}\). Let \(X\) be a random variable with CDF \(F_X\) and MGF \(M_X\). If \[ M_{X_n}(t) \to M_X(t) \] for all \(t\), then \[ F_{X_n}(t) \to F_X(t) \] for all \(t\) at which \(F_X(t)\) is continuous. In other words, \(X_n \stackrel{d}{\to} X\).

Now we are ready to prove the main result.

Theorem 36.2 (Central Limit Theorem) Let \(X_1, \dots, X_n\) be i.i.d. with mean \(\mu\) and variance \(\sigma^2\). Then, \[ \frac{X_1 + \cdots + X_n - \mu n}{\sigma \sqrt{n}} \] converges to the standard normal in distribution.

Proof

Let us first prove the result for the case \(\mu = 0\) and \(\sigma = 1\). The MGF of \(X_i/\sqrt{n}\) is \[ M_{X_i/\sqrt{n}}(t) = \text{E}\!\left[ \exp\left\{ t \frac{X_i}{\sqrt{n}} \right\} \right] = M_{X_i} \left( \frac{t}{\sqrt{n}} \right). \] Then, the MGF of the normalized sum \((X_1 + \cdots X_n)/\sqrt{n}\) is \[ M(t) = \prod_{i=1}^n M_{X_i/\sqrt{n}}(t) = \left( M_{X_1}\left( \frac{t}{\sqrt{n}} \right) \right)^n. \] Define \(L(t) = \log M_{X_1}(t)\). Our goal is to show that \[ M(t) = \left( M_{X_1} \left( \frac{t}{\sqrt{n}} \right) \right)^n \to e^{t^2/2} \] for all \(t\), which is equivalent to showing that \[ \log M(t) = n \log M_{X_1}\left( \frac{t}{\sqrt{n}} \right) \to \frac{t^2}{2} \] for all \(t\). We now note that \[\begin{align*} L(0) &= \log M(0) = \log 1 = 0. \\ L'(0) &= \frac{M'(0)}{M(0)} = \frac{\mu}{1} = \mu = 0. \\ L''(0) &= \frac{M(0)M''(0) - (M'(0))^2}{(M(0))^2} = \frac{\text{Var}\!\left[ X_1 \right]}{1} = \sigma^2 = 1. \end{align*}\] Then, \[\begin{align*} \lim_{n \to \infty} n L\left( \frac{t}{\sqrt{n}} \right) &= \lim_{n \to \infty} \frac{L(t/\sqrt{n})}{1/n} \qquad \qquad \text{(L'Hopital)} \\ &= \lim_{n \to \infty} \frac{-L'(t/\sqrt{n}) n^{-3/2} t}{-2n^{-2}} \\ &= \lim_{n \to \infty} \frac{L'(t/\sqrt{n}) t}{2/\sqrt{n}} \qquad \qquad \text{(L'Hopital again)} \\ &= \lim_{n \to \infty} \frac{-L''(t/\sqrt{n}) n^{-3/2} t^2}{-2n^{-3/2}} \\ &= \lim_{n \to \infty} L''\left( \frac{t}{\sqrt{n}} \right) \frac{t^2}{2} \\ &= \frac{t^2}{2}. \end{align*}\] Thus, the central limit theorem is proved for the \(\mu = 0\) and \(\sigma^2 = 1\) case.

The general case follows by, for \(X_1, \dots, X_n\) i.i.d. with \(\text{E}\!\left[ X_i \right] = \mu\) and \(\text{Var}\!\left[ X_i \right] = \sigma^2\), considering \[ Y_i = \frac{X_i - \mu}{\sigma}. \] Since \(Y_1, \dots, Y_n\) are i.i.d. with mean \(0\) and variance \(1\), the above result shows that \[ \frac{Y_1 + \cdots Y_n}{\sqrt{n}} = \frac{1}{\sqrt{n}} \sum_{i=1}^n \frac{X_i - \mu}{\sigma} = \frac{X_1 + \cdots + X_n - n\mu}{\sqrt{n} \sigma} \] converges to the standard normal in distribution.

36.3 Normal Approximation

Now that we have the central limit theorem, we can use it to approximate certain probabilities.

Example 36.5 (Poisson approximation) The number of students who enroll in Stats 118 is a Poisson random variable with mean 90. The department has decided that if the enrollment is 80 or more, there will be two lectures, whereas if fewer than 80 students enroll, there will be one lecture. What is the probability that there will be two lectures?

Let \(X \sim \text{Poisson}(80)\). We actually know the exact probability \[ P(X \geq 80) = \sum_{k=80}^\infty e^{-90} \frac{90^k}{k!} \approx 0.86682. \] However, this sum is not easily computable, even with a calculator. Let us use the central limit theorem.

Recall that we can view \[ X = Y_1 + \cdots + Y_{90}, \] where \(Y_i \sim \text{Poisson}(1)\) are independent. Then, by Theorem 36.2, \[\begin{align*} P(X \geq 80) &\approx P(X \geq 79.5) \qquad \qquad \text{(continuity correction)} \\ &= P\left( \frac{X - 90}{\sqrt{90}} \geq \frac{79.5 - 90}{\sqrt{90}} \right) \\ &\approx P(Z \geq -1.1068) \qquad \qquad \text{(CLT)} \\ &= 1 - \Phi(-1.1068) \\ &\approx 0.86581. \end{align*}\]

Example 36.6 (Coin flipping) Suppose we flip a fair coin 100 times. What is the probability of getting between 50 and 60 heads, inclusive?

Let \(X\) be the number of heads; we can view \[ X = Y_1 + \cdots + Y_{100}, \] where \(Y_i \sim \text{Bernoulli}(0.5)\) are independent. Again, we actually know the exact probability \[ P(50 \leq X \leq 60) = \sum_{k=50}^{60} \binom{100}{k} \frac{1}{2}^{100} \approx 0.52219. \]

By the central limit theorem, \[\begin{align*} P(50 \leq X \leq 60) &\approx P(49.5 \leq X \leq 60.5) \\ &= P\left( \frac{49.5 - 50}{5} \leq \frac{X - 50}{5} \leq \frac{60.5 - 50}{5} \right) \\ &\approx P(-0.1 \leq Z \leq 2.1) \\ &\approx 0.52196. \end{align*}\]

Example 36.7 (Rolling dice) Suppose we roll ten fair dice. What is the probability the sum is between 30 and 40, inclusive?

Let \(X_i\) be the roll of the \(i\)th die. Then, the sum is \[ X = X_1 + \cdots + X_{10}. \] We can show that \(\displaystyle \text{E}\!\left[ X_i \right] = \frac{7}{2}\) and \(\displaystyle \text{Var}\!\left[ X_i \right] = \frac{35}{12}\). Hence, \[ \text{E}\!\left[ X \right] = 10 \cdot \frac{7}{2} + 35, \] and \[ \text{Var}\!\left[ X \right] = 10 \cdot \frac{35}{12} = \frac{175}{6}. \] Thus, \[\begin{align*} P(30 \leq X \leq 40) &\approx P(29.5 \leq X \leq 40.5) \qquad \qquad \text{(continuity correction)} \\ &= P\left( \frac{29.5 - 35}{\sqrt{175/6}} \leq \frac{X - 35}{\sqrt{175/6}} \leq \frac{40.5 - 35}{\sqrt{175/6}} \right) \\ &\approx P(-1.0184 \leq Z \leq 1.0184) \\ &\approx 0.69151. \end{align*}\] by the central limit theorem.