35 Asymptotic Distributions
$$
$$
In Chapter 33 and Chapter 34, we learned some strategies for determining the exact distribution of a sum or a mean of i.i.d. random variables, but we also saw their limitations. In many situations, it is not feasible to determine the exact distribution, and the best we can do is to approximate the distribution. In general, these approximations will be valid when the sample size \(n\) is large.
In probability and statistics, the study of how random variables behave as \(n \to \infty\) is known as asymptotics. Asymptotics allow us to answer many questions that otherwise would be intractable. Because \(n\) is often large in many modern applications, the asymptotic approximation is very close to the exact answer.
This chapter lays the groundwork for asymptotics, defining precisely what it means for one distribution to approximate another when \(n\) is large. We will apply this theory to sums and means of i.i.d. random variables in Chapter 36, when we discuss the Central Limit Theorem.
35.1 Convergence in Distribution
Suppose we have random variables \(Y_1, Y_2, \dots\) with CDFs \(F_1(y), F_2(y), \dots\), respectively. (We work with the CDF instead of the PMF or PDF because CDFs are defined for both discrete and continuous random variables.) What does it mean to say that the distribution of \(Y_n\) can be approximated by \(F(y)\) as \(n\) gets large? The next definition provides an answer.
Let’s apply Definition 35.1 to the case where \(Y_n = \bar X_n\), the sample mean of \(n\) i.i.d. random variables \(X_1, \dots, X_n\). In Theorem 28.2, we showed that \(\bar X_n\) converges in probability to \(\mu\), where \(\mu \overset{\text{def}}{=}\text{E}\!\left[ X_i \right]\). Now, we will show that \(\bar X_n\) also converges in distribution to \(\mu\).
We now have two statements of the Law of Large Numbers: Theorem 28.2 says that \(\bar{X}_n\) converges in probability to \(\mu\), while Theorem 35.1 says that \(\bar{X}_n\) converges in distribution to \(\mu\). What is the difference? In general, the two modes of convergence are distinct. However, in the case where the limit is a constant, convergence in probability and convergence in distribution are equivalent.
Because convergence in probability and convergence in distribution are equivalent when the limit is a constant, Theorem 28.2 and Theorem 35.1 are one and the same.
The Law of Large Numbers provides assurance that the sample mean \(\bar X_n\) is a reasonable estimator of \(\mu\). Although we saw in Example 32.4 that \(\bar X_n\) may not necessarily be the estimator with the lowest MSE, \(\bar X_n\) will approach \(\mu\) as we collect more data. This property is known as consistency.
35.2 Convergence in Distribution with MGFs
In the examples so far, the sequence of random variables \(Y_1, Y_2, \dots\) have converged in distribution to a constant. Convergence in distribution is more interesting when the limiting distribution is not degenerate.
For example, the code below shows the PMF of a \(\textrm{Poisson}(\mu=n)\) random variable \(X_n\). Try increasing \(n\)—what appears to be the limiting distribution?
Although \(X_n\) is discrete for all \(n\), this sequence of random variables appears to “converge” to a normal distribution, which is continuous! However, more work is needed to make this statement precise. Notice that the center of the distribution is drifting towards \(\infty\) as \(n\) increases. This is because the mean \(\text{E}\!\left[ X_n \right] = n\) is increasing. Notice also that the spread of the distribution increases as \(n\) increases. This is because the variance \(\text{Var}\!\left[ X_n \right] = n\) is also increasing. Clearly, \(X_n\) diverges as \(n\to\infty\).
In order to make the convergence statement precise, we standardize the random variables, \[ Y_n \overset{\text{def}}{=}\frac{X_n - \text{E}\!\left[ X_n \right]}{\sqrt{\text{Var}\!\left[ X_n \right]}} = \frac{X_n - n}{\sqrt{n}}, \] so that each \(Y_n\) has mean \(0\) and variance \(1\). Now, it is plausible that the sequence \(Y_n\) converges in distribution. We will show that \(Y_n \stackrel{d}{\to} \text{Normal}(0, 1)\).
It is virtually impossible to show this directly using Definition 35.1. This is because there is no simple expression for the CDF of the Poisson distribution: \[ F_n(y) \overset{\text{def}}{=}P(Y_n \leq y) = P(X_n \leq y \sqrt{n} + n) = \sum_{x=0}^{\lfloor y \sqrt{n} + n \rfloor} e^{-n} \frac{n^x}{x!}, \] so it is hopeless to find the limit of this expression as \(n\to\infty\).
However, recall from Chapter 34 that distributions can also be uniquely specified by their MGFs. The limit of the MGF is usually easier to find. The following result guarantees that if the MGF has a limit, then this limit is the MGF of the limiting distribution.
Theorem 35.3 was first proved by Paul Levy for characteristic functions (Equation 34.4) and extended by John H. Curtiss (1942) to moment generating functions. Curtiss (1909-1977) was an American mathematician and an early advocate for the adoption of computers. He was one of the founders of the Association for Computing Machinery (ACM), which is the largest professional society for computer science today.
We now apply Theorem 35.3 to show that the \(\textrm{Poisson}(\mu=n)\) distribution converges to a normal distribution as \(n\to\infty\).
The upshot of Example 35.1 is that we can use the normal distribution to approximate probabilities for a Poisson distribution. We say that the Poisson distribution is asymptotically normal.
In the above example, it was not difficult to use R to obtain an exact probability (or at least one that is very close). Why settle for an approximation? Certainly the normal approximation was necessary in the age before computers, and it is still useful today for proving theoretical results. We will soon encounter problems where an approximation is the only feasible answer.
Far better an approximate answer to the right question, which is often vague, than the exact answer to the wrong question, which can always be made precise. — John Tukey
35.3 Exercises
Exercise 35.1 (Consistency of the normal variance MLE when mean is known) Let \(X_1, \dots, X_n\) be i.i.d. \(\text{Normal}(\mu, \sigma^2)\), where \(\mu\) is known (but \(\sigma^2\) is not). Is the MLE that you derived in Exercise 31.4 consistent for \(\sigma^2\)?
Exercise 35.2 (Consistency of the uniform MLE) Is the MLE that you derived in Exercise 30.4 consistent for \(\theta\)?
Hint: You can obtain an explicit expression for \(P(|\hat\theta_n - \theta| > \epsilon)\).
Exercise 35.3 (Poisson approximation to the binomial via MGFs) In Theorem 12.1, we showed that the Poisson distribution was an approximation to the binomial distribution when \(n\) is large and \(p\) is small.
Let \(X_n \sim \text{Binomial}(n, p=\frac{\mu}{n})\). Use MGFs to find the limiting distribution as \(n \to\infty\).
Exercise 35.4 (Asymptotics for the geometric distribution) Let \(X \sim \text{Geometric}(p=\frac{1}{n})\). Find the limiting distribution of \(\frac{1}{n} X\) as \(n \to \infty\).