32 Law of Large Numbers
$$
$$
In the previous part, we learned about estimation theory; in particular, we learned how to find the MLE (maximum likelihood estimator) of an unknown parameter. Additionally, we learned how to compute bias and variance of an estimator; we concluded that the MSE (mean squared error) of an estimator tells us how good it is.
When computing MLEs, we ran across \(\bar{X}\) quite often; in this part, we will be inspecting the various properties of the sample sum and the sample mean.
32.1 Recap
In Proposition 30.1, we saw that, if \(X_1, \dots, X_n\) are i.i.d. with \(\text{E}\!\left[ X_1 \right] = \mu\), then \(\bar{X}\) is an unbiased estimator of \(\mu\); i.e., \[ \text{E}\!\left[ \bar{X} \right] = \mu. \] We can similarly compute the variance of \(\bar{X}\) quite easily.
32.2 Distribution/consistency of \(\bar{X}\)
We know that, for \(X_1, \dots, X_n\) i.i.d. with \(\text{E}\!\left[ X_1 \right] = \mu\) and \(\text{Var}\!\left[ X_1 \right] = \sigma^2\), \[ \text{E}\!\left[ \bar{X} \right] = \mu \qquad \text{and} \qquad \text{Var}\!\left[ \bar{X} \right] = \frac{\sigma^2}{n}. \] What does the distribution of \(\bar{X}\) look like? We can consider the following two examples.
It is a small sample size of two distributions, but as we increase \(n\), we observe that
- the distribution of the sample mean seems to take on the bell curve; and
- the distribution of the sample mean tends to concentrate around 5, the mean for both distributions.
We will focus on the second point in this section; we will come back to the first point in Chapter 36.
How can we mathematically write “the sample mean tends to concentrate around the population mean?” One way would be it is extremely unlikely for the distance between \(\bar{X}\) and \(\mu\) to be large; i.e., \[ P( \lvert \bar{X} - \mu \rvert > \varepsilon ) \approx 0 \] as \(n\) gets larger, for any \(\varepsilon > 0\). In fact, we say that \(\bar{X}\) converges to \(\mu\) in probability if \[ P( \lvert \bar{X} - \mu \rvert > \varepsilon ) \to 0 \] as \(n \to \infty\), for any \(\varepsilon > 0\). We denote this as \(\bar{X} \stackrel{p}{\to} \mu\).
It turns out the Weak Law of Large Numbers states this is true for any underlying distribution \(X_1, \dots, X_n\) as long as they are i.i.d.! In statistics, we say that \(\bar{X}\) is a consistent estimator of \(\mu\).
32.3 Markov’s and Chebyshev’s inequalities
Before proving the Weak Law of Large Numbers, we need two inequalities.
Chebyshev’s inequality follows immediately from Markov’s inequality.
32.4 Weak Law of Large Numbers
We are finally ready to prove the main result of this chapter!
One consequence of the Weak Law of Large Numbers is the consistency of \(\bar{X}\) – the larger the sample size, the closer the sample mean is guaranteed to be to the population mean.
32.5 Exercises
Exercise 32.1 Explain why the non-negativity is necessary in Markov’s inequality. In other words, come up with a random variable \(X\) and \(a > 0\) such that \[ P(X \geq a) > \frac{\text{E}\!\left[ X \right]}{a}. \]
Exercise 32.2 We want to estimate the proportion \(p\) of the Stanford population who is left-handed. We sample \(n\) people and find \(k\) people are left-handed. Show that \[ P\left( \left\lvert \frac{k}{n} - p \right\rvert > a \right) \leq \frac{1}{4na^2} \] for any \(a > 0\).
Exercise 32.3 Let \(X_1, \dots, X_n\) be i.i.d. \(\text{Normal}(\mu,\sigma^2)\). For what values of \(n\) are we at least \(99\%\) certain that \(\bar{X}\) is \(2\) standard deviations within the population mean \(\mu\)? How about \(99.9\%\) certain?
Exercise 32.4 Let \(X \sim \text{Uniform}(0,8)\) and consider \(P(X \geq 4)\). Give the upper bounds of the probability via Markov’s inequality and Chebyshev’s inequality; compare the results with the actual probability.
Exercise 32.5
- Let \(X\) be a discrete random variable taking values \(1, 2, \dots\). If \(P(X = k)\) is non-increasing in \(k\), show that \[ P(X = k) \leq \frac{2 \text{E}\!\left[ X \right]}{k^2}. \]
- Let \(X \sim \text{Geometric}(p)\). Use the previous part to give an upper bound for \(\displaystyle \frac{k^2}{2^k}\) for all positive integers \(k\). What is the actual maximum?
Exercise 32.6
- Let \(X\) be a non-negative continuous random variable with a non-increasing density function \(f_X(x)\). Show that \[ f_X(x) \leq \frac{2 \text{E}\!\left[ X \right]}{x^2} \] for all \(x > 0\).
- Let \(X \sim \text{Exponential}(\lambda)\). Use the previous part to give an upper bound for \(\displaystyle \frac{\lambda^2}{e^\lambda}\) for all positive \(\lambda\). What is the actual maximum?