32 Law of Large Numbers

In the previous part, we learned about estimation theory; in particular, we learned how to find the MLE (maximum likelihood estimator) of an unknown parameter. Additionally, we learned how to compute bias and variance of an estimator; we concluded that the MSE (mean squared error) of an estimator tells us how good it is.

When computing MLEs, we ran across $\bar{X}$ quite often; in this part, we will be inspecting the various properties of the sample sum and the sample mean.

32.1 Recap

In Proposition 30.1, we saw that, if $X_1, \dots, X_n$ are i.i.d. with $\text{E}\!\left[ X_1 \right] = \mu$, then $\bar{X}$ is an unbiased estimator of $\mu$; i.e., \[ \text{E}\!\left[ \bar{X} \right] = \mu. \] We can similarly compute the variance of $\bar{X}$ quite easily.

Proposition 32.1 (Variance of $\bar{X}$) Let $X_1, \dots, X_n$ be i.i.d. with $\text{Var}\!\left[ X_1 \right] = \sigma^2$. Then, \[ \text{Var}\!\left[ \bar{X} \right] = \frac{\sigma^2}{n}. \]

Proof

Since $X_1, \dots, X_n$ are independent, we see that \[\begin{align*} \text{Var}\!\left[ \bar{X} \right] &= \text{Var}\!\left[ \frac{1}{n} \sum_{i=1}^n X_i \right] \\ &= \frac{1}{n^2} \text{Var}\!\left[ \sum_{i=1}^n X_i \right] \\ &\stackrel{\text{ind}}{=} \frac{1}{n^2} \sum_{i=1}^n \text{Var}\!\left[ X_i \right] \\ &= \frac{1}{n^2} \sum_{i=1}^n \sigma^2 \\ &= \frac{1}{n^2} \cdot n \sigma^2 \\ &= \frac{\sigma^2}{n}. \end{align*}\]

32.2 Distribution/consistency of $\bar{X}$

We know that, for $X_1, \dots, X_n$ i.i.d. with $\text{E}\!\left[ X_1 \right] = \mu$ and $\text{Var}\!\left[ X_1 \right] = \sigma^2$, \[ \text{E}\!\left[ \bar{X} \right] = \mu \qquad \text{and} \qquad \text{Var}\!\left[ \bar{X} \right] = \frac{\sigma^2}{n}. \] What does the distribution of $\bar{X}$ look like? We can consider the following two examples.

Example 32.1 (Sample mean of exponential) Let $X_1, \dots, X_n$ be i.i.d. $\text{Exponential}(\lambda)$. Then, the MLE for the mean $\displaystyle \mu = \frac{1}{\lambda}$ is $\hat{\mu}_{\text{MLE}} = \bar{X}$. We run $N$ simulations where we take $n$ samples $X_1, \dots, X_n$; we then plot the results.

Try changing the value of $n$ to 10, 100, 1000, and 10000.

Example 32.2 (Sample mean of Poisson) Let $X_1, \dots, X_n$ be i.i.d. $\text{Poisson}(\lambda)$. Then, the MLE for the mean $\lambda$ is $\hat{\lambda}_{\text{MLE}} = \bar{X}$. We run $N$ simulations where we take $n$ samples $X_1, \dots, X_n$; we then plot the results.

Try changing the value of $n$ to 10, 100, 1000, and 10000.

It is a small sample size of two distributions, but as we increase $n$, we observe that

the distribution of the sample mean seems to take on the bell curve; and
the distribution of the sample mean tends to concentrate around 5, the mean for both distributions.

We will focus on the second point in this section; we will come back to the first point in Chapter 36.

How can we mathematically write “the sample mean tends to concentrate around the population mean?” One way would be it is extremely unlikely for the distance between $\bar{X}$ and $\mu$ to be large; i.e., \[ P( \lvert \bar{X} - \mu \rvert > \varepsilon ) \approx 0 \] as $n$ gets larger, for any $\varepsilon > 0$. In fact, we say that $\bar{X}$ converges to $\mu$ in probability if \[ P( \lvert \bar{X} - \mu \rvert > \varepsilon ) \to 0 \] as $n \to \infty$, for any $\varepsilon > 0$. We denote this as $\bar{X} \stackrel{p}{\to} \mu$.

It turns out the Weak Law of Large Numbers states this is true for any underlying distribution $X_1, \dots, X_n$ as long as they are i.i.d.! In statistics, we say that $\bar{X}$ is a consistent estimator of $\mu$.

32.3 Markov’s and Chebyshev’s inequalities

Before proving the Weak Law of Large Numbers, we need two inequalities.

Proposition 32.2 (Markov’s inequality) If $X$ is a nonnegative random variable, then \[ P(X \geq a) \leq \frac{\text{E}\!\left[ X \right]}{a} \] for any $a > 0$.

Proof

For $a > 0$, let \[ I_{X \geq a} = \begin{cases} 1, & X \geq a \\ 0, & X < a \end{cases}. \] In other words, $I_{X \geq a}$ is the indicator of whether $X \geq a$ or not. Then, $\displaystyle I_{X \geq a} \leq \frac{X}{a}$, and so, \[ \text{E}\!\left[ I_{X \geq a} \right] \leq \text{E}\!\left[ \frac{X}{a} \right] = \frac{\text{E}\!\left[ X \right]}{a}. \] However, $\text{E}\!\left[ I_{X \geq a} \right] = P(X \geq a)$, and the result follows.

Chebyshev’s inequality follows immediately from Markov’s inequality.

Proposition 32.3 (Chebyshev’s inequality) If $X$ is a random variable with mean $\mu$ and variance $\sigma^2$, then \[ P(\lvert X - \mu \rvert \geq k) \leq \frac{\sigma^2}{k^2} \] for any $k > 0$.

Proof

Let $k > 0$ be arbitrary. Since $(X - \mu)^2$ is a nonnegative random variable, we can use Proposition 32.2 to get \[ P( (X - \mu)^2 \geq k^2 ) \leq \frac{\text{E}\!\left[ (X - \mu)^2 \right]}{k^2}. \] Note that the events $\left\{ (X - \mu)^2 \geq k^2 \right\}$ and $\left\{ \lvert X - \mu \rvert \geq k \right\}$ are equivalent. Also noting that $\text{E}\!\left[ (X - \mu)^2 \right] = \sigma^2$ yields the desired result \[ P( \lvert X - \mu \rvert \geq k ) \leq \frac{\sigma^2}{k^2}. \]

32.4 Weak Law of Large Numbers

We are finally ready to prove the main result of this chapter!

Theorem 32.1 (Weak Law of Large Numbers) Let $X_1, \dots, X_n$ be i.i.d. with $\text{E}\!\left[ X_1 \right] = \mu$ and $\text{Var}\!\left[ X_1 \right] = \sigma^2$. Then, \[ P( \lvert \bar{X} - \mu \rvert > \varepsilon) \to 0 \qquad \text{as $n \to \infty$} \] for any $\varepsilon > 0$. In other words, $\bar{X}$ converges to $\mu$ in probability.

Proof

Let $\varepsilon > 0$ be arbitrary. Since $\text{E}\!\left[ \bar{X} \right] = \mu$ and $\text{Var}\!\left[ \bar{X} \right] = \sigma^2/n$, \[ P( \lvert \bar{X} - \mu \rvert > \varepsilon ) \leq \frac{\sigma^2/n}{\varepsilon^2} = \frac{1}{n} \cdot \frac{\sigma^2}{\varepsilon^2} \] by Proposition 32.3.

Hence, it follows that \[ P( \lvert \bar{X} - \mu \rvert > \varepsilon ) \to 0 \] as $n \to \infty$.

One consequence of the Weak Law of Large Numbers is the consistency of $\bar{X}$ – the larger the sample size, the closer the sample mean is guaranteed to be to the population mean.

32.5 Exercises

Exercise 32.1 Explain why the non-negativity is necessary in Markov’s inequality. In other words, come up with a random variable $X$ and $a > 0$ such that \[ P(X \geq a) > \frac{\text{E}\!\left[ X \right]}{a}. \]

Exercise 32.2 We want to estimate the proportion $p$ of the Stanford population who is left-handed. We sample $n$ people and find $k$ people are left-handed. Show that \[ P\left( \left\lvert \frac{k}{n} - p \right\rvert > a \right) \leq \frac{1}{4na^2} \] for any $a > 0$.

Exercise 32.3 Let $X_1, \dots, X_n$ be i.i.d. $\text{Normal}(\mu,\sigma^2)$. For what values of $n$ are we at least $99\%$ certain that $\bar{X}$ is $2$ standard deviations within the population mean $\mu$? How about $99.9\%$ certain?

Exercise 32.4 Let $X \sim \text{Uniform}(0,8)$ and consider $P(X \geq 4)$. Give the upper bounds of the probability via Markov’s inequality and Chebyshev’s inequality; compare the results with the actual probability.

Exercise 32.5

Let $X$ be a discrete random variable taking values $1, 2, \dots$. If $P(X = k)$ is non-increasing in $k$, show that \[ P(X = k) \leq \frac{2 \text{E}\!\left[ X \right]}{k^2}. \]
Let $X \sim \text{Geometric}(p)$. Use the previous part to give an upper bound for $\displaystyle \frac{k^2}{2^k}$ for all positive integers $k$. What is the actual maximum?

Exercise 32.6

Let $X$ be a non-negative continuous random variable with a non-increasing density function $f_X(x)$. Show that \[ f_X(x) \leq \frac{2 \text{E}\!\left[ X \right]}{x^2} \] for all $x > 0$.
Let $X \sim \text{Exponential}(\lambda)$. Use the previous part to give an upper bound for $\displaystyle \frac{\lambda^2}{e^\lambda}$ for all positive $\lambda$. What is the actual maximum?

32.1 Recap

32.2 Distribution/consistency of \(\bar{X}\)

32.3 Markov’s and Chebyshev’s inequalities

32.4 Weak Law of Large Numbers

32.5 Exercises