30  Bias of an Estimator

In this chapter, we will begin to discuss what makes a “good” estimator. We will see cases where the MLE is not good and learn strategies for improving upon the MLE.

30.1 German Tank Problem

During World War II, the Allied forces sought to estimate the production of German military equipment, particularly tanks, based on limited data. While intelligence reports provided some information, they were often incomplete or unreliable. Instead, the Allies used information from the German tanks that they captured.

Photo from Bundesarchiv, Bild 183-H26258, distributed under a CC-BY-SA 3.0 DE license

As it turns out, the Germans assigned sequential serial numbers to the tanks that they produced. For simplicity, we will assume that the first tank was assigned a serial number of 1, the second tank was assigned a serial number of 2, and so on. Let’s suppose that 10 tanks from one production line were captured, and they had the following serial numbers:

\[203, 194, 148, 241, 64, 142, 188, 100, 23, 153. \tag{30.1}\]

What should our estimate of \(N\), the total number of tanks be? Let’s use the principle of maximum likelihood. To determine the likelihood \(L(N)\), we need to determine the probability of observing the above sample.

  • First, \(L(N) = 0\) for any \(N < 241\) because if there were fewer than 241 tanks, then it would be impossible to observe a tank with serial number 241.
  • For \(N \geq 241\), the likelihood is \[ L(N) = \frac{1}{N (N-1) (N-2) \cdots (N - 9)}, \tag{30.2}\] since the tanks are sampled without replacement. (The same tank cannot be captured twice.) Note that we are assuming that every tank is equally likely to be captured.

Because Equation 30.2 only decreases as \(N\) increases, we should make \(N\) as small as possible to maximize the likelihood. However, it cannot be any smaller than \(241\) because then the likelihood would be zero. Therefore, the MLE is \(\hat N_{\textrm{MLE}} = 241\). The likelihood is graphed below.

Ns <- 230:260
likelihoods <- sapply(Ns, function(N) {
  if(N >= 241) 1 / prod(N:(N-9))
  else 0
})

plot(Ns, likelihoods, type="h")

Is \(\hat N_{\textrm{MLE}} = 241\) a good estimate for the number of tanks \(N\) based on the data in Equation 30.1? It is very likely an underestimate, since \(241\) tanks is the minimum number of tanks there could be, based on the observed data. But of course, we cannot rule out the possibility that it is exactly correct.

It is impossible to tell whether a particular estimate, like \(241\), is good or not. We can only evaluate whether the “procedure” for coming up with this estimate, called the estimator, is good or not. That is, for a sample of \(n\) tanks with serial numbers \[ X_1, X_2, \dots, X_n, \] the maximum likelihood estimator chooses \(N\) to be as small as possible, but no smaller: \[\hat N_{\textrm{MLE}} = \max(X_1, X_2, \dots, X_n). \tag{30.3}\]

The estimator \(\hat N_{\textrm{MLE}}\) is just a random variable, since it depends on the data which is random. To evaluate this estimator, we again turn to probability, continuing the cycle between probability and statistics introduced in Chapter 28.

Definition 30.1 (Bias of an estimator) The bias of an estimator \(\hat\theta\) for estimating a parameter \(\theta\) is \[ \text{E}\!\left[ \hat\theta \right] - \theta. \]

If the bias is zero, then the estimator is said to be unbiased.

Let’s apply Definition 30.1 to the MLE Equation 30.3 in the German Tank Problem.

Example 30.1 (Bias of the MLE in the German Tank Problem) To calculate \(\text{E}\!\left[ \hat N_{\textrm{MLE}} \right]\), we need to know the PMF of \(\hat N_{\textrm{MLE}} = \max(X_1, \dots, X_n)\). To calculate \(P(\hat N_{\textrm{MLE}} = m)\), we note that:

  • All \(\binom{N}{n}\) (unordered) choices of \(n\) serial numbers are equally likely.
  • In order for the largest serial number to be exactly \(m\), we must choose serial number \(m\), along with \((n-1)\) serial numbers from the \((m-1)\) serial numbers less than \(m\).

Therefore, \[ P(\hat N_{\textrm{MLE}} = m) = \frac{1 \cdot \binom{m - 1}{n - 1}}{\binom{N}{n}}. \]

Now we can calculate the expected value from the definition: \[ \begin{align} \text{E}\!\left[ \hat N_{\textrm{MLE}} \right] &= \sum_{m=n}^N m \frac{\binom{m - 1}{n - 1}}{\binom{N}{n}} \\ &= \sum_{m=n}^N \frac{\binom{m}{n} n}{\binom{N}{n}} \\ &= \frac{\binom{N+1}{n+1}n}{\binom{N}{n}} \\ &= \frac{n}{n + 1} (N + 1), \end{align} \tag{30.4}\] where we used several combinatorial identities in the calculation:

The bias is then \[ \text{E}\!\left[ \hat N_{\textrm{MLE}} \right] - N = \frac{n}{n + 1} (N + 1) - N = \frac{n - N}{n + 1}. \] Since \(n < N\), we see that the bias is negative; that is, the MLE tends to underestimate \(N\).

Although the MLE is biased, Equation 30.4 suggests a simple correction that makes the estimator unbiased.

Example 30.2 (Making the MLE unbiased) From Equation 30.4, we know that \[ \text{E}\!\left[ \hat N_{\textrm{MLE}} \right] = \frac{n}{n+1} (N + 1). \] By properties of expectation (Proposition 11.1), we know that \[ \text{E}\!\left[ \frac{n+1}{n} \hat N_{\textrm{MLE}} \right] = N + 1 \] and \[ \text{E}\!\left[ \frac{n+1}{n} \hat N_{\textrm{MLE}} - 1 \right] = N, \] so the estimator \(\hat N_{\textrm{MLE}+} = \frac{n+1}{n} \max(X_1, \dots, X_n) - 1\) is unbiased for estimating \(N\).

Let’s apply this modified estimator to the data from above. In a sample of \(n = 10\), the maximum serial number observed was \(241\). Therefore, a better estimate of the number of tanks is \[ \hat N_{\textrm{MLE}+} = \frac{11}{10} 241 - 1 = 264.1. \]

To better understand what it means for an estimator to be unbiased, let’s do a simulation. Suppose that there are \(N = 270\) tanks in the population, and we sample 10 tanks. We simulate the distributions of \(\hat N_{\textrm{MLE}}\) and \(\hat N_{\textrm{MLE}+}\) below.

Notice that \(\hat N_{\textrm{MLE}}\) is never more than \(N = 270\) and severely underestimates on average (\(\text{E}\!\left[ \hat N_{\textrm{MLE}} \right] = \frac{10}{10 + 1} (270 + 1) \approx 246\)). On the other hand, \(\hat N_{\textrm{MLE}+}\) sometimes underestimates and sometimes overestimates, but the estimates average to \(270\).

30.2 Estimating the Mean

In Example 29.4, we showed that when we have i.i.d. \(\text{Normal}(\mu, \sigma^2)\) data \(X_1, \dots, X_n\), the maximum likelihood estimator of \(\mu\) (whether or not \(\sigma\) is known) is \[ \hat\mu = \frac{1}{n} \sum_{i=1}^n X_i. \] What is the bias of this estimator for estimating \(\mu\)?

Example 30.3 (Bias of the MLE for the normal mean) Let \(X_1, \dots, X_n\) be i.i.d. \(\text{Normal}(\mu, \sigma^2)\) observations. Let \(\hat\mu\) be the MLE of \(\mu\).

To calculate the bias of this estimator, we need to calculate its expectation. We can use linearity of expectation (Theorem 14.2): \[ \begin{align} \text{E}\!\left[ \hat\mu \right] &= \text{E}\big[\frac{1}{n} \sum_{i=1}^n X_i\big] \\ &= \frac{1}{n} \sum_{i=1}^n \text{E}\!\left[ X_i \right] \\ &= \frac{1}{n} \sum_{i=1}^n \mu \\ &= \frac{1}{n} n\mu \\ &= \mu. \end{align} \]

Therefore, its bias is \[ \text{E}\!\left[ \hat\mu \right] - \mu = 0. \] The MLE is unbiased for \(\mu\).

This is a special case of a more general fact, whose proof is essentially the same.

Proposition 30.1 (Sample mean is unbiased) Let \(X_1, \dots, X_n\) be identically distributed (not necessarily independent) random variables from any distribution with finite expectation \(\mu \overset{\text{def}}{=}\text{E}\!\left[ X_1 \right]\).

Then, the sample mean \[ \bar X \overset{\text{def}}{=}\frac{1}{n} \sum_{i=1}^n X_i \tag{30.5}\] is unbiased for \(\mu\).

\[ \begin{align} \text{E}\!\left[ \bar X \right] &= \text{E}\big[\frac{1}{n} \sum_{i=1}^n X_i\big] \\ &= \frac{1}{n} \sum_{i=1}^n \text{E}\!\left[ X_i \right] \\ &= \frac{1}{n} \sum_{i=1}^n \mu \\ &= \frac{1}{n} n\mu \\ &= \mu. \end{align} \]

We now apply Proposition 30.1 to the German Tank Problem.

Example 30.4 (Another estimator for the number of tanks) Let \(X_1, \dots, X_n\) be the serial numbers of the \(n\) captured tanks. These random variables are not independent, since the tanks are sampled without replacement. However, we assumed that each captured tank is equally likely to be any of the \(N\) tanks, so they are identically distributed with PMF \[ f(x) = \begin{cases} \frac{1}{N} & x = 1, \dots, N \\ 0 & \text{otherwise} \end{cases} \] and expected value \[ \text{E}\!\left[ X_1 \right] = \sum_{x=1}^N x \frac{1}{N} = \frac{N(N+1)}{2} \frac{1}{N} = \frac{N+1}{2}. \]

By Proposition 30.1, \(\bar X\) is an unbiased estimator for \(\frac{N+1}{2}\).

But this suggests another way to estimate the number of tanks \(N\). By properties of expectation (Proposition 11.1), \[ \text{E}\big[ 2\bar X - 1 \big] = 2\text{E}\!\left[ \bar X \right] - 1 = N, \] so \(2 \bar X - 1\) is an unbiased estimator of \(N\).

We now have two different estimators for the number of tanks \(N\):

  1. \(\hat N_{\textrm{MLE}+} = \frac{n+1}{n} \max(X_1, \dots, X_n) - 1\)
  2. \(2 \bar X - 1\)

For the data in Equation 30.1, the two estimators produce very different estimates: \(264.1\) and \(290.2\) tanks, respectively. Which estimate should we trust more? Both estimators are unbiased, so we will need a criterion other than bias. We take up this issue in the next chapter.