31 Bias of an Estimator

In this chapter, we will begin to discuss what makes an estimator good or bad. We will see cases where the MLE is not good and learn strategies for improving upon the MLE.

31.1 German Tank Problem

During World War II, the Allied forces sought to estimate the production of German military equipment, particularly tanks, based on limited data. While intelligence reports provided some information, they were often incomplete or unreliable. Instead, the Allies used information from the German tanks that they captured.

Photo from Bundesarchiv, Bild 183-H26258, distributed under a CC-BY-SA 3.0 DE license

As it turns out, the Germans assigned sequential serial numbers to the tanks that they produced. For simplicity, we will assume that the first tank was assigned a serial number of 1, the second tank was assigned a serial number of 2, and so on. Let’s suppose that 10 tanks from one production line were captured, and they had the following serial numbers:

\[203, 194, 148, 241, 64, 142, 188, 100, 23, 153. \tag{31.1}\]

What should our estimate of \(N\), the total number of tanks be? Let’s use the principle of maximum likelihood. To determine the likelihood \(L(N)\), we need to determine the probability of observing the above sample.

First, \(L(N) = 0\) for any \(N < 241\) because if there were fewer than 241 tanks, then it would be impossible to observe a tank with serial number 241.
For \(N \geq 241\), the likelihood is \[ L(N) = \frac{1}{N (N-1) (N-2) \cdots (N - 9)}, \tag{31.2}\] since the tanks are sampled without replacement. (The same tank cannot be captured twice.) Note that we are assuming that every tank is equally likely to be captured.

Because Equation 31.2 only decreases as \(N\) increases, we should make \(N\) as small as possible to maximize the likelihood. However, it cannot be any smaller than \(241\) because then the likelihood would be zero. Therefore, the MLE is \(\hat N_{\textrm{MLE}} = 241\). The likelihood is graphed below.

Ns <- 230:260
likelihoods <- sapply(Ns, function(N) {
  if(N >= 241) 1 / prod(N:(N-9))
  else 0
})

plot(Ns, likelihoods, type="h")

Is \(\hat N_{\textrm{MLE}} = 241\) a good estimate for the number of tanks \(N\) based on the data in Equation 31.1? It is very likely an underestimate, since \(241\) tanks is the minimum number of tanks there could be, based on the observed data. But of course, it could be correct.

It is impossible to evaluate a particular estimate, like \(241\). It is only possible to evaluate the “procedure” for coming up with this estimate, called the estimator. That is, for a sample of \(n\) tanks with serial numbers \[ X_1, X_2, \dots, X_n, \] the maximum likelihood estimator chooses \(N\) to be as small as possible, but no smaller: \[\hat N_{\textrm{MLE}} = \max(X_1, X_2, \dots, X_n). \tag{31.3}\]

The estimator \(\hat N_{\textrm{MLE}}\) is just a random variable, since it depends on the data which is random. To evaluate this estimator, we again turn to probability, continuing the cycle between probability and statistics introduced in Chapter 29.

Definition 31.1 (Bias of an estimator) The bias of an estimator \(\hat\theta\) for estimating a parameter \(\theta\) is \[ \text{E}\!\left[ \hat\theta \right] - \theta. \]

If the bias is zero, then the estimator is said to be unbiased.

Let’s apply Definition 31.1 to the MLE Equation 31.3 in the German Tank Problem.

Example 31.1 (Bias of the MLE in the German Tank Problem) To calculate \(\text{E}\!\left[ \hat N_{\textrm{MLE}} \right]\), we need to know the PMF of \(\hat N_{\textrm{MLE}} = \max(X_1, \dots, X_n)\). To calculate \(P(\hat N_{\textrm{MLE}} = m)\), we note that:

All \(\binom{N}{n}\) (unordered) choices of \(n\) serial numbers are equally likely.
In order for the largest serial number to be exactly \(m\), we must choose serial number \(m\), along with \((n-1)\) serial numbers from the \((m-1)\) serial numbers less than \(m\).

Therefore, \[ P(\hat N_{\textrm{MLE}} = m) = \frac{1 \cdot \binom{m - 1}{n - 1}}{\binom{N}{n}}. \]

Now we can calculate the expected value from the definition: \[ \begin{align} \text{E}\!\left[ \hat N_{\textrm{MLE}} \right] &= \sum_{m=n}^N m \frac{\binom{m - 1}{n - 1}}{\binom{N}{n}} \\ &= \sum_{m=n}^N \frac{\binom{m}{n} n}{\binom{N}{n}} \\ &= \frac{\binom{N+1}{n+1}n}{\binom{N}{n}} \\ &= \frac{n}{n + 1} (N + 1), \end{align} \tag{31.4}\] where we used several combinatorial identities in the calculation:

in line 2, choosing a captain (Proposition 2.5)
in line 3, the hockey stick identity (Exercise 2.20).

The bias is then \[ \text{E}\!\left[ \hat N_{\textrm{MLE}} \right] - N = \frac{n}{n + 1} (N + 1) - N = -\frac{N - n}{n + 1}. \] Since \(n < N\), we see that the bias is negative; that is, the MLE tends to underestimate \(N\).

Although the MLE is biased, Equation 31.4 suggests a simple correction that makes the estimator unbiased.

Example 31.2 (Making the MLE unbiased) From Equation 31.4, we know that \[ \text{E}\!\left[ \hat N_{\textrm{MLE}} \right] = \frac{n}{n+1} (N + 1). \] By properties of expectation (Proposition 11.1), we know that \[ \text{E}\!\left[ \frac{n+1}{n} \hat N_{\textrm{MLE}} \right] = N + 1 \] and \[ \text{E}\!\left[ \frac{n+1}{n} \hat N_{\textrm{MLE}} - 1 \right] = N, \] so the estimator \(\hat N_{\textrm{MLE}+} = \frac{n+1}{n} \max(X_1, \dots, X_n) - 1\) is unbiased for estimating \(N\).

Let’s apply this modified estimator to the data from above. In a sample of \(n = 10\), the maximum serial number observed was \(241\). Therefore, a better estimate of the number of tanks is \[ \hat N_{\textrm{MLE}+} = \frac{11}{10} 241 - 1 = 264.1. \]

To better understand what it means for an estimator to be unbiased, let’s do a simulation. Suppose that there are \(N = 270\) tanks in the population, and we sample 10 tanks. We simulate the distributions of \(\hat N_{\textrm{MLE}}\) and \(\hat N_{\textrm{MLE}+}\) below.

Notice that \(\hat N_{\textrm{MLE}}\) is never more than \(N = 270\) and severely underestimates on average (\(\text{E}\!\left[ \hat N_{\textrm{MLE}} \right] = \frac{10}{10 + 1} (270 + 1) \approx 246\)). On the other hand, \(\hat N_{\textrm{MLE}+}\) sometimes underestimates and sometimes overestimates, but the estimates average to \(270\).

31.2 Estimating the Mean

In Example 30.3, we showed that when we have i.i.d. \(\text{Normal}(\mu, \sigma^2)\) data \(X_1, \dots, X_n\), the maximum likelihood estimator of \(\mu\) (whether or not \(\sigma\) is known) is \[ \hat\mu = \frac{1}{n} \sum_{i=1}^n X_i. \] What is the bias of this estimator for estimating \(\mu\)?

Example 31.3 (Bias of the MLE for the normal mean) Let \(X_1, \dots, X_n\) be i.i.d. \(\text{Normal}(\mu, \sigma^2)\) observations. Let \(\hat\mu\) be the MLE of \(\mu\).

To calculate the bias of this estimator, we need to calculate its expectation. We can use linearity of expectation (Theorem 14.2): \[ \begin{align} \text{E}\!\left[ \hat\mu \right] &= \text{E}\big[\frac{1}{n} \sum_{i=1}^n X_i\big] \\ &= \frac{1}{n} \sum_{i=1}^n \text{E}\!\left[ X_i \right] \\ &= \frac{1}{n} \sum_{i=1}^n \mu \\ &= \frac{1}{n} n\mu \\ &= \mu. \end{align} \]

Therefore, its bias is \[ \text{E}\!\left[ \hat\mu \right] - \mu = 0. \] The MLE is unbiased for \(\mu\).

This is a special case of a more general fact, whose proof is essentially the same.

Proposition 31.1 (Sample mean is unbiased) Let \(X_1, \dots, X_n\) be identically distributed (not necessarily independent) random variables from any distribution with finite expectation \(\mu \overset{\text{def}}{=}\text{E}\!\left[ X_1 \right]\).

Then, the sample mean \[ \bar X \overset{\text{def}}{=}\frac{1}{n} \sum_{i=1}^n X_i \tag{31.5}\] is unbiased for \(\mu\).

Proof

\[ \begin{align} \text{E}\!\left[ \bar X \right] &= \text{E}\big[\frac{1}{n} \sum_{i=1}^n X_i\big] \\ &= \frac{1}{n} \sum_{i=1}^n \text{E}\!\left[ X_i \right] \\ &= \frac{1}{n} \sum_{i=1}^n \mu \\ &= \frac{1}{n} n\mu \\ &= \mu. \end{align} \]

We now apply Proposition 31.1 to the German Tank Problem.

Example 31.4 (Another estimator for the number of tanks) Let \(X_1, \dots, X_n\) be the serial numbers of the \(n\) captured tanks. These random variables are not independent, since the tanks are sampled without replacement. However, we assumed that each captured tank is equally likely to be any of the \(N\) tanks, so they are identically distributed with PMF \[ f(x) = \begin{cases} \frac{1}{N} & x = 1, \dots, N \\ 0 & \text{otherwise} \end{cases} \] and expected value \[ \text{E}\!\left[ X_1 \right] = \sum_{x=1}^N x \frac{1}{N} = \frac{N(N+1)}{2} \frac{1}{N} = \frac{N+1}{2}. \]

By Proposition 31.1, \(\bar X\) is an unbiased estimator for \(\frac{N+1}{2}\).

But this suggests another way to estimate the number of tanks \(N\). By properties of expectation (Proposition 11.1), \[ \text{E}\big[ 2\bar X - 1 \big] = 2\text{E}\!\left[ \bar X \right] - 1 = N, \] so \(2 \bar X - 1\) is an unbiased estimator of \(N\).

We now have two different estimators for the number of tanks \(N\):

\(\hat N_{\textrm{MLE}+} = \frac{n+1}{n} \max(X_1, \dots, X_n) - 1\)
\(2 \bar X - 1\)

For the data in Equation 31.1, the two estimators produce very different estimates: \(264.1\) and \(290.2\) tanks, respectively. Which estimate should we trust more? Both estimators are unbiased, so we will need a criterion other than bias. We take up this issue in the next chapter.

31.3 Exercises

Exercise 31.1 (Bias of the Poisson MLE) Let \(X_1, \dots, X_n\) be i.i.d. \(\text{Poisson}(\mu)\). What is the MLE of \(\mu\), and what is its bias?

Exercise 31.2 (Estimating a binomial probability) In Example 29.4, we showed that if we roll a skew die \(n\) times and observe \(X\) sixes, the MLE for the probability \(p\) of rolling a six is \[ \hat p = \frac{X}{n}. \] What is the bias of this estimator?

Exercise 31.3 (Another estimator for the German tank problem) In the German tank problem, consider the estimator \[ \hat{N}_{\text{mm}} = \min_i X_i + \max_i X_i. \] Is \(\hat{N}_{\text{mm}}\) biased?

Consider the symmetry of \(\min_i X_i\) and \(\max_i X_i\).

Exercise 31.4 (Estimating variance of a normal distribution with known mean) Let \(X_1, \dots, X_n\) be i.i.d. \(\text{Normal}(\mu, \sigma^2)\), where \(\mu\) is known (but \(\sigma^2\) is not). Find the MLE of \(\sigma^2\). What is its bias?

Exercise 31.5 (Unbiased estimator for the uniform distribution) Let \(X_1, \dots, X_n\) be i.i.d. \(\text{Uniform}(0,\theta)\). You determined the MLE \(\hat\theta_\textrm{MLE}\) in Exercise 30.4. What is the bias of \(\hat\theta_\textrm{MLE}\)? Can you construct an unbiased estimator \(\hat{\theta}_{\textrm{MLE}+}\) based on the bias of \(\hat\theta_\textrm{MLE}\)?

Exercise 31.6 (Randomized response) In surveys, it is sometimes difficult to obtain truthful responses to sensitive questions, such as “Have you ever cheated on an exam?” One way to address this issue is to use the method of randomized response, which allows respondents to answer sensitive questions truthfully while maintaining their privacy.

Here is one way to implement randomized response. Suppose you want to estimate the proportion of students who have cheated on an exam. Each student flips a coin, without showing the result to anyone. If the coin lands heads, then the student answers “yes” (regardless of whether they have ever cheated on an exam). If the coin lands tails, then the student answers the question truthfully. Since only the student knows the outcome of their coin flip, it is impossible to tell whether someone who answered “yes” has actually cheated or not.

Let \(\pi\) be the proportion of students who have cheated. Let \(p\) be the probability that an individual student answers “yes”.

Express \(p\) in terms of \(\pi\).
Suppose that in a random sample of \(n\) students, \(X\) students answered “yes”. What is an unbiased estimator for \(p\) based on \(X\)? Use this to construct an unbiased estimator for \(\pi\) based on \(X\).