<- 230:260
Ns <- sapply(Ns, function(N) {
likelihoods if(N >= 241) 1 / prod(N:(N-9))
else 0
})
plot(Ns, likelihoods, type="h")
30 Bias of an Estimator
$$
$$
In this chapter, we will begin to discuss what makes a “good” estimator. We will see cases where the MLE is not good and learn strategies for improving upon the MLE.
30.1 German Tank Problem
During World War II, the Allied forces sought to estimate the production of German military equipment, particularly tanks, based on limited data. While intelligence reports provided some information, they were often incomplete or unreliable. Instead, the Allies used information from the German tanks that they captured.
Photo from Bundesarchiv, Bild 183-H26258, distributed under a CC-BY-SA 3.0 DE license
As it turns out, the Germans assigned sequential serial numbers to the tanks that they produced. For simplicity, we will assume that the first tank was assigned a serial number of 1, the second tank was assigned a serial number of 2, and so on. Let’s suppose that 10 tanks from one production line were captured, and they had the following serial numbers:
\[203, 194, 148, 241, 64, 142, 188, 100, 23, 153. \tag{30.1}\]
What should our estimate of \(N\), the total number of tanks be? Let’s use the principle of maximum likelihood. To determine the likelihood \(L(N)\), we need to determine the probability of observing the above sample.
- First, \(L(N) = 0\) for any \(N < 241\) because if there were fewer than 241 tanks, then it would be impossible to observe a tank with serial number 241.
- For \(N \geq 241\), the likelihood is \[ L(N) = \frac{1}{N (N-1) (N-2) \cdots (N - 9)}, \tag{30.2}\] since the tanks are sampled without replacement. (The same tank cannot be captured twice.) Note that we are assuming that every tank is equally likely to be captured.
Because Equation 30.2 only decreases as \(N\) increases, we should make \(N\) as small as possible to maximize the likelihood. However, it cannot be any smaller than \(241\) because then the likelihood would be zero. Therefore, the MLE is \(\hat N_{\textrm{MLE}} = 241\). The likelihood is graphed below.
Is \(\hat N_{\textrm{MLE}} = 241\) a good estimate for the number of tanks \(N\) based on the data in Equation 30.1? It is very likely an underestimate, since \(241\) tanks is the minimum number of tanks there could be, based on the observed data. But of course, we cannot rule out the possibility that it is exactly correct.
It is impossible to tell whether a particular estimate, like \(241\), is good or not. We can only evaluate whether the “procedure” for coming up with this estimate, called the estimator, is good or not. That is, for a sample of \(n\) tanks with serial numbers \[ X_1, X_2, \dots, X_n, \] the maximum likelihood estimator chooses \(N\) to be as small as possible, but no smaller: \[\hat N_{\textrm{MLE}} = \max(X_1, X_2, \dots, X_n). \tag{30.3}\]
The estimator \(\hat N_{\textrm{MLE}}\) is just a random variable, since it depends on the data which is random. To evaluate this estimator, we again turn to probability, continuing the cycle between probability and statistics introduced in Chapter 28.
Let’s apply Definition 30.1 to the MLE Equation 30.3 in the German Tank Problem.
Although the MLE is biased, Equation 30.4 suggests a simple correction that makes the estimator unbiased.
To better understand what it means for an estimator to be unbiased, let’s do a simulation. Suppose that there are \(N = 270\) tanks in the population, and we sample 10 tanks. We simulate the distributions of \(\hat N_{\textrm{MLE}}\) and \(\hat N_{\textrm{MLE}+}\) below.
Notice that \(\hat N_{\textrm{MLE}}\) is never more than \(N = 270\) and severely underestimates on average (\(\text{E}\!\left[ \hat N_{\textrm{MLE}} \right] = \frac{10}{10 + 1} (270 + 1) \approx 246\)). On the other hand, \(\hat N_{\textrm{MLE}+}\) sometimes underestimates and sometimes overestimates, but the estimates average to \(270\).
30.2 Estimating the Mean
In Example 29.4, we showed that when we have i.i.d. \(\text{Normal}(\mu, \sigma^2)\) data \(X_1, \dots, X_n\), the maximum likelihood estimator of \(\mu\) (whether or not \(\sigma\) is known) is \[ \hat\mu = \frac{1}{n} \sum_{i=1}^n X_i. \] What is the bias of this estimator for estimating \(\mu\)?
This is a special case of a more general fact, whose proof is essentially the same.
We now apply Proposition 30.1 to the German Tank Problem.
We now have two different estimators for the number of tanks \(N\):
- \(\hat N_{\textrm{MLE}+} = \frac{n+1}{n} \max(X_1, \dots, X_n) - 1\)
- \(2 \bar X - 1\)
For the data in Equation 30.1, the two estimators produce very different estimates: \(264.1\) and \(290.2\) tanks, respectively. Which estimate should we trust more? Both estimators are unbiased, so we will need a criterion other than bias. We take up this issue in the next chapter.