39  Minima and Maxima

So far, we have studied the properties of the sample mean \(\bar X\) and estimators that are functions of \(\bar X\). However, we have encountered estimators that are not functions of the mean. For example, in the German Tank Problem (Section 30.1), the MLE of the number of tanks \(N\), was \(\hat N = \max(X_1, \dots, X_n)\). In Example 29.6, we saw that the MLE of the exponential location parameter was \(\hat\theta = \min(X_1, \dots, X_n)\).

In this chapter, we investigate the properties of the minimum or the maximum of a set of i.i.d. random variables.

39.1 Maximum of Random Variables

First, let \(X_1, \dots, X_n\) be i.i.d. continuous random variables with PDF \(f_{X_i}(x)\) and CDF \(F_{X_i}(x)\). What is the distribution of \(\max(X_1, \dots, X_n)\)?

Example 39.1 (Maximum of uniforms) Let \(X_1, \dots, X_n\) be i.i.d. \(\textrm{Uniform}(a= 0, b= \theta)\). Then, the MLE of \(\theta\) is \[ \hat\theta_\textrm{MLE}= \max(X_1, \dots, X_n). \] To determine the bias and variance of this estimator, we cannot use properties of expectation; this estimator is not close to linear in \(X_1, \dots, X_n\). We have to determine its PDF and then calculate its expectation directly.

The CDF of \(\hat\theta_\textrm{MLE}\) is (for \(0 < t < \theta\)) \[\begin{align} F_{\hat\theta_\textrm{MLE}}(t) &= P\left( \hat\theta_\textrm{MLE}\leq t \right) \\ &= P(\max(X_1, \dots, X_n) \leq t) \\ &= P(X_1 \leq t, \dots, X_n \leq t) \\ &= P(X_1 \leq t) \cdots P(X_n \leq t) \\ &= \left(\frac{t}{\theta}\right)^n. \end{align}\]

Therefore, the PDF of \(\hat\theta_\textrm{MLE}\) is \[ f_{\hat\theta_\textrm{MLE}}(t) = F'_{\hat\theta_\textrm{MLE}}(t) = \frac{nt^{n-1}}{\theta^n} \tag{39.1}\]

for \(0 < t < \theta\). Let’s graph this PDF for \(n = 5\) and \(\theta = 2\):

Notice that compared to the uniform distribution, the probability density of \(\hat\theta_\textrm{MLE}\) has much more mass near \(\theta\), as we would hope. However, it will be negatively biased, since it can only underestimate \(\theta\), never overestimate.

To calculate the exact bias, we need the expected value: \[ \text{E}\!\left[ \hat\theta_\textrm{MLE} \right] = \int_0^\theta t \cdot \frac{nt^{n-1}}{\theta^n}\,dt = \frac{n}{\theta^n} \frac{t^{n+1}}{n+1}\Bigg|_0^\theta = \frac{n}{n+1} \theta. \] So the bias is \[ \text{E}\!\left[ \hat\theta_\textrm{MLE} \right] - \theta = \frac{n}{n+1} \theta - \theta = -\frac{\theta}{n + 1}, \] which is negative as expected, and the variance is

\[\begin{align*} \text{Var}\!\left[ \hat{\theta}_{\textrm{MLE}} \right] &= \int_0^\theta t^2 \cdot \frac{nt^{n-1}}{\theta^n} \, dt - \left( \frac{n}{n+1} \theta \right)^2 \\ &= \frac{n}{\theta^n} \left. \frac{t^{n+2}}{n+2} \right\rvert_0^\theta - \frac{n^2}{(n+1)^2} \theta^2 \\ &= \frac{n}{n+2} \theta^2 - \frac{n^2}{(n+1)^2} \theta^2 \\ &= \frac{n}{(n+1)^2(n+2)} \theta^2. \end{align*}\]

The MSE of \(\hat\theta_\textrm{MLE}\) is therefore

\[ \text{MSE}\!\left[ \hat{\theta}_{\textrm{MLE}} \right] = \left( -\frac{\theta}{n+1} \right)^2 + \frac{n}{(n+1)^2(n+2)} \theta^2 = \frac{2}{(n+1)(n+2)} \theta^2. \]

Let’s generalize Example 39.1 to all continuous distributions.

Proposition 39.1 (Distribution of the Maximum (Continuous)) Let \(X_1, \dots, X_n\) be i.i.d. continuous random variables with PDF \(f_{X_1}(x)\) and CDF \(F_{X_1}(x)\). Then, the distribution of \(M = \max(X_1, \dots, X_n)\) is given by the PDF \[ f_M(t) = n \left( F_{X_1}(t) \right)^{n-1} f_{X_1}(t). \]

Proof

We start with the CDF of \(M\). \[\begin{align*} F_M(t) &= P(M \leq t) \\ &= P(X_1, \dots, X_n \leq t) \\ &= \left( P(X_1 \leq t) \right)^n \\ &= \left( F_{X_1}(t) \right)^n. \end{align*}\] Taking the derivative, with respect to \(t\), we get the density \[ f_M(t) = n \left( F_{X_1}(t) \right)^{n-1} \left( F_{X_1}(t) \right)' = n \left( F_{X_1}(t) \right)^{n-1} f_{X_1}(t). \]

The distribution of the maximum can be used to study the effects of selection bias.

Example 39.2 (Winner’s curse) In auctions, it is observed that the winner tends to overpay for the value of the item. How do we explain this phenomenon?

Let’s assume that the true value of the item is \(\mu\), and each of the four bidders independently estimate the value of the item and bid that amount. That is, we observe bids \(X_1, X_2, X_3, X_4 \sim \text{Normal}(\mu, 1)\). Notice that we are assuming that each bid \(X_i\) is an unbiased estimator for \(\mu\), so each bidder is right on average.

The winning bid, \(W \overset{\text{def}}{=}\max(X_1, X_2, X_3, X_4)\), is just one of the bids. Shouldn’t it be unbiased as well?

Although \(W\) is just one bid, it is always the largest one. Even though each individual bid is unbiased, the largest bid will tend to overestimate the mean. To see this, we calculate its PDF and expectation.

According to Proposition 39.1, the distribution of \(W\) is \[ \begin{align} f_W(w) &= 4 \Phi(w - \mu)^3 \varphi(w - \mu), \end{align} \tag{39.2}\] where \(\varphi(z) \overset{\text{def}}{=}\frac{1}{\sqrt{2\pi}} e^{-z^2 / 2}\) is the PDF of the standard normal.

The PDF of the winning bid (Equation 39.2) for an item that is actually worth \(\mu = \$10\) is graphed below. For comparison, the PDF of each individual bid, \(\textrm{Normal}(\mu= \$10, \sigma^2= 1)\), is also shown.

Clearly, the winning bid tends to overestimate the value. To be precise, we calculate its expectation: \[\begin{align} \text{E}\!\left[ W \right] &= \int_{-\infty}^\infty w \cdot 4 \Phi(w - \mu)^{3} \varphi(w - \mu)\,dw \\ &= \int_{-\infty}^\infty (\mu + t) \cdot 4 \Phi(t)^{3} \varphi(t)\,dt & (t = w - \mu) \\ &= \mu + \int_{-\infty}^\infty t \cdot 4 \Phi(t)^{3} \varphi(t)\,dt. \end{align}\]

We will evaluate this integral numerically in R.

Therefore, the expectation is about \(\mu + \$1.03\), so the winning bid \(W\) tends to overestimate the true value by \(\mu\) by about \(\$1.03\).

The corresponding theory for discrete random variables is not quite as elegant or as useful, but we present it here for completeness.

Proposition 39.2 (Distribution of the Maximum (Discrete)) Let \(X_1, \dots, X_n\) be i.i.d. integer-valued random variables with CDF \(F_{X_1}(x)\). Then, the distribution of \(M = \max(X_1, \dots, X_n)\) is given by the PMF \[ f_M(k) = \left( F_{X_1}(k) \right)^n - \left( F_{X_1}(k-1) \right)^n \] for integer values of \(k\).

Proof

The CDF of \(M\) is \[ F_M(k) = P(X_1, \dots, X_n \leq k) = \left( P(X_1 \leq k) \right)^n = \left( F_{X_1}(k) \right)^n. \] Thus, \[ f_M(k) = F_M(k) - F_M(k-1) = \left( F_{X_1}(k) \right)^n - \left( F_{X_1}(k-1) \right)^n. \]

Coming back to the German Tank problem (Section 30.1), our work in this section does not apply since the tanks are not i.i.d. – they were sampled without replacement. However, we can consider an i.i.d. version of the problem.

Example 39.3 (German Tanks revisited) There are \(N\) tanks with serial numbers \(1, 2, \dots, N\), where \(N\) is unknown. Suppose the scouts observe \(n\) tanks with serial numbers \(X_1, \dots, X_n\). Since they are observing and not capturing, \(X_1, \dots, X_n\) are i.i.d. with PMF \[ f_{X_1}(x) = \frac{1}{N} \] for \(x = 1, 2, \dots, N\). Thus, \[ F_{X_1}(x) = \frac{x}{N} \] for \(x = 1, 2, \dots, N\), and so, \[\begin{align*} f_M(x) &= \left( F_{X_1}(x) \right)^n - \left( F_{X_1}(x-1) \right)^n \\ &= \left( \frac{x}{N} \right)^n - \left( \frac{x-1}{N} \right)^n. \end{align*}\] for \(M = \max(X_1, \dots, X_n)\). Now, the expected value of \(M\) is \[\begin{align*} \text{E}\!\left[ M \right] &= \sum_{x=1}^N x \left( \left( \frac{x}{N} \right)^n - \left( \frac{x-1}{N} \right)^n \right) \\ &= 1 \left( \left( \frac{1}{N} \right)^n - \left( \frac{0}{N} \right)^n \right) + 2 \left( \left( \frac{2}{N} \right)^n - \left( \frac{1}{N} \right)^n \right) + 3 \left( \left( \frac{3}{N} \right)^n - \left( \frac{2}{N} \right)^n \right) \\ &\qquad + \cdots + N \left( \left( \frac{N}{N} \right)^n - \left( \frac{N-1}{N} \right)^n \right) \\ &= N - \sum_{i=1}^{N-1} \left( \frac{i}{N} \right)^n. \end{align*}\] As expected, \(M\) is an underestimate for \(N\).

39.2 Minimum of Random Variables

Let \(X_1, \dots, X_n\) be i.i.d. continuous random variables with PDF \(f_{X_1}(x)\) and CDF \(F_{X_1}(x)\). What is the distribution of \(\min(X_1, \dots, X_n)\)?

Example 39.4 (Minimum of exponentials) Let \(X_1, \dots, X_n\) be i.i.d. exponential with rate \(1\) and location parameter \(\theta\). That is, \(X_i\) can be represented as \[ X_i = \theta + Z_i, \] where \(Z_1, \dots, Z_n\) are i.i.d. \(\textrm{Exponential}(\lambda=1)\). (Note that we only observe \(X_i\), not \(Z_i\).)

We saw in Example 29.6 that the MLE of \(\theta\) is \[ \hat\theta_\textrm{MLE}= \min(X_1, \dots, X_n) = \theta + \min(Z_1, \dots, Z_n). \]

We will determine the distribution of \(L = \min(Z_1, \dots, Z_n)\); the distribution of \(\hat\theta_\textrm{MLE}\) is just a location shift of this distribution.

The CDF of \(L\) is (for \(t > 0\)) \[\begin{align} F_{L}(t) &= P(L \leq t) \\ &= P(\min(Z_1, \dots, Z_n) \leq t) \\ &= 1 - P(\min(Z_1, \dots, Z_n) > t) \\ &= 1 - P(Z_1 > t, \dots, Z_n > t) \\ &= 1 - P(Z_1 > t) \cdots P(Z_n > t) \\ &= 1 - (e^{-t})^n \\ &= 1 - e^{-nt} \end{align}\]

Therefore, the PDF of \(L\) is \[ f_{L}(t) = n e^{-nt}; t > 0, \] which is the PDF of the \(\textrm{Exponential}(\lambda=n)\) distribution. That is, the minimum also follows an exponential distribution, just with rate \(n\) instead of \(1\). These PDFs are graphed below for \(n = 3\).

To determine the bias and variance of \(\hat\theta_\textrm{MLE}= \min(X_1, \dots, X_n) = \theta + L\), we calculate the expectation \[ \text{E}\!\left[ \hat\theta_\textrm{MLE} \right] = \theta + \text{E}\!\left[ L \right] = \theta + \frac{1}{n} \] and variance \[\text{Var}\!\left[ \hat\theta_\textrm{MLE} \right] = \text{Var}\!\left[ \theta + L \right] = \text{Var}\!\left[ L \right] = \frac{1}{n^2}. \]

So the MSE is \[ \text{MSE}\!\left[ \hat\theta_\textrm{MLE} \right] = \left(\frac{1}{n}\right)^2 + \frac{1}{n^2} = \frac{2}{n^2}. \]

Let’s generalize Example 39.1 to all continuous distributions.

Proposition 39.3 (Distribution of the Minimum (Continuous)) Let \(X_1, \dots, X_n\) be i.i.d. continuous random variables with PDF \(f_{X_1}(x)\) and CDF \(F_{X_1}(x)\). Then, the distribution of \(L = \min(X_1, \dots, X_n)\) is given by the PDF \[ f_L(t) = n f_{X_1}(t) (1 - F_{X_1}(t))^{n-1}. \]

Proof

The CDF of \(L\) can be computed as follows. \[\begin{align*} F_L(t) &= P(L \leq t) \\ &= 1 - P(L > t) \\ &= 1 - P(X_1, \dots, X_n > t) \\ &= 1 - \left( P(X_1 > t) \right)^n \\ &= 1 - \left( 1 - F_{X_1}(t) \right)^n. \end{align*}\] Taking the derivative, with respect to \(t\), we get \[ f_L(t) = -n \left( 1 - F_{X_1}(t) \right)^{n-1} \left( - f_{X_1}(t) \right) = n f_{X_1}(t) \left( 1 - F_{X_1}(t) \right)^{n-1}. \]

And here is the corresponding result for discrete distributions.

Proposition 39.4 (Distribution of the Minimum (Discrete)) Let \(X_1, \dots, X_n\) be i.i.d. integer-valued random variables with CDF \(F_{X_1}(x)\). Then, the distribution of \(L = \min(X_1, \dots, X_n)\) is given by the PMF \[ f_L(k) = (1 - F_{X_1}(k - 1))^n - (1 - F_{X_1}(k))^n. \] for integer values of \(k\).

Proof

The CDF of \(L\) is \[ F_L(k) = P(L \leq k) = 1 - P(L > k) = 1 - \left( P(X_1 > k) \right)^n = 1 - \left( 1 - F_{X_1}(k) \right)^n. \] Thus, the PMF of \(L\) is \[ f_L(k) = F_L(k) - F_L(k-1) = \left( 1 - F_{X_1}(k-1) \right)^n - \left( 1 - F_{X_1}(k) \right)^n. \]

39.3 Exercises

Exercise 39.1 Suggest a simple correction to \(\hat\theta_{\textrm{MLE}}\) in Example 39.1 to make it unbiased and compare its MSE to \(\hat\theta_{\textrm{MLE}}\).

Exercise 39.2 Suggest a simple correction to \(\hat\theta_{\textrm{MLE}}\) in Example 39.4 to make it unbiased and compare its MSE to \(\hat\theta_{\textrm{MLE}}\).

Exercise 39.3 In Example 39.4, we showed that the minimum of i.i.d. exponential random variables also follows an exponential distribution. What about the maximum of i.i.d. exponential random variables? Calculate the PDF and determine if it is a named distribution?

Exercise 39.4 Let \(X_1, \dots, X_n\) be i.i.d. \(\textrm{Uniform}(a= \theta - 1, b= \theta + 1)\). Determine the distributions of \(L \overset{\text{def}}{=}\min(X_1, \dots, X_n)\) and \(M \overset{\text{def}}{=}\max(X_1, \dots, X_n)\), and use these to show that \(\frac{L + M}{2}\) is unbiased for \(\theta\).

Exercise 39.5 Let \(X_1, \dots, X_n\) be i.i.d. \(\text{Geometric}(p)\). What is the distribution of \(\min(X_1, \dots, X_n)\)? Is it a named distribution?