39  Minima and Maxima

The simplest kind of order statistic are the extreme values: the minimum or the maximum. In this chapter, we characterize the distribution of the minimum or the maximum of a collection of i.i.d. random variables.

39.1 Maximum of Random Variables

First, let \(X_1, \dots, X_n\) be i.i.d. continuous random variables with PDF \(f_{X_i}(x)\) and CDF \(F_{X_i}(x)\). What is the distribution of \(\max(X_1, \dots, X_n)\)?

Example 39.1 (Maximum of uniforms) Let \(X_1, \dots, X_n\) be i.i.d. \(\textrm{Uniform}(a= 0, b= \theta)\). Then, the MLE of \(\theta\) is \[ \hat\theta_\textrm{MLE}= \max(X_1, \dots, X_n). \] To determine the bias and variance of this estimator, we cannot use properties of expectation; this estimator is not close to linear in \(X_1, \dots, X_n\). We have to determine its PDF and then calculate its expectation directly.

The CDF of \(\hat\theta_\textrm{MLE}\) is (for \(0 < t < \theta\)) \[\begin{align} F_{\hat\theta_\textrm{MLE}}(t) &= P\left( \hat\theta_\textrm{MLE}\leq t \right) \\ &= P(\max(X_1, \dots, X_n) \leq t) \\ &= P(X_1 \leq t, \dots, X_n \leq t) \\ &= P(X_1 \leq t) \cdots P(X_n \leq t) \\ &= \left(\frac{t}{\theta}\right)^n. \end{align}\]

Therefore, the PDF of \(\hat\theta_\textrm{MLE}\) is \[ f_{\hat\theta_\textrm{MLE}}(t) = F'_{\hat\theta_\textrm{MLE}}(t) = \frac{nt^{n-1}}{\theta^n} \tag{39.1}\]

for \(0 < t < \theta\). Let’s graph this PDF for \(n = 5\) and \(\theta = 2\):

Notice that compared to the uniform distribution, the probability density of \(\hat\theta_\textrm{MLE}\) has much more mass near \(\theta\), as we would hope. However, it will be negatively biased, since it can only underestimate \(\theta\), never overestimate.

To calculate the exact bias, we need the expected value: \[ \text{E}\!\left[ \hat\theta_\textrm{MLE} \right] = \int_0^\theta t \cdot \frac{nt^{n-1}}{\theta^n}\,dt = \frac{n}{\theta^n} \frac{t^{n+1}}{n+1}\Bigg|_0^\theta = \frac{n}{n+1} \theta, \] so the bias is \[ \text{E}\!\left[ \hat\theta_\textrm{MLE} \right] - \theta = \frac{n}{n+1} \theta - \theta = -\frac{\theta}{n + 1}, \] which is negative (as claimed).

To calculate the MSE, we also need the variance: \[\begin{align*} \text{Var}\!\left[ \hat{\theta}_{\textrm{MLE}} \right] &= \int_0^\theta t^2 \cdot \frac{nt^{n-1}}{\theta^n} \, dt - \left( \frac{n}{n+1} \theta \right)^2 \\ &= \frac{n}{\theta^n} \left. \frac{t^{n+2}}{n+2} \right\rvert_0^\theta - \frac{n^2}{(n+1)^2} \theta^2 \\ &= \frac{n}{n+2} \theta^2 - \frac{n^2}{(n+1)^2} \theta^2 \\ &= \frac{n}{(n+1)^2(n+2)} \theta^2. \end{align*}\]

The MSE of \(\hat\theta_\textrm{MLE}\) is therefore \[ \text{MSE}\!\left[ \hat{\theta}_{\textrm{MLE}} \right] = \left( -\frac{\theta}{n+1} \right)^2 + \frac{n}{(n+1)^2(n+2)} \theta^2 = \frac{2}{(n+1)(n+2)} \theta^2. \]

Let’s generalize Example 39.1 to all continuous distributions.

Proposition 39.1 (Distribution of the Maximum (Continuous)) Let \(X_1, \dots, X_n\) be i.i.d. continuous random variables with PDF \(f_{X_1}(x)\) and CDF \(F_{X_1}(x)\). Then, the distribution of \(M = \max(X_1, \dots, X_n)\) is given by the PDF \[ f_M(t) = n \left( F_{X_1}(t) \right)^{n-1} f_{X_1}(t). \]

Proof

We start with the CDF of \(M\). \[\begin{align*} F_M(t) &= P(M \leq t) \\ &= P(X_1, \dots, X_n \leq t) \\ &= \left( P(X_1 \leq t) \right)^n \\ &= \left( F_{X_1}(t) \right)^n. \end{align*}\] Taking the derivative, with respect to \(t\), we get the density \[ f_M(t) = n \left( F_{X_1}(t) \right)^{n-1} \left( F_{X_1}(t) \right)' = n \left( F_{X_1}(t) \right)^{n-1} f_{X_1}(t). \]

The distribution of the maximum can be used to study the effects of selection bias.

Example 39.2 (Winner’s curse) In auctions, it is observed that the winner tends to overpay for the value of the item. How do we explain this phenomenon?

Let’s assume that the true value of the item is \(\mu\), and each of the four bidders independently estimate the value of the item and bid that amount. That is, we observe bids \(X_1, X_2, X_3, X_4 \sim \text{Normal}(\mu, 1)\). Notice that we are assuming that each bid \(X_i\) is an unbiased estimator for \(\mu\), so each bidder is right on average.

The winning bid, \(W \overset{\text{def}}{=}\max(X_1, X_2, X_3, X_4)\), is just one of the bids. Shouldn’t it be unbiased as well?

Although \(W\) is just one bid, it is always the largest one. Even though each individual bid is unbiased, the largest bid will tend to overestimate the mean. To see this, we calculate its PDF and expectation.

According to Proposition 39.1, the distribution of \(W\) is \[ \begin{align} f_W(w) &= 4 \Phi(w - \mu)^3 \varphi(w - \mu), \end{align} \tag{39.2}\] where \(\varphi(z) \overset{\text{def}}{=}\frac{1}{\sqrt{2\pi}} e^{-z^2 / 2}\) is the PDF of the standard normal and \(\Phi(z)\) is the CDF.

The code below graphs the PDF of the winning bid (Equation 39.2) for an item that is actually worth \(\mu = \$10\). For comparison, the PDF of each individual bid, \(\textrm{Normal}(\mu= \$10, \sigma^2= 1)\), is also shown.

Clearly, the winning bid tends to overestimate the value. To be precise, we calculate its expectation: \[\begin{align} \text{E}\!\left[ W \right] &= \int_{-\infty}^\infty w \cdot 4 \Phi(w - \mu)^{3} \varphi(w - \mu)\,dw \\ &= \int_{-\infty}^\infty (\mu + t) \cdot 4 \Phi(t)^{3} \varphi(t)\,dt & (t = w - \mu) \\ &= \mu + \int_{-\infty}^\infty t \cdot 4 \Phi(t)^{3} \varphi(t)\,dt. \end{align}\]

We will evaluate this integral numerically in R.

Therefore, the expectation is about \(\mu + \$1.03\), so the winning bid \(W\) tends to overestimate the true value \(\mu\) by about \(\$1.03\).

The corresponding theory for discrete random variables is not quite as elegant or as commonly used, but we present it here for completeness.

Proposition 39.2 (Distribution of the Maximum (Discrete)) Let \(X_1, \dots, X_n\) be i.i.d. integer-valued random variables with CDF \(F_{X_1}(x)\). Then, the distribution of \(M = \max(X_1, \dots, X_n)\) is given by the PMF \[ f_M(k) = \left( F_{X_1}(k) \right)^n - \left( F_{X_1}(k-1) \right)^n \] for integer values of \(k\).

Proof

The CDF of \(M\) is \[ F_M(k) = P(X_1, \dots, X_n \leq k) = \left( P(X_1 \leq k) \right)^n = \left( F_{X_1}(k) \right)^n. \] Thus, \[ f_M(k) = F_M(k) - F_M(k-1) = \left( F_{X_1}(k) \right)^n - \left( F_{X_1}(k-1) \right)^n. \]

Unfortunately, Proposition 39.2 does not apply directly to the German Tank problem (Section 31.1) because the serial numbers are not i.i.d. — the tanks were sampled without replacement. However, we can consider an i.i.d. version of the problem.

Example 39.3 (German Tanks revisited) There are \(N\) tanks with serial numbers \(1, 2, \dots, N\), where \(N\) is unknown. Suppose the scouts observe \(n\) tanks with serial numbers \(X_1, \dots, X_n\). Since they are observing and not capturing, \(X_1, \dots, X_n\) can be assumed to be i.i.d. with PMF \[ f_{X_1}(x) = \frac{1}{N}; \qquad x = 1, 2, \dots, N. \] Thus, the CDF is \[ F_{X_1}(x) = \frac{x}{N}; \qquad x = 1, 2, \dots, N, \] and by Proposition 39.2, the distribution of the maximum \(\hat N_{\text{MLE}}\) is \[\begin{align*} f_{\hat N}(x) &= \left( F_{X_1}(x) \right)^n - \left( F_{X_1}(x-1) \right)^n \\ &= \left( \frac{x}{N} \right)^n - \left( \frac{x-1}{N} \right)^n. \end{align*}\]

Now, the expected value of \(\hat N_{\text{MLE}}\) is \[\begin{align*} \text{E}\!\left[ \hat N_{\text{MLE}} \right] &= \sum_{x=1}^N x \left( \left( \frac{x}{N} \right)^n - \left( \frac{x-1}{N} \right)^n \right) \\ &= 1 \left( \left( \frac{1}{N} \right)^n - \left( \frac{0}{N} \right)^n \right) + 2 \left( \left( \frac{2}{N} \right)^n - \left( \frac{1}{N} \right)^n \right) + 3 \left( \left( \frac{3}{N} \right)^n - \left( \frac{2}{N} \right)^n \right) \\ &\qquad + \cdots + N \left( \left( \frac{N}{N} \right)^n - \left( \frac{N-1}{N} \right)^n \right) \\ &= N - \sum_{i=1}^{N-1} \left( \frac{i}{N} \right)^n. \end{align*}\] As expected, \(\hat N_{\text{MLE}}\) is an underestimate for \(N\).

39.2 Minimum of Random Variables

Let \(X_1, \dots, X_n\) be i.i.d. continuous random variables with PDF \(f_{X_1}(x)\) and CDF \(F_{X_1}(x)\). What is the distribution of \(\min(X_1, \dots, X_n)\)?

Example 39.4 (Minimum of exponentials) Let \(X_1, \dots, X_n\) be i.i.d. \(\text{Exponential}(\lambda)\) with PDF \[ f(x) = \lambda e^{-\lambda x}, \quad x > 0, \] representing the times that \(n\) events occur. What is the distribution of \(L = \min(X_1, \dots, X_n)\), the time of the first of these events to occur?

The CDF of \(L\) is (for \(t > 0\)) \[\begin{align} F_{L}(t) &= P(L \leq t) \\ &= P(\min(X_1, \dots, X_n) \leq t) \\ &= 1 - P(\min(X_1, \dots, X_n) > t) \\ &= 1 - P(X_1 > t, \dots, X_n > t) \\ &= 1 - P(X_1 > t) \cdots P(X_n > t) \\ &= 1 - (e^{-\lambda t})^n \\ &= 1 - e^{-n \lambda t} \end{align}\]

Therefore, the PDF of \(L\) is \[ f_{L}(t) = n\lambda e^{-n\lambda t}; t > 0, \] which is the PDF of the \(\textrm{Exponential}(\lambda=n\lambda)\) distribution. That is, the minimum also follows an exponential distribution, with a rate that is \(n\) times faster. These PDFs are graphed below for \(n = 3\).

Notice that compared to the original exponential distribution, the probability density of the minimum has much more mass near \(0\), as we would expect.

Exercise 39.2 asks you to apply this result to derive the MSE of the estimator you found in Exercise 30.6.

Let’s generalize Example 39.1 to all continuous distributions.

Proposition 39.3 (Distribution of the Minimum (Continuous)) Let \(X_1, \dots, X_n\) be i.i.d. continuous random variables with PDF \(f_{X_1}(x)\) and CDF \(F_{X_1}(x)\). Then, the distribution of \(L = \min(X_1, \dots, X_n)\) is given by the PDF \[ f_L(t) = n f_{X_1}(t) (1 - F_{X_1}(t))^{n-1}. \]

Proof

The CDF of \(L\) can be computed as follows. \[\begin{align*} F_L(t) &= P(L \leq t) \\ &= 1 - P(L > t) \\ &= 1 - P(X_1, \dots, X_n > t) \\ &= 1 - \left( P(X_1 > t) \right)^n \\ &= 1 - \left( 1 - F_{X_1}(t) \right)^n. \end{align*}\] Taking the derivative, with respect to \(t\), we get \[ f_L(t) = -n \left( 1 - F_{X_1}(t) \right)^{n-1} \left( - f_{X_1}(t) \right) = n f_{X_1}(t) \left( 1 - F_{X_1}(t) \right)^{n-1}. \]

And here is the corresponding result for discrete distributions.

Proposition 39.4 (Distribution of the Minimum (Discrete)) Let \(X_1, \dots, X_n\) be i.i.d. integer-valued random variables with CDF \(F_{X_1}(x)\). Then, the distribution of \(L = \min(X_1, \dots, X_n)\) is given by the PMF \[ f_L(k) = (1 - F_{X_1}(k - 1))^n - (1 - F_{X_1}(k))^n. \] for integer values of \(k\).

Proof

The CDF of \(L\) is \[ F_L(k) = P(L \leq k) = 1 - P(L > k) = 1 - \left( P(X_1 > k) \right)^n = 1 - \left( 1 - F_{X_1}(k) \right)^n. \] Thus, the PMF of \(L\) is \[ f_L(k) = F_L(k) - F_L(k-1) = \left( 1 - F_{X_1}(k-1) \right)^n - \left( 1 - F_{X_1}(k) \right)^n. \]

39.3 Exercises

Exercise 39.1 (Unbiased estimator of the upper bound of the uniform distribution) Suggest a simple correction to \(\hat\theta_{\textrm{MLE}}\) in Example 39.1 to make it unbiased and compare its MSE to \(\hat\theta_{\textrm{MLE}}\).

Exercise 39.2 (Sampling distribution of the Exponential location MLE) In Exercise 30.6, you determined the MLE \(\hat\theta_{\text{MLE}}\) of the location parameter \(\theta\) of an exponential distribution from i.i.d. observations \(X_1, \dots, X_n\).

  1. Describe the sampling distribution of \(\hat\theta_{\text{MLE}}\). (Hint: Use the result in Example 39.4.)
  2. Calculate the bias of \(\hat\theta_{\text{MLE}}\) for estimating \(\theta\).
  3. Suggest a simple correction to \(\hat\theta_{\textrm{MLE}}\) in Example 39.4 to make it an unbiased estimator of \(\theta\).
  4. Compare the MSEs of the two estimators (\(\hat\theta_{\textrm{MLE}}\) and the bias-corrected estimator).

Exercise 39.3 (Maximum of i.i.d. exponentials) In Example 39.4, we stated that the minimum of i.i.d. exponential random variables also follows an exponential distribution. What about the maximum of i.i.d. exponential random variables? Calculate the PDF and determine if it is a named distribution?

Exercise 39.4 (Midrange is unbiased) Let \(X_1, \dots, X_n\) be i.i.d. \(\textrm{Uniform}(a= \theta - 1, b= \theta + 1)\). Determine the distributions of \(L \overset{\text{def}}{=}\min(X_1, \dots, X_n)\) and \(M \overset{\text{def}}{=}\max(X_1, \dots, X_n)\), and use these to show that \(\frac{L + M}{2}\) is unbiased for \(\theta\).

Exercise 39.5 (Minimum of i.i.d. geometrics) Let \(X_1, \dots, X_n\) be i.i.d. \(\text{Geometric}(p)\). What is the distribution of \(\min(X_1, \dots, X_n)\)? Is it a named distribution?