28  Probability versus Statistics

Although probability is a fascinating subject in its own right, it is perhaps most important in the modern world because of its application to statistics. In this chapter, we discuss the intricate relationship between probability and statistics.

28.1 Mark-Recapture

How do we estimate the size of a population, such as the number of a snails in a state park? There may be too many to count, and it may be difficult to catch them all. Another way is mark-recapture. First, we capture a sample, say \(50\) snails, and β€œmark” them, as shown at right.

Then, after some time, we recapture another sample of snails, say \(40\). Some of these snails will be marked from the first capture, while others are unmarked. The number of recaptured snails that are marked can be used to estimate the number of snails in the population.

In Chapter 12, we learned to solve probability problems like the following.

Example 28.1 (Mark-Recapture as a Probability Problem) Suppose there were actually \(300\) snails in the population. What is the probability that exactly \(11\) marked snails are recaptured?

If every snail in the population has an equal chance of being sampled, and if the two samples are independent, then the number of marked snails that are recaptured is a random variable \(X\) that follows a \(\text{Hypergeometric}(M=50, N=250, n=40)\) distribution. Let \(f_{M, N, n}(x)\) be the PMF of a hypergeometric distribution. To calculate \(P(X = 11)\), we simply plug \(11\) into this PMF:

\[ P(X = 11) = f_{50, 250, 40}(11) = \frac{\binom{50}{11} \binom{250}{29}}{\binom{300}{40}} \approx .0276. \]

We can also use R to evaluate this probability:

But Example 28.1 is not realistic. If we knew that there were \(300\) snails in the population, we would not bother doing mark-recapture in the first place!

Here is a more realistic scenario, where we have collected data and want to infer something about the population.

Example 28.2 (Mark-Recapture as a Statistics Problem) Suppose that \(X = 11\) of the recaptured snails are marked. What can we infer about \(s\), the number of snails in the population?

In Example 28.1, we assumed that \(s = 300\) and wanted to calculate the probabilities of various values of \(X\), whereas in Example 28.2, we observe \(X = 11\) and want to estimate \(s\). In other words, statistics is the inverse of probability. This idea is illustrated in Figure 28.1.

Figure 28.1: Relationship between probability and statistics for mark-recapture. Note that \(f(x)\) denotes the hypergeometric PMF.

Properties of the population (or model), such as \(s\), are called parameters, while properties of the sample (or data), such as \(X\), are called statistics. How do we estimate a parameter using data? The probability distribution still plays an important role. However, the unknown quantity is now the parameter \(s\), instead of the value of the random variable \(X\). This motivates the following definition:

Definition 28.1 (Likelihood) Let \(f_\theta(x)\) be a PMF (or PDF) with parameter(s) \(\theta\). The likelihood of \(\theta\) is defined as \[ L_x(\theta) = f_\theta(x). \]

In other words, the likelihood is simply the PMF (or PDF) regarded as a function of the unknown parameter \(\theta\), instead of \(x\).

Let us determine the likelihood for the mark-recapture problem.

Example 28.3 (Mark-Recapture Likelihood) If \(X = 11\) of the recaptured snails are marked, then the likelihood of \(s\) is \[ L_{11}(s) = f_{50, s-50, 40}(11) = \frac{\binom{50}{11} \binom{s-50}{29}}{\binom{s}{40}}. \] In other words, the likelihood represents \(P(X = 11)\) for different values of \(s\).

We can graph the likelihood as follows:

How do we use the likelihood to estimate \(s\)? Since the likelihood represents the probability of observing the data, one idea is to choose \(s\) to make this probability as large as possible. This principle is stated below.

Definition 28.2 (Principle of Maximum Likelihood) One way to estimate a parameter \(\theta\) is to choose the value of \(\theta\) that maximizes the likelihood. That is,

\[ \hat\theta = \arg\max_\theta L_x(\theta). \]

This value is called the maximum likelihood estimate (or MLE).

To find the MLE for the mark-recapture problem, we find the value of \(s\) that maximizes \(L_{11}(s)\). From Example 28.3, we see that the likelihood is maximized somewhere between 150 and 200. To determine the exact value, we print out the likelihood for all values of \(s\) between 150 and 200.

We see that the likelihood is maximized at \(s = 181\), where it achieves a maximum value of \(0.15854\). Therefore, the MLE for the size of the snail population is \(\hat s = 181\). This value makes intuitive sense. The data suggests that approximately \(11 / 40 = 0.275\) of all snails are marked. Since we marked \(50\) snails, the number of snails in the population should be \(50 / 0.275 \approx 181.82\), which is very close to the MLE.

This is no accident. We can derive the MLE as a function of the number of marked snails \(M\), the number of snails in the second sample \(n\), and and the number of marked snails \(x\). To do this, we consider the ratio \(L_x(s) / L_x(s - 1)\):

\[ \begin{align} \frac{L_x(s)}{L_x(s - 1)} &= \frac{\frac{\binom{M}{x} \binom{s - M}{n - x}}{\binom{s}{n}}}{\frac{\binom{M}{x} \binom{s - 1 - M}{n - x}}{\binom{s - 1}{n}}} \\ &= \frac{(s - n)(s - M)}{s(s - M - n + x)}. \end{align} \]

The likelihood is increasing if and only if this ratio is greater than \(1\)β€”that is, when \[ \begin{align*} (s - n)(s - M) &> s(s - M - n + x) \\ s^2 - sM - sn + nM &> s^2 - sM - sn + sx \\ nM &> sx. \end{align*} \] In other words, the likelihood will increase as long as \(s < \frac{nM}{x}\), and it will decrease when \(s > \frac{nM}{x}\). Therefore, the likelihood is maximized when \(s\) is the greatest integer not exceeding \(\frac{nM}{x}\): \[ \hat s = \left\lfloor \frac{nM}{x} \right\rfloor. \]

This captures the intuition that the best estimate of the population size is the value of \(s\) that makes \(\frac{M}{s} \approx \frac{x}{n}\).

28.2 Skew Dice

A skew die is one whose faces are irregular. Are skew dice fair? One way to find out is to roll the die and collect some data.

You should already be familiar with how to solve problems like the following.

Example 28.4 (Skew Dice as a Probability Problem) Suppose that a skew die has a probability \(p = 0.18\) of landing on a six. If the die is rolled 25 times, what is the probability that six comes up exactly 7 times?

Assuming that the dice rolls are independent, the number of sixes is a \(\text{Binomial}(n=25, p=0.18)\) random variable \(X\). Let \(f_{n, p}(x)\) be the PMF. Then, \[ P(X = 7) = f_{25, 0.18}(7) = \binom{25}{7} (0.18)^{7} (1 - 0.18)^{18} \approx .0827. \]

We can also use R to evaluate this probability:

But the whole point of rolling the die is to determine the probability of landing on each face. That is, the statistics question is likely more compelling than the probability question.

Figure 28.2: Relationship between probability and statistics for skew dice. Note that \(f(x)\) denotes the binomial PMF.

Example 28.5 (Skew Dice as a Statistics Problem) Suppose that a skew die is rolled 25 times and six comes up exactly 7 times. What can we infer about the probability \(p\) that this die lands on a six?

We can use maximum likelihood. The likelihood of \(p\) based on this data is \[ L_7(p) = f_{25, p}(7) = \binom{25}{7} p^7 (1 - p)^{18}. \]

We graph this likelihood below. Note that \(p\) is a continuous parameter; it can take on any value between \(0\) and \(1\).

Where is this likelihood maximized? Because \(p\) is a continuous parameter, we can maximize \(L_7(p)\) by taking the derivative and setting it equal to 0:

\[ \begin{align} 0 &= \frac{d}{dp} L_7(p) \\ &= \frac{d}{dp} \binom{25}{7} p^7 (1 - p)^{18} \\ &= \binom{25}{7} \big(7p^6 (1 - p)^{18} - 18 p^7 (1 - p)^{17}\big) \\ &= \binom{25}{7} p^6 (1 - p)^{17} \big(7(1 - p) - 18p\big) \end{align} \]

There are three solutions to this equation: \(p = 0, 1, \frac{7}{25}\). The first two correspond to minima, so the MLE is \[ \hat p = \frac{7}{25} = 0.28. \]

The MLE in Example 28.5 is intuitive. If the skew die landed on six \(7\) times in \(25\) rolls, then our best estimate for the probability of landing on six is \(\frac{7}{25}\). We can show this fact more generally, by replacing \(25\) by \(n\) and \(7\) by \(x\).

The likelihood of \(p\) is \[ L_x(p) = \binom{n}{x} p^x (1 - p)^{n - x}. \] Taking the derivative again with respect to \(p\), we obtain \[ \begin{align} 0 &= \frac{\partial}{\partial p} L_x(p) \\ &= \frac{\partial}{\partial p} \binom{n}{x} p^x (1 - p)^{n - x} \\ &= \binom{n}{x} \big(x p^{x - 1} (1 - p)^{n - x} - (n - x) p^x (1 - p)^{n - x - 1}\big) \\ &= \binom{n}{x} p^{x - 1} (1 - p)^{n - x - 1} \big(x (1 - p) - (n - x) p\big). \end{align} \] The solution to this equation corresponding to a maximum is \[ \hat p = \frac{x}{n}. \]