12  Named Distributions

So far, we have encountered a few distributions which arise so frequently that they have been given names like Bernoulli, binomial, and geometric. In this section, we review these named distributions and introduce a few more. This chapter is also intended to serve as a reference, if you need to look up the formula for the PMF or expected value for a particular distribution.

12.1 Bernoulli distribution

We introduced the Bernoulli distribution in Definition 8.5. We restate its definition here.

Definition 12.1 (Bernoulli distribution) \(X\) is said to be a Bernoulli random variable with parameter \(p\) (\(X \sim \text{Bernoulli}(p)\)) if its PMF \(f(x)\) is of the form:

\(x\) \(0\) \(1\)
\(f(x)\) \(1 - p\) \(p\)

Notice that this PMF can be written more concisely as a formula: \[ f(x) = p^x (1 - p)^{1-x}; \qquad x = 0, 1. \]

Proposition 12.1 (Bernoulli expectation and variance) Let \(X\) be a \(\text{Bernoulli}(p)\) random variable. Then:

\[ \begin{align} \text{E}\!\left[ X \right] &= p & \text{Var}\!\left[ X \right] &= p(1-p) \end{align} \]

The expected value is \[\text{E}\!\left[ X \right] = 0 \cdot (1 - p) + 1 \cdot p = p. \]

To calculate the variance, we first observe that \(X^2 = X\), since squaring either \(0\) or \(1\) does not change the value. Therefore: \[ \text{E}\!\left[ X^2 \right] = \text{E}\!\left[ X \right] = p. \]

By Proposition 11.3, the variance is \[ \text{Var}\!\left[ X \right] = \text{E}\!\left[ X^2 \right] - \text{E}\!\left[ X \right]^2 = p - p^2 = p(1 - p).\]

The Bernoulli distribution is used to model random variables that only have two possible values, \(0\) or \(1\). One example of Bernoulli random variables are indicator random variables, which are \(1\) when an event \(E\) happens and \(0\) otherwise.

Example 12.1 (Indicator Random Variables) Recall Example 3.3, where due to a hospital mix-up, \(n\) babies were returned randomly to \(n\) couples. Let \(E_i\) be the event that the \(i\)th couple receives their own baby.

We can define the indicator random variable \(I_{E_i}\) as \[ I_{E_i}(\omega) = \begin{cases} 1 & \omega \in E_i \\ 0 & \omega \in E_i^c \end{cases}. \]

Then \(I_{E_i}\) is a Bernoulli random variable with parameter \[ p = P(I_{E_i} = 1) = P(E_i) = \frac{1}{n} \]

By Proposition 12.1, we know its expectation and variance are: \[ \begin{aligned} \text{E}\!\left[ I_{E_i} \right] &= p = \frac{1}{n} & \text{Var}\!\left[ I_{E_i} \right] = p(1 - p) = \frac{n-1}{n^2}. \end{aligned} \]

Note that multiplying two indicators \(I_A I_B\) results in the indicator for the intersection, \(I_{A \cap B}\). This is because \(I_A I_B\) equals \(1\) if and only if both \(I_A\) and \(I_B\) equal \(1\); otherwise, \(I_A I_B\) equals \(0\).

Therefore, \(\displaystyle \text{E}\!\left[ I_{E_1} I_{E_2} \right] = \text{E}\!\left[ I_{E_1 \cap E_2} \right] = P(E_1 \cap E_2) = \frac{1}{n(n-1)}\).

12.2 Binomial Distribution

We introduced the binomial distribution in Section 8.4 as the number of “heads” in \(n\) tosses of a coin with probability \(p\) of landing on “heads”. We derived the PMF of this random variable, which we restate below.

Definition 12.2 (Binomial distribution) \(X\) is said to be a \(\text{Binomial}(n, p)\) random variable if its PMF \(f(x)\) is of the form:

\[ f(x) = \binom{n}{x} p^x (1 - p)^{n-x}; x = 0, 1, \dots, n. \]

Notice that when there is only \(n = 1\) toss, this reduces to the \(\text{Bernoulli}(p)\) PMF, as it should.

Next, we restate the formulas for the expected value and variance of the binomial distribution that we derived in Example 9.3 and Example 11.7.

Proposition 12.2 (Binomial expectation and variance) Let \(X\) be a \(\text{Binomial}(n, p)\) random variable. Then:

\[ \begin{align} \text{E}\!\left[ X \right] &= np & \text{Var}\!\left[ X \right] &= np(1-p) \end{align} \]

We have seen several examples of binomial random variables already; the next example presents another.

Example 12.2 (Lottery balls with replacement) Consider a lottery drum with \(M\) white balls and \(N\) black balls. Suppose we draw \(n\) balls at random with replacement. That is, each time a ball is drawn, the ball is placed back into the drum before drawing again so the same ball can be drawn twice. Let \(X\) be the number of white balls drawn.

Because the ball is replaced after each draw, this situation is equivalent to tossing a coin \(n\) times, where “heads” corresponds to a white ball and “tails” corresponds to a black ball. The probability of “heads” is \(p = \frac{M}{M+N}\), the probability of drawing a white ball. Now, \(X\), the number of white balls in the \(n\) draws, is also the number of “heads” in \(n\) coin tosses, which is how we defined a \(\text{Binomial}(n, p=\frac{M}{M+N})\) random variable.

Now, we can use known facts about the binomial distribution. For example, the probability of drawing exactly \(k\) white balls is \[ f_X(k) = \binom{n}{k} \Big(\frac{M}{M + N}\Big)^k \Big(1 - \frac{M}{M + N}\Big)^{n-k}, \] and the expected number of white balls is \[ \text{E}\!\left[ X \right] = n \frac{M}{M + N}. \]

12.3 Hypergeometric Distribution

In most lotteries, the balls are drawn without replacement so that the same ball cannot be drawn twice. How does this change the answers to Example 12.2?

If \(X\) represents the number of white balls in \(n\) random draws without replacement from a drum with \(M\) white balls and \(N\) black balls, then \(X\) is said to be a \(\text{Hypergeometric}(M, N, n)\) random variable.

What is the PMF of \(X\)? In other words, what is \(P(X=x)\), the probability that exactly \(x\) white balls are drawn? We cannot use the binomial formula because each draw changes the composition of the draw; this is not like \(n\) tosses of a coin because the probability of “heads” changes with each toss. In order to calculate this probability, we have to go back to the naive definition of probability.

  • How many possible outcomes are there? In Chapter 2, we saw that the number of distinct ways to select \(n\) items from \(M + N\) items, ignoring order, is \(\binom{M + N}{n}\).
  • How many outcomes are in the event \(\{ X = x \}\)? There are \(\binom{M}{x}\) ways to choose \(x\) white balls and \(\binom{N}{n - x}\) ways to choose the remaining \(n - x\) black balls. By Theorem 2.1, the number of possible ways to choose \(x\) white balls and \(n - x\) black balls is \(\binom{M}{x} \binom{N}{n - x}\).

Putting these pieces together, we obtain the PMF of a \(\text{Hypergeometric}(M, N, n)\) random variable.

Definition 12.3 (Hypergeometric distribution) \(X\) is said to be a \(\text{Hypergeometric}(M, N, n)\) random variable if its PMF \(f(x)\) is of the form:

\[ f(x) = \frac{\binom{M}{x} \binom{N}{n - x}}{\binom{M + N}{n}}; x = 0, 1, \dots, n. \tag{12.1}\]

Next, we present formulas for the expected value and variance of the hypergeometric distribution. Notice that the expected value matches that of the binomial distribution, while the variance is slightly different.

Proposition 12.3 (Hypergeometric expectation and variance) Let \(X\) be a \(\text{Hypergeometric}(M, N, n)\) random variable. Then:

\[ \begin{align} \text{E}\!\left[ X \right] &= n \frac{M}{M + N} & \text{Var}\!\left[ X \right] &= n \frac{M}{M + N} \frac{N}{M + N} \frac{M + N - n}{M + N - 1} \end{align} \]

Since calculating the expectation and variance requires calculating \(\text{E}\!\left[ X \right]\) and \(\text{E}\!\left[ X^2 \right]\), we will start by deriving a general formula for \(\text{E}\!\left[ X^k \right]\) for \(k \geq 1\).

\[\begin{align*} \text{E}\!\left[ X^k \right] &= \sum_{x=0}^n x^k \frac{\binom{M}{x} \binom{N}{n-x}}{\binom{M + N}{n}} & \text{(LotUS)} \\ &= M \sum_{x=1}^n x^{k-1} \frac{\binom{M - 1}{x-1} \binom{N}{n-x}}{\binom{M + N}{n}} & \left( x\binom{M}{x} = M\binom{M-1}{x-1} \right) \\ &= n\frac{M}{M + N} \sum_{x=1}^n x^{k-1} \frac{\binom{M - 1}{x-1} \binom{N}{n-x}}{\binom{M + N - 1}{n - 1}} & \left( \binom{M + N}{n} = \frac{M + N}{n} \binom{M + N - 1}{n - 1} \right) \\ &= n\frac{M}{M + N} \sum_{i=0}^{n-1} (i+1)^{k-1} \frac{\binom{M-1}{i} \binom{N}{n-1-i}}{\binom{M + N - 1}{n-1}} & \left(i = x - 1 \right) \\ &= n\frac{M}{M + N} \text{E}\!\left[ (Y+1)^{k-1} \right] & \text{(LotUS)}, \end{align*}\] where \(Y \sim \text{Hypergeometric}(M-1, N, n-1)\).

Setting \(k = 1\), we obtain \[ \text{E}\!\left[ X \right] = n\frac{M}{M + N} \text{E}\!\left[ (Y + 1)^0 \right] = n\frac{M}{M + N}, \] and setting \(k=2\), we obtain \[ \begin{aligned} \text{E}\!\left[ X^2 \right] &= n\frac{M}{M + N} \text{E}\!\left[ (Y+1)^1 \right] \\ &= n\frac{M}{M + N} \left( (n - 1) \frac{M - 1}{M + N - 1} + 1 \right) \\ &= n \frac{M}{M + N} \left( \frac{nM + N - n}{M + N - 1} \right). \end{aligned} \] Thus, the variance is \[ \begin{aligned} \text{Var}(X) &= \text{E}\!\left[ X^2 \right] - \text{E}\!\left[ X \right]^2 \\ &= n \frac{M}{M + N} \left( \frac{nM + N - n}{M + N - 1} - \frac{nM}{M + N} \right) \\ &= n \frac{M}{M + N} \left( \frac{(nM + N - n)(M + N)- nM(M + N - 1)}{(M + N)(M + N - 1)} \right) \\ &= n \frac{M}{M + N} \frac{N}{M + N} \frac{M + N - n}{M + N - 1}. \end{aligned} \]

Of course, the power of the hypergeometric distribution lies in the fact that it can model many phenomena, not just lottery balls.

Example 12.3 (Capture-Recapture) Capture-recapture is a method in ecology to estimate the number of animals in a population. Suppose there are an unknown number of fish, \(f\), in a pond. An ecologist captures \(10\) different fish, tags them, and releases them back into the pond. One week later, she returns to the pond, captures \(20\) different fish, and counts the number that are tagged, \(X\).

Assuming that she is equally likely to catch any of the \(f\) fish in the pond on both trips, the number of tagged fish \(X\) follows a hypergeometric distribution. To see this, we map the fish to lottery balls.

  • The white balls represent the \(10\) tagged fish.
  • The black balls represent the remaining \(f - 10\) untagged fish.
  • The draws represent the \(20\) fish that she captures on her return trip. Note that the draws are made without replacement because she captures \(20\) distinct fish.

Therefore, \(X \sim \text{Hypergeometric}(M=10, N=f-10, n=20)\).

If we knew, for example, that there were \(f = 100\) fish in the pond, we could calculate the probability that fewer than 2 fish are tagged.

\[ \begin{aligned} P(X < 2) &= P(X = 0) + P(X = 1) \\ &= \frac{\binom{10}{0} \binom{90}{20}}{\binom{100}{20}} + \frac{\binom{10}{1} \binom{90}{19}}{\binom{100}{20}} \\ &\approx .3630. \end{aligned} \]

We could also have calculated this using the dhyper function in R.

In general, when solving problems with the hypergeometric distribution, it helps to explicitly map the problem onto the lottery ball analogy.

Example 12.4 (Flush in Texas Hold’em) Texas Hold’em is a popular variant of poker. First, each player is dealt two cards of their own. Then, five “community” cards are dealt in the center of the table. The player wins if they have the best poker hand among the seven cards (the two cards in their hand, plus the five community cards shared by all the players).

Let’s say you are dealt the following cards:

  • Ace of spades
  • 9 of spades.

With two spades already in hand, you might hope for a flush of spades, which requires five total spades. In order for this to happen, there would need to be \(3\) (or more) spades among the \(5\) community cards. In other words, if \(X\) is the number of spades among the community cards, then the probability you achieve a flush of spades is \(P(X \geq 3)\).

To calculate this probability, we first note that \(X\) follows a hypergeometric distribution. To see why, make the analogy with lottery balls:

  • The white balls represent the \(11\) remaining spades in the deck. (There are \(13\) spades in the deck, but \(2\) of them are in your hand.)
  • The black balls represent the \(39\) other cards (non-spades) in the deck.
  • The draws represent the \(5\) community cards. Because cards are dealt without replacement, the lottery ball analogy is appropriate.

Therefore, \(X \sim \text{Hypergeometric}(M=11, N=39, n=5)\), so we can calculate the probability by plugging in the appropriate numbers into the hypergeometric PMF: \[\begin{align*} P(X \geq 3) &= \frac{\binom{11}{3} \binom{39}{2}}{\binom{50}{5}} + \frac{\binom{11}{4} \binom{39}{1}}{\binom{50}{5}} + \frac{\binom{11}{5} \binom{39}{0}}{\binom{50}{5}} \\ &= \frac{19371}{302680} \approx 0.0640. \end{align*}\]

We could have also calculated this using dhyper in R.

12.4 Geometric and Negative Binomial Distributions

In Section 9.3, we defined a \(\text{Geometric}(p)\) random variable, which counts the number of coin tosses until we get “heads”. We argued that its PMF is \[ f(k) = (1-p)^{k-1} p; \quad k = 1, 2, \dots \] because in order for the first “heads” to be on toss \(k\), the first \(k-1\) tosses must all be “tails”.

We can generalize the geometric distribution. Let \(X\) count the number of tosses of a coin (with probability \(p\) of “heads”) until we get \(r\) “heads”. \(X\) is said to be a negative binomial random variable. Notice that the the geometric distribution is a special case, when \(r = 1\).

What is the PMF of \(X\)? Let’s build intuition by looking at a specific case. Suppose we want to know the probability that it takes exactly 5 tosses to get 3 “heads.” That is, \(X \sim \text{NegativeBinomial}(r=3, p)\), and we want to know \(P(X = 5)\). Here are all the possible sequences where the 3rd “heads” happens on the 5th toss:

  1. HHTTH
  2. HTHTH
  3. HTTHH
  4. THHTH
  5. THTHH
  6. TTHHH

What can we learn from this specific example?

  • First, all of the sequences have 3 “heads” and 2 “tails,” so they have the same probability \(p^3 (1-p)^2\).
  • So all that remains is to count the number of sequences where the 3rd “heads” occurson the 5th toss.
    • The last toss must be “heads.” If the sequence were something like HHTHT, the 3rd “heads” would occur on the 4th toss.
    • Since the last toss must be “heads,” we just need to so count the number of ways of arranging the remaining \(r - 1 = 2\) “heads” among the first \(5 - 1 = 4\) tosses.
    • The answer is \(\binom{4}{2} = 6\).

We can generalize this argument to any number of “heads” \(r\) and any probability \(P(X = x)\) to obtain a general formula for the PMF of a negative binomial random variable.

Definition 12.4 (Negative binomial distribution) If \(X\) is a random variable with PMF \[ f(x) = \binom{x-1}{r-1} p^r (1-p)^{x-r}; \qquad x = r, r+1, \dots \tag{12.2}\] for some \(r\) and \(p\), we say that \(X\) is a negative binomial random variable with parameters \(r\) and \(p\). We denote this by \(X \sim \text{NegativeBinomial}(r,p)\).

If we plug in \(r = 1\) into Equation 12.2, we obtain \[ f(x) = \binom{x - 1}{0} p^1 (1 - p)^{x - 1} = (1 - p)^{x-1} p; \quad x = 1, 2, \dots, \] which is precisely the PMF for a \(\text{Geometric}(p)\) random variable.

We can also derive formulas for the expected value and variance.

Proposition 12.4 (Negative binomial expectation and variance) \[ \begin{align} \text{E}\!\left[ X \right] &= \frac{r}{p} & \text{Var}\!\left[ X \right] &= \frac{r(1-p)}{p^2} \end{align} \]

Let \(X \sim \text{NegativeBinomial}(r,p)\). Then, \[\begin{align*} \text{E}\!\left[ X^k \right] &= \sum_{x=r}^\infty x^k \binom{x-1}{r-1} p^r (1-p)^{x-r} & \text{(LotUS)}\\ &= \sum_{x=r}^\infty x^{k-1} r \binom{x}{r} p^r (1-p)^{x-r} & \left( x \binom{x - 1}{r - 1} = r \binom{x}{r} \right) \\ &= r \sum_{i=r + 1}^\infty (i - 1)^{k-1} \binom{i - 1}{r} p^r (1-p)^{i-1-r} & \left( i = x + 1 \right) \\ &= \frac{r}{p} \sum_{i=r + 1}^\infty (i - 1)^{k-1} \underbrace{\binom{i - 1}{(r + 1) - 1} p^{r + 1} (1-p)^{i-(r + 1)}}_{\text{NegativeBinomial}(r+1, p) \text{ PMF}} & \text{(rewrite in terms of $r + 1$)} \\ &= \frac{r}{p} \text{E}\!\left[ (Y-1)^{k-1} \right] & \text{(LotUS)}, \end{align*}\] where \(Y \sim \text{NegativeBinomial}(r+1, p)\).

Setting \(k=1\), we get \[ \text{E}\!\left[ X \right] = \frac{r}{p} \text{E}\!\left[ (Y - 1)^0 \right] = \frac{r}{p}, \] and setting \(k = 2\), we get \[ \text{E}\!\left[ X^2 \right] = \frac{r}{p} \text{E}\!\left[ (Y-1)^1 \right] = \frac{r}{p} \left( \frac{r+1}{p} - 1 \right). \] Thus, \[ \text{Var}\!\left[ X \right] = \text{E}\!\left[ X^2 \right] - \text{E}\!\left[ X \right]^2 = \frac{r}{p}\left( \frac{r+1}{p} - 1 - \frac{r}{p} \right) = \frac{r(1-p)}{p^2}. \]

Example 12.5 (Number of interviews) A medical researcher needs to recruit 20 subjects for a clinical trial. Each person she interviews has a 60% chance of being eligible to participate in the study, independently of any other subject. How many people should she expect to interview? What is the probability she will have to interview at least 40 people?

If \(X\) is the number of interviews required, then \(X \sim \text{NegativeBinomial}(r=20, p=0.60)\). To convince ourselves, we make an analogy with coin tosses.

  • Each coin toss represents a person that she interviews.
  • A “heads” means that the person is eligible.
  • She will toss the coin until she gets \(r = 20\) “heads.”
  • \(X\) counts the number of tosses.

Therefore, the expected number of people she has to interview is \[ \text{E}\!\left[ X \right] = \frac{r}{p} = \frac{20}{0.6} = 33.33. \] and the probability is \[ P(X \geq 40) = \sum_{k = 40}^\infty \binom{k-1}{19} (0.6)^{20} (0.4)^{k-20}. \]

To avoid an infinite sum, we can use the complement rule: \[ P(X \geq 40) = 1 - P(X \leq 39) = 1 - \sum_{k = 20}^{39} \binom{k-1}{19} (0.6)^{20} (0.4)^{k-20}. \]

But this sum is still impractical to evaluate by hand. We can use dnbinom in R to calculate this probability. Note that dnbinom uses a different definition of the negative binomial distribution that counts only the “tails,” rather than the total number of tosses. To convert our values into the number of tails, we need to subtract \(r = 20\) from the values we plug into the PMF.

12.5 Poisson Distribution

Imagine that you are in a room of \(n\) people, including yourself. Each person in the room has contributed $1 to a central pot, so there is a total of $\(n\) in the pot. The money in the pot will be redistributed back to the people in the room as follows: each dollar is equally likely to go to any one of the \(n\) people, independently of the other dollars in the pot. Some people will end up with more than $1, while others end up with nothing.

As \(n \to \infty\), what is the probability you end up with nothing? There are two lines of reasoning that lead to contradictory answers:

  1. As \(n \to \infty\), the number of dollars in the pot goes to infinity, so it seems that the probability that you end up with at least one of these infinite dollars should approach \(1\); i.e., the probability that you end up with nothing is \(0\).
  2. As \(n \to \infty\), the chance that you earn each dollar, \(1/n\), decreases to \(0\). So it seems that the probability that you end up with nothing is \(1\).

Which line of reasoning is correct? It turns out that both are wrong!

We can solve the problem using the binomial distribution.

Example 12.6 (Ending up with zero dollars) Each of the \(n\) dollars has a probability of \(1/n\) of being returned to you. So, if \(X\) is the number of dollars we get, then \(X \sim \text{Binomial}(n,1/n)\). The probability we get zero dollars is \[ f_X(0) = \binom{n}{0} \left( \frac{1}{n} \right)^0 \left( 1 - \frac{1}{n} \right)^{n} = \left( 1 - \frac{1}{n} \right)^n. \] From calculus, we know that \[ \lim_{n \to \infty} \left( 1 - \frac{1}{n} \right)^n = e^{-1}. \]

Here’s one way to establish the limit. Replace the integer \(n\) by \(1 / x \in \mathbb{R}\), and take the limit as \(x \to 0\): \[ \lim_{n \to \infty} \left( 1 - \frac{1}{n} \right)^n = \lim_{x \to 0} \left( 1 - x \right)^{1/x}, \] and observe that this limit has indeterminate form \(1^\infty\). We can determine the limit by applying L’Hopital’s rule to the logarithm: \[ \lim_{x\to 0 } \ln ( 1 - x )^{1/x} = \lim_{x\to 0} \frac{\ln(1 - x)}{x} = \lim_{x\to 0} \frac{-\frac{1}{1 - x}}{1} = -1. \]

Since the limit of the (natural) logarithm is \(-1\), the limit of the original expression must be \(e^{-1}\).

It turns out that the above phenomenon is not a coincidence. In general, a binomial distribution with large \(n\) and small \(p\) can be approximated by a PMF involving \(e\)!

Theorem 12.1 (Poisson approximation to the binomial) Let \(X \sim \text{Binomial}(n,p = \mu/n)\), where \(\mu\) is a constant. Then, the PMF of \(X\) approaches \[ f(x) = e^{-\mu} \frac{\mu^x}{x!}; \qquad x = 0,1,2, \dots \] as \(n \to \infty\).

Let \(X \sim \text{Binomial}(n, p = \mu/ n)\). Then its PMF is \[ f(x) = \binom{n}{x} \left( \frac{\mu}{n} \right)^x \left( 1 - \frac{\mu}{n} \right)^{n-x}. \]

Now, we regroup this expression into four terms \[ \begin{aligned} f(x) &= \underbrace{\frac{\mu^x}{x!}}_{\text{(a)}} \cdot \underbrace{\frac{n!}{(n-x)! n^x}}_{\text{(b)}} \cdot \underbrace{\left(1 - \frac{\mu}{n}\right)^n}_{\text{(c)}} \cdot \underbrace{\left(1 - \frac{\mu}{n}\right)^{-x}}_{\text{(d)}}, \end{aligned} \] and take the limit of each of these terms as \(n \to \infty\):

  1. \(\frac{\mu^x}{x!}\) does not depend on \(n\) so it is constant.
  2. \(\frac{n!}{(n-x)! n^x}\) can be expanded as \(\frac{n(n-1)\dots (n-x+1)}{n^x}\). There are \(x\) terms in the numerator and \(x\) terms in the denominator. The terms in the numerator are close to \(n\) when \(n\) is large, so the limit is \(1\).
  3. \(\left(1 - \frac{\mu}{n}\right)^n \to e^{-\mu}\), as discussed in Example 12.6.
  4. \(\left(1 - \frac{\mu}{n}\right)^{-x} \to 1^{-x} = 1\), since \(\frac{\mu}{n} \to 0\).

Note that we can apply Theorem 12.1 to Example 12.6 by taking \(\mu = 1\). Theorem 12.1 says that the probability is \[ f(0) = e^{-1} \frac{1^0}{1!} = e^{-1}, \] which matches the answer we got in Example 12.6.

Theorem 12.1 motivates the following named distribution.

Definition 12.5 (Poisson distribution) If \(X\) is a random variable with PMF \[ f(x) = e^{-\mu} \frac{\mu^x}{x!}; \qquad x = 0,1,2, \dots \] for some \(\mu\), \(X\) is said to be a Poisson random variable with parameter \(\mu\). We denote it by \(X \sim \text{Poisson}(\mu)\).

Next, we derive the expectation and variance for a Poisson random variable. Since the Poisson distribution is the approximation to a \(\text{Binomial}(n, p=\frac{\mu}{n})\) distribution, we might conjecture that the the Poisson expectation is \[ \text{E}\!\left[ X \right] = np = n\frac{\mu}{n} = \mu. \] This intuition turns out to be correct, but this fact requires proof.

Proposition 12.5 (Poisson expectation and variance) Let \(X \sim \text{Poisson}(\mu)\). Then, \[ \begin{aligned} \text{E}\!\left[ X \right] &= \mu & \text{Var}\!\left[ X \right] &= \mu \end{aligned} \]

We derive a general formula for \(\text{E}\!\left[ X^k \right]\) for \(k \geq 1\).

\[ \begin{align} \text{E}\!\left[ X^k \right] &= \sum_{x=0}^\infty x^k e^{-\mu} \frac{\mu^x}{x!} & \text{(LotUS)} \\ &= \sum_{x=1}^\infty x^k e^{-\mu} \frac{\mu^x}{x!} & \text{($x=0$ term is zero)} \\ &= \sum_{x=1}^\infty x^{k-1} e^{-\mu} \frac{\mu^x}{(x-1)!} & \text{(cancel $x$ with one factor of $x!$)} \\ &= \sum_{i=0}^\infty (i + 1)^{k-1} e^{-\mu} \frac{\mu^{i+1}}{i!} & (i = x - 1) \\ &= \mu \sum_{i=0}^\infty (i + 1)^{k-1} \underbrace{e^{-\mu} \frac{\mu^i}{i!}}_{\text{$\text{Poisson}(\mu)$ PMF}} \\ &= \mu \text{E}\!\left[ (X + 1)^{k-1} \right]. & \text{(LotUS)} \end{align} \]

Setting \(k=1\), we obtain the expectation that we conjectured, \[ \text{E}\!\left[ X \right] = \mu \text{E}\!\left[ (X + 1)^{0} \right] = \mu. \]

To obtain the variance, we first set \(k=2\) to obtain \(\displaystyle \text{E}\!\left[ X^2 \right] = \mu \text{E}\!\left[ (X + 1)^{1} \right] = \mu (\mu + 1)\). Therefore, \[ \text{Var}\!\left[ X \right] = \text{E}\!\left[ X^2 \right] - \text{E}\!\left[ X \right]^2 = \mu( \mu + 1 - \mu) = \mu. \]

The next example shows a natural situation where the Poisson distribution arises.

Example 12.7 (Detecting Radiation) A Geiger counter is a device that measures the level of ionizing radiation. It makes a clicking sound each time an ionization event is detected. In each millisecond, there is a small probability \(p = 1 / 50000\) of a click. If we assume that milliseconds are independent, then the number of clicks over one minute is \[ X \sim \text{Binomial}(n=60000, p=1/50000) \approx \text{Poisson}(\mu=np=1.2). \]

What is the probability that there is more than 1 click over this minute?

\[ P(X > 1) = 1 - P(X = 0) - P(X = 1) = 1 - e^{-1.2} \frac{1.2^0}{0!} - e^{-1.2} \frac{1.2^1}{1!} \approx .3373. \]

We could use R to calculate this probability and the exact binomial probability.

The Poisson approximation is accurate to over 5 decimal places! Because the Poisson distribution only requires specifying the overall rate \(\mu\), rather than \(n\) and \(p\), it is often more convenient than the binomial distribution.

In Example 12.7, the clicks of the Geiger counter happen at random times. The times at which these clicks occur constitute a random process called the Poisson process because the number of clicks over any time interval is a Poisson random variable. We will revisit Poisson processes in Section 18.4.