\[
\def\X{\textcolor{orange}{X}}
\def\Y{\textcolor{blue}{Y}}
\def\RR{\textcolor{red}{R}}
\def\S{\textcolor{gray}{S}}
\]
In the preceding chapters, we calculated the probability of different events. We described each event as a set of outcomes. It is usually easier to assign a numeric value to each outcome and describe an event in terms of that numeric value. A random variable \(X(\omega)\), often abbreviated \(X\), is simply a function that assigns a numeric value to each possible outcome \(\omega\).
Discrete Random Variables
Definition 8.1 (Random variable) Let \(\Omega\) be a sample space of an experiment. Then, a random variable \(X\) is a function from the sample space \(\Omega\) to the real numbers \(\mathbb{R}\). In other words, for each outcome \(\omega \in \Omega\), \(X(\omega)\) is a real number.
Example 8.1 (Numbers of heads and tails) Suppose a fair coin is tossed three times. The sample space \(\Omega\) is shown below. We can define different random variables on this sample space, depending on what quantity we are interested in. For example,
- \(\X = \text{the number of heads}\)
- \(\Y = \text{the number of tails}\)
are both random variables, and they are illustrated in Figure 8.1.
That is, the random variable \(\X\) is defined as: \[
\X(\omega) = \begin{cases}
0 & \omega \in \{ \text{TTT} \} \\
1 & \omega \in \{ \text{HTT}, \text{THT}, \text{TTH} \} \\
2 & \omega \in \{ \text{HHT}, \text{THH}, \text{HTH} \} \\
3 & \omega \in \{ \text{HHH} \} \\
\end{cases}, \]
while the random variable \(\Y\) is defined as: \[
\Y(\omega) = \begin{cases}
0 & \omega \in \{ \text{HHH} \} \\
1 & \omega \in \{ \text{HHT}, \text{THH}, \text{HTH} \} \\
2 & \omega \in \{ \text{HTT}, \text{THT}, \text{TTH} \} \\
3 & \omega \in \{ \text{TTT} \} \\
\end{cases},
\]
It is usually easier to express events of interest in terms of the random variables. For example, we can express the event \(\{ \text{more heads than tails} \}\) in a number of ways, such as:
- \(\{ \X \geq 2 \}\) (which is shorthand for \(\{ \omega: \X(\omega) \geq 2 \}\))
- \(\{ \Y \leq 1 \}\)
- \(\{ \X > \Y \}\)
So instead of writing \(P(\text{more heads than tails})\), we could instead write \(P(\X \geq 2)\) or \(P(\X > \Y)\).
In general, there are many random variables that could be defined on any sample space, and the “right” random variable is the one that helps solve the problem.
Example 8.2 (Payouts in roulette) In Example 1.4, we introduced the casino game roulette. We discussed various bets in roulette, but we did not discuss their payouts. Since the payout depends on the random outcome of the roulette wheel, it is a random variable.
Suppose we place a $1 bet on our favorite number, 23. A bet on a single number pays 35-to-1. That is, if the ball lands in the 23 pocket, then we win $35; if it lands in any other pocket, then we lose the $1 wagered.
We might be interested in the random variable \(\S\), the payout from this bet. Note that \[
\S(\omega) = \begin{cases} 35 & \omega = 23 \\ -1 & \omega \neq 23 \end{cases}
\tag{8.1}\]
The probability of winning a bet on a single number is only \[ P(\S > 0) = \frac{1}{38}. \tag{8.2}\]
Alternatively, we could have placed a $1 bet on red. However, this bet only pays 1-to-1. That is, if the ball lands in any one of the 18 red pockets, we win $1; otherwise, we lose the $1 we wagered.
If \(\RR\) is the random variable representing the payout from a $1 bet on red, then \[
\RR(\omega) = \begin{cases} 1, & \omega \in \{ 1, 3, 5, 7, 9, 12, 14, 16, 18, 19, 21, 23, 24, 27, 30, 32, 34, 36\} \\ -1, & \omega \in \{ 0, 00, 2, 4, 6, 8, 10, 11, 13, 15, 17, 20, 22, 24, 26, 28, 29, 31, 33, 35 \}\end{cases}
\tag{8.3}\] We see that the probability of winning with a bet on red is \[P(\RR > 0) = \frac{18}{38}. \tag{8.4}\] There are more chances to win, but we win less when we do win.
All the random variables we have encountered have had a limited set of values. For example, \(\X\) and \(\Y\) in Example 8.1 only assumed the values \(\{ 0, 1, 2, 3 \}\), while \(S\) in Example 8.2 only assumed the values \(\{ -1, 35 \}\). These are all examples of discrete random variables. Later, in Chapter 18, we will encounter another type of random variable, called a continuous random variable.
Definition 8.2 (Discrete random variable) A random variable \(X\) is said to be discrete if there is a finite or countable set of values \(x_1, x_2, \dots\) such that \(\displaystyle \sum_i P(X = x_i) = 1\).
Probability Mass Function
Discrete random variables are described by their probability mass function (or PMF, for short). The PMF specifies the probability of each possible value of the random variable.
Definition 8.3 (PMF of a discrete random variable) The probability mass function (or PMF) of a discrete random variable \(X\) is the function \(f_X\) defined as \(f_X(x) = P(X = x)\).
Note that the values of \(f_X(x)\) are necessarily between \(0\) and \(1\) because they represent probabilities.
The PMF describes how probability is distributed among the possible values of the random variable. We refer to this informally as its “distribution”.
Example 8.3 (Distributions of heads and tails) In Example 8.1, we defined two random variables:
- \(\X\), the number of heads in three tosses of a fair coin, and
- \(\Y\), the number of tails in three tosses of a fair coin.
Because each of the \(8\) outcomes in the sample space were equally likely, we can calculate the PMF by simply counting the number of outcomes corresponding to each value and dividing by \(8\).
For example, we can calculate the PMF of \(\X\) as follows:
- \(f_{\X}(0) = P(\X = 0) = \frac{1}{8}\)
- \(f_{\X}(1) = P(\X = 1) = \frac{3}{8}\)
- \(f_{\X}(2) = P(\X = 2) = \frac{3}{8}\)
- \(f_{\X}(3) = P(\X = 3) = \frac{1}{8}\)
It is common to lay these probabilities out in a table.
\(f_{\X}(x)\) |
\(\frac{1}{8}\) |
\(\frac{3}{8}\) |
\(\frac{3}{8}\) |
\(\frac{1}{8}\) |
The PMF of \(\X\) is graphed in Figure 8.2.
What about the PMF of \(\Y\)? Verify for yourself that it is the same!
\(f_{\Y}(y)\) |
\(\frac{1}{8}\) |
\(\frac{3}{8}\) |
\(\frac{3}{8}\) |
\(\frac{1}{8}\) |
Even though \(\X\) and \(\Y\) have the same PMF, they are not the same random variable. In fact, \(\X = \Y\) would imply that the number of heads is always equal to the number of tails. But this is impossible when a coin is tossed 3 times! There is no outcome \(\omega\) for which \(\X(\omega) = \Y(\omega)\), so \[ P(\X = \Y) = 0. \]
Example 8.4 (Distributions of roulette payouts) In Example 8.2, we introduced two random variables, \(\S\) and \(\RR\), which represented one’s payouts from $1 bets on the number 23 and red, respectively.
The PMF of \(\S\) is
\(f_{\S}(x)\) |
\(\frac{37}{38}\) |
\(\frac{1}{38}\) |
and the PMF of \(\RR\) is
\(f_{\RR}(x)\) |
\(\frac{20}{38}\) |
\(\frac{18}{38}\) |
Notice that the probabilities in any PMF always sum to \(1\). This is because events of the form \(\{ X = x_i \}\) are a partition of the sample space.
In order for a function \(f(x)\) to be a valid PMF, it must satisfy two properties:
- \(f(x) \geq 0\) for any \(x\), and
- There exist values \(x_1, x_2, \dots\) such that \(\displaystyle \sum_i f(x_i) = 1\).
In fact, any function \(f\) satisfying these two properties is the PMF of some random variable.
Cumulative Distribution Function
The PMF of a discrete random variable specifies the probability that the random variable is equal to a given value.
There is a related function, the cumulative distribution function (or CDF), which specifies the probability that the random variable is less than or equal to a given value. We can calculate the CDF by summing the PMF over the appropriate values.
Definition 8.4 (CDF of a discrete random variable) The cumulative distribution function (or CDF) of a random variable \(X\) with PMF \(f\) is the function \(F\) defined as \[
F(x) = P(X \leq x) = \sum_{x_i \leq x} f(x_i).
\]
Example 8.5 (CDF of the number of heads) Using the PMF of \(\X\) that we derived in Example 8.3, we can calculate the CDF of \(\X\) to be: \[
\begin{aligned}
F_{\X}(x) &= \begin{cases} 0 & x < 0 \\ \frac{1}{8} & 0 \leq x < 1 \\ \frac{1}{8} + \frac{3}{8} & 1 \leq x < 2 \\ \frac{1}{8} + \frac{3}{8} + \frac{3}{8} & 2 \leq x < 3 \\ \frac{1}{8} + \frac{3}{8} + \frac{3}{8} + \frac{1}{8} & x \geq 3 \end{cases} \\
&= \begin{cases} 0 & x < 0 \\ \displaystyle .125 & 0 \leq x < 1 \\ \displaystyle .5 & 1 \leq x < 2 \\ \displaystyle .875 & 2 \leq x < 3 \\ 1 & x \geq 3 \end{cases}.
\end{aligned}
\tag{8.5}\]
This CDF is graphed in Figure 8.3.
We can use the CDF to quickly evaluate probabilities. For example, we can obtain the probability of getting more heads than tails, \(P(\X > 2)\), from Equation 8.5 with almost no calculation: \[ P(\X > 2) = 1 - P(\X \leq 1) = 1 - F_{\X}(1) = 1 - \frac{1}{2} = \frac{1}{2}. \]
Example 8.5 suggests several properties of a CDF:
- It is non-decreasing.
- As \(x\) approaches \(-\infty\), \(F(x)\) approaches \(0\).
- As \(x\) approaches \(\infty\), \(F(x)\) approaches \(1\).
- It is continuous from the right, and it has a limit from the left. (Mathematicians call such a function “càdlàg”, an acronym for the French phrase “continue à droite, limite à gauche.”)
We can also recover the PMF of a discrete random variable if we know its CDF.
Example 8.6 (Recovering the PMF from the CDF) Suppose \(\X\) is a random variable with the CDF \(F(x)\) given in Equation 8.5.
Notice that the CDF jumps at the values \(x=0, 1, 2, 3\). For example, the size of the jump at \(x=2\) is the difference between \[ F_{\X}(1.9) = P(\X \leq 1.9) = f(0) + f(1) \] and \[ F_{\X}(2.1) = P(\X \leq 2.1) = f(0) + f(1) + f(2), \] which is \(f(2)\), the PMF evaluated at \(x = 2\).
In other words, the size of each jump at \(x\) is precisely \(f(x)\), the PMF evaluated at \(x\). So we can determine the value of \(f(x)\) by calculating the size of each jump at \(x\).
- At \(x = 0\), the CDF jumps from \(0\) to \(1/8\), so \(f(0) = 1/8\).
- At \(x = 1\), the CDF jumps from \(1/8\) to \(1/2\), so \(f(1) = 1/2 - 1/8 = 3/8\).
- At \(x = 2\), the CDF jumps from \(1/2\) to \(7/8\), so \(f(2) = 7/8 - 1/2 = 3/8\).
- At \(x = 3\), the CDF jumps from \(7/8\) to \(1\), so \(f(3) = 1/8\).
This matches the PMF of \(\X\) that we derived in Example 8.3.
Bernoulli and Binomial Distributions
In this section, we introduce two types of random variables that are so common that they have been given names. Both can be described in terms of coin tossing.
Suppose we have a coin that has a probability \(p\) of coming up heads. (Note that this coin is not necessarily fair.)
- The number of heads that come up in a single toss of this coin is a Bernoulli random variable.
- The number of heads that come up in \(n\) tosses of this coin is a binomial random variable.
If \(X\) is a Bernoulli random variable, then clearly \(X\) is either \(0\) or \(1\), and \(P(X = 1) = p\).
Definition 8.5 (Bernoulli random variable) If \(X\) is random variable with the PMF \[
f(x) = \begin{cases} p, & x = 1 \\ 1-p, & x = 0 \end{cases}
\] for some \(0 \leq p \leq 1\), \(X\) is said to be a Bernoulli random variable with parameter \(p\). We write this as \(X \sim \text{Bernoulli}(p)\).
If \(X\) is a binomial random variable, the possible values of \(X\) are \(0, 1, \dots, n\). To determine the probability of each value, observe that the event \(\left\{ X = x \right\}\) means that there are \(x\) heads in the \(n\) tosses. For example, if \(n=4\) and \(x=2\), then \[
\{ X = 2 \} = \{ \text{HHTT}, \text{HTHT}, \text{HTTH}, \text{THHT}, \text{THTH}, \text{TTHH} \}.
\]
We can use Proposition 6.1 to calculate that the probability of any one of these sequences of \(x\) heads (and \(n-x\) tails) is \[
p^x (1 - p)^{n-x},
\] and the number of sequences is \[
\binom{n}{x},
\] so by Proposition 4.4, \[
P(X = x) = \binom{n}{x} p^x (1-p)^{n-x}.
\]
Definition 8.6 (Binomial distribution) If \(X\) is a random variable with PMF \[
f(x) = \binom{n}{x} p^x (1-p)^{n-x};\qquad x = 0, 1, \dots, n
\tag{8.6}\] for some \(n\) and \(0 < p < 1\), \(X\) is said to be a binomial random variable with parameters \(n\) and \(p\). We write this as \(X \sim \text{Binomial}(n,p)\).
Notice that the binomial distribution reduces to the Bernoulli distribution when \(n=1\).
Example 8.7 (Coin tosses as binomial) Recall Example 8.3, where a fair coin was tossed three times and \(\X\) was the number of heads.
Since each toss is independent, with probability \(p = 1/2\) of landing heads, we see that \(\X\) matches the description for a \(\text{Binomial}(n=3, p=1/2)\) random variable.
Therefore, its PMF can be written as the formula: \[
\begin{aligned}
f(x) &= {3 \choose x} (1/2)^x (1 - 1/2)^{3-x}; & x = 0, 1, 2, 3.
\end{aligned}
\tag{8.7}\]
By plugging in \(x=0, 1, 2, 3\) into Equation 8.7, we can verify that this formula yields the same probabilities that we obtained in Example 8.3.
- \(\displaystyle f(0) = {3 \choose 0} (1/2)^0 (1 - 1/2)^{3-0} = \frac{1}{8}\)
- \(\displaystyle f(1) = {3 \choose 1} (1/2)^1 (1 - 1/2)^{3-1} = \frac{3}{8}\)
- \(\displaystyle f(2) = {3 \choose 2} (1/2)^2 (1 - 1/2)^{3-2} = \frac{3}{8}\)
- \(\displaystyle f(3) = {3 \choose 3} (1/2)^3 (1 - 1/2)^{3-3} = \frac{1}{8}\)
Even though we described the Bernoulli and binomial distributions in terms of coin tosses, these distributions can be applied in many settings. In order to apply them, we need to translate the particular situation into coin tosses. The following examples illustrate this process.
Example 8.8 (Number of albino children) In Example 1.6, we saw that the probability that two carrier parents will have an albino child is \(1/4\). Now suppose that the same parents have \(5\) children (with no twins). How many of these children will be albino?
The number of children who are albino is a \(\textrm{Binomial}(n= 5, p= 1/4)\) random variable \(X\). To see why, make the analogy with coin tossing:
- A coin is tossed \(n=5\) times, once for each child.
- A “heads” means that the child is albino. This probability is \(p=1/4\).
- Because there are no twins, the children inherit the OCA2 gene independently. So this situation really is like tossing a coin repeatedly.
Now that we have established that \(X\) is binomial, we can immediately write down its PMF:
\[ f(x) = \binom{5}{x} \left(\frac{1}{4}\right)^x \left(1 - \frac{1}{4}\right)^{5-x}; \qquad x=0,1,2,3,4,5. \]
We can plug values into this PMF to obtain the probabilities:
\(f(x)\) |
\(\frac{243}{1024}\) |
\(\frac{405}{1024}\) |
\(\frac{270}{1024}\) |
\(\frac{90}{1024}\) |
\(\frac{15}{1024}\) |
\(\frac{1}{1024}\) |
The next example is another historical problem that engaged one of the greatest mathematicians of all time.
Example 8.9 (The Newton-Pepys problem) In 1693, Isaac Newton and Samuel Pepys, an English writer most famous today for his diary, corresponded about the following problem. Pepys wrote Newton a letter asking which of the three has the greatest chance of success:
- Six fair dice are tossed independently and at least one six appears.
- Twelve fair dice are tossed independently and at least two sixes appear.
- Eighteen fair dice are tossed independently and at least three sixes appear.
Pepys thought that option C was the most likely and wanted Newton to verify. We can compute the three probabilities ourselves using the binomial distribution.
In scenario A, let \(X\) denote the number of sixes in six tosses. Then, \(X \sim \textrm{Binomial}(n= 6, p= \frac{1}{6})\). The probability can now be obtained by plugging in appropriate values into the binomial PMF: \[
P(X \geq 1) = 1 - P(X = 0) = 1 - \binom{6}{0} \left( \frac{1}{6} \right)^0 \left( \frac{5}{6} \right)^6 \approx 0.6651.
\] In scenario B, let \(Y\) denote the number of sixes in twelve tosses. Then, \(Y \sim \textrm{Binomial}(n= 12, p= \frac{1}{6})\) and \[\begin{align*}
P(Y \geq 2) &= 1 - P(Y = 0) - P(Y = 1) \\
&= 1 - \binom{12}{0} \left( \frac{1}{6} \right)^0 \left( \frac{5}{6} \right)^{12} - \binom{12}{1} \left( \frac{1}{6} \right)^1 \left( \frac{5}{6} \right)^{11} \\
&\approx 0.6187.
\end{align*}\] In scenario C, let \(Z\) denote the number of sixes in eighteen tosses. Then, \(Z \sim \textrm{Binomial}(n= 18, p= \frac{1}{6})\) and \[\begin{align*}
P(Z \geq 3) &= 1 - P(Z = 0) - P(Z = 1) - P(Z = 2) \\
&= 1 - \binom{18}{0} \left( \frac{1}{6} \right)^0 \left( \frac{5}{6} \right)^{18} - \binom{18}{1} \left( \frac{1}{6} \right)^1 \left( \frac{5}{6} \right)^{17} - \binom{18}{2} \left( \frac{1}{6} \right)^2 \left( \frac{5}{6} \right)^{16} \\
&\approx 0.5973.
\end{align*}\]
Newton obtained these probabilities and correctly concluded that scenario A is the most likely.
Geometric Distribution
In this section, we introduce another type of random variable that is so common that it has a name. It can also be described in terms of coin tossing.
Once again, suppose we have a coin that has a probability \(p\) of coming up heads. However, instead of tossing the coin a fixed number of times, we now toss the coin until heads comes up. The random variable \(X\) is the number of tosses.
Unlike the random variables we have considered so far, there is no upper bound to the possible values of \(X\) because tails could hypothetically keep coming up. However, the idea is the same; to determine the distribution of \(X\), we need to determine \[ f_X(x) = P(X = x); \qquad x = 1, 2, 3, \dots \] as a function of \(x\).
The event \(\left\{ X = x \right\}\) means that the first \(x-1\) tosses were tails and the \(x\)th toss was heads. Because the tosses are independent, the probability of this is \[ P(X = x) = (1 - p)^{x-1} p. \]
This motivates the following definition.
Definition 8.7 (Geometric distribution) If \(X\) is random variable with the PMF \[
f_X(x) = (1-p)^{x-1} p, \qquad x = 1, 2, \dots,
\tag{8.8}\] for some \(0 < p \leq 1\), \(X\) is said to be a geometric random variable with parameter \(p\). We write this as \(X \sim \text{Geometric}(p)\).
To apply the geometric distribution to problems other than coin tossing, it helps to make the analogy with coin tossing.
Example 8.10 (Number of rolls in craps) Suppose that in a round of craps (Example 1.9), the point has been set at five. What is the probability that the round lasts longer than 4 rolls, not counting the come-out roll?
The number of (additional) rolls until the round ends is a \(\text{Geometric}(p=\frac{10}{36})\) random variable \(X\). To see why, make the analogy with coin tossing:
- A coin is tossed repeatedly, representing the dice rolls.
- A “heads” means that either a five or a seven was rolled. This probability is \(p=\frac{10}{36}\).
- The dice rolls are independent, just like the coin tosses.
Therefore, the PMF of \(X\) is \[
f_X(x) = \Big(1 - \frac{10}{36}\Big)^{x-1} \frac{10}{36}; \qquad x=1, 2, 3, \dots.
\]
We can use this PMF to calculate the probability that the round lasts longer than 4 rolls: \[
\begin{align}
P(X > 4) &= \sum_{x=5}^\infty f_X(x) \\
&= \sum_{x=5}^\infty \Big(1 - \frac{10}{36}\Big)^{x-1} \frac{10}{36} \\
&= \frac{10}{36} \Big(1 - \frac{10}{36}\Big)^{4} \sum_{x'=0}^\infty\Big(1 - \frac{10}{36}\Big)^{x'} & \text{(pull out constants and reindex sum)} \\
&= \frac{10}{36} \Big(1 - \frac{10}{36}\Big)^{4} \frac{1}{1 - \Big(1 - \frac{10}{36}\Big)} & \text{(sum of geometric series)}\\
&= \Big(1 - \frac{10}{36}\Big)^{4}.
\end{align}
\]
Notice that we used the formula for the sum of a geometric series \[
\sum_{k=0}^\infty r^k = \frac{1}{1 - r}; |r| < 1.
\] In fact, this is why this distribution is known as the geometric distribution.
In retrospect, we did not need the geometric distribution to answer this question. In order for the round to last longer than \(4\) rolls, we must roll something other than a five or a seven \(4\) times in a row. By independence, the probability of this is \[
\Big(1 - \frac{10}{36}\Big)^{4} \approx .272.
\]
Exercises
When a question asks for the distribution of a random variable, any one of the following is sufficient:
- the PMF (either as a table or a formula)
- the CDF (either as a table or a formula)
- the name of the distribution (e.g., Bernoulli, binomial, geometric), along with the values of all parameters
Exercise 8.1 (Secret Santa random variable) Recall the Secret Santa example (Example 1.7). If \(X\) represents the number of friends who draw their own name,
- Determine the distribution of \(X\). Why is it not \(\textrm{Binomial}(n= 4, p= 1/4)\)?
- Verify that \(P(X > 0)\) agrees with the answer we obtained in Example 4.7.
Exercise 8.2 (Field bet in craps) One of the most popular side bets in craps (Figure 1.8) is the field bet. Unlike the pass line bet, which is resolved over multiple rolls, the field bet is resolved in a single roll. The field bet wins if the next roll is a 2, 3, 4, 9, 10, 11, or 12. Typically, 2 and 12 pay double (2-to-1), while the other numbers pay 1-to-1. What is the PMF of the payout if you make a $1 field bet?
Exercise 8.3 (Wine competition) In a wine competition, there are five wines from Napa and five wines from Bordeaux. Each judge is asked to rank the wines from 1 (best) to 10 (worst). Ties are not allowed. Suppose that a judge cannot tell the difference between the ten wines so that all \(10!\) possible rankings are equally likely. Let \(R\) be the rank of the best Napa wine on this judge’s scorecard. Calculate the PMF of \(R\), and check that it is a valid PMF.
Exercise 8.4 (Random variables in roulette) Xavier places a $5 bet on red on each of 3 spins of a roulette wheel.
- Let \(X\) be the number of bets he wins. Find the distribution of \(X\).
- Let \(W\) be his payout over the 3 spins (which may be negative). Find the distribution of \(W\).
Exercise 8.5 (Trying keys at random) A friend asked you to house-sit while she is on vacation. She gave you a keychain with \(n\) keys, but she forgot to tell you which key was the one to her apartment. You decide to try keys at random until you find the right one.
- Let \(X\) be the number of keys that you need to try if you discard keys that do not work. What are the PMF and CDF of \(X\)?
- Let \(Y\) be the number of keys that you need to try if you do not discard keys that do not work. What are the PMF and CDF of \(Y\)?
Exercise 8.6 (Properties of the geometric distribution)
- Show that Equation 8.8 is a valid PMF.
- Derive a simple formula (not involving sums) for the CDF of a geometric random variable. Note that your formula must be valid for all values of \(x\), not just integer values. (Hint: It might be helpful to express your formula in terms of \(\lfloor x \rfloor\), which is defined to be the largest integer \(\leq x\).)