Xavier and Yvette are playing roulette together. They both place $1 on red (for the red/black bet) on 3 spins of the roulette wheel before Xavier has to leave. After Xavier leaves, Yvette continues the bet on 2 more spins. Let \(X\) be the number of bets Xavier wins and \(Y\) be the number of bets Yvette wins.
We know that \(X \sim \text{Binomial}(n=3, p=18/38)\) and \(Y \sim \text{Binomial}(n=5, p=18/38)\). But how are they related to each other? To answer this question, we need to describe the joint distribution of \(X\) and \(Y\).
Joint PMF
Definition 13.1 (Joint PMF of \(X\) and \(Y\)) The joint distribution of two random variables \(X\) and \(Y\) is described by the joint PMF \[
f_{X,Y}(x,y) = P(X = x, Y = y).
\]
The joint PMF is typically written as a table.
Example 13.1 (Joint PMF of Xavier and Yvette’s wins) Xavier can win any number of games between 0 and 3, while Yvette can win any number of games between 0 and 5. We can list the possibilities and their probabilities in a table.
\(0\) |
\(f(0,0)\) |
\(f(1,0)\) |
\(f(2,0)\) |
\(f(3,0)\) |
\(1\) |
\(f(0,1)\) |
\(f(1,1)\) |
\(f(2,1)\) |
\(f(3,1)\) |
\(2\) |
\(f(0,2)\) |
\(f(1,2)\) |
\(f(2,2)\) |
\(f(3,2)\) |
\(3\) |
\(f(0,3)\) |
\(f(1,3)\) |
\(f(2,3)\) |
\(f(3,3)\) |
\(4\) |
\(f(0,4)\) |
\(f(1,4)\) |
\(f(2,4)\) |
\(f(3,4)\) |
\(5\) |
\(f(0,5)\) |
\(f(1,5)\) |
\(f(2,5)\) |
\(f(3,5)\) |
To fill out this table, we first observe that many of these probabilities are 0.
- Whenever Xavier wins, Yvette also wins, so Yvette must win at least as many bets as Xavier. That is, \(f(x, y) = 0\) if \(x > y\).
- Since Yvette plays only two more bets than Xavier, she can have at most two more wins. That is, \(f(x, y) = 0\) if \(y > x + 2\).
\(0\) |
\(f(0,0)\) |
\(0\) |
\(0\) |
\(0\) |
\(1\) |
\(f(0,1)\) |
\(f(1,1)\) |
\(0\) |
\(0\) |
\(2\) |
\(f(0,2)\) |
\(f(1,2)\) |
\(f(2,2)\) |
\(0\) |
\(3\) |
\(0\) |
\(f(1,3)\) |
\(f(2,3)\) |
\(f(3,3)\) |
\(4\) |
\(0\) |
\(0\) |
\(f(2,4)\) |
\(f(3,4)\) |
\(5\) |
\(0\) |
\(0\) |
\(0\) |
\(f(3,5)\) |
The remaining probabilities are non-zero and must be calculated. Let’s do \(f(1,2) = P(X = 1, Y = 2)\) as an example.
The event \(\{ X = 1, Y = 2 \}\) means that Xavier and Yvette win 1 of the 3 bets they play together, and Yvette wins 1 of the 2 bets she plays by herself. Hence, using the fact that the first three spins and the last two spins are independent, \[\begin{align*}
f(1,2) &= P(\text{win 1 of first 3 bets, win 1 of last 2 bets}) \\
&= P(\text{win 1 of first 3 bets}) P(\text{win 1 of last 2 bets}) \\
&= \binom{3}{1} \left( \frac{18}{38} \right)^1 \left( \frac{20}{38} \right)^2 \cdot \binom{2}{1} \left( \frac{18}{38} \right)^1 \left( \frac{20}{38} \right)^1 \\
&\approx 0.1963.
\end{align*}\]
Following the same logic, we can develop a general formula for the joint PMF when \(0 \leq x \leq y \leq 5\)
\[\begin{align*}
f(x, y) &= P(\text{win $x$ of first 3 bets, win $y - x$ of last 2 bets}) \\
&= P(\text{win $x$ of first 3 bets}) P(\text{win $y - x$ of last 2 bets}) \\
&= \binom{3}{x} \left( \frac{18}{38} \right)^x \left( \frac{20}{38} \right)^{3 - x} \cdot \binom{2}{y - x} \left( \frac{18}{38} \right)^{y - x} \left( \frac{20}{38} \right)^{2 - (y - x)} \\
&= \binom{3}{x} \binom{2}{y - x} \left( \frac{18}{38} \right)^{y} \left( \frac{20}{38} \right)^{5 - y}
\end{align*} \tag{13.1}\]
By plugging in values into the formula above, we obtain the following table for the joint PMF:
\(0\) |
\(.0404\) |
\(0\) |
\(0\) |
\(0\) |
\(1\) |
\(.0727\) |
\(.1090\) |
\(0\) |
\(0\) |
\(2\) |
\(.0327\) |
\(.1963\) |
\(.0981\) |
\(0\) |
\(3\) |
\(0\) |
\(.0883\) |
\(.1766\) |
\(.0294\) |
\(4\) |
\(0\) |
\(0\) |
\(.0795\) |
\(.0530\) |
\(5\) |
\(0\) |
\(0\) |
\(0\) |
\(.0238\) |
As a sanity check, we can make sure the probabilities in the table add up to \(1\) (up to rounding error)!
In some situations, the random variables \(X\) and \(Y\) do not provide information about each other. In this case, we say that \(X\) and \(Y\) are independent.
The next definition formalizes independence for random variables. It is based on the definition of independence for events (Proposition 6.1).
Definition 13.2 (Independent Random Variables) Two (discrete) random variables \(X\) and \(Y\) are independent if \[
P(X = x, Y = y) = P(X = x) P(Y = y)
\tag{13.2}\] for all values \(x\) and \(y\).
We can state Equation 13.2 in terms of the joint PMF and the PMFs of \(X\) and \(Y\) individually: \[
f(x,y) = f_X(x) f_Y(y)
\] for all values \(x\) and \(y\).
The next example illustrates an application of independence to obtain a surprising result.
Example 13.2 (Coin Competition) Wei tosses a fair coin 3 times. Van tosses a different fair coin 4 times. The winner is the person who gets more heads. What is the probability that Van wins?
Because Van tosses one more coin than Wei, it seems that Van is more likely to win. To formalize this problem, let \(W\) be the number of heads that Wei gets and \(V\) be the number of heads that Van gets. We are interested in \(P(V > W)\).
Since Wei and Van are tossing different coins. \(W\) and \(V\) are independent. Their joint PMF is \[
f(v, w) = f_V(v) \cdot f_W(w).
\] Since \(V \sim \text{Binomial}(n=4, p=1/2)\) and \(W \sim \text{Binomial}(n=3, p=1/2)\), the joint PMF is \[ f(v, w) = \binom{4}{v} \big(\frac{1}{2}\big)^v \big(\frac{1}{2}\big)^{4-v} \cdot \binom{3}{w} \big(\frac{1}{2}\big)^w \big(\frac{1}{2}\big)^{3-w} = \frac{\binom{4}{v} \binom{3}{w}}{2^7}. \]
By substituting different values for \(v\) and \(w\), we can write out the joint PMF in table form:
\(0\) |
\(.0078125\) |
\(.03125\) |
\(.046875\) |
\(.03125\) |
\(1\) |
\(.0234375\) |
\(.09375\) |
\(.140625\) |
\(.09375\) |
\(2\) |
\(.0234375\) |
\(.09375\) |
\(.140625\) |
\(.09375\) |
\(3\) |
\(.0078125\) |
\(.03125\) |
\(.046875\) |
\(.03125\) |
You should verify that the sum of these values is \(1\) (up to rounding error).
To calculate \(P(V > W)\), we have to sum the joint PMF over the relevant values. We could express this probability as a double sum,
\[ P(V > W) = \sum_{v=0}^4 \sum_{w=0}^{v-1} f(v, w), \]
but it is easier to simply highlight the relevant entries in the table.
\(0\) |
\(.0078125\) |
\(\boxed{.03125}\) |
\(\boxed{.046875}\) |
\(\boxed{.03125}\) |
\(\boxed{.0078125}\) |
\(1\) |
\(.0234375\) |
\(.09375\) |
\(\boxed{.14063}\) |
\(\boxed{.09375}\) |
\(\boxed{.0234375}\) |
\(2\) |
\(.0234375\) |
\(.09375\) |
\(.14063\) |
\(\boxed{.09375}\) |
\(\boxed{.0234375}\) |
\(3\) |
\(.0078125\) |
\(.03125\) |
\(.04688\) |
\(.03125\) |
\(\boxed{.0078125}\) |
Therefore, \[\begin{align*}
P(V > W) &= f(1,0) + f(2,0) + f(3, 0) + f(4, 0) \\
& \qquad + f(2, 1) + f(3,1) + f(4, 1) + f(3, 2) + f(4, 2) + f(4, 3) \\
&= .5.
\end{align*}\]
Surprisingly, Van only has a 50% chance of winning. Intuitively, this is because there is also the possibility that Van and Wei tie.
- The probability that they tie is \[ P(V = W) = f(0, 0) + f(1, 1) + f(2, 2) + f(3, 3) \approx .2734.\]
- By the complement rule, the probability that Wei wins is \[ P(W > V) \approx 1 - .5 - .2734 = .2266. \]
So Van is more likely to win than Wei, but because they can also tie, Van only has a 50% chance of winning outright.
Marginal Distribution
If we know the joint PMF \(f(x, y)\) of two random variables \(X\) and \(Y\), can we determine the PMF of \(X\) by itself? The answer is yes! Because the sets \(\{ Y = y\}\) form a partition of the sample space, we have \[
\begin{aligned}
f_X(x) = P(X = x) &= P\left(\bigcup_y \{ X = x, Y = y \} \right) \\
&= \sum_y P(X = x, Y = y) \\
&= \sum_y f(x, y)
\end{aligned}
\]
If we write the joint PMF as a table, the above calculation corresponds to summing each column.
\(0\) |
\(f(0,0)\) |
\(f(1,0)\) |
\(f(2,0)\) |
\(\dots\) |
\(1\) |
\(f(0,1)\) |
\(f(1,1)\) |
\(f(2,1)\) |
\(\dots\) |
\(2\) |
\(f(0,2)\) |
\(f(1,2)\) |
\(f(2,2)\) |
\(\dots\) |
\(\vdots\) |
\(\vdots\) |
\(\vdots\) |
\(\vdots\) |
\(\vdots\) |
total |
\(f_X(0)\) |
\(f_X(1)\) |
\(f_X(2)\) |
\(\dots\) |
It is natural to write these sums in the margins of the table. For this reason, the PMF of \(X\) is called the marginal PMF in this context.
Definition 13.3 (Marginal PMF) The marginal PMF of \(X\) refers to the PMF of \(X\) when it is calculated from a joint PMF. Specifically, the marginal PMF \(f_X\) can be calculated by summing the joint PMF over all the possible values of \(Y\): \[
f_X(x) = \sum_y f(x,y).
\]
Similarly, the marginal PMF of \(Y\) is calculated by summing the joint PMF over all the possible values of \(X\): \[
f_Y(y) = \sum_x f(x,y).
\]
The next example illustrates how to calculate marginal PMFs from a joint PMF.
Example 13.3 (Marginal distribution of Xavier and Yvette’s roulette wins) In Example 13.1, we determined the joint PMF of \(X\), the number of bets that Xavier wins, and \(Y\), the number of bets that Yvette wins. The joint PMF is specified in the table below.
\(0\) |
\(.0404\) |
\(0\) |
\(0\) |
\(0\) |
\(1\) |
\(.0727\) |
\(.1090\) |
\(0\) |
\(0\) |
\(2\) |
\(.0327\) |
\(.1963\) |
\(.0981\) |
\(0\) |
\(3\) |
\(0\) |
\(.0883\) |
\(.1766\) |
\(.0294\) |
\(4\) |
\(0\) |
\(0\) |
\(.0795\) |
\(.0530\) |
\(5\) |
\(0\) |
\(0\) |
\(0\) |
\(.0238\) |
To calculate the marginal PMF of \(X\), we sum each column. (Normally, this would be written in the margins of the table above, but we write it in a separate table.)
\(f_X(x)\) |
\(.1457\) |
\(.3936\) |
\(.3542\) |
\(.1062\) |
To calculate the marginal PMF of \(Y\), we sum each row.
\(f_Y(y)\) |
\(.0404\) |
\(.1817\) |
\(.3271\) |
\(.2943\) |
\(.1325\) |
\(.0238\) |
Verify that these are both valid PMFs (they sum to 1, up to rounding error).
The above approach is fine if the numerical probabilities are all that is needed. However, if we want to understand what is going on, then we should derive a formula for the marginal PMFs using Equation 13.1. For example, the marginal PMF of \(Y\) is
\[
\begin{aligned}
f_Y(y) &= \sum_x f(x, y) \\
&= \sum_x \binom{3}{x} \binom{2}{y - x} \left( \frac{18}{38} \right)^{y} \left( \frac{20}{38} \right)^{5 - y} \\
&= \binom{5}{y} \left( \frac{18}{38} \right)^{y} \left( \frac{20}{38} \right)^{5 - y},
\end{aligned}
\] where we used Vandermonde’s identity (Exercise 2.19) in the last step. This PMF corresponds to one of the named distributions we learned in Chapter 12, specifically the binomial distribution. This should come as no surprise, since we know that the number of bets that Yvette wins is \(Y \sim \text{Binomial}(n=5, p=\frac{18}{38})\).
Conditional Distribution
The conditional PMF is another way to describe the information that one random variable provides about another. The definition of conditional PMF is a natural analog of the definition of conditional probability (Definition 5.1).
Definition 13.4 (Conditional PMF) The conditional PMF of \(X\) given \(Y\) is \[
f_{X \mid Y}(x \mid y) := P(X = x \mid Y = y) = \frac{P(X = x, Y = y)}{P(Y = y)} = \frac{f(x,y)}{f_Y(y)}.
\]
Let’s calculate a conditional distribution from the joint distribution.
Example 13.4 (Conditional Distribution of Xavier’s Wins) The next day, Xavier has forgotten how many times he won. If Yvette remembers that she won \(3\) times, what can Xavier conclude about how many times he won? In other words, what is \(f_{X \mid Y}(x \mid 3)\)?
First, we will calculate this using the joint PMF table.
\(0\) |
\(.0404\) |
\(0\) |
\(0\) |
\(0\) |
|
\(1\) |
\(.0727\) |
\(.1090\) |
\(0\) |
\(0\) |
|
\(2\) |
\(.0327\) |
\(.1963\) |
\(.0981\) |
\(0\) |
|
\(3\) |
\(\textcolor{red}{0}\) |
\(\textcolor{red}{.0883}\) |
\(\textcolor{red}{.1766}\) |
\(\textcolor{red}{.0294}\) |
.2943 |
\(4\) |
\(0\) |
\(0\) |
\(.0795\) |
\(.0530\) |
|
\(5\) |
\(0\) |
\(0\) |
\(0\) |
\(.0238\) |
|
By conditioning on \(y = 3\), we are restricting ourselves to the highlighted row above, whose sum is \(f_Y(3) = .2943\). Then, \[\begin{alignat*}{2}
f_{X \mid Y}(0 \mid 3) &= \frac{0}{.2943} = 0 \qquad \qquad &&f_{X \mid Y}(1 \mid 3) = \frac{.0883}{.2943} = .3 \\
f_{X \mid Y}(2 \mid 3) &= \frac{.1766}{.2943} = .6 &&f_{X \mid Y}(3 \mid 3) = \frac{.0294}{.2943} = .1
\end{alignat*}\]
Note that these probabilities add up to 1! This is another illustration of Proposition 5.1, that conditional probabilities are probability functions. Once we condition on \(\{ Y = 3 \}\), this becomes the sample space, and all of the usual laws of probability apply in this universe, including the fact that PMFs sum to \(1\).
To obtain additional insight into this problem, let’s derive an explicit formula for the conditional PMF, using Equation 13.1 for the joint PMF. \[
\begin{aligned}
f_{X | Y}(x | y) &= \frac{f(x, y)}{f_Y(y)} \\
&= \frac{\binom{3}{x} \binom{2}{y - x} \left( \frac{18}{38} \right)^{y} \left( \frac{20}{38} \right)^{5 - y}}{\binom{5}{y} \left( \frac{18}{38} \right)^y \left( \frac{20}{38} \right)^{5-y}} \\
&= \frac{\binom{3}{x} \binom{2}{y - x}}{\binom{5}{y}}; \qquad x = 0, 1, \dots, y
\end{aligned}
\]
This conditional PMF corresponds to one of the named distributions that we learned in Chapter 12, the hypergeometric distribution. To be precise, the conditional distribution of \(X\) given \(Y\) is \[ X | Y = y \sim \text{Hypergeometric}(n=3, M=y, N=5). \tag{13.3}\]
Here is one way to make sense of Equation 13.3. If we know that Yvette won \(y\) times, then the \(y\) wins should be equally likely to be anywhere among the \(5\) bets. We can represent the \(5\) bets by a lot of \(5\) cans, of which \(y\) are defective, and the \(3\) bets that Xavier played are like a random sample of \(3\) cans from the lot. Therefore, the number of bets that Xavier wins follows a hypergeometric distribution.
Exercises
Exercise 13.1 (Joint distribution of spades and hearts in a poker hand) Suppose you are dealt a poker hand of \(5\) cards from a well-shuffled deck of cards. Let \(X\) be the number of spades in the hand, and let \(Y\) be the number of hearts in the hand.
- Find the joint PMF of \(X\) and \(Y\).
- Calculate the marginal PMFs of \(X\) and \(Y\) from the joint PMF, and check that they make sense. (Hint: They should correspond to one of the named distributions.)
Exercise 13.2 (Joint Distribution of Heads and Tails) Suppose you toss a fair coin four times. Let \(X\) be the number of heads in the first three tosses and \(Y\) be the number of tails in the last three tosses. Find the joint PMF of \(X\) and \(Y\).
Exercise 13.3 (Predicting Tennis) You have built a model predicting the outcomes of tennis matches. Suppose we are trying to predict how Aryna Sabalenka and Iga Swiatek will perform in the U.S. Open, starting from the round of 8.
Let \(X\) be the number of games Sabalenka wins during the match and \(Y\) be the number of games Swiatek wins during the match. Your model predicts the following:
\(0\) |
\(0.12\) |
\(0.112\) |
\(0.0504\) |
\(0.1176\) |
\(1\) |
\(0.054\) |
\(0.0504\) |
\(0.02268\) |
\(0.05292\) |
\(2\) |
\(0.0504\) |
\(0.04704\) |
\(0\) |
\(0.10584\) |
\(3\) |
\(0.0756\) |
\(0.07056\) |
\(0.07056\) |
\(0\) |
What do \(P(X = 2, Y = 2) = P(X = 3, Y = 3) = 0\) seem to indicate?
Calculate the marginal PMF of \(X\) and \(Y\). Are \(X\) and \(Y\) independent?
Compute \(P(X = 3 \mid Y = 2)\) and \(P(Y = 3 \mid X = 2)\). Should they sum to \(1\)? Why or why not?