13  Joint Distributions

Xavier and Yvette are playing roulette together. They both place $1 on red (for the red/black bet) on 3 spins of the roulette wheel before Xavier has to leave. After Xavier leaves, Yvette continues the bet on 2 more spins. Let \(X\) be the number of bets Xavier wins and \(Y\) be the number of bets Yvette wins.

We know that \(X \sim \text{Binomial}(n=3, p=18/38)\) and \(Y \sim \text{Binomial}(n=5, p=18/38)\). But how are they related to each other? To answer this question, we need to describe the joint distribution of \(X\) and \(Y\).

13.1 Joint PMF

Definition 13.1 (Joint PMF of \(X\) and \(Y\)) The joint distribution of two random variables \(X\) and \(Y\) is described by the joint PMF \[ f_{X,Y}(x,y) = P(X = x, Y = y). \]

The joint PMF is typically written as a table.

Example 13.1 (Joint PMF of Xavier and Yvette’s wins) Xavier can win any number of games between 0 and 3, while Yvette can win any number of games between 0 and 5. We can list the possibilities and their probabilities in a table.

\(y \quad \Big\\ \quad x\) \(0\) \(1\) \(2\) \(3\)
\(0\) \(f(0,0)\) \(f(1,0)\) \(f(2,0)\) \(f(3,0)\)
\(1\) \(f(0,1)\) \(f(1,1)\) \(f(2,1)\) \(f(3,1)\)
\(2\) \(f(0,2)\) \(f(1,2)\) \(f(2,2)\) \(f(3,2)\)
\(3\) \(f(0,3)\) \(f(1,3)\) \(f(2,3)\) \(f(3,3)\)
\(4\) \(f(0,4)\) \(f(1,4)\) \(f(2,4)\) \(f(3,4)\)
\(5\) \(f(0,5)\) \(f(1,5)\) \(f(2,5)\) \(f(3,5)\)

To fill out this table, we first observe that many of these probabilities are 0.

  1. Whenever Xavier wins, Yvette also wins, so Yvette must win at least as many bets as Xavier. That is, \(f(x, y) = 0\) if \(x > y\).
  2. Since Yvette plays only two more bets than Xavier, she can have at most two more wins. That is, \(f(x, y) = 0\) if \(y > x + 2\).
\(y \quad \Big\\ \quad x\) \(0\) \(1\) \(2\) \(3\)
\(0\) \(f(0,0)\) \(0\) \(0\) \(0\)
\(1\) \(f(0,1)\) \(f(1,1)\) \(0\) \(0\)
\(2\) \(f(0,2)\) \(f(1,2)\) \(f(2,2)\) \(0\)
\(3\) \(0\) \(f(1,3)\) \(f(2,3)\) \(f(3,3)\)
\(4\) \(0\) \(0\) \(f(2,4)\) \(f(3,4)\)
\(5\) \(0\) \(0\) \(0\) \(f(3,5)\)

The remaining probabilities are non-zero and must be calculated. Let’s do \(f(1,2) = P(X = 1, Y = 2)\) as an example.

The event \(\{ X = 1, Y = 2 \}\) means that Xavier and Yvette win 1 of the 3 bets they play together, and Yvette wins 1 of the 2 bets she plays by herself. Hence, using the fact that the first three spins and the last two spins are independent, \[\begin{align*} f(1,2) &= P(\text{win 1 of first 3 bets, win 1 of last 2 bets}) \\ &= P(\text{win 1 of first 3 bets}) P(\text{win 1 of last 2 bets}) \\ &= \binom{3}{1} \left( \frac{18}{38} \right)^1 \left( \frac{20}{38} \right)^2 \cdot \binom{2}{1} \left( \frac{18}{38} \right)^1 \left( \frac{20}{38} \right)^1 \\ &\approx 0.1963. \end{align*}\]

Following the same logic, we can develop a general formula for the joint PMF when \(0 \leq x \leq y \leq 5\)

\[\begin{align*} f(x, y) &= P(\text{win $x$ of first 3 bets, win $y - x$ of last 2 bets}) \\ &= P(\text{win $x$ of first 3 bets}) P(\text{win $y - x$ of last 2 bets}) \\ &= \binom{3}{x} \left( \frac{18}{38} \right)^x \left( \frac{20}{38} \right)^{3 - x} \cdot \binom{2}{y - x} \left( \frac{18}{38} \right)^{y - x} \left( \frac{20}{38} \right)^{2 - (y - x)} \\ &= \binom{3}{x} \binom{2}{y - x} \left( \frac{18}{38} \right)^{y} \left( \frac{20}{38} \right)^{5 - y} \end{align*} \tag{13.1}\]

By plugging in values into the formula above, we obtain the following table for the joint PMF:

\(y \quad \Big\\ \quad x\) \(0\) \(1\) \(2\) \(3\)
\(0\) \(.0404\) \(0\) \(0\) \(0\)
\(1\) \(.0727\) \(.1090\) \(0\) \(0\)
\(2\) \(.0327\) \(.1963\) \(.0981\) \(0\)
\(3\) \(0\) \(.0883\) \(.1766\) \(.0294\)
\(4\) \(0\) \(0\) \(.0795\) \(.0530\)
\(5\) \(0\) \(0\) \(0\) \(.0238\)

As a sanity check, we can make sure the probabilities in the table add up to \(1\) (up to rounding error)!

In some situations, the random variables \(X\) and \(Y\) do not provide information about each other. In this case, we say that \(X\) and \(Y\) are independent.

The next definition formalizes independence for random variables. It is based on the definition of independence for events (Definition 6.1).

Definition 13.2 (Independent Random Variables) Two (discrete) random variables \(X\) and \(Y\) are independent if \[ P(X = x, Y = y) = P(X = x) P(Y = y) \tag{13.2}\] for all values \(x\) and \(y\).

We can state Equation 13.2 in terms of the joint PMF and the PMFs of \(X\) and \(Y\) individually: \[ f(x,y) = f_X(x) f_Y(y) \] for all values \(x\) and \(y\).

The next example illustrates an application of independence to obtain a surprising result.

Example 13.2 (Coin Competition) Wei tosses a fair coin 3 times. Van tosses a different fair coin 4 times. The winner is the person who gets more heads. What is the probability that Van wins?

Because Van tosses one more coin than Wei, it seems that Van is more likely to win. To formalize this problem, let \(W\) be the number of heads that Wei gets and \(V\) be the number of heads that Van gets. We are interested in \(P(V > W)\).

Since Wei and Van are tossing different coins. \(W\) and \(V\) are independent. Their joint PMF is \[ f(v, w) = f_V(v) \cdot f_W(w). \] Since \(V \sim \text{Binomial}(n=4, p=1/2)\) and \(W \sim \text{Binomial}(n=3, p=1/2)\), the joint PMF is \[ f(v, w) = \binom{4}{v} \big(\frac{1}{2}\big)^v \big(\frac{1}{2}\big)^{4-v} \cdot \binom{3}{w} \big(\frac{1}{2}\big)^w \big(\frac{1}{2}\big)^{3-w} = \frac{\binom{4}{v} \binom{3}{w}}{2^7}. \]

By substituting different values for \(v\) and \(w\), we can write out the joint PMF in table form:

\(w \quad \Big\\ \quad v\) \(0\) \(1\) \(2\) \(3\)
\(0\) \(.0078125\) \(.03125\) \(.046875\) \(.03125\)
\(1\) \(.0234375\) \(.09375\) \(.140625\) \(.09375\)
\(2\) \(.0234375\) \(.09375\) \(.140625\) \(.09375\)
\(3\) \(.0078125\) \(.03125\) \(.046875\) \(.03125\)

You should verify that the sum of these values is \(1\) (up to rounding error).

To calculate \(P(V > W)\), we have to sum the joint PMF over the relevant values. We could express this probability as a double sum,

\[ P(V > W) = \sum_{v=0}^4 \sum_{w=0}^{v-1} f(v, w), \]

but it is easier to simply highlight the relevant entries in the table.

\(w \quad \Big\\ \quad v\) \(0\) \(1\) \(2\) \(3\) \(4\)
\(0\) \(.0078125\) \(\boxed{.03125}\) \(\boxed{.046875}\) \(\boxed{.03125}\) \(\boxed{.0078125}\)
\(1\) \(.0234375\) \(.09375\) \(\boxed{.14063}\) \(\boxed{.09375}\) \(\boxed{.0234375}\)
\(2\) \(.0234375\) \(.09375\) \(.14063\) \(\boxed{.09375}\) \(\boxed{.0234375}\)
\(3\) \(.0078125\) \(.03125\) \(.04688\) \(.03125\) \(\boxed{.0078125}\)

Therefore, \[\begin{align*} P(V > W) &= f(1,0) + f(2,0) + f(3, 0) + f(4, 0) \\ & \qquad + f(2, 1) + f(3,1) + f(4, 1) + f(3, 2) + f(4, 2) + f(4, 3) \\ &= .5. \end{align*}\]

Surprisingly, Van only has a 50% chance of winning. Intuitively, this is because there is also the possibility that Van and Wei tie.

  • The probability that they tie is \[ P(V = W) = f(0, 0) + f(1, 1) + f(2, 2) + f(3, 3) \approx .2734.\]
  • By the complement rule, the probability that Wei wins is \[ P(W > V) \approx 1 - .5 - .2734 = .2266. \]

So Van is more likely to win than Wei, but because they can also tie, Van only has a 50% chance of winning outright.

13.2 Marginal Distribution

If we know the joint PMF \(f(x, y)\) of two random variables \(X\) and \(Y\), can we determine the PMF of \(X\) by itself? The answer is yes! Because the sets \(\{ Y = y\}\) form a partition of the sample space, we have \[ \begin{aligned} f_X(x) = P(X = x) &= P\left(\bigcup_y \{ X = x, Y = y \} \right) \\ &= \sum_y P(X = x, Y = y) \\ &= \sum_y f(x, y) \end{aligned} \]

If we write the joint PMF as a table, the above calculation corresponds to summing each column.

\(y \quad \Big\\ \quad x\) \(0\) \(1\) \(2\) \(\dots\)
\(0\) \(f(0,0)\) \(f(1,0)\) \(f(2,0)\) \(\dots\)
\(1\) \(f(0,1)\) \(f(1,1)\) \(f(2,1)\) \(\dots\)
\(2\) \(f(0,2)\) \(f(1,2)\) \(f(2,2)\) \(\dots\)
\(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\)
total \(f_X(0)\) \(f_X(1)\) \(f_X(2)\) \(\dots\)

It is natural to write these sums in the margins of the table. For this reason, the PMF of \(X\) in this context is called the marginal PMF.

Definition 13.3 (Marginal PMF) The marginal PMF of \(X\) refers to the PMF of \(X\) when it is calculated from a joint PMF. Specifically, the marginal PMF \(f_X\) can be calculated by summing the joint PMF over all the possible values of \(Y\): \[ f_X(x) = \sum_y f(x,y). \]

Similarly, the marginal PMF of \(Y\) is calculated by summing the joint PMF over all the possible values of \(X\): \[ f_Y(y) = \sum_x f(x,y). \]

The next example illustrates how to calculate marginal PMFs from a joint PMF.

Example 13.3 (Marginal distribution of Xavier and Yvette’s roulette wins) In Example 13.1, we determined the joint PMF of \(X\), the number of bets that Xavier wins, and \(Y\), the number of bets that Yvette wins. The joint PMF is specified in the table below.

\(y \quad \Big\\ \quad x\) \(0\) \(1\) \(2\) \(3\)
\(0\) \(.0404\) \(0\) \(0\) \(0\)
\(1\) \(.0727\) \(.1090\) \(0\) \(0\)
\(2\) \(.0327\) \(.1963\) \(.0981\) \(0\)
\(3\) \(0\) \(.0883\) \(.1766\) \(.0294\)
\(4\) \(0\) \(0\) \(.0795\) \(.0530\)
\(5\) \(0\) \(0\) \(0\) \(.0238\)

To calculate the marginal PMF of \(X\), we sum each column. (Normally, this would be written in the margins of the table above, but we write it in a separate table.)

\(x\) \(0\) \(1\) \(2\) \(3\)
\(f_X(x)\) \(.1457\) \(.3936\) \(.3542\) \(.1062\)

To calculate the marginal PMF of \(Y\), we sum each row.

\(y\) \(0\) \(1\) \(2\) \(3\) \(4\) \(5\)
\(f_Y(y)\) \(.0404\) \(.1817\) \(.3271\) \(.2943\) \(.1325\) \(.0238\)

Verify that these are both valid PMFs (they sum to 1, up to rounding error).

The above approach is fine if all we need are the numerical probabilities. However, if we want insight, then we should derive a formula for the marginal PMF using Equation 13.1. For example, the marginal PMF of \(Y\) is

\[ \begin{aligned} f_Y(y) &= \sum_x f(x, y) \\ &= \sum_x \binom{3}{x} \binom{2}{y - x} \left( \frac{18}{38} \right)^{y} \left( \frac{20}{38} \right)^{5 - y} \\ &= \binom{5}{y} \left( \frac{18}{38} \right)^{y} \left( \frac{20}{38} \right)^{5 - y}, \end{aligned} \] where we used Vandermonde’s identity in the last step. This PMF corresponds to one of the named distributions we learned in Chapter 12, specifically the binomial distribution. This is no surprise, as we know that the number of bets that Yvette wins is \(Y \sim \text{Binomial}(n=5, p=18/38)\).

13.3 Conditional Distribution

The conditional PMF is another way to describe the information that one random variable provides about another. The definition of conditional PMF is the natural analog of Definition 5.1.

Definition 13.4 (Conditional PMF) The conditional PMF of \(X\) given \(Y\) is \[ f_{X \mid Y}(x \mid y) := P(X = x \mid Y = y) = \frac{f(x,y)}{f_Y(y)}. \]

Let’s calculate a conditional distribution from the joint distribution.

Example 13.4 (Conditional Distribution of Xavier’s Wins) The next day, Xavier has forgotten how many times he won. If Yvette remembers that she won \(3\) times, what information does Xavier have now? In other words, what is \(f_{X \mid Y}(x \mid 3)\)?

First, we will calculate this using the joint PMF table.

\(y\quad \Big\\ \quad x\) \(0\) \(1\) \(2\) \(3\) total
\(0\) \(.0404\) \(0\) \(0\) \(0\)
\(1\) \(.0727\) \(.1090\) \(0\) \(0\)
\(2\) \(.0327\) \(.1963\) \(.0981\) \(0\)
\(3\) \(\textcolor{red}{0}\) \(\textcolor{red}{.0883}\) \(\textcolor{red}{.1766}\) \(\textcolor{red}{.0294}\) .2943
\(4\) \(0\) \(0\) \(.0795\) \(.0530\)
\(5\) \(0\) \(0\) \(0\) \(.0238\)

By conditioning on \(y = 3\), we are restricting ourselves to the highlighted row above, whose sum is \(f_Y(3) = .2943\). Then, \[\begin{alignat*}{2} f_{X \mid Y}(0 \mid 3) &= \frac{0}{.2943} = 0 \qquad \qquad &&f_{X \mid Y}(1 \mid 3) = \frac{.0883}{.2943} = .3 \\ f_{X \mid Y}(2 \mid 3) &= \frac{.1766}{.2943} = .6 &&f_{X \mid Y}(3 \mid 3) = \frac{.0294}{.2943} = .1 \end{alignat*}\]

Note that these probabilities add up to 1! This is another illustration of Theorem 5.2, that conditional probabilities are probability functions. If we condition on \(\{ Y = 3 \}\), it becomes our universe, and all of the usual laws of probability apply in this universe, including the fact that PMFs sum to \(1\).

To obtain additional insight into this problem, let’s derive an explicit formula for the conditional PMF, using Equation 13.1 for the joint PMF. \[ \begin{aligned} f_{X | Y}(x | y) &= \frac{f(x, y)}{f_Y(y)} \\ &= \frac{\binom{3}{x} \binom{2}{y - x} \left( \frac{18}{38} \right)^{y} \left( \frac{20}{38} \right)^{5 - y}}{\binom{5}{y} \left( \frac{18}{38} \right)^y \left( \frac{20}{38} \right)^{5-y}} \\ &= \frac{\binom{3}{x} \binom{2}{y - x}}{\binom{5}{y}}; \qquad x = 0, 1, \dots, y \end{aligned} \]

This conditional PMF corresponds to one of the named distributions that we learned in Chapter 12, the hypergeometric distribution. To be precise, the conditional distribution of \(X\) given \(Y\) is \[ X | Y = y \sim \text{Hypergeometric}(M=y, N=5 - y, n=3). \tag{13.3}\]

Here is one way to make sense of Equation 13.3. If we know that Yvette won \(y\) times, then the \(y\) wins should be equally likely to be anywhere over the \(5\) bets. We can simulate the \(5\) bets by placing \(y\) white balls and \(5 - y\) black balls into a lottery drum and drawing the lottery balls one by one. The number of white balls in the first \(3\) draws corresponds to the number of times that Xavier won, and this has a hypergeometric distribution.