16  From Conditionals to Marginals

In Chapter 13, we calculated conditional and marginal distributions from the joint distribution \(f(x, y)\) of two random variables.

However, in many situations, the conditional distribution is known but not the joint distribution. The next example illustrates such a situation.

Example 16.1 (Corned Beef at the Food Truck) The number of customers \(N\) that purchase from a food truck at lunchtime follows a \(\text{Poisson}(\mu=\mean)\) distribution. About \(\textcolor{blue}{30\%}\) of customers order its signature corned beef sandwich, independently of other customers. Let \(X\) be the number of corned beef sandwiches they sell at lunchtime.

In this example, we know:

  • the marginal distribution of \(N\): \[ N \sim \text{Poisson}(\mu=\mean), \]
  • the conditional distribution of \(X\), given \(N\): \[ X | \{ N = n \} \sim \text{Binomial}(n, p=\prob). \]

but we do not know the joint distribution of \(X\) and \(N\), nor the marginal distribution of \(X\).

The deli wants to know the (marginal) distribution of \(X\) so that they know how much corned beef to prepare. To find the marginal distribution, we can first find the joint distribution: \[\begin{align*} f_{N, X}(n, x) &= f_N(n) \cdot f_{X|N}(x | n) \\ &= e^{-\mean} \frac{\mean^n}{n!} \cdot \frac{n!}{x! (n-x)!} (\prob)^x (1 - \prob)^{n-x}; & 0 \leq n < \infty, 0 \leq x \leq n \end{align*}\]

and sum over \(n\) to get the marginal distribution of \(X\): \[\begin{align*} f_X(x) &= \sum_n f_{N, X}(n, x) \\ &= \sum_{n=x}^\infty e^{-\mean} \frac{\mean^n}{n!} \cdot \frac{n!}{x! (n-x)!} (\prob)^x (1 - \prob)^{n-x} & (\text{probability is $0$ for $n < x$}) \\ &= e^{-\mean \cdot \prob} \frac{(\mean \cdot \prob)^x}{x!} \sum_{n=x}^\infty e^{-\mean (1 - \prob)} \frac{(\mean (1 - \prob))^{n-x}}{(n-x)!} & (\text{regroup terms}) \\ &= e^{-\mean \cdot \prob} \frac{(\mean \cdot \prob)^x}{x!} \underbrace{\sum_{m=0}^\infty e^{-\mean(1 - \prob)} \frac{(\mean(1 - \prob))^{m}}{m!}}_{=1} & (\text{substitute } m = n - x)\\ &= e^{-\mean \cdot \prob} \frac{(\mean \cdot \prob)^x}{x!}. \end{align*}\]

In the last line, we used the fact that the expression inside the sum is a \(\text{Poisson}(\mu=\mean(1 - \prob))\) pmf. When we sum any pmf over all of its possible values, it is equal to \(1\).

The last step is to recognize the final expression for \(f_X(x)\) as the pmf of a Poisson random variable. In particular, \[X \sim \text{Poisson}(\mu=\mean \cdot \prob).\]

The algebra in Example 16.1 was tricky, but the idea was not. To obtain the marginal distribution of \(X\), we:

  1. multiplied the distribution of \(N\) by the conditional distribution of \(X | N\) to get the joint distribution of \(N\) and \(X\) \[ f_{N, X}(n, x) = f_N(n) f_{X|N}(x | n) \]
  2. summed the joint distribution over the values of \(N\) to get the distribution of \(X\). \[ f_X(x) = \sum_n f_{N, X}(n, x) \]

We can express this recipe as a single formula: \[ f_X(x) = \sum_n f_N(n) f_{X|N}(x | n). \tag{16.1}\] In fact, Equation 16.1 is none other than the Law of Total Probability! To see this, write each pmf in terms of the corresponding probability: \[ P(X = x) = \sum_n P(N = n) P(X = x | N = n). \tag{16.2}\] Since the sets \(A_n = \{ N = n \}\) form a partition of the sample space, Equation 16.2 is just a restatement of Theorem 7.1.

16.1 Law of Total Expectation

How many corned beef sandwiches should the food truck expect to sell at lunchtime? Since we already showed in Example 16.1 that \(X\) follows a \(\text{Poisson}(\mu=\mean \cdot \prob)\) distribution, and we know the expected value of a Poisson, the answer is: \[ \text{E}[ X ] = \mu = \mean \cdot \prob = 25.56. \] But determining the (marginal) distribution of \(X\) was a lot of work! And in most situations, the marginal distribution will not a familiar distribution, or it may even be infeasible to calculate altogether.

Is there a way to calculate \(\text{E}[ X ]\) without first finding the distribution of \(X\)? Yes, using the Law of Total Expectation, which is the natural extension of the Law of Total Probability (Equation 16.2) to expected values. In math, it says: \[ \text{E}[ X ] = \sum_n P(N = n) \text{E}[ X | N = n ]. \tag{16.3}\] In words, it says that we can calculate an overall expectation \(\text{E}[ X ]\) by:

  1. calculating the conditional expectations \(\text{E}[ X | N = n ]\) and
  2. weighting them by the probabilities of \(N\).

We will define the notation \(\text{E}[ X | N = n ]\) in Definition 16.1 and prove the Law of Total Expectation in Theorem 16.1. But let’s first appreciate its power by applying it to two examples.

Example 16.2 (Expected Rolls in Craps) On average, how long does a round of craps last? Let’s determine the expected number of rolls in a round of craps.

Let \(X\) be the number of rolls after the come-out roll. The distribution of \(X\) is complicated. But the distribution of \(X\) is simple if we condition on the value of the come-out roll.

  • If the come-out roll is 2, 3, 7, 11, 12, then the round ends and \(X = 0\).
  • Otherwise, the come-out roll determines the point, and the number of remaining rolls \(X\) is a \(\Geom(p)\) random variable, where \(p\) depends on the point.
    • For example, if the point is 4, then the round ends when either a 4 or a 7 is rolled, so \(p = 9 / 36\).
    • But if the point is 9, then the round ends when either a 9 or a 7 is rolled, so \(p = 10/36\).

The outcome of the come-out roll is a random variable \(N\). Let’s calculate the conditional expectation \(\text{E}[ X | N = n ]\) for different values of the come-out roll. We use the fact that the expected value of a \(\Geom(p)\) random variable is \(1 / p\). (We derived this previously using algebra, but it is also possible to derive this using conditional expectation. See Example 16.4.)

Come-Out Roll \(n\) Distribution of \(X | N = n\) \(\text{E}[ X | N = n ]\)
2 \(0\) \(0\)
3 \(0\) \(0\)
4 \(\Geom(p=9/36)\) \(36/9\)
5 \(\Geom(p=10/36)\) \(36/10\)
6 \(\Geom(p=11/36)\) \(36/11\)
7 \(0\) \(0\)
8 \(\Geom(p=11/36)\) \(36/11\)
9 \(\Geom(p=10/36)\) \(36/10\)
10 \(\Geom(p=9/36)\) \(36/9\)
11 \(0\) \(0\)
12 \(0\) \(0\)

Now, we can use the Law of Total Probability to calculate \(\text{E}[ X ]\). \[ \begin{aligned} \text{E}[ X ] &= \sum_n P(N = n) \text{E}[ X | N = n ] \\ &= P(N = 2) \underbrace{\text{E}[ X | N = 2 ]}_0 + P(N = 3) \underbrace{\text{E}[ X | N = 3 ]}_0 + \underbrace{P(N = 4)}_{3/36} \underbrace{\text{E}[ X | N = 4 ]}_{36/9} + ... \end{aligned} \] This sum is rather tedious to calculation by hand, so let’s use R to finish the calculation.

From R, we see that \(\text{E}[ X ] = 2.3758\).

The total number of rolls in the round includes the come-out roll, so it can be represented by the random variable \(1 + X\). Its expected value is \[ \text{E}[ 1 + X ] = 1 + \text{E}[ X ] = 3.3758. \]

Let’s also apply the Law of Total Expectation to Example 16.1, which is another situation where the distribution of \(X\) was complicated to calculate, but the conditional distribution \(X | N = n\) is simple.

Example 16.3 (Corned Beef Expectations) How many corned beef sandwiches should the food truck expect to sell?

Since the conditional distribution \(X | N = n\) is binomial, we know that \[ \text{E}[ X | N = n ] = np = n(\prob). \] Substituting this into the Law of Total Expectation (Equation 16.3), the expected number of corned beef sandwiches is \[\begin{align*} \text{E}[ X ] &= \sum_{n=0}^\infty e^{-\mean} \frac{(\mean)^n}{n!} \cdot n(\prob) \\ &= (\prob) \underbrace{\sum_{n=0}^\infty n \cdot e^{-\mean} \frac{(\mean)^n}{n!}}_{\text{E}[ N ] = \mean} \\ &= 25.56. \end{align*}\]

Notice how much simpler it was to calculate the expectation \(\text{E}[ X ]\), compared with the distribution of \(X\) in Example 16.1. This illustrates the power of the Law of Total Expectation!

16.2 Definitions and Proofs

To prove the Law of Total Expectation, we first need to define the conditional expectation \(\text{E}[ X | N = n ]\). The definition is intuitive, and in fact, we have “calculated” it in the examples above.

Definition 16.1 (Conditional Expectation Given an Event) The expectation of \(X\) conditional an event \(A\) is defined as:

\[ \begin{aligned} \text{E}[ X | A ] &\overset{\text{def}}{=} \sum_x x P(X = x | A) \\ &= \sum_x x f_{X|A}(x) \end{aligned} \tag{16.4}\]

That is, instead of weighting the possible values of \(X\) by the probabilities of \(X\), we weight by the probabilities conditional on \(A\). These probabilities together comprise a (conditional) pmf \(f_{X|A}(x)\).

To define conditional expectations like \(\text{E}[ X | N = n ]\), we simply condition on the event \(A = \{ N = n \}\) in Definition 16.1: \[ \begin{aligned} \text{E}[ X | N=n ] &= \sum_x x P(X = x | N=n) \\ &= \sum_x x f_{X|N}(x|n). \end{aligned} \tag{16.5}\]

Now, armed with Equation 16.5, we can state and prove the Law of Total Expectation.

Theorem 16.1 (Law of Total Expectation) \[ \text{E}[ X ] = \sum_n P(N = n) \text{E}[ X | N = n ]. \]

Proof

The proof follows by expanding \(\text{E}[ X | N = n ]\) using its definition Equation 16.5 and rearranging the sums.

\[\begin{align*} \sum_n P(N = n) \text{E}[ X | N=n ] &= \sum_n P(N = n) \sum_x x P(X = x | N = n) \\ &= \sum_x x \underbrace{\sum_n P(N = n) P(X = x | N = n)}_{P(X = x)} \\ &= \text{E}[ X ] \end{align*}\]

The last step follows from Equation 16.2.

Let’s use the Law of Total Expectation to derive the expected value of a \(\Geom(p)\) random variable.

Example 16.4 (Geometric Expectation) Let \(X\) be a \(\Geom(p)\) random variable. What is \(\text{E}[ X ]\)?

Let \(N\) be the outcome of the first Bernoulli trial. If \(N = 1\), then \(X = 1\). But if \(N = 0\), the remaining number of trials is a \(\text{Geom}(p)\) random variable.

In other words,

  • \(\text{E}[ X | N = 1 ] = 1\)
  • \(\text{E}[ X | N = 0 ] = 1 + \text{E}[ X ]\), since we have used up 1 trial already and the remaining number of trials is again \(\text{Geom}(p)\), with the same expectation.

By the Law of Total Expectation (Equation 16.3), we have the expression: \[ \begin{aligned} \text{E}[ X ] &= P(N = 0) \text{E}[ X | N = 0 ] + P(N = 1) \text{E}[ X | N = 1 ] \\ &= (1 - p) (1 + \text{E}[ X ]) + p(1). \end{aligned} \]

If we solve this equation for \(\text{E}[ X ]\), we obtain: \[ \text{E}[ X ] = \frac{1}{p}. \]

Notice in this example that we calculated \(\text{E}[ X | N = 0 ]\) by intuition and reasoning, rather than by algebra. This will usually be the case when working with conditional expectations.

16.3 Exercises

Exercise 16.1 A fair coin is tossed repeatedly.

  1. Let \(X\) be the number of tosses until the sequence HH first appears. What is \(\text{E}[ X ]\)? Hint: Condition on the outcome of the first toss, and use Theorem 16.1.
  2. Let \(Y\) be the number of tosses until the sequence HT first appears. What is \(\text{E}[ Y ]\)?
  3. Compare \(\text{E}[ X ]\) and \(\text{E}[ Y ]\). Can you provide intuition for this result?

Exercise 16.2 A stream of bits is being transmitted over a noisy channel. Each bit has a 1% chance of being corrupted, independently of any other bit. Let \(X\) be the number of bits that are transmitted until the first corruption.

  1. Interpret \(\text{E}[ X | X > 10 ]\) in the context of this problem.
  2. Calculate \(\text{E}[ X | X > 10 ]\).

Exercise 16.3 You play the following game. You roll a fair, six-sided die. You can either take the amount on that die (in dollars) or roll the die again and take whatever amount comes up the second time (in dollars).

  1. What is your optimal strategy? (In other words, when should you take the first roll, and when should you take the option to roll again?) How much money would you expect to win by playing this optimal strategy?
  2. Now suppose that you have the option to roll the die a third time if you are unhappy with the outcome of the second roll. What is your optimal strategy now, and how much money would you expect to win with this strategy?