14  Expectations Involving Multiple Random Variables

In Chapter 13, we described the distribution of multiple random variables completely using the joint PMF. In this chapter, we will summarize multiple random variables using expectations of the form \[ \text{E}[ g(X, Y) ]. \]

14.1 2D LotUS

The general tool for calculating expectations of the form \(\text{E}[ g(X, Y) ]\) is 2D LotUS. It is the natural generalization of (1D) LotUS, which we introduced in Chapter 11 for calculating expectations of functions of a single variable, \(\text{E}[ g(X) ]\).

Theorem 14.1 (2D LotUS) Let \(X\) and \(Y\) be random variables with joint PMF \(f(x,y)\). Then, for a function \(g: \mathbb{R}^2 \to \mathbb{R}\), \[ \text{E}[ g(X,Y) ] = \sum_x \sum_y g(x,y) f(x,y). \tag{14.1}\]

Similar to Theorem 11.1, we calculate the expectation of \(g(X, Y)\) by weighting the possible values of \(g(x, y)\) by the corresponding probabilities, which are given by the joint PMF (because there are two random variables).

Example 14.1 (Win difference in roulette) In Example 13.1, we determined the joint PMF of \(X\) (Xavier’s wins) and \(Y\) (Yvette’s wins) to be

\(y \quad \Big\\ \quad x\) \(0\) \(1\) \(2\) \(3\)
\(0\) \(.0404\) \(0\) \(0\) \(0\)
\(1\) \(.0727\) \(.1090\) \(0\) \(0\)
\(2\) \(.0327\) \(.1963\) \(.0981\) \(0\)
\(3\) \(0\) \(.0883\) \(.1766\) \(.0294\)
\(4\) \(0\) \(0\) \(.0795\) \(.0530\)
\(5\) \(0\) \(0\) \(0\) \(.0238\)

What is the expected win differential, i.e., \(\text{E}[ Y-X ]\)?

By 2D LotUS, \[ \text{E}[ Y - X ] = \sum_x \sum_y (y-x) f(x,y). \] Omitting the terms with \(f(x,y) = 0\), the above sum becomes \[\begin{align*} \text{E}[ Y - X ] &= (0-0) .0404 + (1-0) .0727 + (2-0) .0327 \\ &\quad + (1-1) .1090 + (2-1) .1963 + (3-1) .0883 \\ &\quad + (2-2) .0981 + (3-2) .1766 + (4-2) .0795 \\ &\quad + (3-3) .0294 + (4-3) .0530 + (5-3) .0238 \\ &= .9474. \end{align*}\]

Of course, there was a simpler way this result could have been obtained. \(Y - X\) represents the number of wins in the last two spins. So, \[Y - X \sim \textrm{Binomial}(n= 2, p= \frac{18}{38}),\] and we have a formula for the expectation of a binomial random variable: \[ \text{E}[ Y-X ] = np = 2 \cdot \frac{18}{38} \approx .9474. \]

Because Equation 14.1 is usually cumbersome to evaluate, 2D LotUS is usually a tool of last resort. The remainder of this chapter is devoted to shortcuts for specific functions \(g(x, y)\) that allow us to avoid 2D LotUS. But when in doubt, remember that 2D LotUS is always an option.

14.2 Linearity of Expectation

When \(g(x, y)\) is a linear function, there is a remarkable simplification.

Theorem 14.2 (Linearity of Expectation) Let \(X\) and \(Y\) be random variables. Then, \[ \text{E}[ X + Y ] = \text{E}[ X ] + \text{E}[ Y ]. \]

Proof

Using 2D LotUS with \(g(x,y) = x + y\), we see that \[\begin{align*} \text{E}[ X + Y ] &= \sum_x \sum_y (x+y) f(x,y) \\ &= \sum_x \sum_y x f(x,y) + \sum_x \sum_y y f(x,y) \\ &= \sum_x x \sum_y f(x,y) + \sum_y y \sum_x f(x,y) \\ &= \sum_x x f_X(x) + \sum_y y f_Y(y) \\ &= \text{E}[ X ] + \text{E}[ Y ]. \end{align*}\]

This result is more remarkable than it appears. It says that \(\text{E}[ X + Y ]\), which depends in principle on the joint distribution of \(X\) and \(Y\), can be calculated using only the distribution of \(X\) and the distribution of \(Y\) individually. That is, no matter how \(X\) and \(Y\) are related to each other, \(\text{E}[ X + Y ]\) is the same value.

The next example shows how linearity of expectation makes quick work of expected value calculations that otherwise would be cumbersome

Example 14.2 (Revisiting Example 14.1) In Example 14.1, \(g(x, y) = y - x\) is a linear function, so \[ \text{E}[ Y - X ] = \text{E}[ Y ] + \text{E}[ -X ] = \text{E}[ Y ] - \text{E}[ X ].\]

We know that \(X \sim \textrm{Binomial}(n= 3, p= \frac{18}{38})\) and \(Y \sim \textrm{Binomial}(n= 5, p= \frac{18}{38})\), so \[ \text{E}[ Y-X ] = \text{E}[ Y ] - \text{E}[ X ] = 5 \cdot \frac{18}{38} - 3 \cdot \frac{18}{38} = 2 \cdot \frac{18}{38}. \]

One way to use linearity of expectation is to break down a complicated random variable into a sum of simpler random variables. The next example applies this strategy to derive the expectation of the binomial distribution. We saw in Example 9.3 that this expected values can be messy to calculate directly.

Example 14.3 (Binomial expectation using linearity) Let \(X \sim \text{Binomial}(n,p)\). That is, \(X\) is the number of heads in \(n\) tosses of a coin with probability \(p\) of heads. Then, \[ X = I_1 + I_2 + \dots + I_n, \] where \(I_k\) is the indicator of heads on the \(k\)th toss. Each \(I_k\) is \(\text{Bernoulli}(p)\), so \(\text{E}[ I_k ] = p\).

By linearity of expectation: \[ \text{E}[ X ] = \text{E}[ I_1 + \cdots + I_n ] = \text{E}[ I_1 ] + \cdots + \text{E}[ I_n ] = p + \cdots + p = np. \]

Not only does linearity simplify the proof, but it also provides insight into why the formula is true.

The same strategy works for deriving the expectation of the hypergeometric distribution. In fact, the hypergeometric distribution has the same expectation as the binomial, precisely because linearity of expectation does not depend on the relationship between the random variables.

Example 14.4 (Hypergeometric expectation using linearity) Let \(Y \sim \text{Hypergeometric}(M, N, n)\). That is, \(Y\) is the number of white balls in \(n\) draws without replacement from a lottery drum containing \(M\) white balls and \(N\) black balls.

We can break down \(Y\) into a sum of indicator random variables, \[ Y = I_1 + I_2 + \dots + I_n, \] where \(I_k\) is the indicator of that the \(k\)th draw is a white ball. Each \(I_k\) is a random draw from the lottery drum, so \(I_k \sim \text{Bernoulli}(p=\frac{M}{M + N})\) and \(\text{E}[ I_k ] = \frac{M}{M + N}\).

By linearity of expectation: \[ \text{E}[ Y ] = \text{E}[ I_1 + \cdots + I_n ] = \text{E}[ I_1 ] + \cdots + \text{E}[ I_n ] = \frac{M}{M + N} + \cdots + \frac{M}{M + N} = n\frac{M}{M + N}. \] Note that the indicator random variables are not independent because the draws are made without replacement. If \(I_1 = 1\), then \(I_2\) is less likely to equal \(1\) because there is one fewer white ball in the drum.

If the draws had been made with replacement, then the indicator random variables would be independent, and \(Y\) would be \(\text{Binomial}(n, p=\frac{M}{M + N})\). But \(\text{E}[ Y ]\) would be the same, \(n\frac{M}{M + N}\). This illustrates the fact that linearity of expectation does not depend on the relationship between the random variables \(I_k\).

14.3 Expectation of Products

When \(g(x, y) = xy\), evaluating \(\text{E}[ g(X, Y) ] = \text{E}[ XY ]\) requires 2D LotUS in general. However, when \(X\) and \(Y\) are independent, we can break up the expectation.

Theorem 14.3 (Expectation of a product of independent random variables) If \(X\) and \(Y\) are independent random variables, then \[ \text{E}[ XY ] = \text{E}[ X ] \text{E}[ Y ]. \] Moreover, for functions \(g\) and \(h\), \[ \text{E}[ g(X) h(Y) ] = \text{E}[ g(X) ] \text{E}[ h(Y) ]. \]

Proof

Using 2D LotUS, we see that \[\begin{align*} \text{E}[ XY ] &= \sum_x \sum_y xy f(x,y) \qquad (\text{$X$ and $Y$ are independent}) \\ &= \sum_x \sum_y xy f_X(x) f_Y(y) \\ &= \left( \sum_x x f_X(x) \right) \left( \sum_y y f_Y(y) \right) \\ &= \text{E}[ X ] \text{E}[ Y ]. \end{align*}\]

The proof of the second part is similar.

The independence condition is absolutely necessary, as the next example illustrates.

Example 14.5 (EV of a product) Returning to Xavier and Yvette, recall that the joint PMF of \(X\) and \(Y\) is

\(y \quad \Big\\ \quad x\) \(0\) \(1\) \(2\) \(3\)
\(0\) \(.0404\) \(0\) \(0\) \(0\)
\(1\) \(.0727\) \(.1090\) \(0\) \(0\)
\(2\) \(.0327\) \(.1963\) \(.0981\) \(0\)
\(3\) \(0\) \(.0883\) \(.1766\) \(.0294\)
\(4\) \(0\) \(0\) \(.0795\) \(.0530\)
\(5\) \(0\) \(0\) \(0\) \(.0238\)

First, we will compute \(\text{E}[ XY ]\) using 2D LotUS. \[\begin{align*} \text{E}[ XY ] &= \sum_x \sum_y xy f(x,y) \\ &= (0 \cdot 0) .0404 + (0 \cdot 1) .0727 + (0 \cdot 2) .0327 \\ &\quad + (1 \cdot 1) .1090 + (1 \cdot 2) .1963 + (1 \cdot 3) .0883 \\ &\quad + (2 \cdot 2) .0981 + (2 \cdot 3) .1766 + (2 \cdot 4) .0795 \\ &\quad + (3 \cdot 3) .0294 + (3 \cdot 4) .0530 + (3 \cdot 5) .0238 \\ &\approx 4.11. \end{align*}\]

This is not the same as \(\displaystyle \text{E}[ X ]\text{E}[ Y ] = \left( 3 \cdot \frac{18}{38} \right) \left( 5 \cdot \frac{18}{38} \right) \approx 3.37,\) which is to be anticipated, since \(X\) and \(Y\) are not independent. We cannot apply Theorem 14.3 directly.

However, there is a way to apply Theorem 14.3 to this problem to avoid 2D LotUS, but it involves introducing an additional random variable.

Let \(Z\) be the number of wins in the two spins Yvette plays by herself, so \(Y = X + Z\). Now, \(X\) and \(Z\) are independent, and we can apply Theorem 14.3 to \(\text{E}[ XZ ]\): \[ \begin{align*} \text{E}[ XY ] &= \text{E}[ X(X+Z) ] \\ &= \text{E}[ X^2 ] + \text{E}[ XZ ] \\ &= \text{E}[ X^2 ] + \text{E}[ X ]\text{E}[ Z ]. \end{align*} \]

These expectations only involve one random variable. Moreover, we know that \(X \sim \textrm{Binomial}(n= 3, p= \frac{18}{38})\) and \(Z \sim \textrm{Binomial}(n= 2, p= \frac{18}{38})\), so:

  • \(\displaystyle \text{E}[ X ] = np = 3\frac{18}{38}\)
  • \(\displaystyle \text{E}[ Z ] = np = 2\frac{18}{38}\)
  • \(\text{E}[ X^2 ] = \text{Var}[ X ] + \text{E}[ X ]^2\) (by Proposition 11.2), so \(\displaystyle \text{E}[ X^2 ] = np(1-p) + (np)^2 = 3\frac{18}{38}\left(3 \frac{18}{38} + \frac{20}{38}\right)\).

Plugging these values into the above formula, we obtain \[ \begin{align*} \text{E}[ XY ] &= \text{E}[ X^2 ] + \text{E}[ X ]\text{E}[ Z ] \\ &= 3 \frac{18}{38} \left( 3 \frac{18}{38} + \frac{20}{38} \right) + \left( 3 \frac{18}{38} \right) + \left( 2 \frac{18}{38} \right) \\ &\approx 4.11, \end{align*}\] which matches the answer that we obtained using 2D LotUS. However, this calculation only required evaluating expectations of one random variable.

Why would we be interested in \(\text{E}[ XY ]\)? It turns out to be useful for summarizing the relationship between \(X\) and \(Y\). We take up this issue in Chapter 15.