27  Conditional Expectations

In Chapter 17, we defined the notation \[ \text{E}[ Y | X ] \] and used it to calculate \(\text{E}[ Y ]\). The trick was the Law of Total Expectation, which we now prove for continuous random variables.

Theorem 27.1 (Law of Total Expectation) \[ \text{E}[ Y ] = \text{E}[ \text{E}[ Y | X ] ]. \]

First, \(\text{E}[ Y|X ] \overset{\text{def}}{=}g(X)\), where \[ g(x) = \text{E}[ Y|X=x ] = \int_{-\infty}^\infty y f_{Y|X}(y|x)\,dy. \] Note that to calculate \(\text{E}[ Y|X=x ]\), we simply integrate \(y\) times the conditional PDF of \(Y\) given \(X\).

Therefore, \[ \begin{align} \text{E}[ \text{E}[ Y|X ] ] &= \text{E}[ g(X) ] & \text{(definition of $\text{E}[ Y|X ]$)} \\ &= \int_{-\infty}^\infty g(x) f_X(x)\,dx & \text{(LotUS)} \\ &= \int_{-\infty}^\infty \left[\int_{-\infty}^\infty y f_{Y|X}(y|x)\,dy \right] f_X(x)\,dx & \text{(formula for $g(x)$)} \\ &= \int_{-\infty}^\infty \int_{-\infty}^\infty y f_{Y|X}(y|x) f_X(x) \,dy \,dx & \text{(bring $f_X(x)$ inside inner integral)} \\ &= \int_{-\infty}^\infty \int_{-\infty}^\infty y f_{X, Y}(x, y) \,dy \,dx & \text{(definition of conditional PDF)} \\ &= \text{E}[ Y ] & \text{(2D LotUS)}. \end{align} \]

The Law of Total Expectation can make it easy to calculate expected values.

Example 27.1 (Bayes’ billiard balls and expectation) Consider Example 26.2. What is \(\text{E}[ X ]\), the expected number of balls to the left of the first ball (wherever it landed)?

By the Law of Total Expectation (Theorem 27.1), \[ \text{E}[ X ] = \text{E}[ \text{E}[ X | U ] ] = \text{E}[ nU ] = n\text{E}[ U ] = \frac{n}{2}. \] In the second equality, we used the fact that \(X|U\) is \(\text{Binomial}(n, p=U)\), so its expectation is \(\text{E}[ X|U ] = nU\).

We can check our answer above because we actually determined the PMF of \(X\) in Example 26.2. It was \[ f_X(k) = \frac{1}{n+1}; \qquad k=0, 1, \dots, n. \] Therefore, \[ \text{E}[ X ] = \sum_{k=0}^n k\frac{1}{n+1} = \frac{n(n+1)}{2} \frac{1}{n+1} = \frac{n}{2}.\] But the beauty of the Law of Total Expectation is that we did not have to determine the marginal PMF of \(X\) first.

Theorem 27.1 is one property of conditional expectations, but there are many other properties, which are identical to the properties in Chapter 17. We restate those properties here.

Proposition 27.1 (Linearity of Conditional Expectation) \[ \begin{aligned} \text{E}[ Y_1 + Y_2 | X ] = \text{E}[ Y_1 | X ] + \text{E}[ Y_2 | X ] \end{aligned} \]

Proposition 27.2 (Pulling Out What’s Given) \[ \begin{aligned} \text{E}[ g(X) Y | X ] = g(X) \text{E}[ Y | X ] \end{aligned} \]

Proposition 27.3 (Conditional Expectation of an Independent Random Variable) If \(X\) and \(Y\) are independent random variables, then \[ \text{E}[ Y | X ] = \text{E}[ Y ]. \]

27.1 Law of Total Variance

There is an analogous result for calculating variance from conditional variances.

Theorem 27.2 (Law of Total Variance) \[ \text{Var}[ Y ] = \text{E}[ \text{Var}[ Y | X ] ] + \text{Var}[ \text{E}[ Y | X ] ]. \]

The statement of Theorem 27.2 relies on Definition 17.2 for conditional variance, which is the same for discrete and continuous random variables. The proof of Theorem 27.2 is also identical to the proof in Theorem 17.2.

Example 27.2 (Expectation and variance of \(T\)) Recall Example 26.4, where we defined the random variable \(T\) as follows:

  • \(V \sim \textrm{Exponential}(\lambda=1)\)
  • \(T | \{ V = v \} \sim \textrm{Normal}(\mu= 0, \sigma= V^\alpha)\) for some \(\alpha \neq 0\).

We saw that the PDF of \(T\) could only be calculated for \(\alpha = -1/2\), in which case it has a \(t\)-distribution. For other values of \(\alpha\), the distribution is not tractable. Nevertheless, we can determine their expectation and variance using the Laws of Total Expectation and Total Variance.

First, note that

  • \(\text{E}[ T | V ] = 0\) and
  • \(\text{Var}[ T | V ] = V^{2\alpha}\).

Now, by the Law of Total Expectation, \[ \text{E}[ T ] = \text{E}[ \text{E}[ T | V ] ] = \text{E}[ 0 ] = 0 \] for any \(\alpha\).

By the Law of Total Variance, \[ \begin{aligned} \text{Var}[ T ] &= \text{E}[ \text{Var}[ T | V ] ] + \text{Var}[ \text{E}[ T | V ] ] \\ &= \text{E}[ V^{2\alpha} ] + \text{Var}[ 0 ] \\ &= \text{E}[ V^{2\alpha} ]. \end{aligned} \]

Since \(V\) is standard exponential, we can easily determine \(\text{E}[ V^{2\alpha} ]\) for different \(\alpha\). Therefore,

  • when \(\alpha = 1/2\), \(\text{Var}[ T ] = \text{E}[ V ] = 1\),
  • when \(\alpha = 1\), \(\text{Var}[ T ] = \text{E}[ V^2 ] = 2\), and
  • when \(\alpha = -1/2\), \(\text{Var}[ T ] = \text{E}[ V^{-1} ] = \infty\).

Remember that \(\alpha = -1/2\) corresponds to the \(t\)-distribution. We conclude that the \(t\)-distribution (with 2 degrees of freedom) has infinite variance!