26  From Conditionals to Marginals

In Chapter 16, we considered situations where the distribution of two discrete random variables \(X\) and \(Y\) was specified by

  1. first specifying the PMF of \(X\),
  2. then specifying the conditional PMF of \(Y\) given \(X\).

We saw that the (marginal) PMF of \(Y\) could be calculated using the Law of Total Probability (Theorem 7.1) \[ f_Y(y) = \sum_x f(x, y) = \sum_x f_X(x) f_{Y|X}(y|x). \tag{26.1}\]

We now generalize Equation 26.1 to the situation where \(X\) or \(Y\) may be continuous.

26.1 Laws of Total Probability

As a warm-up, we revisit Example 23.7, where we had two identical light bulbs whose lifetimes (in years) are i.i.d. \(\textrm{Exponential}(\lambda=0.4)\) random variables \(X\) and \(Y\). We calculated \(P(X < Y)\) in two ways, using double integrals and using symmetry. Now, we offer a third way: using the Law of Total Probability.

Example 26.1 (Lifetimes of light bulbs (Law of Total Probability version)) We want to calculate \(P(B) \overset{\text{def}}{=}P(X < Y)\). This event involves two random variables, so calculating its probability requires a double integral.

However, if we condition on the event \(\{ X = x \}\), we obtain a probability that only involves one random variable: \[ P(B | X = x) = P(X < Y | X = x) = P(x < Y). \] In the last step, we substituted the “known” value of \(X\), and since \(X\) and \(Y\) are independent, the distribution of \(Y\) does not change. Now, plugging in the CDF of the exponential distribution, we see that \[ P(B | X = x) = 1 - F_Y(x) = e^{-0.4 x}; \qquad x > 0. \]

Finally, we need to aggregate these conditional probabilities into the overall probability of \(B\). To do this, we use the continuous analog of the Law of Total Probability. \[ P(B) = \int_{-\infty}^\infty P(B | X = x) f_X(x)\,dx. \tag{26.2}\] Equation 26.2 is just like its discrete cousin, except with the PMF replaced by the PDF and the sum replaced by an integral.

Substituting the formulas for \(P(B | X = x\) and \(f_X(x)\) into Equation 26.2, we obtain \[ P(B) = \int_0^\infty e^{-0.4x} \cdot 0.4 e^{-0.4 x}\,dx = \frac{1}{2}, \] which matches the answer that we obtained using a double integral and symmetry in Example 23.7.

Example 26.1 reinforces a point we made after Example 23.7. Double integrals are rarely the most intuitive way to solve most real problems in probability and statistics. If you think carefully about the structure of a problem, you can usually avoid double integrals (or calculus altogether)!

Proposition 26.1 (Laws of Total Probability) Let \(X\) and \(Y\) be random variables.

  1. If \(X\) is a discrete random variable with PMF \(f_X(x)\), then the PMF or PDF of \(Y\) (depending on whether \(Y\) is discrete or continuous, respectively) is given \[ f_Y(y) = \sum_x f_X(x) f_{Y|X}(y|x). \tag{26.3}\]
  2. If \(X\) is a continuous random variable with PDF \(f_X(x)\), then the PMF or PDF of \(Y\) (depending on whether \(Y\) is discrete or continuous, respectively) is described by \[ f_Y(y) = \int_{-\infty}^\infty f_X(x) f_{Y|X}(y|x)\,dx. \tag{26.4}\]

Note that \(f_{Y|X}(y|x)\) denotes a conditional PMF or PDF, depending on whether \(Y\) is discrete or continuous, respectively.

We have already seen several applications of Proposition 26.1 when \(X\) and \(Y\) are both discrete in Chapter 16. So we will focus on the other three cases in the examples below.

First, we use the Law of Total Probability to derive a general formula for the PDF of the sum of two independent continuous random variables. This is both an example of the Law of Total Probability and a useful result in its own right.

Proposition 26.2 (PDF of a sum) Let \(X\) and \(Y\) be independent continuous random variables with PDFs \(f_X\) and \(f_Y\), respectively. What is the PDF of their sum \(T = X + Y\)?

We know the distribution of \(X\). We can determine the conditional distribution of \(T\) given \(X\) as follows:

  • The distribution of \(T | \{ X = x\}\) is the distribution of \(X + Y | \{ X = x\}\) by definition.
  • The distribution of \(X + Y | \{ X = x \}\) is the same as the distribution of \(x + Y\). To see this, replace \(X\) by its “known” value \(x\) and then use independence of \(X\) and \(Y\).

The above argument shows that conditional distribution of \(T | \{ X = x \}\) is the distribution of \(x + Y\), which is a location transform of \(Y\). That is, the conditional PDF of \(T\) given \(X\) is \[ f_{T | X}(t | x) = f_Y(t - x). \]

By Proposition 26.1 (part 2), the (marginal) PDF of \(T\) is \[ f_T(t) = \int_{-\infty}^\infty f_X(x) f_T(t | x)\,dx = \int_{-\infty}^\infty f_X(x) f_Y(t - x)\,dx. \tag{26.5}\]

This formula for the PDF of \(T = X + Y\) is called the convolution formula.

Example 26.2 (Bayes’s billiards balls) In the same paper that introduced Bayes’ rule, Thomas Bayes (1763) considered a binomial random variable \(X\) where the probability of “heads” \(p\) is unknown. We can model this probability as a random variable \(U\), which is equally likely to be any value between \(0\) and \(1\).

  • \(U \sim \textrm{Uniform}(\alpha= 0, \beta= 1)\)
  • \(X | \{ U = u \} \sim \text{Binomial}(n, p=u)\)

Since \(U\) is continuous and \(X\) is discrete, we are in case 2 of Proposition 26.1, so we use Equation 26.4 to derive the PMF of \(X\):

\[ \begin{aligned} f_X(k) = P(X = k) &= \int_{-\infty}^\infty f_U(u) f_{X|U}(k|u)\,du \\ &= \int_0^1 \binom{n}{k} u^k (1 - u)^{n-k}\,du & k=0, 1, \dots, n \end{aligned} \tag{26.6}\]

Equation 26.6 is a polynomial in \(u\), so for any particular choice of \(n\) and \(k\), we can expand \((1 - u)^{n-k}\) and evaluate the integral. However, it is not obvious how to come up with a general formula.

Bayes realized that the general formula must be \(f_X(k) = \frac{1}{n+1}\). That is, all values \(k=0, 1, \dots, n\) are equally likely. He argued this by imagining \(n+1\) balls being rolled across a table. (Later writers interpreted these to be billiards balls on a pool table.)

Suppose that each of the \(n+1\) balls is equally likely to land anywhere on the opposite side, independently of the other balls. If the opposite side has length \(1\), then the position of the balls are i.i.d. \(\textrm{Uniform}(\alpha= 0, \beta= 1)\) random variables. To map this situation onto the model above,

  • let \(U\) be the position of the first ball, and
  • let \(X\) be the number of the remaining \(n\) balls that lie to the left of the first ball. Conditional on the position of the first ball \(\{ U = u \}\), each ball has a probability \(u\) of being to the left, and the balls are independent by assumption. Therefore, \(X | \{ U = u\}\) is binomial.

This situation is illustrated in Figure 26.1 for \(n=6\).

Figure 26.1: Illustration of Bayes’ billiards balls, with the first ball highlighted in red.

What is the marginal distribution of \(X\)? Since we are rolling \(n+1\) i.i.d. balls, the first ball is equally likely to be any of the \(n+1\) balls. If it is the leftmost ball, then \(X = 0\); if it is the rightmost ball, then \(X = n\); all values in between are equally likely. Therefore, \[ f_X(k) = \frac{1}{n+1}; \qquad k=0, 1, \dots, n. \]

26.2 Bayes’ Rule

Proposition 26.3 (Bayes’ rule for random variables) Let \(X\) and \(Y\) be random variables. Then,

\[ f_{X | Y}(x|y) = \frac{f_X(x) f_{Y|X}(y|x)}{f_Y(y)}, \tag{26.7}\]

where \(f_X\), \(f_Y\), \(f_{X|Y}\) and \(f_{Y|X}\) are PMFs or PDFs, depending on whether \(X\) and \(Y\) are discrete or continuous.

We will prove this for the case where \(X\) and \(Y\) are both continuous. The case where \(X\) and \(Y\) are both discrete is already covered by Theorem 7.2. The case where \(X\) is discrete and \(Y\) is continuous (or vice versa) is beyond the scope of this book.

We can expand the joint PDF in two ways: \[ f_{X, Y}(x, y) = f_Y(y) f_{X|Y}(x|y) = f_X(x) f_{Y|X}(y|x). \] Dividing both sides by \(f_Y(y)\), we obtain Equation 26.7.

Example 26.3 (Applying Bayes’ rule to Bayes’s billiards balls) Continuing with Example 26.2 suppose we roll \(n+1 = 19\) balls, and \(X = 12\) of the last \(n = 18\) balls end up to the left of the first ball. In light of this information, what can we say about \(U\), the position of the first ball?

Before we knew this information, the position of the first ball was equally likely to be anywhere between \(0\) and \(1\). In light of this information that 2/3 of the 18 remaining balls ended up to the left of the first ball, it seems that the first ball was more likely to be closer to \(1\) than to \(0\).

To make this precise, we determine the conditional PDF of \(U\) given \(X\). To do this, we use Proposition 26.3. \[ \begin{align} f_{U|X}(u|x) &= \frac{f_U(u) f_{X|U}(x|u)}{f_X(x)} \\ &= \frac{1 \cdot \binom{n}{x} u^x(1-u)^{n-x}}{\frac{1}{n+1}} \\ &= (n+1) \binom{n}{x} u^x (1 - u)^{n-x} \end{align} \] for \(0 < u < 1\) and \(x = 0, 1, \dots n\).

Substituting in \(n=18\) and \(x=12\), the conditional distribution of \(U\) is \[ f_{U|X}(u|12) = 19 \binom{18}{12} u^{12} (1 - u)^6; \qquad 0 < u < 1. \]

This PDF is graphed below. As expected, most of the probability is to the right of center.

Figure 26.2: Conditional PDF of \(U\) given \(X = 12\)

We can summarize this distribution by \(\text{E}[ U|X=12 ]\), where we expect the first ball to be, given this information.

\[ \begin{align} \text{E}[ U|X=12 ] &= \int_{-\infty}^\infty u f_{U|X}(u|12)\,du \\ &= \int_0^1 u\cdot 19 \binom{18}{12} u^{12} (1 - u)^6\,du \\ &= 19 \binom{18}{12} \int_0^1 u^{13} (1 - u)^6\,du \end{align} \] To evaluate this integral, we can use Equation 26.6. By substituting \(n=19\) and \(k=13\), we see that \[ \int_0^1 \binom{19}{13} u^{13} (1 - u)^6\,du = \frac{1}{20}. \]

Therefore: \[ \begin{align} \text{E}[ U|X=12 ] &= \frac{19 \binom{18}{12}}{\binom{19}{13}} \int_0^1 \binom{19}{13} u^{13} (1 - u)^6\,du \\ &= \frac{19 \binom{18}{12}}{\binom{19}{13}} \frac{1}{20} \\ &= \frac{13}{20}. \end{align} \]

On average, we expect the first ball to be at \(13/20 = .65\), which is close to but not equal to \(12/18 \approx .667\), the number of balls that ended to the left of the first ball.

26.3 Further Examples

Example 26.4 (The \(t\)-distribution) Consider a random variable \(T\) defined as follows:

  • \(V \sim \textrm{Exponential}(\lambda=1)\)
  • \(T | \{ V = v \} \sim \textrm{Normal}(\mu= 0, \sigma= V^\alpha)\) for some \(\alpha \neq 0\).

That is, the variance of the normal distribution is a random variable, generated from a standard exponential distribution. Since the exponential distribution only generates positive numbers, \(V\) (and therefore \(V^\alpha\)) are guaranteed to be positive.

To get a feel for \(T\), let’s simulate 10000 draws of \(T\). Try varying the value of \(\alpha\) and seeing how the distribution of \(T\) varies.

The distribution of \(T\) looks bell-shaped, like the normal distribution, except with a higher probability of very large values.

To find the distribution of \(T\), we apply Proposition 26.1. Since both \(V\) and \(T\) are continuous, we are in case 2 of Proposition 26.1, so we use Equation 26.4 to derive the PDF of \(T\):

\[ \begin{align} f_T(t) &= \int_{-\infty}^\infty f_V(v) f_{T|V}(t|v)\,dv \\ &= \int_{0}^\infty e^{-v} \frac{1}{v^{\alpha}\sqrt{2\pi}} e^{-t^2 / (2v^{2\alpha})}\,dv \end{align} \]

This integral is hopeless to simplify by hand, unless \(\alpha = -1/2\). In that special case, it becomes

\[ \begin{align} f_T(t) &= \int_{0}^\infty e^{-v} \frac{v^{1/2}}{\sqrt{2\pi}} e^{- t^2 v / 2}\,dv \\ &= \frac{1}{2\sqrt{2\pi}} \int_{0}^\infty v^{1/2} e^{-\left(1 + \frac{t^2}{2}\right) v} \,dv. \\ \end{align} \]

Now, the integrand looks almost like \(\text{E}[ X^{1/2} ]\), where \(X \sim \textrm{Exponential}(\lambda=1 + \frac{t^2}{2})\), except it is missing the extra factor of \(\lambda\). We can multiply and divide by \(\lambda\).

\[ \begin{align} f_T(t) &= \frac{1}{2\sqrt{2\pi}} \left(1 + \frac{t^2}{2}\right)^{-1} \underbrace{\int_{0}^\infty v^{1/2} \left(1 + \frac{t^2}{2}\right) e^{-\left(1 + \frac{t^2}{2}\right) v}\,dv}_{\text{E}[ X^{1/2} ]} \\ \end{align} \]

Now, since every exponential is a scale transformation of a standard exponential \(Z\) (Proposition 22.3), we can express \(X = \left(1 + \frac{t^2}{2} \right)^{-1} Z\), and therefore

\[ \begin{align} f_T(t) &= \frac{1}{2\sqrt{2\pi}} \left(1 + \frac{t^2}{2}\right)^{-1} \text{E}[ X^{1/2} ] \\ &= \frac{1}{2\sqrt{2\pi}} \left(1 + \frac{t^2}{2}\right)^{-3/2} \text{E}[ Z^{1/2} ] \end{align} \]

The final answer is a function of \(t\), and \(\frac{\text{E}[ Z^{1/2} ]}{2\sqrt{2\pi}}\) does not depend on \(t\), so we will absorb them all into a normalizing constant \(k\) to be evaluated numerically. That is,

\[ \begin{align} f_T(t) &= \frac{1}{k} \left(1 + \frac{t^2}{2}\right)^{-3/2} & k &\overset{\text{def}}{=}\int_{-\infty}^\infty \left(1 + \frac{t^2}{2}\right)^{-3/2}\,dt. \end{align} \]

\(f_T\) is the PDF of an important distribution in statistics called the \(t\)-distribution (specifically with 2 degrees of freedom). We graph this PDF to see how it agrees with the simulated values of \(T\).