# Lesson 21 Sums of Random Variables

## Theory

Let \(X\) and \(Y\) be random variables. What is the distribution of their sum—that is, the random variable \(T = X + Y\)?

In principle, we already know how to calculate this. To determine the distribution of \(T\), we need to calculate \[ f_T(t) \overset{\text{def}}{=} P(T = t) = P(X + Y = t), \] which we can do by summing the joint p.m.f. over the appropriate values: \[\begin{equation} \sum_{(x, y):\ x + y = t} f(x, y). \tag{21.1} \end{equation}\]

For example, to calculate the *total* number of bets that Xavier and Yolanda win, we
calculate \(P(X + Y = t)\) for \(t = 0, 1, 2, \ldots, 8\). The probabilities that we
would need to sum for \(t=4\) are highlighted in the joint p.m.f. table below:
\[ \begin{array}{rr|cccc}
& 5 & 0 & 0 & 0 & .0238 \\
& 4 & \fbox{0} & 0 & .0795 & .0530 \\
y & 3 & 0 & \fbox{.0883} & .1766 & .0294 \\
& 2 & .0327 & .1963 & \fbox{.0981} & 0 \\
& 1 & .0726 & .1090 & 0 & \fbox{0} \\
& 0 & .0404 & 0 & 0 & 0 \\
\hline
& & 0 & 1 & 2 & 3\\
& & & & x
\end{array}. \]

For a fixed value of \(t\), \(x\) determines the value of \(y\) (and vice versa). In particular, \(y = t - x\). So we can write (21.1) as a sum over \(x\): \[\begin{equation} f_T(t) = \sum_x f(x, t-x). \tag{21.2} \end{equation}\]

This is the general equation for the p.m.f. of the sum \(T\). If the random variables are independent, then we can actually say more.

**Theorem 21.1 (Sum of Independent Random Variables)**Let \(X\) and \(Y\) be independent random variables. Then, the p.m.f. of \(T = X + Y\) is the

**convolution**of the p.m.f.s of \(X\) and \(Y\): \[\begin{equation} f_T = f_X * f_Y. \tag{21.3} \end{equation}\] The convolution operator \(*\) in (21.3) is defined as follows: \[ f_T(t) = \sum_x f_X(x) \cdot f_Y(t-x). \] Note that the verb form of “convolution” is

**convolve**, not “convolute”, even though many students find convolution quite convoluted!

*Proof.*This follows from (21.2), after observing that independence means that the joint distribution is the product of the marginal distributions (Theorem 19.1): \[ f(x, t-x) = f_X(x) \cdot f_Y(t-x). \]

**Theorem 21.2 (Sum of Independent Binomials)**Let \(X\) and \(Y\) be independent \(\text{Binomial}(n, p)\) and \(\text{Binomial}(m, p)\) random variables, respectively. Then \(T = X + Y\) follows a \(\text{Binomial}(n + m, p)\) distribution.

*Proof. * We apply Theorem 21.1 to binomial p.m.f.s.

\[\begin{align*} f_T(t) &= \sum_{x=0}^t f_X(x) \cdot f_Y(t-x) \\ &= \sum_{x=0}^t \binom{n}{x} p^x (1-p)^{n-x} \cdot \binom{m}{t-x} p^{t-x} (1-p)^{m-(t-x)} \\ &= \sum_{x=0}^t \binom{n}{x} \binom{m}{t-x} p^t (1-p)^{n+m-t} \\ &= \binom{n+m}{t} p^t (1-p)^{n+m-t}, \end{align*}\] which is the p.m.f. of a \(\text{Binomial}(n + m, p)\) random variable.

In the last equality, we used the fact that \[\begin{equation} \sum_{x=0}^t \binom{n}{x} \binom{m}{t-x} = \binom{n+m}{t}. \tag{21.4} \end{equation}\] This equation is known as Vandermonde’s identity. One way to see it is to observe \[ \sum_{x=0}^t \frac{\binom{n}{x} \binom{m}{t-x}}{\binom{n+m}{t}} = 1, \] since we are summing the p.m.f. of a \(\text{Hypergeometric}(t, n, m)\) random variable over all of its possible values \(0, 1, 2, \ldots, t\). Now, if we multiply both sides of this equality by \[ \binom{n+m}{t}, \] we obtain Vandermonde’s identity (21.4).

However, we can see that the sum of two independent binomials must be binomial another way.
\(X\) represents the number of \(\fbox{1}\)s in \(n\) draws with replacement from a box.
\(Y\) represents the number of \(\fbox{1}\)s in \(m\) *separate* draws with replacement from the *same* box:

- The draws must be
*separate*because we need \(X\) to be independent of \(Y\). - We can use the
*same*box because \(p\) (which corresponds to how many \(\fbox{1}\)s and \(\fbox{0}\)s there are in the box) is the same for \(X\) and \(Y\).

## Essential Practice

Let \(X\) and \(Y\) be independent \(\text{Poisson}(\mu)\) and \(\text{Poisson}(\nu)\) random variables. Use convolution to find the distribution of \(X + Y\). (

*Hint:*It is a named distribution.) Then, by making an analogy to a Poisson process, explain why this must be the distribution of \(X + Y\).(The binomial theorem will come in handy: \(\sum_{x=0}^n \binom{n}{x} a^x b^{n-x} = (a + b)^n\).)

Let \(X\) and \(Y\) be independent \(\text{Geometric}(p)\) random variables. Use convolution to find the distribution of \(X + Y\). (

*Hint:*It is a named distribution. It may help to remember that \(\sum_{i=1}^m 1 = m = \binom{m}{1}\).) Then, by making an analogy to a box model, explain why this has to be the distribution of \(X + Y\).Give an example of two \(\text{Binomial}(n=3, p=0.5)\) random variables \(X\) and \(Y\), where \(T = X + Y\) does not follow a \(\text{Binomial}(n=6, p=0.5)\) distribution. Why does this not contradict Theorem 21.2?

## Additional Exercises

- Let \(X\) and \(Y\) be independent random variables with the p.m.f.

\(x\) | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|

\(f(x)\) | \(1/6\) | \(1/6\) | \(1/6\) | \(1/6\) | \(1/6\) | \(1/6\) |

Use convolution to find the p.m.f. of \(T = X + Y\). Why does the answer make sense?
(*Hint:* \(X\) and \(Y\) represent the outcomes when you roll two fair dice.)