Lesson 21 Sums of Random Variables

Theory

Let \(X\) and \(Y\) be random variables. What is the distribution of their sum—that is, the random variable \(T = X + Y\)?

In principle, we already know how to calculate this. To determine the distribution of \(T\), we need to calculate \[ f_T(t) \overset{\text{def}}{=} P(T = t) = P(X + Y = t), \] which we can do by summing the joint p.m.f. over the appropriate values: \[\begin{equation} \sum_{(x, y):\ x + y = t} f(x, y). \tag{21.1} \end{equation}\]

For example, to calculate the total number of bets that Xavier and Yolanda win, we calculate \(P(X + Y = t)\) for \(t = 0, 1, 2, \ldots, 8\). The probabilities that we would need to sum for \(t=4\) are highlighted in the joint p.m.f. table below: \[ \begin{array}{rr|cccc} & 5 & 0 & 0 & 0 & .0238 \\ & 4 & \fbox{0} & 0 & .0795 & .0530 \\ y & 3 & 0 & \fbox{.0883} & .1766 & .0294 \\ & 2 & .0327 & .1963 & \fbox{.0981} & 0 \\ & 1 & .0726 & .1090 & 0 & \fbox{0} \\ & 0 & .0404 & 0 & 0 & 0 \\ \hline & & 0 & 1 & 2 & 3\\ & & & & x \end{array}. \]

For a fixed value of \(t\), \(x\) determines the value of \(y\) (and vice versa). In particular, \(y = t - x\). So we can write (21.1) as a sum over \(x\): \[\begin{equation} f_T(t) = \sum_x f(x, t-x). \tag{21.2} \end{equation}\]

This is the general equation for the p.m.f. of the sum \(T\). If the random variables are independent, then we can actually say more.

Theorem 21.1 (Sum of Independent Random Variables) Let \(X\) and \(Y\) be independent random variables. Then, the p.m.f. of \(T = X + Y\) is the convolution of the p.m.f.s of \(X\) and \(Y\): \[\begin{equation} f_T = f_X * f_Y. \tag{21.3} \end{equation}\] The convolution operator \(*\) in (21.3) is defined as follows: \[ f_T(t) = \sum_x f_X(x) \cdot f_Y(t-x). \] Note that the verb form of “convolution” is convolve, not “convolute”, even though many students find convolution quite convoluted!

Proof. This follows from (21.2), after observing that independence means that the joint distribution is the product of the marginal distributions (Theorem 19.1): \[ f(x, t-x) = f_X(x) \cdot f_Y(t-x). \]

Theorem 21.2 (Sum of Independent Binomials) Let \(X\) and \(Y\) be independent \(\text{Binomial}(n, p)\) and \(\text{Binomial}(m, p)\) random variables, respectively. Then \(T = X + Y\) follows a \(\text{Binomial}(n + m, p)\) distribution.

Proof. We apply Theorem 21.1 to binomial p.m.f.s.

\[\begin{align*} f_T(t) &= \sum_{x=0}^t f_X(x) \cdot f_Y(t-x) \\ &= \sum_{x=0}^t \binom{n}{x} p^x (1-p)^{n-x} \cdot \binom{m}{t-x} p^{t-x} (1-p)^{m-(t-x)} \\ &= \sum_{x=0}^t \binom{n}{x} \binom{m}{t-x} p^t (1-p)^{n+m-t} \\ &= \binom{n+m}{t} p^t (1-p)^{n+m-t}, \end{align*}\] which is the p.m.f. of a \(\text{Binomial}(n + m, p)\) random variable.

In the last equality, we used the fact that \[\begin{equation} \sum_{x=0}^t \binom{n}{x} \binom{m}{t-x} = \binom{n+m}{t}. \tag{21.4} \end{equation}\] This equation is known as Vandermonde’s identity. One way to see it is to observe \[ \sum_{x=0}^t \frac{\binom{n}{x} \binom{m}{t-x}}{\binom{n+m}{t}} = 1, \] since we are summing the p.m.f. of a \(\text{Hypergeometric}(t, n, m)\) random variable over all of its possible values \(0, 1, 2, \ldots, t\). Now, if we multiply both sides of this equality by \[ \binom{n+m}{t}, \] we obtain Vandermonde’s identity (21.4).

However, we can see that the sum of two independent binomials must be binomial another way. \(X\) represents the number of \(\fbox{1}\)s in \(n\) draws with replacement from a box. \(Y\) represents the number of \(\fbox{1}\)s in \(m\) separate draws with replacement from the same box:

The draws must be separate because we need \(X\) to be independent of \(Y\).
We can use the same box because \(p\) (which corresponds to how many \(\fbox{1}\)s and \(\fbox{0}\)s there are in the box) is the same for \(X\) and \(Y\).

Instead of drawing \(m\) times for \(X\) and \(n\) times for \(Y\), we could simply draw \(m+n\) times with replacement from this box. \(X+Y\) is then the number of \(\fbox{1}\)s in these \(m+n\) draws and must also be binomial.

Essential Practice

Let \(X\) and \(Y\) be independent \(\text{Poisson}(\mu)\) and \(\text{Poisson}(\nu)\) random variables. Use convolution to find the distribution of \(X + Y\). (Hint: It is a named distribution.) Then, by making an analogy to a Poisson process, explain why this must be the distribution of \(X + Y\).

(The binomial theorem will come in handy: \(\sum_{x=0}^n \binom{n}{x} a^x b^{n-x} = (a + b)^n\).)
Let \(X\) and \(Y\) be independent \(\text{Geometric}(p)\) random variables. Use convolution to find the distribution of \(X + Y\). (Hint: It is a named distribution. It may help to remember that \(\sum_{i=1}^m 1 = m = \binom{m}{1}\).) Then, by making an analogy to a box model, explain why this has to be the distribution of \(X + Y\).
Give an example of two \(\text{Binomial}(n=3, p=0.5)\) random variables \(X\) and \(Y\), where \(T = X + Y\) does not follow a \(\text{Binomial}(n=6, p=0.5)\) distribution. Why does this not contradict Theorem 21.2?

Additional Exercises

Let \(X\) and \(Y\) be independent random variables with the p.m.f.

\(x\)	1	2	3	4	5	6
\(f(x)\)	\(1/6\)	\(1/6\)	\(1/6\)	\(1/6\)	\(1/6\)	\(1/6\)

Use convolution to find the p.m.f. of \(T = X + Y\). Why does the answer make sense? (Hint: \(X\) and \(Y\) represent the outcomes when you roll two fair dice.)