In Chapter 32, we proved the Weak Law of Large Numbers (Theorem 32.1), which states that if \(X_1, \dots, X_n\) are i.i.d. with finite mean \(\mu\) and finite variance, then the sample mean \(\bar{X}\) converges to \(\mu\) in probability: \[
\bar{X} \stackrel{p}{\to} \mu.
\] In Example 32.1 and Example 32.2, we were able to visualize \(\bar{X}\) concentrating at \(\mu\) as \(n\) got very large. However, if we want to calculate any probabilities involving \(\bar{X}\), we need to know its distribution.
Since \[
\bar{X} = \frac{1}{n} \sum_{i=1}^n X_i,
\] if we know the distribution of \(\sum_{i=1}^n X_i\), we can find the distribution of \(\bar{X}\).
The process used to determine the distribution of a sum is called convolution. Convolution is done with two independent random variables at a time. That is, to determine the distribution of \(S_n = \sum_{i=1}^n X_i\),
- we first determine the distribution of \(S_2 = X_1 + X_2\),
- then determine the distribution of \(S_3 = (X_1 + X_2) + X_3 = S_2 + X_3\),
- then determine the distribution of \(S_4 = (X_1 + X_2 + X_3) + X_4 = S_3 + X_4\),
and so on. Once we have a way to calculate the distribution of the sum of two independent random variables, we can calculate the distribution of the sum of any number of independent random variables.
Discrete Convolutions
Let \(X\) and \(Y\) be independent discrete random variables with PMFs \(f_X(x)\) and \(f_Y(y)\), respectively. The method for determining the PMF of \(S = X + Y\) is straightforward in theory but difficult in practice.
Since the PMF of \(S\) is \[
f_S(s) = P(S = s) = P(X + Y = s),
\] we just need to evaluate this probability for all possible values of \(s\). When \(X\) and \(Y\) are both integer-valued, the event \(\left\{ X + Y = s \right\}\) can be expressed as a disjoint union of all the different possibilities of \(X\)-values and \(Y\)-values that yield a sum of \(s\): \[
\cdots \cup \left\{ X = 0, Y = s \right\} \cup \left\{ X = 1, Y = s-1 \right\} \cup \left\{ X = 2, Y = s-2 \right\} \cup \cdots.
\]
Therefore, the PMF of \(S\) is \[
\begin{align*}
f_S(s) = P(X + Y = s) &= \sum_{x = -\infty}^\infty P(X = x, Y = s-x) \\
&= \sum_{x=-\infty}^\infty P(X = x) P(Y = s - x) \\
&= \sum_{x=-\infty}^\infty f_X(x) f_Y(s - x),
\end{align*}
\] where we used independence of \(X\) and \(Y\) in the second step.
This method is summarized in the following theorem.
Theorem 33.1 (Discrete convolution) Let \(X\) and \(Y\) be independent discrete random variables with PMFs \(f_X(x)\) and \(f_Y(y)\), respectively. Then, the PMF of \(S = X + Y\) is \[
f_S(s) = \sum_x f_X(x) f_Y(s - x).
\tag{33.1}\]
\(f_S\) is said to be the discrete convolution of \(f_X\) and \(f_Y\) and is denoted \[
f_S = f_X * f_Y.
\]
We now apply Theorem 33.1 to an example.
Example 33.1 (Convolution of independent Geometrics) Let \(X\) and \(Y\) be i.i.d. \(\text{Geometric}(p)\) random variables. What is the distribution of \(S = X + Y\)?
Since \(X\) and \(Y\) both take integer values greater than or equal to 1, it follows that \(S\) takes integer values greater than or equal to 2. For an integer \(s \geq 2\), \(x\) can take values \(1, 2, \dots, s-1\). Thus, for an integer \(s \geq 2\), \[\begin{align*}
f_S(s) &= \sum_{x=1}^{s-1} f_X(x) f_Y(s-x) \\
&= \sum_{x=1}^{s-1} \left( (1-p)^{x-1} p \right) \left( (1-p)^{s-x-1} p \right) \\
&= \sum_{x=1}^{s-1} (1-p)^{s-2} p^2 \\
&= (s-1) (1-p)^{s-2} p^2 \\
&= \binom{s - 1}{2 - 1} p^2 (1-p)^{s-2}.
\end{align*}\] This is the PMF of \(\text{NegativeBinomial}(2,p)\) (Definition 12.4), and so, \[
S = X + Y \sim \text{NegativeBinomial}(2,p).
\] This is in line with our motivation of a negative binomial random variable as the sum of independent geometric random variables in Section 12.4.
We can apply Theorem 33.1 to independent random variables that are not identically distributed.
Example 33.2 (Convolution of independent Poissons) Let \(X \sim \text{Poisson}(\mu_1)\) and \(Y \sim \text{Poisson}(\mu_2)\) be independent. What does \(S = X+Y\) look like?
\(X\) and \(Y\) both take values \(0, 1, 2, \dots\), and so, \(S\) will take values \(0, 1, 2, \dots\). Let \(s \geq 0\) be an integer; then \(x\) can take values \(0, 1, \dots, s\). Thus, \[\begin{align*}
f_S(s) &= \sum_{x=0}^s P(X = x) P(Y = s-x) \\
&= \sum_{x=0}^s \left( \frac{\mu_1^x e^{-\mu_1}}{x!} \right) \left( \frac{\mu_2^{s-x} e^{-\mu_2}}{(s-x)!} \right) \\
&= \sum_{x=0}^s e^{-\mu_1 - \mu_2} \frac{\mu_1^x \mu_2^{s-x}}{x!(s-x)!} \\
&= e^{-\mu_1 - \mu_2} \sum_{x=0}^s \frac{1}{x!(s-x)!} \mu_1^x \mu_2^{s-x} \\
&= e^{-\mu_1 - \mu_2} \sum_{x=0}^s \frac{s!}{x!(s-x)!} \frac{\mu_1^x \mu_2^{s-x}}{s!} \\
&= e^{-\mu_1 - \mu_2} \sum_{x=0}^s \binom{s}{x} \frac{\mu_1^x \mu_2^{s-x}}{s!} \\
&= \frac{e^{-\mu_1 - \mu_2}}{s!} \sum_{x=0}^s \binom{s}{x} \mu_1^x \mu_2^{s-x} \\
&= \frac{e^{-\mu_1 - \mu_2}}{s!} (\mu_1 + \mu_2)^s,
\end{align*}\] which is the PMF of \(\text{Poisson}(\mu_1 + \mu_2)\)! Thus, \[
S = X+Y \sim \text{Poisson}(\mu_1 + \mu_2).
\]
In other words, the sum of two independent Poissons is another Poisson distribution with the parameters added. An immediate consequence of Example 33.2 is that if \(X_1, \dots, X_n\) are i.i.d. \(\text{Poisson}(\mu)\), then \[
X_1 + \cdots + X_n \sim \text{Poisson}(n \mu).
\]
Continuous Convolutions
Let \(X\) and \(Y\) be independent continuous random variables. The process of finding the PDF of \(S = X+Y\) is similar to Theorem 33.1, except with PMFs replaced by PDFs and sums replaced by integrals.
Theorem 33.2 (Continuous convolution) Let \(X\) and \(Y\) be independent continuous random variables with PDFs \(f_X(x)\) and \(f_Y(y)\), respectively. Then, the PDF of \(S = X + Y\) is \[
f_S(s) = \int_{-\infty}^\infty f_X(x) f_Y(s - x)\,dx.
\tag{33.2}\]
\(f_S\) is said to be the convolution of \(f_X\) and \(f_Y\) and is denoted \[
f_S = f_X * f_Y.
\]
In Chapter 20, we discussed a general way to find the distribution of a transformed random variable. We apply that same recipe here:
- Compute the CDF \(F_S(s)\)
- Differentiate (with respect to \(s\)) to get the PDF \(f_S(s)\).
Once we have the PDF, we can compute probabilities related to \(S = X + Y\). In general, \[\begin{align*}
F_S(s) &= P(X + Y \leq s) \\
&= \iint_{x+y \leq s} f_{X,Y}(x,y) \, dy \, dx \\
&= \iint_{x+y \leq s} f_X(x) f_Y(y) \, dy \, dx \qquad \qquad \text{($X$ and $Y$ are independent)} \\
&= \int_{-\infty}^\infty \int_{-\infty}^{s-x} f_X(x) f_Y(y) \, dy \, dx \\
&= \int_{-\infty}^\infty f_X(x) \int_{-\infty}^{s-x} f_Y(y) \, dy \, dx \\
&= \int_{-\infty}^\infty f_X(x) \left( F_Y(y) \right) \Biggr|_{-\infty}^{s-x} \, dx \\
&= \int_{-\infty}^\infty f_X(x) F_Y(s-x) \, dx.
\end{align*}\]
Now, we differentiate \(F_S\) to obtain the PDF \(f_S\). \[\begin{align*}
f_S(s) &= \frac{d}{ds} \int_{-\infty}^\infty f_X(x) F_Y(s-x) \, dx \\
&= \int_{-\infty}^\infty f_X(x) \frac{d}{ds} F_Y(s-x) \, dx \\
&= \int_{-\infty}^\infty f_X(x) f_Y(s-x) \, dx.
\end{align*}\]
Now we apply Theorem 33.2 to several examples. The first example illustrates how tedious the computations can be in practice.
Example 33.3 (Convolution of independent standard normals) Let \(X\) and \(Y\) be i.i.d. \(\textrm{Normal}(\mu= 0, \sigma^2= 1)\). What is the distribution of \(S = X+Y\)?
By Theorem 33.2, we calculate \[\begin{align*}
f_S(s) &= \int_{-\infty}^\infty f_X(x) f_Y(s-x) \, dx \\
&= \int_{-\infty}^\infty \left( \frac{1}{\sqrt{2\pi}} \exp\left\{ -\frac{x^2}{2} \right\} \right) \left( \frac{1}{\sqrt{2\pi}} \exp \left\{ -\frac{(s-x)^2}{2} \right\} \right) \, dx \\
&= \int_{-\infty}^\infty \frac{1}{2\pi} \exp \left\{ -\frac{x^2}{2} - \frac{(s-x)^2}{2} \right\} \, dx \\
&= \frac{1}{2\pi} \int_{-\infty}^\infty \exp \left\{ - \frac{2x^2 - 2sx + s^2}{2} \right\} \, dx \\
&= \frac{1}{2\pi} \int_{-\infty}^\infty \exp \left\{ - \frac{1}{2} \left( \left( \sqrt{2}x - \frac{s}{\sqrt{2}} \right)^2 + \frac{s^2}{2} \right) \right\} \, dx \\
&= \frac{1}{2\pi} \exp \left\{ -\frac{s^2}{4} \right\} \int_{-\infty}^\infty \exp \left\{ -\frac{1}{2} \left( \sqrt{2}x - \frac{s}{\sqrt{2}} \right)^2 \right\} \, dx \\
&= \frac{1}{2\pi} \exp \left\{ -\frac{s^2}{4} \right\} \int_{-\infty}^\infty \exp\left\{ -\frac{y^2}{2} \right\} \, \frac{dy}{\sqrt{2}} \\
&= \frac{1}{2 \sqrt{\pi}} \exp \left\{ -\frac{s^2}{4} \right\} \int_{-\infty}^\infty \underbrace{\frac{1}{\sqrt{2\pi}} \exp\left\{ -\frac{y^2}{2} \right\}}_{\text{PDF of Normal(0,1)}} \, dy \\
&= \frac{1}{2\sqrt{\pi}} \exp \left\{ -\frac{s^2}{4} \right\} \\
&= \frac{1}{\sqrt{2\pi} \cdot \sqrt{2}} \exp \left\{ -\frac{s^2}{2 \cdot 2} \right\},
\end{align*}\] which is the PDF of \(\textrm{Normal}(\mu= 0, \sigma^2= 2)\). Hence, \[
S = X + Y \sim \textrm{Normal}(\mu= 0, \sigma^2= 2).
\]
The calculation in Example 33.3 was already quite formidable, and we have not even begun to consider adding general normal random variables! Clearly, a simpler approach is needed. We will present a simpler approach in Chapter 34 and revisit this example in Example 34.6.
When applying Theorem 33.2, we often need to be mindful of the support of the distributions, as the next example illustrates.
Example 33.4 (Convolution of independent uniforms) Let \(X\) and \(Y\) be independent \(\textrm{Uniform}(a= 0, b= 1)\). What is the distribution of \(S = X + Y\)?
By Theorem 33.2, \[
f_S(s) = \int_{-\infty}^\infty f_X(x) f_Y(s-x) \, dx.
\] Since both \(f_X(x)\) and \(f_Y(s-x)\) take values 0 or 1, it follows that the product will also be 0 or 1. In fact, the integrand will be 1 only if both \(f_X(x)\) and \(f_Y(s-x)\) are 1. Depending on the value of \(s\), the two PDFs intersect in different ways. In order to make the visualization easier, we start with the following representations of \(\color{blue}f_X(x)\) and \(\color{red}f_Y(s-x)\):
We can see that the supports of \(\color{blue}f_X(x)\) and \(\color{red}f_Y(s-x)\) do not intersect if \(s < 0\):
or if \(s > 2\):
Hence, if \(s < 0\) or \(s > 2\), then \[
f_X(x) f_Y(s-x) = 0,
\] and so, \(f_S(s) = 0\). Now, there are two ways for the supports to overlap.
If \(0 < s < 1\), the supports overlap in the following way:
The supports overlap between \(0\) and \(s\), and thus, \[
f_S(s) = \int_{-\infty}^\infty f_X(x) f_Y(s-x) \, dx = \int_0^s \, dx = s
\] for \(0 < s < 1\).
The last type of intersection to consider happens when \(1 < s < 2\):
The supports overlap between \(s-1\) and \(1\), and so, \[
f_S(s) = \int_{-\infty}^\infty f_X(x) f_Y(s-x) \, dx = \int_{s-1}^1 \, dx = 2-s
\] for \(1 < s < 2\).
Grouping everything together, we have \[
f_S(s) = \begin{cases} s, & 0 < s < 1 \\ 2-s, & 1< s < 2 \\ 0, & \text{otherwise} \end{cases}.
\]
In the last example, we demonstrate how to use convolutions on random variables with possibly different distributions.
Example 33.5 (Sum of independent exponentials) Let \(X\) and \(Y\) be independent \(\text{Exponential}(\lambda)\). We first calculate the density of \(S = X+Y\).
By Theorem 33.2, for \(s > 0\), \[\begin{align*}
f_S(s) &= \int_{-\infty}^\infty f_X(x) f_Y(s-x) \, dx \\
&= \int_0^s f_X(x) f_Y(s-x) \, dx & (\text{$f_S(x) = 0$ for $x < 0$ and $f_Y(s-x) = 0$ for $x > s$}) \\
&= \int_0^s \left( \lambda e^{-\lambda x} \right) \left( \lambda e^{-\lambda (s-x)} \right) \, dx \\
&= \int_0^s \lambda^2 e^{-\lambda s} \, dx \\
&= \lambda^2 s e^{-\lambda s}.
\end{align*}\]
What happens if we add a third exponential? Let \(W \sim \text{Exponential}(\lambda)\) be independent of \(X\) and \(Y\) (and thus of \(S\)). What is the density of \(T = X+Y+W = S+W\)?
Again, by Theorem 33.2, for \(t > 0\), \[\begin{align*}
f_T(t) &= \int_{-\infty}^\infty f_S(s) f_W(t-s) \, ds \\
&= \int_0^t f_S(s) f_W(t-s) \, ds & (\text{$f_S(s) = 0$ for $s < 0$ and $f_W(t - s) = 0$ for $s > t$}) \\
&= \int_0^t \left( \lambda^2 s e^{-\lambda s} \right) \left( \lambda e^{-\lambda (t - s)} \right) \, ds \\
&= \int_0^t \lambda^3 s e^{-\lambda t} \, ds \\
&= \lambda^3 e^{-\lambda t} \int_0^t s \, ds \\
&= \frac{\lambda^3}{2} t^2 e^{-\lambda t}.
\end{align*}\]
Exercises
Exercise 33.1 Let \(X\) and \(Y\) be independent continuous random variables \[
f_{X-Y}(z) = \int_{-\infty}^\infty f_X(x) f_Y(x-z) \, dx.
\]
Exercise 33.2 Let \(X \sim \text{Binomial}(n,p)\) and \(Y \sim \text{Binomial}(m,p)\) be independent. What is the distribution of \(S = X+Y\)?
Explain why the result makes sense intuitively.
Exercise 33.3 Let \(X \sim \text{NegativeBinomial}(r,p)\) and \(Y \sim \text{NegativeBinomial}(s,p)\) be independent. What is the distribution of \(S = X+Y\)?
Explain why the result makes sense intuitively.
Exercise 33.5 Using induction, derive a general formula for the PDF of \(S_n = X_1 + \dots + X_n\), where \(X_i\) are i.i.d. \(\text{Exponential}(\lambda)\). Note that the case of \(n = 2, 3\) was already done in Example 33.5.