In this chapter, we introduce a new named continuous distribution, the gamma distribution. This distribution arises most naturally as the sum of i.i.d. exponential random variables.
Let \(X_i\) be i.i.d. \(\text{Exponential}(\lambda)\) random variables. What is the distribution of \(S_n = \sum_{i=1}^n X_i\)?
Of course, \(S_1 = X_1\), which is \(\text{Exponential}(\lambda)\) and has PDF \(f(x) = \lambda e^{-\lambda x}\) for \(x > 0\). We know further from Example 33.5 that:
\(S_2 = X_1 + X_2\) has PDF \(f(x) = \lambda^2 x e^{-\lambda x}\) for \(x > 0\).
\(S_3 = X_1 + X_2 + X_3\) has PDF \(f(x) = \frac{\lambda^3}{2} x^2 e^{-\lambda x}\) for \(x > 0\).
Based on these three examples, we might make the following conjecture.
Proposition 35.1 (Distribution of the sum of exponentials) The PDF of \(S_n\) is \[
f(x) = \frac{\lambda^n}{(n-1)!} x^{n-1} e^{-\lambda x}; x > 0.
\tag{35.1}\]
Proof
Equation 35.1 can be proved in a number of ways. For example, Exercise 33.5 asked you to prove it by induction. Here is an alternative approach using moment generating functions.
The MGF of \(S_n\), as the sum of i.i.d. exponential random variables, must be \[
M_{S_n}(t) = \text{E}\!\left[ e^{tS_n} \right] = \text{E}\!\left[ e^{t(X_1 + \dots + X_n)} \right] = \text{E}\!\left[ e^{tX_1} \right] \cdots \text{E}\!\left[ e^{tX_n} \right] = M_{X_1}(t)^n.
\] All that remains is to calculate \(M_{X_1}(t)\), the MGF of the \(\text{Exponential}(\lambda)\) distribution. \[
M_{X_1}(t) = \text{E}\!\left[ e^{tX_1} \right] = \int_{0}^\infty e^{tx} \lambda e^{-\lambda x}\,dx = \lambda \int_0^\infty e^{-(\lambda - t)x}\,dx = \frac{\lambda}{\lambda - t},
\] where \(t < \lambda\), since otherwise the integral diverges to \(\infty\).
Now, we just need to show that Equation 35.1 has the same MGF. We can calculate the MGF using LotUS: \[\begin{align}
\int_0^\infty e^{tx} \cdot \frac{\lambda^n}{(n-1)!} x^{n-1} e^{-\lambda x}\,dx &= \lambda^n \int_0^\infty \frac{1}{(n-1)!} x^{n-1} e^{-(\lambda - t) x}\,dx \\
&= \frac{\lambda^n}{(\lambda - t)^n} \underbrace{\int_0^\infty \frac{(\lambda - t)^n}{(n-1)!} x^{n-1} e^{-(\lambda - t) x}\,dx}_1 \\
&= \left(\frac{\lambda}{\lambda - t}\right)^n,
\end{align}\] for \(t < \lambda\) (since otherwise the integral diverges). Since the MGFs match, Equation 35.1 must be the PDF of \(S_n\).
The distribution of \(S_n\) is called the \(\text{Gamma}(n, \lambda)\) distribution, with PDF given by Equation 35.1. Because we know that it arose from the sum of \(n\) i.i.d. exponentials, we can immediately conclude the following about the gamma distribution:
In the special case \(n = 1\), it reduces to the \(\text{Exponential}(\lambda)\) distribution.
Its expectation is \(\text{E}\!\left[ S_n \right] = n\text{E}\!\left[ X_1 \right] = \frac{n}{\lambda}\).
Its variance is \(\text{Var}\!\left[ S_n \right] = n\text{Var}\!\left[ X_1 \right] = \frac{n}{\lambda^2}\).
The following data contains the prices of 100 home sales (in thousands of dollars) in Gainesville, Florida. (Agresti 2015) Suppose we wish to model the prices using a gamma distribution. Two candidates are \(\text{Gamma}(n=3, \lambda=.021)\) and \(\text{Gamma}(n=4, \lambda=.021)\), shown in blue and orange, respectively.
Unfortunately, \(n=3\) is too far to the left and \(n=4\) is too far to the right. Can we model the data using some value of \(n\) in between?
At first glance, the answer appears to be no. Equation 35.1 contains \((n - 1)!\), and factorials are only defined for integers. But what if we could generalize the factorial to non-integer values?
Definition 35.1 (Gamma function) The gamma function\(\Gamma(\alpha)\) is defined by \[
\Gamma(\alpha) = \int_0^\infty t^{\alpha - 1} e^{-t} \, dt
\tag{35.2}\] for any \(\alpha > 0\).
We can evaluate \(\Gamma(\alpha)\) for particular values of \(\alpha\):
\(\Gamma(1) = \int_0^\infty e^{-t}\,dt = 1\), since this is the total probability under a standard exponential PDF.
\(\Gamma(2) = \int_0^\infty t e^{-t}\,dt = 1\), since this is \(\text{E}\!\left[ Z \right]\) for a standard exponential random variable \(Z\).
\(\Gamma(3) = \int_0^\infty t^2 e^{-t}\,dt = 2\), since this is \(\text{E}\!\left[ Z^2 \right]\) for a standard exponential random variable \(Z\).
The gamma function generalizes the factorial to non-integer values.
Theorem 35.1 (Gamma function generalizes the factorial) The gamma function (Equation 35.2) satisfies the recursion \[
\Gamma(\alpha + 1) = \alpha\Gamma(\alpha),
\tag{35.3}\]
which implies that for positive integer values of \(n\), \[\Gamma(n) = (n - 1)! \tag{35.4}\]
Proof
To establish the recursion Equation 35.3, we use integration by parts.
Figure 35.1 shows the gamma function on top of \((n - 1)!\) (for integer values of \(n\)). Notice how the gamma function connects the dots, allowing us to make sense of non-integer factorials, like \(3.5!\).
Therefore, we can replace \((n-1)!\) in Equation 35.1 with \(\Gamma(n)\) and generalize the gamma distribution to non-integer parameters. As a reminder that \(n\) need not be an integer, we will rename this parameter to \(\alpha\).
Definition 35.2 (Gamma distribution) A random variable \(X\) is said to follow a \(\text{Gamma}(\alpha, \lambda)\) distribution if its PDF is \[
f_X(x) = \begin{cases} \displaystyle\frac{\lambda^\alpha}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\lambda x} & x > 0 \\ 0 & \text{otherwise} \end{cases},
\] where \(\alpha > 0\) parametrizes the shape and \(\lambda > 0\) parametrizes the rate.
When the shape parameter \(\alpha\) is not an integer, the gamma distribution does not have a simple interpretation (unlike for integer values of \(\alpha\), where it can be interpreted as the sum of i.i.d. exponentials). Nevertheless, non-integer shape parameters afford us flexibility in modeling.
Next, we calculate formulas for the expected value and variance. Previously, we obtained these formulas for integer shape parameters, by representing the gamma random variable as a sum of i.i.d. exponentials and appealing to linearity of expectation. However, to prove these formulas for general gamma distributions, we have to evaluate an integral.
Proposition 35.2 (Expectation and variance of gamma) Let \(X\) be a \(\text{Gamma}(\alpha, \lambda)\) random variable.
\[
\begin{aligned}
\text{E}\!\left[ X \right] &= \frac{\alpha}{\lambda} & \text{Var}\!\left[ X \right] &= \frac{\alpha}{\lambda^2}.
\end{aligned}
\]
Proof
We compute the first moment:
\[\begin{align*}
\text{E}\!\left[ X \right] &= \int_0^\infty x \cdot \frac{\lambda^\alpha}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\lambda x} \, dx \\
&= \frac{1}{\Gamma(\alpha)} \int_0^\infty (\lambda x)^\alpha e^{-\lambda x} \, dx \qquad \qquad (y = \lambda x) \\
&= \frac{1}{\Gamma(\alpha)} \int_0^\infty \frac{1}{\lambda} y^\alpha e^{-y} \, dy \\
&= \frac{1}{\lambda \Gamma(\alpha)} \int_0^\infty y^{(\alpha+1)-1} e^{-y} \, dy \\
&= \frac{1}{\lambda \Gamma(\alpha)} \cdot \Gamma(\alpha+1) \\
&= \frac{1}{\lambda \Gamma(\alpha)} \cdot \alpha \Gamma(\alpha) \\
&= \frac{\alpha}{\lambda}.
\end{align*}\] In the second-to-last step, we used the recursion property (Equation 35.3).
We can compute the second moment similarly: \[\begin{align*}
\text{E}\!\left[ X^2 \right] &= \int_0^\infty x^2 \cdot \frac{\lambda^\alpha}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\lambda x} \, dx \\
&= \frac{1}{\lambda \Gamma(\alpha)} \int_0^\infty (\lambda x)^{\alpha + 1} e^{-\lambda x} \, dx \qquad \qquad (y = \lambda x) \\
&= \frac{1}{\lambda \Gamma(\alpha)} \int_0^\infty \frac{1}{\lambda} y^{\alpha + 1} e^{-y} \, dy \\
&= \frac{1}{\lambda^2 \Gamma(\alpha)} \int_0^\infty y^{(\alpha+2) - 1} e^{-y} \, dy \\
&= \frac{1}{\lambda^2 \Gamma(\alpha)} \cdot \Gamma(\alpha+2) \\
&= \frac{1}{\lambda^2 \Gamma(\alpha)} \cdot (\alpha+1) \alpha \Gamma(\alpha) \\
&= \frac{\alpha(\alpha+1)}{\lambda^2}.
\end{align*}\] In the second-to-last step, we used the recursion property (Equation 35.3) twice.
Thus, the variance is \[
\text{Var}\!\left[ X \right] = \frac{\alpha(\alpha+1)}{\lambda^2} - \frac{\alpha^2}{\lambda^2} = \frac{\alpha}{\lambda^2}.
\]
Of course, we could have also derived the expectation and variance if we knew its moment generating function. The MGF is derived below; Exercise 35.1 asks you to rederive the expectation and variance using Equation 35.5.
Proposition 35.3 (MGF of gamma) Let \(X\) be a \(\text{Gamma}(\alpha, \lambda)\) random variable. Then, its MGF is
\[
M_X(t) = \left(\frac{\lambda}{\lambda - t}\right)^{\alpha}; t < \lambda.
\tag{35.5}\]
The MGF provides quick proofs of the following properties.
Proposition 35.4 (Properties of the gamma distribution) Let \(X\) and \(Y\) be independent \(\text{Gamma}(\alpha,\lambda)\) and \(\text{Gamma}(\beta,\lambda)\) random variables, respectively.
Then,
\[
aX \sim \text{Gamma}\left(\alpha, \frac{\lambda}{a}\right).
\tag{35.6}\] and
The MGF of \(aX\) is \[\begin{align*}
M_{aX}(t) &= \text{E}\!\left[ e^{t \cdot aX} \right] \\
&= \text{E}\!\left[ e^{(at)X} \right] \\
&= M_X(at) \\
&= \left( \frac{\lambda}{\lambda - at} \right)^\alpha \\
&= \left( \frac{\lambda/a}{\lambda/a - t} \right)^\alpha,
\end{align*}\] which is the MGF of \(\text{Gamma}\left( \alpha, \frac{\lambda}{a} \right)\). Thus, \[
aX \sim \text{Gamma}\left( \alpha, \frac{\lambda}{a} \right)
\] by Theorem 34.1.
Next, the MGF of \(X + Y\) is \[\begin{align*}
M_{X+Y}(t) = M_X(t) M_Y(t) = \left( \frac{\lambda}{\lambda - t} \right)^\alpha \left( \frac{\lambda}{\lambda - t} \right)^\beta = \left( \frac{\lambda}{\lambda - t} \right)^{\alpha+\beta},
\end{align*}\] which is the MGF of \(\text{Gamma}(\alpha+\beta,\lambda)\). Thus, \[
X+Y \sim \text{Gamma}(\alpha+\beta, \lambda)
\] by Theorem 34.1.
Proposition 35.4 is intuitive when both \(\alpha\) and \(\beta\) are integers because then \(X\) is the sum of \(\alpha\) i.i.d. exponentials and \(Y\) is the sum of \(\beta\) i.i.d. exponentials.
If we scale \(X\) by \(a\), that is equivalent to scaling each exponential by \(a\), which we know from Proposition 22.3 results in an exponential with rate \(\lambda / a\). Therefore, we are simply adding \(\alpha\) i.i.d. exponential random variables with rate \(\lambda / a\), which follows a \(\text{Gamma}(\alpha, \frac{\lambda}{a})\) distribution.
If we add \(X\) and \(Y\), then together they represent the sum of \(\alpha + \beta\) i.i.d. exponentials, which follows a \(\text{Gamma}(\alpha + \beta, \lambda)\) distribution.
35.2 Maximum Likelihood Estimation
Now that we can model data using the gamma distribution with non-integer values of \(\alpha\), the question is how do we find the best value of \(\alpha\). The answer is maximum likelihood, of course.
The likelihood for a sample of data \(X_1, \dots, X_n\) is \[ L(\alpha, \lambda) = \prod_{i=1}^n \frac{\lambda^\alpha}{\Gamma(\alpha)} X_i^{\alpha - 1} e^{-\lambda X_i}, \] so the log-likelihood is \[ \ell(\alpha, \lambda) = n\alpha \log\lambda - n\log\Gamma(\alpha) + (\alpha - 1) \sum_{i=1}^n \log X_i - \lambda \sum_{i=1}^n X_i.\]
If we know \(\alpha\), then the optimal \(\lambda\) satisfies \[
\frac{\partial\ell}{\partial\lambda} = \frac{n\alpha}{\lambda} - \sum_{i=1}^n X_i = 0,
\] so the optimal value of \(\lambda\) (for a given value of \(\alpha\)) is \[ \hat\lambda(\alpha) = \frac{\alpha}{\bar X}. \tag{35.8}\]
Substituting this back into the log-likelihood, we see that the optimal \(\alpha\) must maximize \[
\ell(\alpha, \hat\lambda(\alpha)) = n\alpha \log \frac{\alpha}{\bar X} - n\log\Gamma(\alpha) + (\alpha - 1) \sum_{i=1}^n \log X_i - n\alpha
\] or equivalently, \[
J(\alpha) = \frac{1}{n} \ell(\alpha, \hat\lambda(\alpha)) = \alpha \log \frac{\alpha}{\bar X} - \log\Gamma(\alpha) + (\alpha - 1) \frac{1}{n} \sum_{i=1}^n \log X_i - \alpha.
\tag{35.9}\]
Solving for the optimal \(\alpha\) by hand is hopeless, not least because there is an \(\alpha\) nested inside a gamma function. However, for any given data set, we can find the optimal \(\alpha\) numerically using Newton’s method.
Example 35.1 (Newton’s method for maximum likelihood)
Isaac Newton (1643-1727)
Newton’s method, first published by Isaac Newton in 1669, is a general method for maximizing (or minimizing) a function \(J(\alpha)\).
We start with a guess for \(\alpha\) and iteratively update the guess as follows: \[
\alpha \leftarrow \alpha - \frac{J'(\alpha)}{J''(\alpha)}.
\tag{35.10}\] After several iterations, \(\alpha\) will converge to a (local) maximum or minimum.
To maximize Equation 35.9 using Newton’s method, we will first need to calculate its derivatives.
The first derivative is \[ J'(\alpha) = \log\alpha - \log \bar X - \underbrace{\frac{\Gamma'(\alpha)}{\Gamma(\alpha)}}_{\psi(\alpha)} + \frac{1}{n}\sum_{i=1}^n \log X_i \] and involves the digamma function\(\psi(\alpha) \overset{\text{def}}{=}\frac{\Gamma'(\alpha)}{\Gamma(\alpha)}\), built into many scientific programming libraries and languages, including R.
The second derivative is \[ J''(\alpha) = \frac{1}{\alpha} - \psi'(\alpha), \] which involves the derivative of the digamma function, called the trigamma function.
Now, we implement Newton’s method on the prices data from above in R.
Notice that Newton’s method converges after only about 5 or 6 iterations. Try different starting guesses for \(\alpha\). No matter where you start, Newton’s method should converge to the same estimate.
Now that we have the optimal value of \(\alpha\), we can plug it into Equation 35.8 to obtain the corresponding value of \(\lambda\). Finally, we plot the estimated gamma distribution on top of the data.
If you are disappointed by the fit of this curve to the data, do not blame the gamma distribution! This is merely the member of the gamma family that fits this data the best. The fact that it does not fit the data particularly well suggests that we should look beyond the gamma distribution for a model.
35.3 Chi-Square Distribution
The gamma distribution with non-integer shape parameter \(\alpha\) also arises in the guise of the chi-square, or \(\chi^2\), distribution, an important distribution in statistics.
Let \(Z_1, \dots, Z_n\) be i.i.d. standard normal. The random variable \[S_n = Z_1^2 + \dots + Z_n^2\] is said to follow a \(\chi^2_n\) distribution. But the \(\chi^2_n\) distribution is not a new distribution; it is merely a special case of the gamma distribution!
Proposition 35.5 (Chi-square is gamma) The \(\chi^2_n\) distribution is the \(\text{Gamma}(\alpha=\frac{n}{2}, \lambda=\frac{1}{2})\) distribution.
Proof
We can prove this using MGFs. First, we calculate the MGF of \(Z_1^2\): \[\begin{align*}
M_{Z_1^2}(t) &= \text{E}\!\left[ e^{tZ_1^2} \right] \\
&= \int_{-\infty}^\infty e^{tz^2} \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2} z^2}\,dz & \text{(LotUS)} \\
&= \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2} (1 - 2t) z^2}\,dz \\
&= \frac{1}{\sqrt{1 - 2t}} \underbrace{\int_{-\infty}^\infty \frac{1}{\frac{1}{\sqrt{1 - 2t}} \sqrt{2\pi}} e^{-\frac{1}{2} (1 - 2t) z^2}\,dz}_{= 1} \\
&= \left(\frac{1}{1 - 2t}\right)^{1/2}
\end{align*}\]
Another way to write this is \[ M_{Z_1^2}(t) = \left(\frac{\frac{1}{2}}{\frac{1}{2} - t}\right)^{1/2}, \] which we recognize from Proposition 35.3 as the MGF of a \(\text{Gamma}(\alpha=\frac{1}{2}, \lambda=\frac{1}{2})\) distribution.
Now, \(S_n = Z_1^2 + \dots + Z_n^2\) is just the sum of \(n\) independent such gamma random variables. By Equation 35.7, we know that its distribution must therefore be \(\text{Gamma}(\alpha=\frac{n}{2}, \lambda=\frac{1}{2})\).
35.4 Exercises
Exercise 35.1 Rederive the expectation and variance of the gamma distribution (Proposition 35.2) using the MGF of the gamma distribution (Equation 35.5).
Agresti, Alan. 2015. Foundations of Linear and Generalized Linear Models. John Wiley & Sons.