41 Beta Distribution
$$
$$
In this chapter, we introduce a new named continuous distribution, the beta distribution. This distribution arises most naturally as the order statistics of i.i.d. standard uniform random variables.
Let \(U_1, \dots, U_n\) be i.i.d. \(\textrm{Uniform}(a= 0, b= 1)\) random variables. What is the distribution of \(U_{(k)}\)? We already computed this in Example 40.2 using Proposition 40.1. The next proposition restates this result.
The distribution of \(U_{(k)}\) is called the \(\textrm{Beta}(\alpha= k, \beta= n-k+1)\) distribution. Notice that the standard uniform distribution is a special case of the beta distribution when \(n = k = 1\) so that \(\alpha = \beta = 1\).
41.1 Generalizing the Beta Distribution
Because the beta distribution is supported on \((0, 1)\), it can be used to model data that is naturally a proportion, such as relative humidity. Suppose we want to model the (daily average) relative humidity in Austin, Texas using a beta distribution. The daily averages from 2023 are below. Some candidates are \(\text{Beta}(\alpha=7, \beta=1)\) (shown in blue) and \(\text{Beta}(\alpha=7, \beta=2)\) (shown in orange).
Unfortunately, \(\beta=2\) is a bit too far to the left, but \(\beta=1\) is too far right. Ideally, we would pick a value of \(\beta\) in between. Can we model the data using fractional values of \(\beta\) (and \(\alpha\))?
If we look at the PDF of the beta distribution (Equation 41.2), we see that \(\alpha\) and \(\beta\) appear inside factorials. We can generalize the beta distribution to non-integer values of \(\alpha\) and \(\beta\) by replacing the factorial by the gamma function (Definition 38.1), as we did for the gamma distribution in Definition 38.2.
When \(\alpha\) and \(\beta\) are not integers, the beta distribution does not have a simple interpretation (unlike for integer values, where it can be interpreted as the order statistics of i.i.d. uniforms). Nevertheless, non-integer parameters afford us flexibility in modeling.
Next, we calculate formulas for the expected value and variance.
The support of the beta distribution is \((0, 1)\), so if we shift or scale a beta random variable, it will no longer follow a beta distribution. However, location-scale transformations of the beta distribution can be used to model random variables with a bounded support. For example, if we wanted to model a continuous random variable \(Y\) that only takes on values in the interval \((a, b)\), we could model it as \[ Y = (b - a) X + a, \] where \(X\) is a beta random variable. However, \(Y\) would no longer follow a beta distribution because its support is not \((0, 1)\).
41.2 Maximum Likelihood Estimation
Now that we can model data using the beta distribution with non-integer values of \(\alpha\) and \(\beta\), the question is how do we find the best values of \(\alpha\) and \(\beta\). The answer is maximum likelihood, of course, similar to Section 38.2.
The likelihood for an i.i.d. sample \(X_1, \dots, X_n\) from a \(\text{Beta}(\alpha,\beta)\) distribution is \[ L(\alpha, \beta) = \prod_{i=1}^n \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha) \Gamma(\beta)} X_i^{\alpha - 1} (1 - X_i)^{\beta - 1}, \] so the log-likelihood is \[ \ell(\alpha, \beta) = n\log\Gamma(\alpha+\beta) - n\log\Gamma(\alpha) - n\log\Gamma(\beta) + (\alpha - 1)\sum_{i=1}^n \log X_i + (\beta - 1) \sum_{i=1}^n \log(1 - X_i).\]
Equivalently, we can instead minimize \[ J(\alpha, \beta) = -\frac{1}{n} \ell(\alpha, \beta). \]
Recalling the digamma function \(\psi(\alpha) \overset{\text{def}}{=}\frac{d}{d\alpha} \log \Gamma(\alpha)\) from Example 38.2, we can calculate the gradient of \(J\) to obtain the first-order conditions: \[ \begin{align} \frac{\partial J}{\partial\alpha} &= -\psi(\alpha+\beta) + \psi(\alpha) - \frac{1}{n}\sum_{i=1}^n \log X_i = 0 \\ \frac{\partial J}{\partial\beta} &= -\psi(\alpha+\beta) + \psi(\beta) - \frac{1}{n}\sum_{i=1}^n \log (1 - X_i) = 0 \\ \end{align} \tag{41.4}\]
Solving these equations by hand is hopeless because there are \(\alpha\)s and \(\beta\)s nested inside digamma functions. However, for any given data set, we can find the optimal \(\alpha\) and \(\beta\) numerically using Newton’s method, as we did in Example 38.2.
Notice that the MLE for \(\beta\) is approximately \(1.63\), which is in between the integer values of \(1\) and \(2\) that we tried previously.
Finally, we plot the estimated beta distribution on top of the data.
41.3 Connection between the Gamma and Beta Distributions
There is a deep connection between the gamma distribution introduced in Chapter 38 and the beta distribution. In particular, if \(X\) and \(Y\) are independent gamma random variables, then it turns out that \(\displaystyle \frac{X}{X + Y}\) (which is always a number between \(0\) and \(1\)) follows a beta distribution.
Proposition 41.3 is a beautiful connection with a simple interpretation. If \(X\) and \(Y\) represent “times” spent doing two activities (e.g., doing homework and studying for exams), then \(\frac{X}{X + Y}\) represents the fraction of time in the first activity. If we model \(X\) and \(Y\) as independent gamma random variables, then Proposition 41.3 says that \(\frac{X}{X + Y}\) will be beta and, moreover, will be independent of \(X + Y\), the total time spent doing both activities.
Proposition 41.3 offers a charming calculus-free derivation of the expectation of the beta distribution.
41.4 Exercises
Exercise 41.1 (Second-price auction) In a second-price auction, the winner is the highest bidder, but they only pay the amount of the next-highest bidder. If there are five participants in an auction, and their bids are independent and uniformly distributed between \(a\) and \(b\), how much is the winner expected to pay?
Hint: Start with the case where \(a = 0\) and \(b = 1\).
Exercise 41.2 (Moments of the beta distribution) Let \(X \sim \text{Beta}(\alpha,\beta)\). Show that the \(k\)th moment of \(X\) is \[ \text{E}\!\left[ X^k \right] = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha+\beta+k)} \cdot \frac{\Gamma(\alpha+k)}{\Gamma(\alpha)}. \]
Exercise 41.3 (Mode of the beta distribution) Show that the beta PDF reaches a maximum value at \(x = \frac{\alpha-1}{\alpha+\beta-2}\).
Exercise 41.4 (Power of uniform is beta) Suppose \(U \sim \textrm{Uniform}(a= 0, b= 1)\). Show that \(X = U^k\) follows a beta distribution (with what parameters?) for \(k > 0\).
Exercise 41.5 (Generalizing the gamma-beta connection) Suppose that \(X_1, \dots, X_n\) are independent \(\text{Gamma}(\alpha_i, \lambda)\) random variables. Determine the distribution of \(\displaystyle \frac{X_1}{X_1 + \dots + X_n}\), and show that it is independent of \(X_1 + \dots + X_n\).
Hint: You should be able to do this with minimal calculation if you use the result of Proposition 41.3 and properties of the gamma distribution.
Exercise 41.6 (Another proof of beta variance) Let \(X\) and \(Y\) be independent \(\text{Gamma}(\alpha, \lambda)\) and \(\text{Gamma}(\beta, \lambda)\) random variables.
- Argue that \[ \text{E}\!\left[ \frac{X^c}{(X + Y)^c} \right] = \frac{\text{E}\!\left[ X^c \right]}{\text{E}\!\left[ (X + Y)^c \right]} \] for all real numbers \(c\).
- Use this fact to rederive the variance of the \(\text{Beta}(\alpha, \beta)\) distribution.
Exercise 41.7 (The \(F\)-distribution) Let \(X\) and \(Y\) be independent \(\text{Gamma}(\alpha, \lambda)\) and \(\text{Gamma}(\beta, \lambda)\) random variables.
- Express \(\frac{X}{Y}\) in terms of a beta random variable \(B\). Specify the distribution of \(B\). (Hint: Use the results of Proposition 41.3.)
- Use the result from part (a) to find the PDF of \(\frac{X}{Y}\).
- Find the PDF of \(W \overset{\text{def}}{=}\frac{X / \alpha}{Y / \beta}\). \(W\) is said to follow an \(F\)-distribution with \(2\alpha\) and \(2\beta\) degrees of freedom. (Hint: This is simply a scale-transformation of the random variable from part (b).)
Exercise 41.8 (Limiting distribution of order statistics) Let \(U_1, \dots, U_n\) be i.i.d. \(\textrm{Uniform}(a= 0, b= 1)\). For a fixed integer \(1 \leq k \leq n\), show that \(n U_{(k)}\) converges in distribution as \(n \to \infty\). Identify the limiting distribution.
Hint: Combine Proposition 41.1 with Proposition 41.3. You should be able to apply the Law of Large Numbers to the denominator.