22  Named Distributions

In this chapter, we introduce three named continuous distributions. Each of these three distributions can be derived from location-scale transformations (Section 21.3) of a single distribution. This makes it easy to derive the expected value and variance of these named distributions.

We have already encountered specific instances of all three distributions, so this chapter is more of a synthesis and a review of facts that you already know.

22.1 Uniform

The uniform distribution is used to model continuous random variables that are “equally likely” to take on any value in a range. In Example 18.5, we modeled the distance of one of Jackie Joyner-Kersee’s long jumps as a uniform random variable.

Definition 22.1 (Uniform distribution) A random variable \(X\) is said to have a \(\text{Uniform}(a, b)\) distribution if its PDF is

\[ f(x) = \begin{cases} \frac{1}{b - a} & a < x < b \\ 0 & \text{otherwise} \end{cases}. \tag{22.1}\]

This PDF is graphed below:

Figure 22.1: PDF of the Uniform Distribution

Equivalently, a random variable has a \(\text{Uniform}(a, b)\) distribution if its CDF is \[ F(x) = \int_{-\infty}^x f(t)\,dt = \begin{cases} 0 & x \leq a \\ \frac{x-a}{b-a} & a < x < b \\ 1 & x \geq b \end{cases}.\]

This CDF is graphed below.

Figure 22.2: CDF of the Uniform Distribution

Using Definition 22.1, the distance \(X\) of one of
Joyner-Kersee’s long jumps is a \(\textrm{Uniform}(a= 6.3, b= 7.5)\) random variable. We can use the PDF or CDF to calculate probabilities such as \(P(7.0 < X < 7.3)\).

In fact, R has a built-in function for the CDF of a uniform distribution, punif, which we can use to calculate probabilities. For example, to determine \(P(7.0 < X < 7.3)\), we can calculate \(F(7.3) - F(7.0)\) in R as follows:

We can derive any uniform distribution as a location-scale transformation of the standard uniform distribution—that is, a \(\textrm{Uniform}(a= 0, b= 1)\) distribution. To see this, let \(U\) be a standard uniform random variable. If we scale \(U\) by \((b - a)\) and shift by \(a\), then the resulting variable will be uniformly distributed between \(a\) and \(b\).

Proposition 22.1 (Location-Scale Transformation of the Uniform) Let \(U\) be a standard uniform random variable, and let \[X \overset{\text{def}}{=}(b - a) U + a.\]

The PDF of \(U\), according to Equation 22.1, is \[ f_U(x) = \begin{cases} 1 & 0 < x < 1 \\ 0 & \text{otherwise} \end{cases}. \]

By Proposition 20.3, the PDF of \(X\) is \[ \begin{aligned} f_X(x) &= \frac{1}{b - a} f_U\Big(\frac{x - a}{b - a}\Big) \\ &= \begin{cases} \frac{1}{b - a} & 0 < \frac{x - a}{b - a} < 1 \\ 0 & \text{otherwise} \end{cases} \\ &= \begin{cases} \frac{1}{b - a} & a < x < b \\ 0 & \text{otherwise}, \end{cases} \end{aligned} \] which is the PDF of a \(\text{Uniform}(a, b)\) random variable (Equation 22.1), as we wanted to show.

22.1.1 Expectation and Variance

Proposition 22.1 provides a simple way to derive the expectation and variance of any uniform distribution. First, we calculate the expectation \(\text{E}\!\left[ U \right]\) and variance \(\text{Var}\!\left[ U \right]\) for a standard uniform random variable \(U\). Then, we use properties of expected value and variance for linear transformations to derive the expectation of a general uniform random variable.

Proposition 22.2 Let \(X\) be a \(\text{Uniform}(a, b)\) random variable.

\[ \begin{aligned} \text{E}\!\left[ X \right] &= \frac{a + b}{2} & \text{Var}\!\left[ X \right] &= \frac{(b - a)^2}{12}. \end{aligned} \]

First, let \(U\) be a \(\textrm{Uniform}(a= 0, b= 1)\) random variable. We calculate its expected value and variance:

\[\begin{aligned} \text{E}\!\left[ U \right] &= \int_0^1 x \,dx \\ &= \frac{1}{2} \\ \text{Var}\!\left[ U \right] &= \text{E}\!\left[ U^2 \right] - \text{E}\!\left[ U \right]^2 \\ &= \int_0^1 x^2\,dx - \Big(\frac{1}{2}\Big)^2 \\ &= \frac{1}{3} - \frac{1}{4} \\ &= \frac{1}{12}. \end{aligned}\]

Since \(X = (b - a) U + a\), we can use Proposition 21.2 to determine its expected value to be \[ \begin{aligned} \text{E}\!\left[ X \right] &= \text{E}\!\left[ (b - a) U + a \right] \\ &= (b - a) \text{E}\!\left[ U \right] + a \\ &= (b - a) \frac{1}{2} + a \\ &= \frac{a + b}{2} \end{aligned} \] and Proposition 21.3 to determine its variance to be \[ \begin{aligned} \text{Var}\!\left[ X \right] &= \text{Var}\!\left[ (b - a) U + a \right] \\ &= (b - a)^2 \text{Var}\!\left[ U \right] \\ &= (b - a)^2 \frac{1}{12}. \end{aligned} \]

Using Proposition 22.2, it is easy to calculate that the expected distance of one of Joyner-Kersee’s long jumps is \[\text{E}\!\left[ X \right] = \frac{6.3 + 7.5}{2},\] and the variance of one of her long jumps is \[\text{Var}\!\left[ X \right] = \frac{(7.5 - 6.3)^2}{12} = 0.12.\]

22.2 Exponential

The exponential distribution is used to model the time until some event. In Example 18.9, we used this distribution to model the time until the first click of a Geiger counter.

Definition 22.2 (Exponential Distribution) A random variable \(X\) is said to have a \(\text{Exponential}(\lambda)\) distribution if its PDF is

\[ f(x) = \begin{cases} \lambda e^{-\lambda x} & x > 0 \\ 0 & \text{otherwise} \end{cases}, \tag{22.2}\]

where \(\lambda\) is the rate at which events occur.

This PDF is graphed below:

Figure 22.3: PDF of the Exponential Distribution

Equivalently, a random variable has a \(\text{Exponential}(\lambda)\) distribution if its CDF is \[ F(x) = \int_{-\infty}^x f(t)\,dt = \begin{cases} 0 & x \leq 0 \\ 1 - e^{-\lambda x} & x > 0 \end{cases}.\]

This CDF is graphed below.

Figure 22.4: CDF of the Exponential Distribution

In this language, we can describe the time (in seconds) until the first click of a Geiger counter from Example 18.9, \(T\), as an \(\textrm{Exponential}(\lambda=2.6)\) random variable. We can use the PDF or CDF to calculate probabilities such as \(P(T > 1.2)\).

In fact, R has a built-in function for the CDF of an exponential distribution, pexp, which we can use to calculate probabilities. For example, to determine \(P(T > 1.2)\), we can calculate \(1 - F(1.2)\) in R as follows:

We can derive any exponential distribution as a scale transformation of the standard exponential distribution—that is, a \(\textrm{Exponential}(\lambda=1)\) distribution. To see this, let \(Z\) be a standard exponential random variable. Note that \(Z\) is measured in time units so that the rate is \(1\). To convert back to the original time units, we need to scale \(Z\) by \(\frac{1}{\lambda}\).

Proposition 22.3 (Scale Transformation of the Exponential) Let \(Z\) be a standard exponential random variable, and let \[X \overset{\text{def}}{=}\frac{1}{\lambda} Z.\]

The PDF of \(Z\), according to Equation 22.2, is \[ f_Z(z) = \begin{cases} e^{-z} & z > 0 \\ 0 & \text{otherwise} \end{cases}. \]

By Proposition 20.3, the PDF of \(X\) is \[ \begin{aligned} f_X(x) &= \frac{1}{1 / \lambda} f_Z\Big(\frac{x}{1 / \lambda}\Big) \\ &= \lambda f_Z(\lambda x) \\ &= \begin{cases} \lambda e^{-\lambda x} & x > 0 \\ 0 & \text{otherwise} \end{cases}, \end{aligned} \] which is the PDF of an \(\text{Exponential}(\lambda)\) random variable (Equation 22.2), as we wanted to show.

22.2.1 Expectation and Variance

Proposition 22.3 provides a simple way to derive the expectation and variance of any exponential distribution. First, we calculate the expectation \(\text{E}\!\left[ Z \right]\) and variance \(\text{Var}\!\left[ Z \right]\) for a standard exponential random variable \(Z\). Then, we use properties of expected value and variance for linear transformations to derive the expectation of a general exponential random variable.

Proposition 22.4 Let \(X\) be a \(\text{Exponential}(\lambda)\) random variable.

\[ \begin{aligned} \text{E}\!\left[ X \right] &= \frac{1}{\lambda} & \text{Var}\!\left[ X \right] &= \frac{1}{\lambda^2}. \end{aligned} \]

First, let \(Z\) be a \(\textrm{Exponential}(\lambda=1)\) random variable. We calculate its expected value and variance:

\[\begin{aligned} \text{E}\!\left[ Z \right] &= \int_0^\infty z e^{-z} \,dz \\ &= \underbrace{z}_u \cdot \underbrace{-e^{-z}}_v\Big|_0^\infty - \int_0^\infty \underbrace{-e^{-z}}_{v} \,\underbrace{dz}_{du} \\ &= - ze^{-z} \Big|_0^\infty - e^{-z} \Big|_0^\infty \\ &= (0 - 0) - (0 - 1) \\ &= 1 \\ \text{Var}\!\left[ Z \right] &= \text{E}\!\left[ Z^2 \right] - \text{E}\!\left[ Z \right]^2 \\ &= \int_0^\infty z^2 e^{-z} \,dx - (1)^2 \\ &= \underbrace{z^2}_u \cdot \underbrace{-e^{-z}}_v\Big|_0^\infty - \int_0^\infty \underbrace{-e^{-z}}_{v} \cdot \underbrace{2z\, dz}_{du} - 1 \\ &= \underbrace{- z^2 e^{-z} \Big|_0^\infty}_0 + 2 \underbrace{\int_0^\infty z e^{-z} \,dx}_{=\text{E}\!\left[ Z \right] = 1} - 1 \\ &= 1. \end{aligned}\]

Since \(X = \frac{1}{\lambda} Z\), we can use Proposition 21.2 to determine its expected value to be \[ \begin{aligned} \text{E}\!\left[ X \right] &= \text{E}\!\left[ \frac{1}{\lambda} Z \right] \\ &= \frac{1}{\lambda} \text{E}\!\left[ Z \right] \\ &= \frac{1}{\lambda} \end{aligned} \] and Proposition 21.3 to determine its variance to be \[ \begin{aligned} \text{Var}\!\left[ X \right] &= \text{Var}\!\left[ \frac{1}{\lambda} Z \right] \\ &= \frac{1}{\lambda^2} \text{Var}\!\left[ Z \right] \\ &= \frac{1}{\lambda^2} \end{aligned} \]

Using Proposition 22.4, it is easy to write down the expected time of the first click of the Geiger counter from Example 18.9: \[\text{E}\!\left[ X \right] = \frac{1}{\lambda} = \frac{1}{2.6},\] and the variance of the time of the first click: \[\text{Var}\!\left[ X \right] = \frac{1}{\lambda^2} = \frac{1}{2.6^2}.\]

22.2.2 Memoryless Property

The exponential distribution is frequently used to model the time until an event. What are some practical implications of using the exponential model?

One property of the exponential distribution is that it is memoryless. That is, if the event has not happened by time \(t\), then the remaining time until the event happens has the same exponential distribution.

Proposition 22.5 (Memoryless property of the exponential distribution) The exponential distribution is memoryless. That is, if \(X\) is an \(\text{Exponential}(\lambda)\) random variable, then for all \(s, t > 0\), \[ P(X > s + t | X > t) = P(X > s). \tag{22.3}\]

Here is another way to state the memoryless property: conditional on \(X > t\), the remaining time \(X - t\) has the same distribution as the original distribution of \(X\).

We will use the fact that the CDF of \(X\) is \(F(x) = 1 - e^{-\lambda x}\), so \[ P(X > x) = 1 - F(x) = 1 - (1 - e^{-\lambda x}) = e^{-\lambda x}. \]

Now, we simply apply the definition of conditional probability: \[ \begin{aligned} P(X > s + t | X > t) &= \frac{P(\{ X > s + t \} \cap \{ X > t \})}{P(X > t)} \\ &= \frac{P(X > s + t)}{P(X > t)} \\ &= \frac{e^{-\lambda(s + t)}}{e^{-\lambda t}} \\ &= e^{-\lambda s} \\ &= P(X > s). \end{aligned} \] In the second line above, we used the fact that the event \(B = \{ X > t \}\) is “redundant” if the event \(A = \{ X > s + t \}\) happened, so \(A \cap B = A\). In the language of set theory, \(A \subset B\) implies \(A \cap B = A\).

The memoryless property is controversial. In real-world situations, we expect that if you have already waited a long time, then the remaining waiting time will be shorter. But when we use the exponential distribution to model waiting times, this will not be the case because of the memoryless proeprty. For this reason, statisticians and engineers typically use non-memoryless distributions, such as the Weibull distribution, to model waiting times.

22.3 Normal

The normal distribution is the most important continuous model in probability and statistics. In Example 18.7, we used the normal distribution to model the high temperature in May in Iqaluit.

To begin our formal discussion of the normal distribution, we will first define the standard normal distribution, which is a bell-shaped PDF centered around \(0\). Then, we will apply location-scale transformations to the standard normal distribution to construct general normal distributions, which are bell-shaped PDFs that can have any width and be centered around values other than \(0\).

22.3.1 Standard Normal Distribution

Definition 22.3 A random variable has the standard normal distribution if its PDF is \[f(x) \propto e^{-x^2/2}; \qquad -\infty < x < \infty. \] The \(\propto\) symbol means “proportional to”. That is, there is a constant \(k\) such that \[ f(x) = \frac{1}{k} e^{-x^2 / 2}; \qquad -\infty < x < \infty. \]

Notice that the support of the standard normal distribution is the entire real line.

The constant \(k\) is the unique value that makes the total probability equal to one, \[ k = \int_{-\infty}^\infty e^{-x^2/2}\,dx. \]

Unfortunately, this integral is not easy to evaluate because \(e^{-x^2 / 2}\) has no elementary antiderivative. We can approximate the integral numerically using R.

In Example 23.9, we will use tools from joint distributions to show that \(k = \sqrt{2\pi}\) so that the standard normal PDF is \[ \begin{equation} f(x) = \frac{1}{\sqrt{2\pi}} e^{-x^2 / 2}; \qquad -\infty < x < \infty. \end{equation} \tag{22.4}\]

This PDF is graphed in Figure 22.5.

Figure 22.5: PDF of the Standard Normal Distribution

Equivalently, a random variable has a standard normal distribution if its CDF is \[ \Phi(x) \overset{\text{def}}{=}\int_{-\infty}^x \frac{1}{\sqrt{2\pi}} e^{-t^2/2}\,dt. \tag{22.5}\]

Because the PDF Equation 22.4 has no elementary antiderivative, the CDF cannot be simplified beyond Equation 22.5. For this reason, we will often express probabilities in terms of the standard normal CDF \(\Phi\). Note that we are using the symbol \(\Phi\) for the standard normal CDF, as opposed to the more generic \(F\). This is an indication of the importance of the standard normal CDF.

Example 22.1 Let \(Z\) be a standard normal random variable. What is \(P(-2 < Z < 2)\)?

This probability corresponds to the red shaded area below.

Figure 22.6: PDF of the Standard Normal Distribution

We can express this probability in terms of the CDF \(\Phi\).

\[ \begin{align} P(-2 < Z < 2) &= P(Z \leq 2) - P(Z \leq -2) \\ &= \Phi(2) - \Phi(-2) \\ \end{align} \tag{22.6}\]

Because the standard normal curve is symmetric around 0, it must be the case that \[ \begin{aligned} P(Z > 2) &= P(Z < -2) \\ 1 - \Phi(2) &= \Phi(-2). \end{aligned} \] Substituting this into Equation 22.6, we obtain the equivalent expression \[ P(-2 < Z < 2) = \Phi(2) - (1 - \Phi(2)) = 2\Phi(2) - 1.\]

In order to evaluate this probability numerically, we use the pnorm function in R, which corresponds to \(\Phi\). The two lines of code below should produce the same answer, based on our calculations above.

The probability is about 95%.

Following the same procedure, we can show that

  • \(P(-1 < Z < 1) = \Phi(1) - \Phi(-1) \approx 68\%\),
  • \(P(-2 < Z < 2) = \Phi(2) - \Phi(-2) \approx 95\%\), and
  • \(P(-3 < Z < 3) = \Phi(3) - \Phi(-3) \approx 99.7\%\),

leading to the 68-95-99.7 rule for calculating normal probabilities. This rule is helpful for approximating probabilities numerically when a standard normal CDF calculator (like R) is not available. For example, if we wanted to approximate \(P(Z \geq 2.5)\), we could observe that this probability is somewhere between \(P(Z \geq 3)\) and \(P(Z \geq 2)\), which we can obtain using the 68-95-99.7 rule.

A lower bound for this probability is \[ \begin{aligned} P(Z \geq 3) &= \frac{1}{2} P(|Z| \geq 3) & \text{(symmetry)} \\ &= \frac{1}{2}(1 - P(-3 < Z < 3)) & \text{(complement rule)} \\ &\approx \frac{1}{2}(1 - .997) & \text{(68-95-99.7 rule)} \\ &= .0015, \end{aligned} \] and a similar calculation shows an upper bound to be \(P(Z \geq 2) \approx .025\). Therefore, \(P(Z \geq 2.5)\) should be somewhere between 0.15% and 2.50%.

Since we have a calculator handy, we can check this approximation by evaluating \(P(Z \geq 2.5) = 1 - \Phi(2.5)\) to high precision.

22.3.2 General Normal Distribution

The general normal distribution is defined as a location-scale transformation of a standard normal distribution. That is, the general normal PDF is bell-shaped, like the standard normal PDF (Equation 22.4), but with a different width and centered around a value that is not necessarily \(0\).

Definition 22.4 (Normal Distribution) Let \(Z\) be a standard normal random variable, as defined in Definition 22.3. Then, \(X\) is a \(\text{Normal}(\mu, \sigma^2)\) random variable if

\[ X \overset{\text{def}}{=}\mu + \sigma Z. \tag{22.7}\]

By Proposition 20.3, the PDF of \(X\) is \[ \begin{aligned} f_X(x) &= \frac{1}{\sigma} f_Z\Big(\frac{x - \mu}{\sigma}\Big) \\ &= \frac{1}{\sigma\sqrt{2\pi}} e^{-(x - \mu)^2 / (2\sigma^2)}. \end{aligned} \tag{22.8}\]

However, we do not usually have to work with Equation 22.8 because we can always convert a normal random variable into a standard normal random variable, by a process called standardization. Standardization is simply the inverse of Equation 22.7. It says that \[ Z = \frac{X - \mu}{\sigma}. \tag{22.9}\] The next example illustrates how standardization can be used to solve problems.

Example 22.2 (Calculating normal probabilities) In Example 18.7, we modeled the daily high May temperature (in Celsius) in Iqaluit as a random variable \(C\) with PDF \[ f_C(x) = \frac{1}{7.519885} e^{-x^2/18}; -\infty < x < \infty. \]

In light of Definition 22.4, we now know that \(C \sim \textrm{Normal}(\mu= 0, \sigma^2= 9)\). (We can check that \(\sigma\sqrt{2\pi} \approx 7.519885\).)

What is \(P(C > 5)\), the probability that the high tempeature is above 5 degrees Celsius? The trick is to standardize \(C\) into \(Z\):

\[ \begin{aligned} P(C > 5) &= P\left(\frac{C - 0}{3} > \frac{5 - 0}{3} \right) \\ &= P\left(Z > \frac{5}{3} \right) \\ &= 1 - \Phi\left(\frac{5}{3} \right). \end{aligned} \]

This is the exact answer. However, if we wish to obtain a numerical approximation, we can use R.

The probability is only about 4.78% that the temperature exceeds 5 degrees Celsius.

22.3.3 Expectation and Variance

Because we defined the (general) normal distribution as a location-scale transformation of a standard normal random variable, we can derive the expectation and variance easily. First, we calculate the expectation \(\text{E}\!\left[ Z \right]\) and variance \(\text{Var}\!\left[ Z \right]\) for a standard uniform random variable \(Z\). Then, we use Proposition 21.2 and Proposition 21.3 to obtain the expectation and variance of a general normal random variable.

Proposition 22.6 Let \(X\) be a \(\text{Normal}(\mu, \sigma^2)\) random variable.

\[ \begin{aligned} \text{E}\!\left[ X \right] &= \mu & \text{Var}\!\left[ X \right] &= \sigma^2. \end{aligned} \]

First, let \(Z\) be a standard normal (i.e., \(\textrm{Normal}(\mu= 0, \sigma^2= 1)\)) random variable. We calculate its expected value and variance:

\[\begin{aligned} \text{E}\!\left[ Z \right] &= \int_{-\infty}^\infty x \cdot \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \,dx \\ &= 0 \\ \text{Var}\!\left[ Z \right] &= \text{E}\!\left[ Z^2 \right] - \text{E}\!\left[ Z \right]^2 \\ &= \int_{-\infty}^\infty x^2 \cdot \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \,dx - 0^2\\ &= 1 \end{aligned}\]

Since \(X = \mu + \sigma Z\), we can use Proposition 21.2 to determine its expected value to be \[ \begin{aligned} \text{E}\!\left[ X \right] &= \text{E}\!\left[ \mu + \sigma Z \right] \\ &= \mu + \sigma \text{E}\!\left[ Z \right] \\ &= \mu \end{aligned} \] and Proposition 21.3 to determine its variance to be \[ \begin{aligned} \text{Var}\!\left[ X \right] &= \text{Var}\!\left[ \mu + \sigma Z \right] \\ &= \sigma^2 \text{Var}\!\left[ Z \right] \\ &= \sigma^2. \end{aligned} \]