18  Random Variables

\[ \def\mean{\textcolor{red}{2.6}} \def\Pois{\text{Poisson}} \]

The random variables that we have encountered so far, like

have all been discrete. That is, these random variables can only take on individually separate and distinct values. For example, the number of babies returned to the correct mother can be 0, 1, 2, or 4, but it cannot be 1.3 or \(\pi\).

Not all random variables are discrete. For example, the distance that a long jump athlete leaps is a random variable. Even a professional’s leaps will vary between attempts. For example, when Jackie Joyner-Kersee (Figure 18.1) won gold at the 1988 Seoul Olympics, she made three successful long jump attempts; the distances were 7.00, 7.16, and 7.40 meters. (The last jump earned her the gold medal and stands as the Olympic record to this day.) Even these distances have already been rounded to the nearest centimeter. If we were to measure the distances more precisely, Joyner-Kersee’s last jump may have actually been 7.398 meters or 7.4035 meters.

When there are too many possible outcomes to count, as in long jump outcomes, the probability of any individual outcome will be zero. Yet the probability that Joyner-Kersee leaps a distance between 7.00 and 7.20 meters is not zero. That is, even if the probability of each individual outcome is zero, the probability that the outcome falls in some interval, like \((7.0, 7.2)\), may be non-zero.

Figure 18.1: U.S. track and field Olympian Jackie Joyner-Kersee (source)

In mathematical terms, if \(X\) represents how far Joyner-Kersee goes,

This is just like the dartboard example from Chapter 3! The probability that the dart lands on any particular spot is zero, but the probability that it lands somewhere on the dartboard is certainly not zero!

Random variables like \(X\), where any value in an interval is possible, are called continuous. The next section provides a taste of how to calculate probabilities for a continuous random variable.

18.1 A Simple Continuous Model

Here is a simple model for how far Joyner-Kersee jumps: let’s suppose that she is equally likely to leap any distance between 6.3 and 7.5 meters. Let’s use this model to calculate, \(P(7.0 \leq X \leq 7.3)\), the probability her jump is between 7 and 7.3 meters.

To calculate this probability, note that Joyner-Kersee is equally likely to jump any distance within a \(7.5 - 6.3 = 1.2\) meter window. So the probability that it lands in a \(7.3 - 7.0 = .3\) meter window must be \[ P(7.0 < X < 7.3) = \frac{7.3 - 7.0}{7.5 - 6.3} = \frac{.3}{1.2} = .25. \]

Example 18.1 (Probability of Jumping Less than 7 Meters) What is the probability Joyner-Kersee’s long jump is less than 7 meters?

In other words, what is \(P(X < 7)\)? This is the probability that the distance is in the interval \((6.3, 7.0)\), since our model assumed that she wuld not jump less than 6.3 meters.

Once again, because all distances within the \(1.2\) meter window are equally likely, the probability that the distance is in the \(.7\)-meter window \((6.3, 7.0)\) is \[ P(X < 7) = P(6.3 < X < 7.0) = \frac{7.0 - 6.3}{7.5 - 6.3} = \frac{.7}{1.2} \approx .5833.\]

Example 18.2 (Probability of Jumping At Most 7 Meters) What about the probability that Joyner-Kersee’s long jump is at most than 7 meters?

Although this question sounds very similar to Example 18.1, it is different because besides jumping less than 7 meters, she could also jump exactly meters (\(X = 7\)).

In other words, this question is asking, what is \(P(X \leq 7.00)\)? But the only difference between this probability and the probability in Example 18.1 is \(P(X = 7.00)\), which is zero for a continuous random variable like \(X\).

We can prove this formally using the probability axioms: \[ P(X \leq 7.00) = P(X < 7.00) + \underbrace{P(X = 7.00)}_0. \] So the answer is the same as Example 18.1, \[ P(X \leq 7.00) = P(X < 7.00) = \frac{.7}{1.2} = .5833. \]

Caution!

The answers to Example 18.1 and Example 18.2 are only the same because \(X\) is a continuous random variable. Because \(X\) is continuous, we know that there is zero probability that she jumps exactly \(7.0000...\) meters.

But if \(X\) were discrete, then \(P(X = 7.0)\) may not be zero, in which case \(P(X < 7.00)\) and \(P(X \leq 7.00)\) would be different.

However, the uniform distribution is unlikely to be a good model for the distance of a long jump because it assumes that all distances are equally likely. But very short and very long distances should be less likely than intermediate distances.

In order to develop better models for the distance of a long jump, we need a way to describe general continuous random variables.

18.2 Cumulative Distribution Function

How can we describe continuous random variables like \(X\), how far Joyner-Kersee jumps? In Section 8.3, we saw that random variables can be described by their cumulative distribution function (or CDF, for short) \(F(x)\), which represents the “probability up to \(x\)”.

We can use the CDF to describe continuous random variables, too.

Example 18.3 (CDF of \(X\)) Let’s determine the CDF of \(X\), how far Joyner-Kersee jumps (in meters) under the uniform model: \[ F(x) = P(X \leq x).\]

First, consider the case where \(x\) is between \(6.3\) and \(7.5\). If we follow Example 18.2, the probability that the distance jumped is in the interval \((6.3, x)\) is \[ F(x) = \frac{x - 6.3}{7.5 - 6.3}; 6.3 \leq x \leq 7.5. \]

But what if \(x\) is not between \(6.3\) and \(7.5\) meters?

  • If \(x < 6.3\), then \(F(x) = 0\) because the model assumes that she cannot jump less than 6.3 meters.
  • If \(x > 7.5\), then \(F(x) = 1\), since the model assumes that she cannot jump more than 7.5 meters, so she is guaranteed to jump a distance less than or equal to 7.5.

Putting all the cases together, the CDF can be written as \[F(x) = \begin{cases} 0 & x < 6.3 \\ \frac{x - 6.3}{1.2} & 6.3 \leq x \leq 7.5 \\ 1 & x > 7.5 \end{cases}, \tag{18.1}\] and it looks like this.

Once we have the CDF, it is easy to calculate probabilities by plugging in different values for \(x\).

Example 18.4 (Calculating Probabilities Using the CDF) What is the probability that Joyner-Kersee jumps a distance between 7.0 and 7.3 meters?

\[ \begin{align*} P(7.0 < X < 7.3) &= \underbrace{P(X < 7.3)}_{=P(X \leq 7.3)} - P(X \leq 7.0) \\ &= F(7.3) - F(7.0) \\ &= \frac{7.3 - 6.3}{1.2} - \frac{7.0 - 6.3}{1.2} \\ &= .25. \end{align*} \]

The properties of the CDF from Section 8.3 carry over to continuous random variables:

  1. non-decreasing with a left limit of 0 and a right limit of 1, and
  2. right-continuous.

In fact, the CDF of a continuous random variable is continuous. That is why they are called continuous random variables in the first place. Any continuous function \(F(x)\) that satisfies these properties is the CDF of a continuous random variable. We can use different CDFs to model non-uniform continuous random variables.

18.3 Probability Density Function

In Chapter 8, we saw that discrete random variables could alternatively be described by their PMF, which specifies the probability of each possible value: \[ f(x) = P(X = x). \] But the PMF is useless for continuous random variables because the probability of any particular value is always zero.

The analog of the PMF for continuous random variables is the probability density function, or PDF for short. We will start with an informal definition, then move to a more formal definition.

18.3.1 Informal Definition

Informally, the PDF is a function \(f(x)\) that describes how likely it is for a random variable to be “near” \(x\). A possible PDF is shown in Figure 18.2. Because \(f(7.2) > f(6.8)\), we know that \(X\) is more likely to be “near” 7.2 than 6.8, even though the probability that \(X\) is equal to either value is zero.

Figure 18.2: A hypothetical PDF \(f(x)\)

As we will see in Proposition 18.1, probabilities correspond to areas under the PDF. Since there cannot be any area at a single point, \(P(X = 6.8)\) and \(P(X = 7.2)\) are zero. But \(P(6.8 \leq X < 6.81)\) and \(P(7.2 \leq X < 7.21)\) are not zero and correspond to the areas of the red shaded regions in Figure 18.2. By comparing these areas, we can determine that

\[ P(6.8 \leq X < 6.81) < P(7.2 \leq X < 7.21). \] For this reason, we say that \(X\) is more likely to be “near” 7.2 than 6.8.

18.3.2 Formal Definition

Let’s define the PDF more formally. If we want the PDF \(f(x)\) to describe the probability that the random variable is “near” \(x\), we need the probability that \(X\) is in a small interval \([x, x + \varepsilon)\) to be \[ P(x \leq X < x + \varepsilon) \approx f(x) \cdot \varepsilon. \tag{18.2}\] Notice that this probability gets smaller as we decrease \(\varepsilon\). This makes sense because as we decrease \(\varepsilon\), the probability approaches \(P(X = x)\), which is zero for any continuous random variable.

Now, we can rewrite the left-hand side of Equation 18.2 in terms of the CDF so that Equation 18.2 becomes: \[ F(x + \varepsilon) - F(x) \approx f(x) \cdot \varepsilon. \] Next, we divide both sides by \(\varepsilon\) and take the limit as \(\varepsilon\) approaches 0 (since this is an approximation for \(\varepsilon\) small): \[ \begin{aligned} \underbrace{\lim_{\epsilon\to 0} \frac{F(x + \epsilon) - F(x)}{\epsilon}}_{F'(x)} \approx f(x) \end{aligned}\]

This is the basis for a rigorous definition of the PDF \(f(x)\).

Definition 18.1 (Probability Density Function) The probability density function (or PDF) \(f(x)\) of a continuous random variable is the derivative of its CDF:

\[ f(x) = F'(x). \]

18.3.3 Examples

Let’s use Definition 18.1 to determine the PDF of \(X\), the distance that Joyner-Kersee jumps under the uniform model.

Example 18.5 (Determining the PDF of \(X\)) Starting from the CDF \(F\) given by Example 18.3, let’s calculate the PDF of \(X\) using Definition 18.1.

\[ f(x) = F'(x), \]

We can calculate the derivative piece by piece. Since \[F(x) = \begin{cases} 0 & x < 6.3 \\ \frac{x - 6.3}{1.2} & 6.3 \leq x \leq 7.5 \\ 1 & x > 7.5 \end{cases},\] we can take the derivative of each case separately to obtain: \[F(x) = \begin{cases} 0 & x < 6.3 \\ \frac{1}{1.2} & 6.3 \leq x \leq 7.5 \\ 0 & x > 7.5 \end{cases},\]

We can simplify the PDF of \(X\) to \[ f(x) = \begin{cases} \frac{1}{1.2} & 6.3 \leq x \leq 7.5 \\ 0 & \text{otherwise} \end{cases} \tag{18.3}\] and graph it as follows.

Figure 18.3: PDF of \(X\)

This PDF makes intuitive sense. It is constant between \(6.3\) and \(7.5\) meters to reflect the uniform model’s assumption that Joyner-Kersee is equally likely jump any distance between 6.3 and 7.5 meters. It is zero outside of this range because the model assumes that these are the only possible distances. The range where a PDF is non-zero is called the support of the random variable. So the support of \(X\) is \((6.3, 7.5)\).

Units of the PDF

What are the units on the vertical axis of Figure 18.3? In other words, what are the units of the PDF? Unlike a PMF, the values of a PDF are not probabilities. Instead, they represent probabilities per unit. In the case of the PDF \(f(x)\), the units are “probability per meter”.

We can also calculate probabilities using the PDF. The next result follows from Definition 18.1 and the Fundamental Theorem of Calculus.

Proposition 18.1 (Integrate the PDF to Get Probabilities) To calculate the probability that a continuous random variable \(X\) falls in some interval \((a, b)\), integrate the PDF \(f(x)\) over that interval: \[ P(a < X < b) = \int_a^b f(x)\,dx. \tag{18.4}\]

Proof. Note that we already know how to calculate the probability using the CDF \(F(x)\). \[ \begin{align*} P(a < X < b) &= P(a < X \leq b) & \text{(since $P(X = b) = 0$)} \\ &= P(X \leq b) - P(X \leq a) \\ &= F(b) - F(a). \end{align*}\] Now, since \(F\) is the antiderivative of \(f\) by Definition 18.1, we can use the Fundamental Theorem of Calculus to conclude that \[ = \int_a^b f(x)\,dx. \]

In other words, areas under the PDF correspond to probabilities. The next example shows how we could have solved Example 18.4 using the PDF.

Example 18.6 (Calculating Probabilities Using the PDF) The probability that Joyner-Kersee jumps a distance between 7.0 and 7.3 meters is equal to the area of the red shaded region below.

We can calculate the area of the red shaded region in two ways:

  1. By geometry, the red shaded region is a rectangle with base \((7.3 - 7.0) = .3\ \text{meters}\) and height \(\frac{1}{1.2} \frac{\text{probability}}{\text{minute}}\), so the probability is \[ P(7.0 < X < 7.3) = (.3\ \text{meters}) \cdot (\frac{1}{1.2}\ \frac{\text{probability}}{\text{meter}}) = .25.\] Notice how the units cancel to give a probability in the end.
  2. By calculus, the area under the PDF between \(7.0\) and \(7.2\) is \[ \begin{align*} P(7.0 < X < 7.3) = \int_{7.0}^{7.3} f(x)\,dx &= \int_{7.0}^{7.3} \frac{1}{1.2} \,dx \\ &= \frac{1}{1.2} x \Big|_{7.0}^{7.3} = \frac{1}{1.2} (7.3) - \frac{1}{1.2} (7.0) = .25. \end{align*} \] Note that the second line is just calculus. Probability tells us what integral to set up, but calculus computes the integral. Many students of probability find continuous random variables more challenging, not necessarily because of the probability but because of the calculus.

What is the total area under a PDF, \(\int_{-\infty}^\infty f(x)\,dx\)? This is just the probability that the random variable is any real number, which is always 1. This means that it is not necessary to include a scale on the \(y\)-axis; the scale is whatever makes the total area equal to 1.

Together with the requirement that \(f(x) \geq 0\) to avoid negative probabilities, this property defines a valid PDF. That is, any function \(f(x)\) that satisfies the two properties:

  1. \(\int_{-\infty}^\infty f(x)\,dx = 1\)
  2. \(f(x) \geq 0\)

could represent the PDF of a continuous random variable.

We can use Property 1 to determine the scale of the PDF when it is not given, as shown in the next example.

Example 18.7 (Temperature in Iqaluit) In 1999, Canada carved a new territory out of the Northwest Territories to be independently governed by the native Inuit. This new territory, called Nunavut, occupies the northernmost reaches of the Earth.

On the whole, Nunavut is very cold. In the capital, Iqaluit, the high temperature (in Celsius) for a day in May is equally likely to be below freezing as above freezing. More specifically, the daily high temperature \(C\) can be modeled as a continuous random variable with PDF

Inuit women and child (source)

\[ f_C(x) = \frac{1}{k} e^{-x^2/18}; -\infty < x < \infty \tag{18.5}\] where \(k\) is a constant. Since \(k\) is unspecified, we do not know the scale of the PDF. However, we do know that it has the shape shown in Figure 18.4.

Figure 18.4: PDF of the daily high temperature (in Celsius) in Iqaluit in May

Note that this PDF extends infinitely far in both directions. This reminds us that models like Equation 18.5 are only approximations. As you might have learned in chemistry class, temperatures cannot be less than absolute zero \((-273.15^\circ C)\), but the probability of that event under this model is so small that it does not affect the practical usefulness of this approximation.

To quote the great statistician George Box, “All models are wrong, but some are useful.” Equation 18.5 can still be a useful model of temperature, even if it is technically wrong.

George Box (1919-2013) (source)

To determine the scale \(k\), which is called a normalizing constant, we use the property that the total area under any PDF must be 1: \[ \begin{aligned} \int_{-\infty}^\infty \frac{1}{k} e^{-x^2/18}\,dx &= 1 & \Rightarrow & & k &= \int_{-\infty}^\infty e^{-x^2/18}\,dx. \end{aligned} \]

Unfortunately, the expression \(e^{-x^2 / 18}\) has no elementary antiderivative, so this integral cannot be evaluated by hand. But we can use R to get a numerical approximation:

So the PDF (Equation 18.5) is approximately \[ f_C(x) \approx \frac{1}{7.519885} e^{-x^2/18}; -\infty < x < \infty. \]

18.4 Case Study: Radioactive Particles

In Example 12.7, we introduced the Geiger counter, a device that measures the level of ionizing radiation. It makes a clicking sound each time an ionization event is detected. The clicks occur at random times, and the times at which they occur is well modeled as a Poisson process. This means that the total number of clicks, counting from time \(0\) to time \(t\), is a random variable \(N_t\), which follows a \(\Pois(\mu=\lambda t)\) distribution, where \(\lambda\) is the rate of clicks. (The higher the value of \(\lambda\), the higher the level of radiation in the air.)

Suppose that we turn on the Geiger counter in a building with \(\lambda = \mean\) clicks per minute. Let \(T\) be the time of the first click (in minutes), counting from the moment that we powered up the device. In Poisson process lingo, \(T\) is called the “first arrival time”.

Example 18.8 (CDF of the First Arrival Time) What is the CDF of \(T\)?

First, we use the complement rule: \[ F_T(x) = P(T \leq x) = 1 - P(T > x). \] Note that the CDF is zero, unless \(x\) is positive.

In words, the event \(\{ T > x \}\) means, “The first arrival happened after time \(x\).” Another way to say the same thing is “No arrivals happened between time \(0\) and time \(x\).” Therefore, \(\{T > x\}\) is the same event as \(\{ N_x = 0 \}\).

This is helpful because we know that \(N_x\) follows a \(\Pois(\mu=\mean x)\) distribution and can calculate this probability. The CDF of \(T\) is \[ \begin{aligned} F_T(x) &= 1 - P(T > x) \\ &= 1 - P(N_{x} = 0) \\ &= 1 - e^{-\mean x} \frac{(\mean x)^0}{0!} \\ &= 1 - e^{-\mean x}. \end{aligned} \tag{18.6}\]

Remember that this formula is valid when \(x\) is positive. The CDF is zero otherwise. Therefore, \(\{ x > 0\}\) is the support of \(T\), and we can express the full CDF as \[ F_T(x) = \begin{cases} 1 - e^{-\mean x} & x > 0 \\ 0 & x \leq 0 \end{cases}. \]

Now, we can use Definition 18.1 to derive the PDF from the CDF.

Example 18.9 (PDF of the First Arrival Time) What is the PDF of \(T\)?

We take the derivative of the CDF (Equation 18.6) to obtain the PDF: \[ \begin{aligned} f_T(x) &= F'_T(x) \\ &= \mean e^{-\mean x}. \end{aligned}\] This formula is valid on the support of \(T\), \(x > 0\). The PDF is zero elsewhere.

Therefore, we can write the PDF as: \[ f_T(x) = \begin{cases} \mean e^{-\mean x} & x > 0 \\ 0 & \text{otherwise} \end{cases}. \tag{18.7}\]

So what does the PDF Equation 18.7 look like? Let’s graph it.

Figure 18.5: PDF of \(T\)

We see that at a rate of \(\lambda = \mean\) clicks per minute, the first click is most likely to happen soon after the Geiger counter is turned on, but there is a small (but non-zero) probability that we could be waiting for a while.

Let’s use the CDF and the PDF to calculate the probability that we need to wait more than 1.2 seconds for the first click.

Example 18.10 (Probability of the First Arrival Time) What is the probability that it takes more than \(1.2\) seconds for the Geiger counter to register a click?

We can answer this question by plugging \(1.2\) into the CDF: \[ P(T > 1.2) = 1 - F_T(1.2) = 1 - (1 - e^{-\mean (1.2)}) \approx .044 \] or by integrating the PDF from \(1.2\) to \(\infty\): \[ \begin{aligned} P(T > 1.2) &= \int_{1.2}^\infty \mean e^{-\mean x}\,dx \\ &= -e^{-\mean x} \Big|_{1.2}^\infty = (-e^{-\mean(1.2)}) - (-0) \approx .044. \end{aligned} \] Once again, it is important to separate the probability (setting up the integral in the first line) from the calculus (computing the integral in the second line). This integral corresponds to calculating the area of the red shaded region below.

Figure 18.6: The probability that the first arrival happens after 1.2 seconds is about 4.4%.

18.5 Calculating the CDF from the PDF

We have seen several examples (Example 18.5, Example 18.9) where we determined the PDF by taking the derivative of the CDF. What if we wanted go in the other direction? If you guessed that we integrate, you would be correct!

Proposition 18.2 (PDF to CDF) To go from the PDF \(f(x)\) to the CDF \(F(x)\), we integrate: \[ F(x) = \int_{-\infty}^x f(t)\,dt. \]

By Proposition 18.1, \[ F(x) = P(X \leq x) = P(-\infty < X \leq x) = \int_{-\infty}^x f(t)\,dt. \] Note that because \(x\) already appears in the limits of integration, we use \(t\) as the variable inside the integral to avoid mixing the two up.

Let’s apply Proposition 18.2 to an example.

Example 18.11 Starting from the PDF \(f\) given by Equation 18.3, let’s calculate the CDF of \(X\) using Proposition 18.2.

\[ F(x) = \int_{-\infty}^x f(t)\,dt, \] There are three cases:

  1. If \(x \leq 6.3\), then \(f(t) = 0\) for all \(t\) up to \(x\), so \[F(x) = 0.\]
  2. If \(6.3 < x < 7.5\), then \(f(t) = \frac{1}{1.2}\) between \(6.3\) and \(x\), so \[ F(x) = \int_{6.3}^{x} \frac{1}{1.2}\,dt = \frac{1}{1.2} t \Big|_{6.3}^x = \frac{1}{1.2} (x - 6.3) = \frac{x - 6.3}{1.2}.\]
  3. If \(x \geq 7.5\), then we have all of the area between \(6.3\) and \(7.5\), so \[ F(x) = \int_{6.3}^{7.5} \frac{1}{1.2}\,dt = \frac{1}{1.2} t \Big|_{6.3}^{7.5} = \frac{1}{1.2} (7.5 - 6.3) = 1. \]

Putting the three cases together, we get exactly the CDF \(F\) from Equation 18.1.

18.6 Exercises

Exercise 18.1 Harry and Sally agree to meet at Katz’s Deli at noon. But punctuality is not Harry’s strong suit; he is equally to arrive any time between 11:50 AM (10 minutes early) and 12:30 PM (30 minutes late). Let \(H\) be the random variable representing how late Harry is, in minutes. (A negative value of \(H\) would mean that Harry is early.)

  1. What continuous model would be appropriate for \(H\)? Write down the PDF and CDF of \(H\).
  2. Express the event that Harry arrives on time in terms of \(H\) and calculate its probability.

Exercise 18.2 Suppose we have quantitative data, such as stock prices or country populations. What does the distribution of first digits look like? That is, what percentage of observations do you expect to start with the digit 1? What about the digit 9?

If you’ve never tried this, look up a list of stock prices or country populations and count how many start with a 1. It may be more than you expect! This phenomenon is called Benford’s Law.

Here is one model that explains Benford’s Law. Suppose the quantitative data can be modeled by a random variable \(X\) with PDF \[ f(x) = \begin{cases}\frac{c}{x^2} & x \geq 6 \\ 0 & \text{otherwise} \end{cases}.\]

  1. Determine the value of \(c\) that makes this a valid PDF.
  2. Calculate \(P(\text{first digit of $X$ is 1})\). (Hint: You will have to calculate the probability of disjoint intervals. These probabilities form a geometric series.)
  3. Calculate \(P(\text{first digit of $X$ is 9})\) and compare with your answer to b.