\[ \newcommand{\or}{\textrm{ or }} \newcommand{\and}{\textrm{ and }} \newcommand{\not}{\textrm{not }} \newcommand{\Pois}{\textrm{Poisson}} \newcommand{\E}{\textrm{E}} \newcommand{\var}{\textrm{Var}} \]
\[ \newcommand{\or}{\textrm{ or }} \newcommand{\and}{\textrm{ and }} \newcommand{\not}{\textrm{not }} \newcommand{\Pois}{\textrm{Poisson}} \newcommand{\E}{\textrm{E}} \newcommand{\var}{\textrm{Var}} \]
\[ \def\mean{\textcolor{red}{2.6}} \def\Pois{\text{Poisson}} \]
The random variables that we have encountered so far, like
have all been discrete. That is, these random variables can only take on individually separate and distinct values. For example, the number of babies returned to the correct mother can be 0, 1, 2, or 4, but it cannot be 1.3 or \(\pi\).
Not all random variables are discrete. For example, the distance that a long jump athlete leaps is a random variable. Even a professional’s leaps will vary between attempts. For example, when Jackie Joyner-Kersee (Figure 18.1) won gold at the 1988 Seoul Olympics, she made three successful long jump attempts; the distances were 7.00, 7.16, and 7.40 meters. (The last jump earned her the gold medal and stands as the Olympic record to this day.) Even these distances have already been rounded to the nearest centimeter. If we were to measure the distances more precisely, Joyner-Kersee’s last jump may have actually been 7.398 meters or 7.4035 meters.
When there are too many possible outcomes to count, as in long jump outcomes, the probability of any individual outcome will be zero. Yet the probability that Joyner-Kersee leaps a distance between 7.00 and 7.20 meters is not zero. That is, even if the probability of each individual outcome is zero, the probability that the outcome falls in some interval, like \((7.0, 7.2)\), may be non-zero.
In mathematical terms, if \(X\) represents how far Joyner-Kersee goes,
This is just like the dartboard example from Chapter 3! The probability that the dart lands on any particular spot is zero, but the probability that it lands somewhere on the dartboard is certainly not zero!
Random variables like \(X\), where any value in an interval is possible, are called continuous. The next section provides a taste of how to calculate probabilities for a continuous random variable.
Here is a simple model for how far Joyner-Kersee jumps: let’s suppose that she is equally likely to leap any distance between 6.3 and 7.5 meters. Let’s use this model to calculate, \(P(7.0 \leq X \leq 7.3)\), the probability her jump is between 7 and 7.3 meters.
To calculate this probability, note that Joyner-Kersee is equally likely to jump any distance within a \(7.5 - 6.3 = 1.2\) meter window. So the probability that it lands in a \(7.3 - 7.0 = .3\) meter window must be \[ P(7.0 < X < 7.3) = \frac{7.3 - 7.0}{7.5 - 6.3} = \frac{.3}{1.2} = .25. \]
However, the uniform distribution is unlikely to be a good model for the distance of a long jump because it assumes that all distances are equally likely. But very short and very long distances should be less likely than intermediate distances.
In order to develop better models for the distance of a long jump, we need a way to describe general continuous random variables.
How can we describe continuous random variables like \(X\), how far Joyner-Kersee jumps? In Section 8.3, we saw that random variables can be described by their cumulative distribution function (or CDF, for short) \(F(x)\), which represents the “probability up to \(x\)”.
We can use the CDF to describe continuous random variables, too.
Once we have the CDF, it is easy to calculate probabilities by plugging in different values for \(x\).
The properties of the CDF from Section 8.3 carry over to continuous random variables:
In fact, the CDF of a continuous random variable is continuous. That is why they are called continuous random variables in the first place. Any continuous function \(F(x)\) that satisfies these properties is the CDF of a continuous random variable. We can use different CDFs to model non-uniform continuous random variables.
In Chapter 8, we saw that discrete random variables could alternatively be described by their PMF, which specifies the probability of each possible value: \[ f(x) = P(X = x). \] But the PMF is useless for continuous random variables because the probability of any particular value is always zero.
The analog of the PMF for continuous random variables is the probability density function, or PDF for short. We will start with an informal definition, then move to a more formal definition.
Informally, the PDF is a function \(f(x)\) that describes how likely it is for a random variable to be “near” \(x\). A possible PDF is shown in Figure 18.2. Because \(f(7.2) > f(6.8)\), we know that \(X\) is more likely to be “near” 7.2 than 6.8, even though the probability that \(X\) is equal to either value is zero.
As we will see in Proposition 18.1, probabilities correspond to areas under the PDF. Since there cannot be any area at a single point, \(P(X = 6.8)\) and \(P(X = 7.2)\) are zero. But \(P(6.8 \leq X < 6.81)\) and \(P(7.2 \leq X < 7.21)\) are not zero and correspond to the areas of the red shaded regions in Figure 18.2. By comparing these areas, we can determine that
\[ P(6.8 \leq X < 6.81) < P(7.2 \leq X < 7.21). \] For this reason, we say that \(X\) is more likely to be “near” 7.2 than 6.8.
Let’s define the PDF more formally. If we want the PDF \(f(x)\) to describe the probability that the random variable is “near” \(x\), we need the probability that \(X\) is in a small interval \([x, x + \varepsilon)\) to be \[ P(x \leq X < x + \varepsilon) \approx f(x) \cdot \varepsilon. \tag{18.2}\] Notice that this probability gets smaller as we decrease \(\varepsilon\). This makes sense because as we decrease \(\varepsilon\), the probability approaches \(P(X = x)\), which is zero for any continuous random variable.
Now, we can rewrite the left-hand side of Equation 18.2 in terms of the CDF so that Equation 18.2 becomes: \[ F(x + \varepsilon) - F(x) \approx f(x) \cdot \varepsilon. \] Next, we divide both sides by \(\varepsilon\) and take the limit as \(\varepsilon\) approaches 0 (since this is an approximation for \(\varepsilon\) small): \[ \begin{aligned} \underbrace{\lim_{\epsilon\to 0} \frac{F(x + \epsilon) - F(x)}{\epsilon}}_{F'(x)} \approx f(x) \end{aligned}\]
This is the basis for a rigorous definition of the PDF \(f(x)\).
Let’s use Definition 18.1 to determine the PDF of \(X\), the distance that Joyner-Kersee jumps under the uniform model.
This PDF makes intuitive sense. It is constant between \(6.3\) and \(7.5\) meters to reflect the uniform model’s assumption that Joyner-Kersee is equally likely jump any distance between 6.3 and 7.5 meters. It is zero outside of this range because the model assumes that these are the only possible distances. The range where a PDF is non-zero is called the support of the random variable. So the support of \(X\) is \((6.3, 7.5)\).
We can also calculate probabilities using the PDF. The next result follows from Definition 18.1 and the Fundamental Theorem of Calculus.
In other words, areas under the PDF correspond to probabilities. The next example shows how we could have solved Example 18.4 using the PDF.
What is the total area under a PDF, \(\int_{-\infty}^\infty f(x)\,dx\)? This is just the probability that the random variable is any real number, which is always 1. This means that it is not necessary to include a scale on the \(y\)-axis; the scale is whatever makes the total area equal to 1.
Together with the requirement that \(f(x) \geq 0\) to avoid negative probabilities, this property defines a valid PDF. That is, any function \(f(x)\) that satisfies the two properties:
could represent the PDF of a continuous random variable.
We can use Property 1 to determine the scale of the PDF when it is not given, as shown in the next example.
In Example 12.7, we introduced the Geiger counter, a device that measures the level of ionizing radiation. It makes a clicking sound each time an ionization event is detected. The clicks occur at random times, and the times at which they occur is well modeled as a Poisson process. This means that the total number of clicks, counting from time \(0\) to time \(t\), is a random variable \(N_t\), which follows a \(\Pois(\mu=\lambda t)\) distribution, where \(\lambda\) is the rate of clicks. (The higher the value of \(\lambda\), the higher the level of radiation in the air.)
Suppose that we turn on the Geiger counter in a building with \(\lambda = \mean\) clicks per minute. Let \(T\) be the time of the first click (in minutes), counting from the moment that we powered up the device. In Poisson process lingo, \(T\) is called the “first arrival time”.
Now, we can use Definition 18.1 to derive the PDF from the CDF.
So what does the PDF Equation 18.7 look like? Let’s graph it.
We see that at a rate of \(\lambda = \mean\) clicks per minute, the first click is most likely to happen soon after the Geiger counter is turned on, but there is a small (but non-zero) probability that we could be waiting for a while.
Let’s use the CDF and the PDF to calculate the probability that we need to wait more than 1.2 seconds for the first click.
We have seen several examples (Example 18.5, Example 18.9) where we determined the PDF by taking the derivative of the CDF. What if we wanted go in the other direction? If you guessed that we integrate, you would be correct!
Let’s apply Proposition 18.2 to an example.
Exercise 18.1 Harry and Sally agree to meet at Katz’s Deli at noon. But punctuality is not Harry’s strong suit; he is equally to arrive any time between 11:50 AM (10 minutes early) and 12:30 PM (30 minutes late). Let \(H\) be the random variable representing how late Harry is, in minutes. (A negative value of \(H\) would mean that Harry is early.)
Exercise 18.2 Suppose we have quantitative data, such as stock prices or country populations. What does the distribution of first digits look like? That is, what percentage of observations do you expect to start with the digit 1? What about the digit 9?
If you’ve never tried this, look up a list of stock prices or country populations and count how many start with a 1. It may be more than you expect! This phenomenon is called Benford’s Law.
Here is one model that explains Benford’s Law. Suppose the quantitative data can be modeled by a random variable \(X\) with PDF \[ f(x) = \begin{cases}\frac{c}{x^2} & x \geq 6 \\ 0 & \text{otherwise} \end{cases}.\]