20  Transformations

\[ \def\defeq{\overset{\text{def}}{=}} \def\mean{\textcolor{red}{2.6}} \]

In applications, we are usually less interested in the random variable \(X\) than in a transformation of \(X\), \(g(X)\). This is a new random variable with its own distribution. In this chapter, we will learn how to derive the PDF of \(Y = g(X)\) from the PDF of \(X\).

One situation in which transformations naturally arise is unit conversion. For example:

Here is another situation in which transformations arise. The long jump is one of the seven contests in the heptathlon. Competitors earn points for each event in the heptathlon, and the competitor with the most total points is the winner. But the heptathlon consists of many different contests:

How do we put all these different contests on the same scale? That brings us to the subject of transformations.

Example 20.1 The result of each heptathlon contest is transformed in a different way so that the resulting points are on a similar scale. For example, the transformation that is used for scoring the long jump in the women’s heptathlon is:

\[ g(x) = 124.7435 (x - 2.1)^{1.41}; x \geq 2.1, \] where \(x\) is the distance in meters. Joyner-Kersee was also a gold-medal heptathlete. If we want to know how much points she earns in the heptathlon from the long jump, then we need to study the random variable \(g(X)\).

Let’s assume that \(X\) is uniformly distributed between 6.3 and 7.5 meters, as in Example 18.5. Clearly, her score \(S = g(X)\) will be somewhere between \[ \begin{aligned} 124.7435(6.3 - 2.1)^{1.41} &= 943.625 & \text{and} & & 124.7435(7.5 - 2.1)^{1.41} &= 1344.91. \end{aligned} \] But how will her score be distributed between these two values? Will it also be uniformly distributed? Let’s run a simulation to find out.

The simulation suggests that not all scores are equally likely. Lower scores (near 1000) are (slightly) more likely than higher scores (above 1300).

20.1 The PDF of a Transformed Random Variable

The heptathlon scoring problem is a special case of a general question: if \(X\) is a continuous random variable whose distribution is known, what is the distribution of \(Y = g(X)\), where \(g\) is a known function?

There is a simple strategy based on what we learned in Chapter 18.

General Strategy for Deriving the PDF of \(Y = g(X)\)
  1. First, determine the CDF of \(Y\).
  2. Take the derivative of the CDF to get the PDF.

Let’s apply this strategy to the heptathlon scoring example.

Example 20.2 Recall that the long jump distance \(X\) is uniformly distributed between 6.3 and 7.5 meters. We determined the CDF of \(X\) in Example 18.3 to be \[ F_X(x) = \begin{cases} \frac{x - 6.3}{1.2} & 6.3 < x < 7.5 \\ 0 & \text{otherwise} \end{cases}. \] Now, we want to determine the distribution of \(S\), the score from the long jump, which is a transformation of \(X\): \[ S = g(X) = 124.7435 (X - 2.1)^{1.4}. \]

According to the strategy, we first determine the CDF of \(S\). We will only determine the CDF on the support of \(S\), which is \([943.625, 1344.91]\).

\[\begin{align*} F_S(s) &= P(S \leq s) & \text{(definition of CDF of $S$)} \\ &= P(124.7435 (X - 2.1)^{1.41} \leq s) & \text{(substitute $g(X)$ for $S$)} \\ &= P\Bigg(X \leq \big(\frac{s}{124.7435}\big)^{1/1.41} + 2.1\Bigg) & \text{(solve for $X$)} \\ &= F_X\Big(\big(\frac{s}{124.7435}\big)^{1/1.41} + 2.1\Big) & \text{(definition of CDF of $X$)} \\ &= \frac{\Big(\big(\frac{s}{124.7435}\big)^{1/1.41} + 2.1\Big) - 6.3}{1.2} & \text{(plug into known CDF of $X$)} \\ &= \frac{s^{1/1.41} - 128.7617}{36.7891} & \text{(simplify)} \end{align*}\]

Next, we take the derivative of the CDF to get the PDF: \[f_S(s) = F'_S(s) = \frac{1}{36.7891} \frac{1}{1.41} s^{1/1.41 - 1}.\]

Remember that this formula is only valid on the support of \(S\), which is \(943.625 < s < 1344.91\). If we wanted to write down the entire PDF, we would also need to take into account the fact that the PDF is zero outside of the support. \[f_S(s) = \begin{cases} \frac{1}{36.7891} \frac{1}{1.41} s^{1/1.41 - 1} & 943.625 < s < 1344.91 \\ 0 & \text{otherwise} \end{cases}. \]

This PDF depends on the value of \(s\), so it is certainly not a uniform distribution. To check the answer, we can graph this PDF on top of the histogram from Example 20.1. (If you are running this code cell, be sure to run the cell above first.)

In general, we apply the strategy in this section to determine the PDF of a transformed random variable \(Y = g(X)\). However, for specific classes of transformations \(g\), there are simple formulas for the PDF. The next two sections explore two important classes of transformations.

20.2 Location-Scale Transformations

In this section, we will explore an important class of transformations called location-scale transformations.

Definition 20.1 (Location-Scale Transformation) Let \(X\) be a random variable, and let \(a\) and \(b\) be constants.

  • A location transformation is a transformation of the form \[ g(X) = X + b. \]
  • A scale transformation is a transformation of the form \[ g(X) = aX. \] Scale transformations arise commonly when we change units. For example, if \(X\) is measured in meters, and we want to convert it to centimeters, we would multiply \(X\) by \(a = 100\).
  • A location-scale transformation is a combination of the two. It is a transformation of the form \[ g(X) = aX + b. \]

Let’s examine location transformations first. If we add \(b\) to a random variable \(X\), then the support and all the probabilities should shift by \(b\). This is illustrated in Figure 20.1.

Figure 20.1: What a location transformation \(Y = X + b\) does to a PDF

Now let’s develop this observation into a formula.

Proposition 20.1 (Location Transformation) Suppose \(X\) is a continuous random variable with PDF \(f_X(x)\). Let \(Y = X + b\) be a location transformation of \(X\) for some constant \(b\). Then the PDF of \(Y\) is \[ f_Y(x) = f_X(x - b). \tag{20.1}\]

Proof. We apply the strategy from Section 20.1. First, we determine the CDF of \(Y\), in terms of the CDF of \(X\).

\[\begin{align*} F_Y(x) &= P(Y \leq x) \\ &= P(X + b \leq x) \\ &= P(X \leq x - b) \\ &= F_X(x - b) \end{align*}\]

Now we take the derivative to obtain the PDF of \(Y\).

\[ f_Y(x) = \frac{d}{dx} F_Y(x) = \frac{d}{dx} F_X(x - b) = f_X(x - b). \]

Next, let’s examine scale transformations. Suppose we multiply a random variable \(X\) by the constant \(a = 1.5\), producing a new random variable \(Y = 1.5 X\). The support will be stretched out by a factor of \(1.5\) so that if the possible values of \(X\) range from \(0\) to \(6\), the possible values of \(Y\) will range from \(0\) to \(9\). Furthermore, the PDF is “squashed” by a factor of \(1.5\). This effect is illustrated in Figure 20.2.

Figure 20.2: What a scale transformation \(Y = aX\) does to a PDF

While it should be clear that a scale transformation stretches the PDF, it may be less obvious why it also squashes the PDF. Here are two ways to see that the squashing is necessary:

  • If we stretched the PDF without squashing it, there would be too much area under the PDF. The total area under a PDF must equal 1, before and after the transformation.
  • Think of a scale transformation as a change of units, and consider the units on the vertical axis. If \(X\) is in meters and \(Y = 100 X\) is in centimeters, then the units on the vertical axis change from “probability per meter” to “probability per centimeter”. One centimeter is shorter than one meter, so there should be less probability per centimeter than probability per meter!

Now that we understand the intuition, we can formalize the result.

Proposition 20.2 (Scale Transformation) Suppose \(X\) is a continuous random variable with PDF \(f_X(x)\). Let \(Y = aX\) be a scale transformation of \(X\) for some constant \(a \neq 0\). Then the PDF of \(Y\) is \[ f_Y(x) = \frac{1}{|a|} f_X\Big(\frac{x}{a}\Big). \tag{20.2}\]

Proof. We will prove the result for \(a > 0\), leaving the case \(a < 0\) to Exercise 20.1.

We apply the strategy from Section 20.1. First, we determine the CDF of \(Y\), in terms of the CDF of \(X\).

\[\begin{align*} F_Y(x) &= P(Y \leq x) \\ &= P(aX \leq x) \\ &= P\Big(X \leq \frac{x}{a}\Big) \\ &= F_X\Big(\frac{x}{a}\Big) \end{align*}\]

Now we take the derivative to obtain the PDF of \(Y\), remembering to apply the Chain Rule in the last step.

\[ f_Y(x) = \frac{d}{dx} F_Y(x) = \frac{d}{dx} F_X\Big(\frac{x}{a}\Big) = \frac{1}{a} f_X\Big( \frac{x}{a} \Big). \] This matches Equation 20.2 when \(a \geq 0\).

Finally, a location-scale transformation is just a scale transformation followed by a location transformation. So we can obtain the next result by simply combining Proposition 20.1 and Proposition 20.2.

Proposition 20.3 (Location-Scale Transformation) Suppose \(X\) is a continuous random variable with PDF \(f_X(x)\). Let \(Y = aX + b\) be a location-scale transformation of \(X\) for some constants \(a \neq 0\) and \(b\). Then the PDF of \(Y\) is \[ f_Y(x) = \frac{1}{|a|} f_X\Big(\frac{x - b}{a}\Big). \tag{20.3}\]

Proof. We will define the intermediate random variable \(Z = aX\). Then, \(Y = Z + b\) is a location transformation of \(Z\), so by Proposition 20.1: \[ f_Y(x) = f_Z(x - b). \] But \(Z = aX\) is a scale transformation of \(X\), so by Proposition 20.2: \[ f_Z(x - b) = \frac{1}{|a|} f_X\Big( \frac{x - b}{|a|} \Big). \]

Now let’s apply location-scale transformations to an example.

Example 20.3 (Converting Celsius to Fahrenheit) In Example 18.7, we modeled the daily high temperature \(C\) as a continuous random variable with PDF \[ f_C(x) = \frac{1}{k} e^{-x^2/18}; -\infty < x < \infty, \] where \(k\) was a constant that makes the total area equal to 1. We determined the constant \(k\) to be about \(7.5\). This PDF was graphed in Figure 18.4.

An American visitor to Iqaluit might want to know the temperature in Fahrenheit. But this is just a location-scale transformation of the temperature in Celsius! In particular: \[ F = g(C) = \textcolor{green}{\frac{9}{5}} C + \textcolor{orange}{32}. \]

We can derive the PDF of \(F\) using Proposition 20.3 above, with \(a = \textcolor{green}{9/5}\) and \(b = \textcolor{orange}{32}\). Therefore, the PDF of \(F\) is:

\[ \begin{aligned} f_F(x) &= \frac{1}{\textcolor{green}{9/5}} f_C\Big(\frac{x-\textcolor{orange}{32}}{\textcolor{green}{9/5}}\Big) \\ &= \frac{1}{\textcolor{green}{9/5}} \frac{1}{k} e^{\displaystyle -\Big( \frac{x-\textcolor{orange}{32}}{\textcolor{green}{9/5}} \Big)^2 / 18} \\ &\approx \frac{1}{13.53579} e^{-(x - 32)^2 / 58.32} \end{aligned} \tag{20.4}\]

Let’s graph the PDF in Fahrenheit (Equation 20.4):

Figure 20.3: PDF of the daily high temperature (in Fahrenheit) in Iqaluit in May

Compared to Figure 18.4, this PDF is centered around the freezing point in Fahrenheit (\(32^\circ\)) instead of the freezing point in Celsius (\(0^\circ\)).

20.3 The Probability Integral Transform

Let \(X\) be a continuous random variable with CDF \(F(x)\). Then \(U = F(X)\) is also a continuous random variable. What is the distribution of \(U\)?

This particular class of transformations, where we plug a random variable into its own CDF, is called the probability integral transformation. At first, it may not be clear why anyone would do such a thing, but we will see that it is actually one of the most useful tricks in all of probability.

Theorem 20.1 (Probability Integral Transform) Let \(X\) be a continuous random variable with CDF \(F(x)\). Then \(U = F(X)\) is uniformly distributed between 0 and 1. That is, its PDF is \[ f_U(x) = \begin{cases} 1 & 0 < x < 1 \\ 0 & \text{otherwise} \end{cases}. \]

Proof. For simplicity, we will assume that \(F(x)\) is strictly increasing, although the theorem is valid even without this assumption.

Using the strategy from Section 20.1, we first calculate the CDF of \(U\). Note that \(U = F(X)\) is a probability, so its value must be between 0 and 1. For \(0 < x < 1\),we have: \[ \begin{align*} F_U(x) &= P(U \leq x) \\ &= P(F(X) \leq x) \\ &= P(X \leq F^{-1}(x)) \\ &= F(F^{-1}(x)) \\ &= x. \end{align*} \]

In the third line, we used the fact that \(F(x)\) is strictly increasing to conclude that the inverse CDF \(F^{-1}(p)\) is well-defined for \(0 < p < 1\), with \(F^{-1}(F(x)) = x\).

Finaly, we take the derivative to obtain the PDF of \(U\): \[f_U(x) = F'_U(x) = 1,\] and this formula is valid on the support of \(U\), \(0 < x < 1\). The full PDF is \[ f_U(x) = \begin{cases} 1 & 0 < x < 1 \\ 0 & \text{otherwise} \end{cases}. \]

What use is knowing that \(U = F(X)\) has a standard uniform distribution? Some direct applications of the probability integral transform are suggested in Exercise 20.4. But the most compelling application derives from the inverse transformation: \(X = F^{-1}(U)\). This suggests that we can simulate values of \(X\) from any (continuous) distribution by first simulating \(U\) and then calculating \(F^{-1}(U)\). This trick is called inverse transform sampling, and it follows immediately from Theorem 20.1.

Proposition 20.4 (Inverse Transform Sampling) Let \(U\) be a uniform random variable on \((0, 1)\), and let \(F(x)\) be a valid CDF, with \(F^{-1}(p)\) well-defined for all \(p \in (0, 1)\). Then, \[ X = F^{-1}(U) \] is a random variable whose CDF is \(F(x)\).

Proposition 20.4 is useful because a programming language may not have a built-in function to simulate from the distribution you want, but every programming language has a function to generate uniform random numbers between 0 and 1.

Example 20.4 (Simulating the First Arrival Time) Suppose we want to simulate the time \(T\) that the Geiger counter in Example 18.8 clicks for the first time after it is turned on. We showed in Example 18.8 that the CDF of \(T\) is \[ F_T(x) = \begin{cases} 1 - e^{-\mean x} & x > 0 \\ 0 & x \leq 0 \end{cases}. \]

We can obtain the inverse CDF \(F_T^{-1}(p)\) by solving \[ p = F_T(x) = 1 - e^{-\mean x} \] for \(x\). The solution is \[ x = F_T^{-1}(p) = - \frac{1}{\mean} \ln(1 - p). \]

Therefore, by Proposition 20.4, we should be able to simulate \(T\) by first simulating a uniform random number \(U\) between 0 and 1 and then calculating \[ T = F_T^{-1}(U) = -\frac{1}{\mean} \log(1 - U). \] (Note that \(\log\) is the natural logarithm with base \(e\).)

Let’s simulate 10000 \(T\)s using this approach and see how well they agree with the PDF of \(T\), which we know from Example 18.9 to be \[ f_T(t) = \begin{cases} \mean e^{-\mean t} & t > 0 \\ 0 & \text{otherwise} \end{cases}. \]

20.4 Exercises

Exercise 20.1 Complete the proof of Proposition 20.2 by showing that Equation 20.2 also holds when \(a < 0\).

Hint: Remember that when you multiply or divide both sides of an inequality by a negative number, the direction of the inequality flips!

Exercise 20.2 Here’s one way to draw a random square. Start by picking a length \(U\), which is equally likely to be any value between 0 and 1. Now, draw a square whose sides are length \(U\). What is the PDF of \(S\), the area of the random square? Check your answer by simulation.

Exercise 20.3 Write code to simulate a random variable \(X\) with the half-triangle PDF \[ \begin{equation} f(x) = \begin{cases} 1 - \frac{x}{2} & 0 \leq x < 2 \\ 0 & \text{otherwise} \end{cases}. \end{equation} \tag{20.5}\]

Exercise 20.4 A common problem in statistics is to determine whether data \(x\) is too large to have plausibly come from a distribution with CDF \(F\). One way to do this is to calculate the probability of observing \(x\) or greater, \(p = 1 - F(x)\), and if this \(p\)-value is small (say, less than \(.05\)), then we conclude that \(x\) did not come from that distribution.

Now, suppose that the data \(X\) is a random variable that really does have CDF \(F\). What is the distribution of the \(p\)-value?