\[ \newcommand{\or}{\textrm{ or }} \newcommand{\and}{\textrm{ and }} \newcommand{\not}{\textrm{not }} \newcommand{\Pois}{\textrm{Poisson}} \newcommand{\E}{\textrm{E}} \newcommand{\var}{\textrm{Var}} \]
\[ \newcommand{\or}{\textrm{ or }} \newcommand{\and}{\textrm{ and }} \newcommand{\not}{\textrm{not }} \newcommand{\Pois}{\textrm{Poisson}} \newcommand{\E}{\textrm{E}} \newcommand{\var}{\textrm{Var}} \]
\[ \def\defeq{\overset{\text{def}}{=}} \def\mean{\textcolor{red}{2.6}} \]
In applications, we are usually less interested in the random variable \(X\) than in a transformation of \(X\), \(g(X)\). This is a new random variable with its own distribution. In this chapter, we will learn how to derive the PDF of \(Y = g(X)\) from the PDF of \(X\).
One situation in which transformations naturally arise is unit conversion. For example:
Here is another situation in which transformations arise. The long jump is one of the seven contests in the heptathlon. Competitors earn points for each event in the heptathlon, and the competitor with the most total points is the winner. But the heptathlon consists of many different contests:
How do we put all these different contests on the same scale? That brings us to the subject of transformations.
The heptathlon scoring problem is a special case of a general question: if \(X\) is a continuous random variable whose distribution is known, what is the distribution of \(Y = g(X)\), where \(g\) is a known function?
There is a simple strategy based on what we learned in Chapter 18.
Let’s apply this strategy to the heptathlon scoring example.
In general, we apply the strategy in this section to determine the PDF of a transformed random variable \(Y = g(X)\). However, for specific classes of transformations \(g\), there are simple formulas for the PDF. The next two sections explore two important classes of transformations.
In this section, we will explore an important class of transformations called location-scale transformations.
Let’s examine location transformations first. If we add \(b\) to a random variable \(X\), then the support and all the probabilities should shift by \(b\). This is illustrated in Figure 20.1.
Now let’s develop this observation into a formula.
Next, let’s examine scale transformations. Suppose we multiply a random variable \(X\) by the constant \(a = 1.5\), producing a new random variable \(Y = 1.5 X\). The support will be stretched out by a factor of \(1.5\) so that if the possible values of \(X\) range from \(0\) to \(6\), the possible values of \(Y\) will range from \(0\) to \(9\). Furthermore, the PDF is “squashed” by a factor of \(1.5\). This effect is illustrated in Figure 20.2.
While it should be clear that a scale transformation stretches the PDF, it may be less obvious why it also squashes the PDF. Here are two ways to see that the squashing is necessary:
Now that we understand the intuition, we can formalize the result.
Finally, a location-scale transformation is just a scale transformation followed by a location transformation. So we can obtain the next result by simply combining Proposition 20.1 and Proposition 20.2.
Now let’s apply location-scale transformations to an example.
Let \(X\) be a continuous random variable with CDF \(F(x)\). Then \(U = F(X)\) is also a continuous random variable. What is the distribution of \(U\)?
This particular class of transformations, where we plug a random variable into its own CDF, is called the probability integral transformation. At first, it may not be clear why anyone would do such a thing, but we will see that it is actually one of the most useful tricks in all of probability.
What use is knowing that \(U = F(X)\) has a standard uniform distribution? Some direct applications of the probability integral transform are suggested in Exercise 20.4. But the most compelling application derives from the inverse transformation: \(X = F^{-1}(U)\). This suggests that we can simulate values of \(X\) from any (continuous) distribution by first simulating \(U\) and then calculating \(F^{-1}(U)\). This trick is called inverse transform sampling, and it follows immediately from Theorem 20.1.
Proposition 20.4 is useful because a programming language may not have a built-in function to simulate from the distribution you want, but every programming language has a function to generate uniform random numbers between 0 and 1.
Exercise 20.1 Complete the proof of Proposition 20.2 by showing that Equation 20.2 also holds when \(a < 0\).
Hint: Remember that when you multiply or divide both sides of an inequality by a negative number, the direction of the inequality flips!
Exercise 20.2 Here’s one way to draw a random square. Start by picking a length \(U\), which is equally likely to be any value between 0 and 1. Now, draw a square whose sides are length \(U\). What is the PDF of \(S\), the area of the random square? Check your answer by simulation.
Exercise 20.3 Write code to simulate a random variable \(X\) with the half-triangle PDF \[ \begin{equation} f(x) = \begin{cases} 1 - \frac{x}{2} & 0 \leq x < 2 \\ 0 & \text{otherwise} \end{cases}. \end{equation} \tag{20.5}\]
Exercise 20.4 A common problem in statistics is to determine whether data \(x\) is too large to have plausibly come from a distribution with CDF \(F\). One way to do this is to calculate the probability of observing \(x\) or greater, \(p = 1 - F(x)\), and if this \(p\)-value is small (say, less than \(.05\)), then we conclude that \(x\) did not come from that distribution.
Now, suppose that the data \(X\) is a random variable that really does have CDF \(F\). What is the distribution of the \(p\)-value?