42  Multivariate Transformations

Given i.i.d. data \(X_1, \dots, X_n\), we know how to determine the distributions of statistics such as

A general statistic is of the form \(g(X_1, \dots, X_n)\). In this chapter, we will learn a general strategy for deriving the distribution of a general statistic when the random variables are continuous and \(g\) is differentiable.

42.1 Univariate Transformations

As a warm-up, consider the case of a single continuous random variable \(X\) with PDF \(f_X(x)\). What is the PDF of \(Y = g(X)\)?

In Chapter 20, we presented a general strategy for deriving the PDF of \(Y\). First, find the CDF of \(Y\), then differentiate to obtain the PDF. However, when \(g(x)\) is a differentiable function that is strictly increasing (or decreasing), there is a simple formula.

Theorem 42.1 (Univariate transformations) Let \(X\) be a continuous random variable with PDF \(f_X(x)\) with support \((a, b)\) (where \(a\) and \(b\) may be \(\pm\infty\)). Let \(g(x)\) be a differentiable one-to-one function on \((a, b)\). Then, the PDF of \(Y = g(X)\) is \[ f_Y(y) = f_X(x) \left| \frac{dx}{dy} \right| = f_X(x) \left| \frac{1}{g'(x)} \right|, \tag{42.1}\] where \(x = g^{-1}(y)\).

Proof

If \(g(x)\) is one-to-one, then \(g'(x) > 0\) or \(g'(x) < 0\) for all \(x \in (a, b)\).

First, suppose \(g'(x) > 0\). Then, the CDF of \(Y\) is \[ F_Y(y) = P(Y \leq y) = P(g(X) \leq y) = P(X \leq g^{-1}(y)) = F_X(g^{-1}(y)) \] for all \(y\) in the support of \(Y\). Note that we used the fact that \(g\) is strictly increasing to assert that the events \(\{ g(X) \leq y \}\) and \(\{ X \leq g^{-1}(y) \}\) are equivalent.

Now, we differentiate to obtain the PDF: \[ f_Y(y) = f_X(g^{-1}(y)) \frac{dx}{dy}, \] where we applied the chain rule to \(x = g^{-1}(y)\).

On the other hand, if \(g'(x) < 0\), then \[ F_Y(y) = P(Y \leq y) = P(g(X) \leq y) = P(X \geq g^{-1}(y)) = 1 - F_X(g^{-1}(y)), \] and \[ f_Y(y) = - f_X(g^{-1}(y)) \frac{dx}{dy}. \]

Note that \(\frac{dx}{dy} > 0\) in the case where \(g'(x) > 0\), and \(\frac{dx}{dy} < 0\) in the case where \(g'(x) < 0\). If we add an absolute value sign, we can cover both cases with a single formula: \[ f_X(g^{-1}(y)) \left| \frac{dx}{dy} \right|. \]

Notice that by the inverse function theorem, we can write \[ \frac{dx}{dy} = \frac{1}{\frac{dy}{dx}} = \frac{1}{g'(x)}. \]

One natural application of Theorem 42.1 is to location-scale transformations.

Example 42.1 (Location-scale transformations revisited) Here is another proof of Proposition 20.3.

Let \(Y = aX + b\) and consider \(g(x) = ax + b\). Then, \(g'(x) = a\) and by Theorem 42.1, \[ f_Y(y) = f_X(x) \left| \frac{1}{g'(x)} \right| = f_X(x) \frac{1}{|a|}. \]

Although this is technically correct, we usually express the PDF of \(Y\) in terms of \(y\). Since \[ x = g^{-1}(y) = \frac{y-b}{a}, \] we can substitute to obtain \[ f_Y(y) = f_X \left( \frac{y-b}{a} \right) \frac{1}{|a|}, \] which matches the result from Proposition 20.3.

42.2 Multivariate Transformations

Now, we consider the case where we have \(n\) random variables \(X_1, \dots, X_n\), and \(g\) is a differentiable one-to-one function from \(A \subset \mathbb{R}^n\) to \(B \subset \mathbb{R}^n\). That is, \[ (Y_1, \dots, Y_n) = g(X_1, \dots, X_n). \] Then, the natural generalization of Theorem 42.1 is the following.

Theorem 42.2 (Multivariate transformations) Let \(X_1, \dots, X_n\) be continuous random variables with joint PDF \(f_{X_1, \dots, X_n}(x_1, \dots, x_n)\) with support \(A \subset \mathbb{R}^n\). Let \(g\) be a differentiable one-to-one function from \(A\) to \(B \subset \mathbb{R}^n\).

Then, the joint PDF of \((Y_1, \dots, Y_n) = g(X_1, \dots, X_n)\) is \[ f_{Y_1, \dots, Y_n}(y_1, \dots, y_n) = f_{X_1, \dots, X_n}(x_1, \dots, x_n) \left| \det \left[\frac{\partial (x_1, \dots, x_n)}{\partial (y_1, \dots, y_n)}\right] \right|, \tag{42.2}\] where \((x_1, \dots, x_n) = g^{-1}(y_1, \dots, y_n)\) and \[ \left[\frac{\partial (x_1, \dots, x_n)}{\partial (y_1, \dots, y_n)}\right] \overset{\text{def}}{=}\begin{bmatrix} \frac{\partial x_1}{\partial y_1} & \frac{\partial x_1}{\partial y_2} & \dots & \frac{\partial x_1}{\partial y_n} \\ \frac{\partial x_2}{\partial y_1} & \frac{\partial x_2}{\partial y_2} & \dots & \frac{\partial x_2}{\partial y_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial x_n}{\partial y_1} & \frac{\partial x_n}{\partial y_2} & \dots & \frac{\partial x_n}{\partial y_n} \end{bmatrix} \] is the \(n \times n\) Jacobian matrix of partial derivatives.

Note that \[ \det \left[\frac{\partial (x_1, \dots, x_n)}{\partial (y_1, \dots, y_n)}\right] = \det \left[\frac{\partial (y_1, \dots, y_n)}{\partial (x_1, \dots, x_n)}\right]^{-1} = \frac{1}{\det \left[\frac{\partial (y_1, \dots, y_n)}{\partial (x_1, \dots, x_n)}\right]}, \] so we can calculate the Jacobian of the \(x\)s with respect to the \(y\)s, or the Jacobian of the \(y\)s with respect to the \(x\)s, whichever is easier.

Let’s apply Theorem 42.2 to some examples.

Example 42.2 (Gamma-beta connection revisited) In Proposition 41.3, we showed that if \(X\) and \(Y\) are independent \(\text{Gamma}(\alpha, \lambda)\) and \(\text{Gamma}(\beta, \lambda)\) random variables, then

\[ T \overset{\text{def}}{=}\frac{X}{X + Y} \]

follows a \(\text{Beta}(\alpha, \beta)\) distribution and is independent of \(X + Y\). Another way to show this is using Jacobians, and this approach will yield an additional insight.

Let \(S = X + Y\). Then, by Theorem 42.2, the joint PDF of \(S\) and \(T\) is \[ f_{S, T}(s, t) = f_{X, Y}(x, y) \left| \det\left[ \frac{\partial (x, y)}{\partial(s, t)} \right] \right|. \]

In this case, it is easier to calculate the inverse Jacobian, since we already know \((s, t) = g(x, y)\), but we would need to do some work to determine \((x, y) = g^{-1}(s, t)\).

\[\begin{align} \det\left[ \frac{\partial (s, t)}{\partial(x, y)} \right] &= \det \begin{bmatrix} \frac{\partial s}{\partial x} & \frac{\partial s}{\partial y} \\ \frac{\partial t}{\partial x} & \frac{\partial t}{\partial y} \end{bmatrix} \\ &= \det \begin{bmatrix} 1 & 1 \\ \frac{y}{(x + y)^2} & -\frac{x}{(x + y)^2} \end{bmatrix} \\ &= -\frac{1}{x + y}. \end{align}\]

We next solve for \(s\) and \(t\) in terms of \(x\) and \(y\). Since \(s = x + y\), we see that \[ t = \frac{x}{x+y} = \frac{x}{s}, \] and hence, \(x = st\) and \(y = s - x = s - st\). Next,

\[\begin{alignat*}{2} f_{S, T}(s, t) &= f_{X, Y}(x, y) \left| \frac{1}{\frac{-1}{x + y}} \right| \\ &= \left( \frac{\lambda^\alpha}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\lambda x} \right) \left( \frac{\lambda^\beta}{\Gamma(\beta)} y^{\beta - 1} e^{-\lambda y} \right) (x + y) \\ &= \frac{\lambda^\alpha}{\Gamma(\alpha)} (st)^{\alpha - 1} e^{-\lambda st} \frac{\lambda^\beta}{\Gamma(\beta)} (s - st)^{\beta - 1} e^{-\lambda (s - st)} s & (x = st, y = s - st) \\ &= \frac{\lambda^\alpha} {\Gamma(\alpha)} s^{\alpha - 1} t^{\alpha - 1} \frac{\lambda^\beta}{\Gamma(\beta)} s^{\beta-1} (1-t)^{\beta-1} e^{-\lambda s} s \\ &= \frac{\lambda^{\alpha+\beta}}{\Gamma(\alpha)\Gamma(\beta)} \left( s^{\alpha+\beta-1} e^{-\lambda s} \right) \left( t^{\alpha-1}(1-t)^{\beta-1} \right). \end{alignat*}\] We recognize the \(s\) terms as the main part of a \(\text{Gamma}(\alpha+\beta,\lambda)\) PDF and the \(t\) terms as the main part of a \(\text{Beta}(\alpha, \beta)\) PDF. Thus, we can fill in the appropriate constants to get \[ f_{S,T}(s,t) = \underbrace{\frac{\lambda^{\alpha+\beta}}{\Gamma(\alpha + \beta)} s^{\alpha + \beta - 1} e^{-\lambda s}}_{f_S(s)} \cdot \underbrace{\frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha) \Gamma(\beta)} t^{\alpha - 1} (1 - t)^{\beta - 1}}_{f_T(t)}. \] This factorization shows that \(S \sim \text{Gamma}(\alpha + \beta, \lambda)\) and \(T \sim \text{Beta}(\alpha, \beta)\). But it shows something more: \(S\) and \(T\) are also independent!

42.2.1 Simulating normal random variables

Theorem 42.2 provides a basis for simulating normal random variables. We begin by discussing how it can be used to simulate a single normal random variable, which is harder than it sounds.

Example 42.3 (Box-Muller transform) How does a computer simulate values from a normal distribution? We saw a general way to simulate continuous random variables in Proposition 20.4. According to that result, if we start with a \(\textrm{Uniform}(a= 0, b= 1)\) random variable \(U\), then
\[ Z \overset{\text{def}}{=}\Phi^{-1}(U), \tag{42.3}\] follows a \(\textrm{Normal}(\mu= 0, \sigma^2= 1)\) distribution.

However, the standard normal CDF \(\Phi(z)\) has no closed-form formula, so Equation 42.3 is not practical. Here is a simpler way that relies only on standard mathematical functions such as logarithms and cosine, due to Box and Muller (1958).

Let \(U \sim \textrm{Uniform}(a= 0, b= 1)\) and \(T \sim \textrm{Exponential}(\lambda=1)\) be independent random variables. Note that \(T\) is easy to generate using Proposition 20.4 (see Example 20.4). Box and Muller (1958) showed that \[\begin{align} X &= \sqrt{2 T} \cos (2\pi U) \\ Y &= \sqrt{2 T} \sin (2\pi U) \end{align}\] are independent standard normal random variables.

To see this, we apply Theorem 42.2. We first need the Jacobian \[\begin{align*} \det\left[ \frac{\partial(x,y)}{\partial(t,u)} \right] &= \det \left[ \begin{matrix} \frac{\partial x}{\partial t} & \frac{\partial x}{\partial u} \\ \frac{\partial y}{\partial t} & \frac{\partial y}{\partial u} \end{matrix} \right] \\ &= \det\left[ \begin{matrix} \frac{1}{\sqrt{2t}} \cos(2\pi u) & - 2\pi \sqrt{2t} \sin(2\pi u) \\ \frac{1}{\sqrt{2t}} \sin(2\pi u) & 2\pi \sqrt{2t} \cos(2\pi u) \end{matrix} \right] \\ &= 2\pi \cos^2(2\pi u) + 2\pi \sin^2(2\pi u) \\ &= 2\pi, \end{align*}\] which implies \(\displaystyle \det\left[ \frac{\partial(t,u)}{\partial(x,y)} \right] = \frac{1}{2\pi}\).

Therefore, the joint PDF of \(X\) and \(Y\) is \[ f_{X,Y}(x,y) = f_{T,U}(t,u) \cdot \left|\frac{1}{2\pi}\right| = e^{-t} \cdot 1 \cdot \frac{1}{2\pi}. \]

In order to express this joint PDF in terms of \(x\) and \(y\), we observe that \[ \begin{align} x^2 + y^2 &= 2t \cos^2(2\pi u) + 2t \sin^2(2\pi u) = 2t & \Rightarrow & & t &= \frac{x^2 + y^2}{2}. \end{align} \] Substituting this into the expression above, we obtain \[\begin{align*} f_{X,Y}(x,y) &= \frac{1}{2\pi} e^{-\frac{x^2+y^2}{2}} \\ &= \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \cdot \frac{1}{\sqrt{2\pi}} e^{-y^2/2}. \end{align*}\]

This factorization implies that \(X\) and \(Y\) are independent standard normal. In particular, if we integrate over \(y\), we obtain \[ f_X(x) = \frac{1}{\sqrt{2\pi}} e^{-x^2/2}. \]

The code below implements the Box-Muller transform.

Notice that we can now simulate a normal random variable with any mean and variance by simply applying a location-scale transform.

With the Box-Muller transform, we get two independent normal random variables for the price of one. In Example 25.4, we discussed how to transform independent random variables into correlated ones. We can now derive the joint PDF of these correlated normal random variables using Theorem 42.2.

Example 42.4 (Bivariate normal distribution) Let \(Z\) and \(W\) be independent standard normal random variables (perhaps generated by the Box-Muller transform above). Define \[ \begin{align} X &= Z \\ Y &= \rho Z + \sqrt{1 - \rho^2} W. \end{align} \] Now, \(X\) is clearly standard normal, and \(Y\), as the sum of independent normals, is also standard normal (by Example 34.6). But \(X\) and \(Y\) are no longer independent; they will have correlation \(\rho\), as we showed in Example 25.4. \(X\) and \(Y\) are said to follow a bivariate normal distribution.

The joint PDF of \(Z\) and \(W\) is clearly \[ f_{Z, W}(z, w) = f_Z(z) \cdot f_W(w) = \frac{1}{2\pi} e^{-(z^2 + w^2)/2}. \] What is the joint PDF of \(X\) and \(Y\)?

We first compute the Jacobian \[\begin{align*} \det\left[ \frac{\partial(x,y)}{\partial(z,w)} \right] &= \det\left[ \begin{matrix} \frac{\partial x}{\partial z} & \frac{\partial x}{\partial w} \\ \frac{\partial y}{\partial z} & \frac{\partial y}{\partial w} \end{matrix} \right] \\ &= \det \left[ \begin{matrix} 1 & 0 \\ 1 & \sqrt{1-\rho^2} \end{matrix} \right] \\ &= \sqrt{1-\rho^2}. \end{align*}\] We then compute the inverses \[ z = x \qquad \text{and} \qquad w = \frac{y - \rho x}{\sqrt{1-\rho^2}}. \] Next, by Theorem 42.2, \[\begin{align*} f_{X,Y}(x,y) &= f_{Z,W}(z,w) \frac{1}{\sqrt{1-\rho^2}} \\ &= \frac{1}{\sqrt{1-\rho^2}} \cdot \frac{1}{2\pi} \exp\left\{ -\frac{z^2 + w^2}{2} \right\} \\ &= \frac{1}{2\pi \sqrt{1-\rho^2}} \exp \left\{ -\frac{1}{2} \left( x^2 + \left( \frac{y-\rho x}{\sqrt{1 - \rho^2}} \right)^2 \right) \right\} \\ &= \frac{1}{2\pi \sqrt{1-\rho^2}} \exp \left\{ -\frac{1}{2} \frac{(1-\rho^2)x^2 + (y-\rho x)^2}{1-\rho^2} \right\} \\ &= \frac{1}{2\pi \sqrt{1-\rho^2}} \exp \left\{ -\frac{x^2 - 2\rho xy + y^2}{2(1-\rho^2)} \right\}. \end{align*}\]

We can see that if \(\rho \neq 0\), then \(X\) and \(Y\) are not independent; if \(\rho = 0\), we get the joint PDF of independent standard normals.

42.3 Exercises

Exercise 42.1 Let \(X\) and \(Y\) denote the coordinates of a point chosen uniformly in the unit circle. That is, the joint density is \[ f_{X,Y}(x,y) = \frac{1}{\pi}, \qquad x^2 + y^2 \leq 1. \] Find the joint density of the polar coordinates \(R = \sqrt{X^2+Y^2}\) and \(\Theta = \tan^{-1}(Y/X)\).

Exercise 42.2 Let \((X,Y)\) denote a random point in the plane, and assume the rectangular coordinates \(X\) and \(Y\) are independent standard normal random variables. Find the joint distribution of \(R\) and \(\Theta\), the polar coordinates.

Exercise 42.3 Let \(X_1, \dots, X_n\) be i.i.d. \(\text{Exponential}(\lambda)\). Define \[ Y_k = X_1 + \cdots + X_k \] for \(1 \leq k \leq n\).

  1. Find the joint density of \(Y_1, \dots, Y_n\).
  2. Use part (a) to find the density of \(Y_n\). Does your result make sense?
  3. Find the conditional density of \(Y_1, \dots, Y_{n-1}\) given \(Y_n = t\).

Exercise 42.4 Let \(Z_1\) and \(Z_2\) be i.i.d. \(\text{Normal}(0,1)\). Let \(X = Z_1 + Z_2\) and \(Y = Z_1 - Z_2\) – we know both are \(\text{Normal}(0,2)\). Are they independent?

Exercise 42.5 Let \(X\) and \(Y\) be i.i.d. \(\text{Exponential}(\lambda)\). Let \(T = X + Y\) and \(\displaystyle W = \frac{X}{Y}\). Are \(T\) and \(W\) independent? Find the marginal densities of \(T\) and \(W\).