23  Joint Distributions

In this chapter, we describe the distributions of multiple continuous random variables. To get the most out of this chapter, please reread to review how we described the distributions of multiple discrete random variables. You will see that many concepts are identical, except with PMFs replaced by PDFs and sums replaced by integrals.

23.1 Joint PDF

In , we described the distribution of two discrete random variables X and Y by their joint PMF fX,Y(x,y)=P(X=x,Y=y). The joint PMF is useless for continuous random variables X and Y because P(X=x,Y=y)=0 for any values x and y. Instead, we shall describe the distribution of two continuous random variables by their joint PDF.

The joint PDF is the natural extension of the PDF () to multiple random variables. Whereas a PDF fX(x) could be visualized as a curve; the joint PDF fX,Y(x,y) is a surface over the xy-plane, as shown in .

Figure 23.1: The joint PDF of X and Y

The probability of any event B is the volume under this surface. We begin with an example where it is easy to determine this volume geometrically.

Example 23.1 (When Harry met Sally?) Harry and Sally plan to meet at Katz’s Deli for lunch. Let X and Y be the times that Harry and Sally arrive, respectively, in minutes after noon.

Suppose the joint PDF of X and Y is fX,Y(x,y)={c,0x30,0y600,else, where c is a constant. That is, the surface is flat over the support of the distribution, as shown in .

Figure 23.2: The joint PDF of X and Y

In order to calculate any probabilities, we first need to determine the height of this surface. To do so, we use the fact that the total probability must be 1; that is, the total volume under the surface must be 1. Since the volume under the surface is a rectangular prism, its volume is total volume under surface=baseheight=(3060)c. Setting this total volume equal to 1, we obtain c=11800.

Using this information, we can calculate probabilities. For example, suppose Harry and Sally will each wait 15 minutes for the other to arrive. What is the probability that they meet? In other words, we want to determine P((X,Y)B)=P(|XY|<15).

To do this, we determine the set B in the xy-plane, and calculate the volume under the joint PDF above B, the blue prism shown in .

Figure 23.3: The probability of the event B where Harry and Sally arrive less than 15 minutes apart.

shows the full picture, but it is unwieldy to draw. For calculating probabilities, we usually just need a bird’s-eye view of the event B. Consider from the perspective of Rand the Raven, who is flying high above the xy-plane. The picture from Rand’s perspective is shown in .

Figure 23.4: A bird’s-eye view of .

To calculate the volume of the prism, we just need to determine the area of its base B in and multiply by the height of the surface, c=11800. That is, P(|XY|<15)=volume under surface above B=(area of B)height of surface

The area of B can be determined using basic geometry. One way is to take the area of the 30×45 rectangle and subtract the areas of the two triangles. (area of B)=3045123030121515=787.5

Therefore, the probability that Harry and Sally meet is P(|XY|<15)=787.511800=.4375.

We were able to solve using geometry because the joint PDF surface, fX,Y(x,y), was flat. However, in general, the joint PDF surface will be curved, and integration is necessary to calculate volumes under the surface. Specifically, we will need to calculate the double integral of the joint PDF over B:

P((X,Y)B)=BfX,Y(x,y)dxdy.

The next example illustrates how integration can be used to calculate a probability involving two random variables.

Example 23.2 (Blood Types) The ABO blood group system is used to classify human blood types. These blood types are caused by three genetic variants, or alleles, on chromosome 9: O, A, and B. The frequencies of these three alleles vary from population to population. In one population, it might be 70% O, 15% A, and 15% B, while in another population, it might be 60% O, 10% A, and 30% B.

The frequencies of these three alleles in a population are random variables; let’s call them X, Y, and Z, respectively. The joint distribution of X and Y can be modeled by f(x,y)={12x2x,y>0,x+y<10otherwise. Note that X+Y+Z=1, so Z is determined once we know X and Y.

What is P(Z>.3)=P(1XY>.3), the probability that more than 30% of the alleles in a population will be B? To calculate this probability, we sketch a bird’s-eye view of the support of the distribution and the set B={1XY>.3} in .

Figure 23.5: Joint PDF of X and Y, along with the event B={1XY>.3}.

The surface f(x,y) is not flat, so to calculate the volume under the surface, we will need to calculate a double integral over B. To evaluate this integral, we convert the double integral into an iterated integral.

P(1XY>.3)=Bf(x,y)dxdy=0.70.7y12x2dx4x3|0.7ydy=0.74(.7y)3dy=.2401.

We would have arrived at the same answer if we had set up the iterated integral in the other order, dydx instead of dxdy.

P(1XY>.3)=0.70.7x12x2dydx=0.712x20.7x1dydx=0.712x2(.7x)dx=.2401.

So there is only about a .24 probability that the frequency of B alleles in a population exceeds 30%.

23.2 Marginal Distribution

In , we saw that we can recover the PMF of a discrete random variable X from the joint PMF of X and Y, by summing over all the possible values of Y: fX(x)=yfX,Y(x,y).

For continuous random variables, the idea is the same, except we integrate the PDF instead of summing the PMF.

Definition 23.1 (Marginal PDF) Let fX,Y(x,y) be the joint PDF of X and Y. Then, the marginal PDF of X is fX(x)=fX,Y(x,y)dy.

The marginal PDF of Y, fY(y), can be found similarly by integrating the joint PDF with respect to x.

Let us revisit and determine the marginal distributions of the times that Harry and Sally arrive.

Example 23.3 (Marginal distribution of Harry’s and Sally’s arrival times) The marginal distribution of X, Harry’s arrival time after noon, is fX(x)=fX,Y(x,y)dy=06011800dy=130 for 0<x<30. We recognize this as the PDF of the Uniform(a=0,b=30) distribution.

Similarly, the marginal distribution of Y, Sally’s arrival time after noon, is fY(y)=fX,Y(x,y)dx=306011800dx=160 for 0<y<60. We recognize this as the PDF of the Uniform(a=0,b=60) distribution.

That is, Harry is equally likely to arrive at any time between 12:30 and 1:00, and Sally is equally likely to arrive at any time between 12:00 and 1:00.

In , the joint PDF ends up being the product of the marginal PDFs: fX,Y(x,y)=11800=130160=fX(x)fY(y) for 30<x<60,0<y<60.

This proves to be a useful characterization of independence for continuous random variables.

Definition 23.2 (Joint PDF of independent random variables) Two continuous random variables X and Y are independent if their joint PDF is the product of the marginals: (23.1)fX,Y(x,y)=fX(x)fY(y).

Notice that for any events A,BR, P(XA,YB)=A×BfX,Y(x,y)dydx=ABfX(x)fY(y)dydx=AfX(x)BfY(y)dyP(YB)dx=P(YB)AfX(x)dx=P(YB)P(XA), so this definition leads to all the properties of independence that we expect.

The next example illustrates the importance of respecting the support of the distribution when calculating marginal PDFs.

Example 23.4 (Marginal distribution of allele frequencies) In , we examined a particular model for the joint distribution of the allele frequencies of the O and A blood types.

The marginal distribution of X, the allele frequency of the O blood type, is fX(x)=fX,Y(x,y)dy.

However, the support of this joint distribution is {(x,y):x,y>0,x+y<1}. This support was illustrated in . For a given value of x, the values of y that are in the support are (0,1x). Therefore: (23.2)fX(x)=01x12x2dy=12x2(1x)0<x<1.

is graphed in . Populations are more likely than not to have more than 50% O alleles.

Figure 23.6: Marginal PDF of the allele frequency of the O blood type

Similarly, the marginal PDF of Y, the frequency of the A allele, is (23.3)fY(y)=01y12x2dx=4(1y)30<y<1.

is graphed in . Populations are most likely to have a small frequency of A alleles.

Figure 23.7: Marginal PDF of the allele frequency of the A blood type

Notice that the product of the marginal PDFs is fX(x)fY(y)=48x2(1x)(1y)3 for 0<x,y<1, which is very different from the joint PDF fX,Y(x,y). Therefore, X and Y are not independent. This is intuitive because the frequencies of the alleles must add up to 1, so an increase in the frequency of one allele must be associated with a decrease in another.

23.3 Conditional Distribution

The conditional PDF is a way to describe information that one random variable provides about another. The definition of conditional PDF is the continuous analog of .

Definition 23.3 (Conditional PDF) The conditional PDF of X given Y is fXY(xy)=fX,Y(x,y)fY(y), provided fY(y)0.

Example 23.5 (Sally’s arrival time) Suppose Harry arrives at the deli at 12:15. What is the distribution of Sally’s arrival time, conditional on this information?

In other words, we want to know fY,X(y15). Using our results from , fYX(y15)=fX,Y(15,y)fX(15)=1/18001/30=160 for 0<y<60. Notice that this is the same as the marginal PDF fY(y).

In fact, for any 0<x<30, fYX(yx)=160 for 0<y<60. This means that regardless of Harry’s arrival time, Sally is equally likely to at any time between 12:00 and 1:00. In other words, their arrival times are independent!

Example 23.6 (Conditional frequencies of blood type alleles) In , we considered a model for the frequencies of the O and A blood type alleles.

If the frequency of the O blood type in a population is 60%, what is the distribution of the A blood type? By , the conditional PDF of Y given X is fY|X(y|x)=fX,Y(x,y)fX(x)=12x212x2(1x)=11x for 0<y<1x. In particular, fY|X(y|.6)=1.4 for 0<y<.4, so the conditional distribution of Y given X=.6 is Uniform(a=0,b=.4).

On the other hand, if the frequency of the A blood type in a population is 30%, what is the distribution of the O blood type? By , the conditional PDF of X given Y is fX|Y(x|y)=fX,Y(x,y)fY(y)=12x24(1y)3 for 0<x<1y. In particular, fX|Y(x|.3)=12x21.372 for 0<y<.7. This is not one of the named distributions that we learned.

We can see that X and Y by either noting that fY|X(y|x)fY(y) or that fX|Y(x|y)fX(x).

23.4 Further Examples

Example 23.7 (Lifetimes of light bulbs) You buy two identical light bulbs of the same brand and type. The lifetimes of the light bulbs are independent and identically distributed (i.i.d.) random variables X and Y.

Suppose we model their lifetimes (in years) by Exponential(λ=0.4) distributions. Then, their joint PDF, by , is f(x,y)=0.4e0.4x0.4e0.4y for x,y>0.

What is the probability that the first light bulb lasts shorter than the second, P(X<Y)? This event is sketched in .

Figure 23.8: Probability that the first light bulb lasts shorter than the second.

We can calculate this probability by setting up a double integral: P(X<Y)=0x0.4e0.4x0.4e0.4ydydx=00.4e0.4xx0.4e0.4ydye0.4xdx=00.4e0.8xdx=12.

In hindsight, this answer was obvious by symmetry. Since the two light bulbs are identical, there is no reason that one light bulb should be more likely to last longer than the other. Therefore, P(X<Y)=P(Y<X), and since P(X=Y)=0 for continuous random variables, these two events must each have probability 12.

illustrates that there is usually a way to avoid double integrals in most practical problems. When the distribution is uniform, as in , the probability can be calculated geometrically. When the random variables are i.i.d., we can appeal to symmetry. We will see more techniques in the upcoming chapters.

Example 23.8 (Three light bulbs) What if in , there had been three identical light bulbs, whose lifetimes X, Y, and Z are i.i.d.? What is P(X<Y<Z)?

To calculate this by integration, we would need to set up a triple integral over their joint PDF, fX,Y,Z(x,y,z).

Instead, we will appeal to symmetry. X<Y<Z is only one possible ordering of the lifetimes of the three light bulbs, but there are other possible orderings, such as Y<Z<X or Z<Y<X. By symmetry, no ordering should be more likely than any other. So all of the orderings are equally likely and P(X<Y<Z)=13!=16.

The next example is a surprising application of joint PDFs. In , we saw that the PDF of the standard normal distribution is fZ(x)=12πex2/2,<x<. Where does the normalizing constant 1/2π come from? To answer this question, we will first make the problem more complicated by considering the joint distribution of two independent standard normal random variables.

Example 23.9 (Normalizing constant of the standard normal) Determining the normalizing constant k of the standard normal is not straightforward because there is no “simple” expression for the indefinite integral ex2/2dx. (If you don’t believe it, try to find the antiderivative!)

The solution is to introduce two independent standard normal random variables. Let’s call them X and Y. Because they are independent, their joint PDF must be the product of the marginal PDFs: f(x,y)=fX(x)fY(y)=1kex2/21key2/2=1k2e(x2+y2)/2. As we saw above, the total volume under any joint PDF must equal one.

But this volume can be obtained as a solid of revolution! Specifically, the solid under this curve can be obtained by rotating the red shaded region in around the vertical axis.

Figure 23.9: The red region can be rotated around the vertical axis to obtain the solid under 1k2ex2+y2.

The volume of this solid of revolution is most easily calculated using cylindrical shells. Each shell has a height of f(x)=1k2ex2/2 and a base with area 2πxdx, so the total volume is Total Volume=01k2ex2/22πxdx=2πk20ex2/2xdx=2πk2[ex2/2]0=2πk2. Since we know the total volume must be one, we can solve for k: 2πk2=1k=2π.