25  Covariance

In Chapter 15, we defined the covariance between two discrete random variables as a measure of the relationship between them. In this chapter, we extend covariance to continuous random variables.

Definition 25.1 (Covariance) Let \(X\) and \(Y\) be random variables. Then, the covariance of \(X\) and \(Y\) is \[ \text{Cov}[ X, Y ] = \text{E}[ (X - \text{E}[ X ])(Y - \text{E}[ Y ]) ]. \tag{25.1}\]

Notice that Definition 25.1 is identical to Definition 15.1 because we defined covariance in terms of expected values. Since the properties of expectation are the same for both discrete and continuous random variables, all of the properties of covariance from Chapter 15 (and their proofs) carry over. Therefore, in this chapter, we only restate these properties; for their proofs, please consult Chapter 15.

25.1 Properties of Covariance

Proposition 25.1 (Shortcut formula for covariance) An equivalent formula to Equation 25.1 is \[ \text{Cov}[ X, Y ] = \text{E}[ XY ] - \text{E}[ X ] \text{E}[ Y ]. \tag{25.2}\]

Example 25.1 (Covariance between the times that Harry and Sally arrive) Continuing Example 24.3, what is \(\text{Cov}[ X, Y ]\), the covariance between the time that Harry arrives and the time that Sally arrives?

By Proposition 25.1, it is \[ \text{Cov}[ X, Y ] = \text{E}[ XY ] - \text{E}[ X ] \text{E}[ Y ]. \] But since Harry and Sally arrive at independent times, \(\text{E}[ XY ] = \text{E}[ X ]\text{E}[ Y ]\), so the covariance is zero.

Example 25.1 is an example of a more general phenomenon. When two random variables \(X\) and \(Y\) are independent, their covariance is zero.

Proposition 25.2 (Independent random variables have covariance zero) Let \(X\) and \(Y\) be independent random variables. Then, \[ \text{Cov}[ X, Y ] = 0. \]

Example 25.2 (Covariance between the times that the first and second person arrives) Continuing Example 24.3, let \(L\) be the time that the first person (Harry or Sally) arrives, and let \(M\) be the time that the second person arrives. What is \(\text{Cov}[ L, M ]\)?

By Proposition 25.1, it is \[ \text{Cov}[ L, M ] = \text{E}[ LM ] - \text{E}[ L ] \text{E}[ M ]. \] We showed in Example 24.3 that \(\text{E}[ LM ]\) is not equal to \(\text{E}[ L ] \text{E}[ M ]\). The covariance is precisely the difference between them. Using the values we calculated previously, \[ \text{Cov}[ L, M ] = 450 - (12.5)(32.5) = 43.75. \]

The covariance is positive, which makes sense because if the first person arrives late (i.e., \(L\) is large), then the second person must arrive even later (i.e., \(M\) is also large).

Proposition 25.3 (Relationship between variance and covariance) Let \(X\) be a random variable. Then, \[ \text{Var}[ X ] = \text{Cov}[ X, X ]. \]

Proposition 25.4 (Properties of covariance) Let \(X, Y, Z\) be random variances and \(a\) be a constant. Then, the following are true.

  1. (Symmetry) \(\text{Cov}[ X, Y ] = \text{Cov}[ Y, X ]\).
  2. (Constants cannot covary) \(\text{Cov}[ X, a ] = 0\).
  3. (Multiplying by a constant) \(\text{Cov}[ a X, Y ] = a \text{Cov}[ X, Y ] = \text{Cov}[ X, a Y ]\).
  4. (Distributive property) \(\text{Cov}[ X+Y, Z ] = \text{Cov}[ X, Z ] + \text{Cov}[ Y, Z ]\) and \(\text{Cov}[ X, Y+Z ] = \text{Cov}[ X, Y ] + \text{Cov}[ X, Z ]\).

The properties of covariance, in conjunction with Proposition 25.3, can be used to calculate variances, as we show in the next example.

Example 25.3 (Stock Portfolio Diversification) If we invest all of our savings into a single stock with return \(X_1\), which is a \(\textrm{Normal}(\mu= 5\%, \sigma= 12\%)\) random variable. The probability we get a positive return is \[ P(X_1 > 0) = P(\frac{X_1 - 5}{12} > \frac{0 - 5}{12}) = P(Z > -\frac{5}{12}) = 1 - \Phi(-\frac{5}{12}) \approx .6615. \]

There is nearly a \(1/3\) chance that we lose money. One way to measure the risk of this stock portfolio is the variance: \[ \text{Var}[ X_1 ] = \sigma^2 = 144. \]

Now, suppose we divide up our savings evenly among 3 independent stocks with returns \(X_1, X_2, X_3\), each of which is a \(\textrm{Normal}(\mu= 5\%, \sigma= 12\%)\) random variable. That is, the return of our portfolio is \[ R = \frac{1}{3} X_1 + \frac{1}{3} X_2 + \frac{1}{3} X_3. \] The expected return of this portfolio is still \[ \text{E}[ R ] = \frac{1}{3} \text{E}[ X_1 ] + \frac{1}{3} \text{E}[ X_2 ] + \frac{1}{3} \text{E}[ X_3 ] = 5\%, \] but the risk is now only \[ \begin{aligned} \text{Var}[ R ] &= \text{Cov}[ R, R ] \\ &= \text{Cov}[ \frac{1}{3} X_1 + \frac{1}{3} X_2 + \frac{1}{3} X_3, \frac{1}{3} X_1 + \frac{1}{3} X_2 + \frac{1}{3} X_3 ] \\ &= \frac{1}{9} \text{Var}[ X_1 ] + \frac{1}{9} \text{Var}[ X_2 ] + \frac{1}{9} \text{Var}[ X_3 ] + \text{all other terms $0$} \\ &= \frac{3 \cdot 144}{9} \\ &= 48. \end{aligned} \]

We can make the risk smaller by dividing our portfolio among more and more independent stocks. There is just one catch: it is difficult to find stocks whose returns are truly independent.

Suppose that the three stocks \(X_1, X_2, X_3\) are not independent but have covariance \(\text{Cov}[ X_i, X_j ] = 60\). The expected return is still \(5\%\) by linearity of expectation, but the risk is now

\[ \begin{aligned} \text{Var}[ R ] &= \text{Cov}[ R, R ] \\ &= \text{Cov}[ \frac{1}{3} X_1 + \frac{1}{3} X_2 + \frac{1}{3} X_3, \frac{1}{3} X_1 + \frac{1}{3} X_2 + \frac{1}{3} X_3 ] \\ &= 3 \cdot \frac{1}{9} \text{Var}[ X_1 ] + 6 \cdot \frac{1}{9} \text{Cov}[ X_1, X_2 ] \\ &= \frac{3 \cdot 144 + 6 \cdot 60}{9} \\ &= 88. \end{aligned} \]

There is still a benefit to diversification, but it is diluted by the positive covariance between the stocks.

25.2 Correlation

One consequence of Proposition 25.4 is that covariance is sensitive to the choice of units.

For example, in Example 25.2, we calculated the covariance to be \(43.75\ \text{minutes}^2\). If \(X\) and \(Y\) had instead been measured in hours, then the covariance would be \[ \text{Cov}[ \frac{X}{60}, \frac{Y}{60} ] = \frac{1}{60^2} \text{Cov}[ X, Y ] = 0.01215\ \text{hours}^2. \] The magnitude of the covariance can vary wildly depending on the choice of units!

Since changing the units does not meaningfully affect the relationship between two random variables, we might prefer a summary that is not sensitive to the choice of units. The next definition provides a summary of the relationship between two random variables that is invariant to the choice of units; in fact, it is unitless.

Definition 25.2 (Correlation coefficient) Let \(X\) and \(Y\) be random variables. The correlation coefficient of \(X\) and \(Y\) is \[ \text{Corr}[ X, Y ] = \frac{\text{Cov}[ X, Y ]}{\sqrt{\text{Var}[ X ] \text{Var}[ Y ]}}. \tag{25.3}\]

Notice that the correlation coefficient has the same sign as the covariance. So we can still interpret a positive correlation coefficient as indicating that \(X\) and \(Y\) tend to move in the same direction and a negative correlation coefficient as indicating that \(X\) and \(Y\) tend to move in opposite directions.

The next result shows that the correlation coefficient does not change when we change units.

Proposition 25.5 (Invariance of the correlation coefficient) Let \(X\) and \(Y\) be random variables, and let \(X' = aX + b\) and \(Y' = cX + d\) for constants \(a, c > 0\) and \(b, d \in \mathbb{R}\). Then, \[ \text{Corr}[ X', Y' ] = \text{Corr}[ X, Y ]. \tag{25.4}\]

We evaluate \(\text{Corr}[ X', Y' ]\) by first calculating the numerator, \(\text{Cov}[ X', Y' ]\), using properties of covariance (Proposition 25.4):

\[ \begin{align} \text{Cov}[ X', Y' ] &= \text{Cov}[ aX + b, cY + d ] \\ &= ac\text{Cov}[ X, Y ]. \end{align} \]

Next, we calculate \(\text{Var}[ X' ]\) and \(\text{Var}[ Y' ]\) using Proposition 25.3 and properties of covariance:

\[ \begin{align} \text{Var}[ X' ] &= \text{Cov}[ X', X' ] \\ &= \text{Cov}[ aX + b, aX + b ] \\ &= a^2 \text{Var}[ X ] \\ \text{Var}[ Y' ] &= \text{Cov}[ Y', Y' ] \\ &= \text{Cov}[ cY + d, cY + d ] \\ &= c^2 \text{Var}[ Y ]. \end{align} \]

Therefore, the correlation is \[ \begin{align} \text{Corr}[ X', Y' ] &= \frac{\text{Cov}[ X', Y' ]}{\text{Var}[ X' ] \text{Var}[ Y' ]} \\ &= \frac{ac \text{Cov}[ X, Y ]}{\sqrt{a^2 \text{Var}[ X ] \cdot c^2 \text{Var}[ Y ]}} \\ &= \frac{ac}{ac} \frac{\text{Cov}[ X, Y ]}{\sqrt{\text{Var}[ X ] \cdot \text{Var}[ Y ]}} \\ &= \text{Corr}[ X, Y ]. \end{align} \]

The next example provides a recipe for simulating correlated random variables.

Example 25.4 (Simulating correlated random variables) It is easy to simulate independent random variables \(X\) and \(Y\). The code below simulates 1000 instances of independent standard normal random variables.

Note that \(\text{Corr}[ X, Y ] = 0\), since independent random variables have zero covariance, by Proposition 25.2.

But as we discussed in Example 25.3, most things in life are not independent. How do we simulate correlated random variables \(X\) and \(Y\)? That is, we want to simulate \(X\) and \(Y\) with \(\text{Corr}[ X, Y ] = \rho\). Here is one possible way.

Start with independent random variables \(Z\) and \(W\) with expected value \(\text{E}[ Z ] = \text{E}[ W ] = 0\) and variance \(\text{Var}[ Z ] = \text{Var}[ W ] = 1\). Now, define \[ \begin{align} X &= Z \\ Y &= \rho Z + \sqrt{1 - \rho^2} W \end{align} \tag{25.5}\]

Now, we check that \(\text{Corr}[ X, Y ] = \rho\). First, we calculate \(\text{Cov}[ X, Y ]\), using properties of covariance (Proposition 25.4):

\[ \begin{align} \text{Cov}[ X, Y ] &= \text{Cov}[ Z, \rho Z + \sqrt{1 - \rho^2} W ] \\ &= \rho \underbrace{\text{Cov}[ Z, Z ]}_{\text{Var}[ Z ] = 1} + \sqrt{1 - \rho^2} \underbrace{\text{Cov}[ Z, W ]}_0 \\ &= \rho \end{align} \]

Next, we evaluate \(\text{Var}[ X ]\) and \(\text{Var}[ Y ]\), using Proposition 25.3 and properties of covariance:

\[ \begin{align} \text{Var}[ X ] &= \text{Var}[ Z ] \\ &= 1 \\ \text{Var}[ Y ] &= \text{Cov}[ Y, Y ] \\ &= \text{Cov}[ \rho Z + \sqrt{1 - \rho^2} W, \rho Z + \sqrt{1 - \rho^2} W ] \\ &= \rho^2 \text{Var}[ Z ] + 2 \rho\sqrt{1 - \rho^2} \text{Cov}[ Z, W ] + (1 - \rho^2) \text{Var}[ W ] \\ &= \rho^2 + (1 - \rho^2) \\ &= 1 \end{align} \]

Therefore, the correlation between \(X\) and \(Y\) is \[ \text{Corr}[ X, Y ] = \frac{\text{Cov}[ X, Y ]}{\sqrt{\text{Var}[ X ] \text{Var}[ Y ]}} = \frac{\rho}{\sqrt{1 \cdot 1}} = \rho. \]

Although \(X\) and \(Y\) have expected value \(0\) and variance \(1\), we can apply a linear transformation to give them any expected value and variance we like, without changing the correlation (by Proposition 25.5): \[ \begin{align} X' &= \mu_1 + \sigma_1 X \\ Y' &= \mu_2 + \sigma_2 Y. \end{align} \]

The code below illustrates how to simulate random variables with correlation \(\rho = 0.6\).

In fact, \(X\) and \(Y\) are also marginally normal, although we do not show this fact here.

Notice that Example 25.4 only allows us to simulate random variables with correlations \(\rho\) between \(-1\) and \(1\). (If \(|\rho| > 1\), then \(\sqrt{1 - \rho^2}\) in Equation 25.5 would be an imaginary number.)

This is not a shortcoming of Example 25.4. In fact, the magnitude of the correlation coefficient can never exceed \(1\), a consequence of the following inequality.

Theorem 25.1 (Cauchy-Schwarz Inequality) Let \(X\) and \(Y\) be random variables. Then,

\[|\text{Cov}[ X, Y ]| \leq \sqrt{\text{Var}[ X ] \text{Var}[ Y ]}. \tag{25.6}\]

Proof

First, we prove Equation 25.6 assuming \(\text{Var}[ X ] = \text{Var}[ Y ] = 1\). In this case, we want to show that \(|\text{Cov}[ X, Y ]| \leq 1\).

We know that the variance of any random variable is non-negative. Consider the variance of \(X - Y\): \[ \begin{align} \text{Var}[ X - Y ] &= \text{Cov}[ X - Y, X - Y ] \\ &= \text{Cov}[ X, X ] - \text{Cov}[ X, Y ] - \text{Cov}[ Y, X ] + \text{Cov}[ Y, Y ] \\ &= \underbrace{\text{Var}[ X ]}_{1} - 2\text{Cov}[ X, Y ] + \underbrace{\text{Var}[ Y ]}_{1} \\ &= 2 - 2\text{Cov}[ X, Y ] \geq 0. \\ \end{align} \] Rearranging this inequality, we obtain \[ \text{Cov}[ X, Y ] \leq 1. \] This is one half of the inequality. To show the other half, we apply the same argument to the variance of \(X + Y\): \[ \begin{align} \text{Var}[ X + Y ] &= 2 + 2\text{Cov}[ X, Y ] \geq 0 \\ \end{align} \]

and obtain \[ \text{Cov}[ X, Y ] \geq -1. \] Putting the two halves together, we obtain \[ -1 \leq \text{Cov}[ X, Y ] \leq 1, \tag{25.7}\] which establishes Equation 25.6 for the case where \(\text{Var}[ X ] = \text{Var}[ Y ] = 1\).

What if \(\sigma_X^2 = \text{Var}[ X ]\) or \(\sigma_Y^2 = \text{Var}[ Y ]\) is not \(1\)? We can convert \(X\) and \(Y\) to random variables, \(X' = \frac{X}{\sigma_X}\) and \(Y\) and \(Y' = \frac{Y}{\sigma_Y}\), with variance \(1\). Now, we can apply Equation 25.7 to \(X'\) and \(Y'\) to obtain \[ -1 \leq \text{Cov}[ \frac{X}{\sigma_X}, \frac{Y}{\sigma_Y} ] \leq 1, \] or equivalently, \[ -\sigma_X \sigma_Y \leq \text{Cov}[ X, Y ] \leq \underbrace{\sigma_X \sigma_Y}_{\sqrt{\text{Var}[ X ]\text{Var}[ Y ]}}, \] which establishes Equation 25.6 in general.

You may be familiar with a Cauchy-Schwarz inequality from linear algebra, which says that for any two vectors \(\vec v, \vec w \in \mathbb{R}^n\), \[ |\vec v \cdot \vec w| \leq ||\vec v|| \ ||\vec w||. \] In fact, Theorem 25.1 is really the same theorem. To see why, consider the following analogy between probability and linear algebra:

  • Random variables \(X\) and \(Y\) are like vectors \(\vec v\) and \(\vec w\).
  • The covariance between two random variables, \(\text{Cov}[ X, Y ]\), is like the inner product between two vectors \(\vec v \cdot \vec w\).
  • The covariance between a random variable and itself is the variance: \(\text{Cov}[ X, X ] = \text{Var}[ X ]\). Similarly, the inner product between a vector and itself is the squared magnitude: \(\vec v \cdot \vec v = || \vec v ||^2\).

Each property of covariance (Proposition 25.4) corresponds to an analogous property of the inner product. Therefore, any theorem for the inner product will have an analogous theorem for covariance.

Note that Theorem 25.1 immediately implies that the correlation coefficient is between \(-1\) and \(1\). To see why, divide both sides of Equation 25.6 by \(\sqrt{\text{Var}[ X ] \text{Var}[ Y ]}\) to obtain \[ \left| \frac{\text{Cov}[ X, Y ]}{\sqrt{\text{Var}[ X ]\text{Var}[ Y ]}} \right| \leq 1. \] The quantity inside the absolute value is precisely \(\text{Corr}[ X, Y ]\). This fact is summarized in the following corollary.

Corollary 25.1 (Correlation cannot exceed 1) For any random variables \(X\) and \(Y\), \[ -1 \leq \text{Corr}[ X, Y ] \leq 1. \]