25  Covariance

In Chapter 15, we defined the covariance between two discrete random variables as a measure of the relationship between them. The same definition works for continuous random variables.

Definition 15.1 (Covariance)

Let \(X\) and \(Y\) be random variables. Then, the covariance between \(X\) and \(Y\) is \[ \text{Cov}\!\left[ X, Y \right] \overset{\text{def}}{=}\text{E}\!\left[ XY \right] - \text{E}\!\left[ X \right]\text{E}\!\left[ Y \right]. \]

Example 25.1 (Covariance between waiting times) Continuing Example 24.3, what is \(\text{Cov}\!\left[ X, Y \right]\), the covariance between the times that you and your friend have to wait?

By Equation 15.1, it is \[ \text{Cov}\!\left[ X, Y \right] = \text{E}\!\left[ XY \right] - \text{E}\!\left[ X \right] \text{E}\!\left[ Y \right]. \] But since waiting times are independent, \(\text{E}\!\left[ XY \right] = \text{E}\!\left[ X \right]\text{E}\!\left[ Y \right]\), so the covariance is \(0\). This makes sense because covariance is a measure of the relationship between two random variables, and \(X\) and \(Y\) have “no” relationship.

Example 25.2 (Covariance between the times that the first and second person enters) Continuing Example 24.3, let \(L\) be the time that the first person enters the park, and let \(M\) be the time that the second person enters the park. What is \(\text{Cov}\!\left[ L, M \right]\)?

By Equation 15.1, it is \[ \text{Cov}\!\left[ L, M \right] = \text{E}\!\left[ LM \right] - \text{E}\!\left[ L \right] \text{E}\!\left[ M \right]. \] We showed in Example 24.3 that \(\text{E}\!\left[ LM \right]\) is not equal to \(\text{E}\!\left[ L \right] \text{E}\!\left[ M \right]\). (In fact, the covariance is precisely the difference between these two quantities.) Using the values derived previously, \[ \begin{align} \text{Cov}\!\left[ L, M \right] &= \frac{1}{\lambda_1\lambda_2} - \left( \frac{1}{\lambda_1 + \lambda_2} \right) \left( \frac{1}{\lambda_1} + \frac{1}{\lambda_2} - \frac{1}{\lambda_1 + \lambda_2} \right) \\ &= \frac{1}{(\lambda_1 + \lambda_2)^2}. \end{align} \] The covariance is positive, which makes sense because if the first person to enter had to wait a long time (i.e., \(L\) is large), then the second person must have waited even longer (i.e., \(M\) is also large).

25.1 Properties of Covariance

Since covariance is defined in terms of expectation, and the properties of expectation are the same for discrete and continuous random variables, all of the properties of covariance from Chapter 15 (and their proofs) carry over. Therefore, in this chapter, we only restate these properties.

Proposition 15.1 (Properties of covariance)

Let \(X, Y, Z\) be random variables and \(a\) be a constant. Then, the following are true.

  1. (Symmetry) \(\text{Cov}\!\left[ X, Y \right] = \text{Cov}\!\left[ Y, X \right]\).
  2. (Constants cannot covary) \(\text{Cov}\!\left[ a, Y \right] = 0\).
  3. (Multiplying by a constant) \(\text{Cov}\!\left[ a X, Y \right] = a \text{Cov}\!\left[ X, Y \right] = \text{Cov}\!\left[ X, a Y \right]\).
  4. (Distributive property) \(\text{Cov}\!\left[ X+Y, Z \right] = \text{Cov}\!\left[ X, Z \right] + \text{Cov}\!\left[ Y, Z \right]\) and \(\text{Cov}\!\left[ X, Y+Z \right] = \text{Cov}\!\left[ X, Y \right] + \text{Cov}\!\left[ X, Z \right]\).
Proposition 15.2 (Relationship between variance and covariance)

Let \(X\) be a random variable. Then, \[ \text{Var}\!\left[ X \right] = \text{Cov}\!\left[ X, X \right]. \]

These properties together allow us to calculate variances.

Example 25.3 (Portfolio Theory and Asset Diversification) Portfolio theory is a field of economics and finance focused on creating an optimal investment portfolio. In portfolio theory, the returns of assets, such as stocks and bonds, are modeled as random variables \(X_i\). If we invest a fraction \(w_i\) of our portfolio in asset \(i\), then the return of our portfolio is \[ R = \sum_{i=1}^n w_i X_i. \]

Portfolio theory is concerned with determining the optimal fractions \(w_i\). Clearly, we prefer a portfolio with a higher expected return \(\text{E}\!\left[ R \right]\). But if two portfolios have the same expected return, we prefer the one with less risk \(\text{Var}\!\left[ R \right]\).

To be concrete, suppose that the return of each asset \(X_i\) has mean \(\text{E}\!\left[ X_i \right] = 5\%\) and \(\text{Var}\!\left[ X_i \right] = 144\). If we invest the entire portfolio in one asset, say \(X_1\), then these values also represent the expected return and risk, respectively.

If there are three independent assets \(X_1, X_2, X_3\), and we invest equally in each asset, \[ w_1 = w_2 = w_3 = \frac{1}{3}, \]

then the expected return of our portfolio is still \[ \text{E}\!\left[ R \right] = \frac{1}{3} \text{E}\!\left[ X_1 \right] + \frac{1}{3} \text{E}\!\left[ X_2 \right] + \frac{1}{3} \text{E}\!\left[ X_3 \right] = 5\%, \] but the risk is now only \[ \begin{aligned} \text{Var}\!\left[ R \right] &= \text{Cov}\!\left[ R, R \right] \\ &= \text{Cov}\!\left[ \frac{1}{3} X_1 + \frac{1}{3} X_2 + \frac{1}{3} X_3, \frac{1}{3} X_1 + \frac{1}{3} X_2 + \frac{1}{3} X_3 \right] \\ &= \frac{1}{9} \text{Var}\!\left[ X_1 \right] + \frac{1}{9} \text{Var}\!\left[ X_2 \right] + \frac{1}{9} \text{Var}\!\left[ X_3 \right] + \text{all other terms $0$} \\ &= \frac{3 \cdot 144}{9} \\ &= 48. \end{aligned} \]

We can decrease the risk smaller by dividing our portfolio among more and more independent assets. This strategy is called diversification.

There is just one catch: it is difficult to find assets that are truly independent. Suppose that the three stocks \(X_1, X_2, X_3\) are not independent but have covariance \(\text{Cov}\!\left[ X_i, X_j \right] = 60\). The expected return is still \(5\%\) by linearity of expectation, but the risk is now \[ \begin{aligned} \text{Var}\!\left[ R \right] &= \text{Cov}\!\left[ R, R \right] \\ &= \text{Cov}\!\left[ \frac{1}{3} X_1 + \frac{1}{3} X_2 + \frac{1}{3} X_3, \frac{1}{3} X_1 + \frac{1}{3} X_2 + \frac{1}{3} X_3 \right] \\ &= 3 \cdot \frac{1}{9} \text{Var}\!\left[ X_1 \right] + 6 \cdot \frac{1}{9} \text{Cov}\!\left[ X_1, X_2 \right] \\ &= \frac{3 \cdot 144 + 6 \cdot 60}{9} \\ &= 88. \end{aligned} \]

There is still a benefit to diversification, but it is diluted by the positive covariance between the assets.

25.2 Correlation

One consequence of Proposition 15.1 is that covariance is sensitive to the choice of units. For example, in Example 25.2, suppose \(\text{E}\!\left[ X \right] = \text{E}\!\left[ Y \right] = 3\) minutes. Then, \[ \text{Cov}\!\left[ L, M \right] = \frac{1}{(\frac{1}{3} + \frac{1}{3})^2} = 2.25\ \text{minutes}^2. \]

However, if \(L\) and \(M\) had instead been measured in seconds, then the covariance would be \[ \text{Cov}\!\left[ 60 L , 60 M \right] = 60^2 \text{Cov}\!\left[ X, Y \right] = 8100\ \text{seconds}^2. \] The magnitude of the covariance can vary wildly depending on the choice of units!

Since changing the units does not meaningfully affect the relationship between two random variables, we might prefer a summary that is not sensitive to the choice of units. The next definition provides a summary of the relationship between two random variables that is invariant to the choice of units; in fact, it is unitless.

Definition 25.1 (Correlation coefficient) Let \(X\) and \(Y\) be random variables. The correlation coefficient of \(X\) and \(Y\) is \[ \text{Corr}[ X, Y ] = \frac{\text{Cov}\!\left[ X, Y \right]}{\sqrt{\text{Var}\!\left[ X \right] \text{Var}\!\left[ Y \right]}}. \tag{25.1}\]

Notice that the correlation coefficient has the same sign as the covariance. So we can still interpret a positive correlation coefficient as indicating that \(X\) and \(Y\) tend to move in the same direction and a negative correlation coefficient as indicating that \(X\) and \(Y\) tend to move in opposite directions.

The next result formally establishes that the correlation coefficient does not change when we change units.

Proposition 25.1 (Invariance of the correlation coefficient) Let \(X\) and \(Y\) be random variables, and let \(X' = aX + b\) and \(Y' = cX + d\) for constants \(a, c > 0\) and \(b, d \in \mathbb{R}\). Then, \[ \text{Corr}[ X', Y' ] = \text{Corr}[ X, Y ]. \]

We evaluate \(\text{Corr}[ X', Y' ]\) by first calculating the numerator, \(\text{Cov}\!\left[ X', Y' \right]\), using properties of covariance (Proposition 15.1):

\[ \begin{align} \text{Cov}\!\left[ X', Y' \right] &= \text{Cov}\!\left[ aX + b, cY + d \right] \\ &= ac\text{Cov}\!\left[ X, Y \right]. \end{align} \]

Next, we calculate \(\text{Var}\!\left[ X' \right]\) and \(\text{Var}\!\left[ Y' \right]\) using Proposition 15.2 and properties of covariance:

\[ \begin{align} \text{Var}\!\left[ X' \right] &= \text{Cov}\!\left[ X', X' \right] \\ &= \text{Cov}\!\left[ aX + b, aX + b \right] \\ &= a^2 \text{Var}\!\left[ X \right] \\ \text{Var}\!\left[ Y' \right] &= \text{Cov}\!\left[ Y', Y' \right] \\ &= \text{Cov}\!\left[ cY + d, cY + d \right] \\ &= c^2 \text{Var}\!\left[ Y \right]. \end{align} \]

Therefore, the correlation is \[ \begin{align} \text{Corr}[ X', Y' ] &= \frac{\text{Cov}\!\left[ X', Y' \right]}{\text{Var}\!\left[ X' \right] \text{Var}\!\left[ Y' \right]} \\ &= \frac{ac \text{Cov}\!\left[ X, Y \right]}{\sqrt{a^2 \text{Var}\!\left[ X \right] \cdot c^2 \text{Var}\!\left[ Y \right]}} \\ &= \frac{ac}{ac} \frac{\text{Cov}\!\left[ X, Y \right]}{\sqrt{\text{Var}\!\left[ X \right] \cdot \text{Var}\!\left[ Y \right]}} \\ &= \text{Corr}[ X, Y ]. \end{align} \]

What values can the correlation coefficient take? It turns out that the correlation coefficient is always between \(-1\) and \(1\). This fact is a consequence of the next theorem.

Theorem 25.1 (Cauchy-Schwarz Inequality) Let \(X\) and \(Y\) be random variables. Then,

\[|\text{Cov}\!\left[ X, Y \right]| \leq \sqrt{\text{Var}\!\left[ X \right] \text{Var}\!\left[ Y \right]}. \tag{25.2}\]

Proof

First, we prove Equation 25.2 assuming \(\text{Var}\!\left[ X \right] = \text{Var}\!\left[ Y \right] = 1\). In this case, we want to show that \(|\text{Cov}\!\left[ X, Y \right]| \leq 1\).

We know that the variance of any random variable is non-negative. Consider the variance of \(X - Y\): \[ \begin{align} \text{Var}\!\left[ X - Y \right] &= \text{Cov}\!\left[ X - Y, X - Y \right] \\ &= \text{Cov}\!\left[ X, X \right] - \text{Cov}\!\left[ X, Y \right] - \text{Cov}\!\left[ Y, X \right] + \text{Cov}\!\left[ Y, Y \right] \\ &= \underbrace{\text{Var}\!\left[ X \right]}_{1} - 2\text{Cov}\!\left[ X, Y \right] + \underbrace{\text{Var}\!\left[ Y \right]}_{1} \\ &= 2 - 2\text{Cov}\!\left[ X, Y \right] \geq 0. \\ \end{align} \] Rearranging this inequality, we obtain \[ \text{Cov}\!\left[ X, Y \right] \leq 1. \] This is one half of the inequality. To show the other half, we apply the same argument to the variance of \(X + Y\): \[ \begin{align} \text{Var}\!\left[ X + Y \right] &= 2 + 2\text{Cov}\!\left[ X, Y \right] \geq 0 \\ \end{align} \]

and obtain \[ \text{Cov}\!\left[ X, Y \right] \geq -1. \] Putting the two halves together, we obtain \[ -1 \leq \text{Cov}\!\left[ X, Y \right] \leq 1, \tag{25.3}\] which establishes Equation 25.2 for the case where \(\text{Var}\!\left[ X \right] = \text{Var}\!\left[ Y \right] = 1\).

What if \(\sigma_X^2 = \text{Var}\!\left[ X \right]\) or \(\sigma_Y^2 = \text{Var}\!\left[ Y \right]\) is not \(1\)? We can scale \(X\) and \(Y\) to random variables with variance \(1\), \(X' = \frac{X}{\sigma_X}\) and \(Y' = \frac{Y}{\sigma_Y}\). Now, we can apply Equation 25.3 to \(X'\) and \(Y'\) to obtain \[ -1 \leq \text{Cov}\!\left[ \frac{X}{\sigma_X}, \frac{Y}{\sigma_Y} \right] \leq 1, \] or equivalently, \[ -\sigma_X \sigma_Y \leq \text{Cov}\!\left[ X, Y \right] \leq \underbrace{\sigma_X \sigma_Y}_{\sqrt{\text{Var}\!\left[ X \right]\text{Var}\!\left[ Y \right]}}, \] which establishes Equation 25.2 in general.

You may be familiar with a Cauchy-Schwarz inequality from linear algebra, which says that for any two vectors \(\vec v, \vec w \in \mathbb{R}^n\), \[ |\vec v \cdot \vec w| \leq ||\vec v|| \ ||\vec w||. \] In fact, Theorem 25.1 is really the same theorem. To see why, consider the following analogy between probability and linear algebra:

  • Random variables \(X\) and \(Y\) are like vectors \(\vec v\) and \(\vec w\).
  • The covariance between two random variables, \(\text{Cov}\!\left[ X, Y \right]\), is like the inner product between two vectors \(\vec v \cdot \vec w\).
  • The covariance between a random variable and itself is the variance: \(\text{Cov}\!\left[ X, X \right] = \text{Var}\!\left[ X \right]\). Similarly, the inner product between a vector and itself is the squared magnitude: \(\vec v \cdot \vec v = || \vec v ||^2\).

Each property of covariance (Proposition 15.1) corresponds to an analogous property of the inner product. Therefore, any theorem for the inner product will have an analogous theorem for covariance.

Theorem 25.1 immediately implies that the correlation coefficient is between \(-1\) and \(1\). To see why, divide both sides of Equation 25.2 by \(\sqrt{\text{Var}\!\left[ X \right] \text{Var}\!\left[ Y \right]}\) to obtain \[ \left| \frac{\text{Cov}\!\left[ X, Y \right]}{\sqrt{\text{Var}\!\left[ X \right]\text{Var}\!\left[ Y \right]}} \right| \leq 1. \] The quantity inside the absolute value is precisely \(\text{Corr}[ X, Y ]\). This fact is summarized in the following corollary.

Corollary 25.1 (Correlation cannot exceed 1) For any random variables \(X\) and \(Y\), \[ -1 \leq \text{Corr}[ X, Y ] \leq 1. \]