15  Covariance

How do we measure the relationship between two random variables \(X\) and \(Y\)? The next definition offers one way.

Definition 15.1 (Covariance) Let \(X\) and \(Y\) be random variables. Then, the covariance of \(X\) and \(Y\) is \[ \text{Cov}[ X, Y ] = \text{E}[ (X - \text{E}[ X ])(Y - \text{E}[ Y ]) ]. \tag{15.1}\]

The sign of the covariance indicates the direction of the relationship between \(X\) and \(Y\).

The following video explains this intuition.

Equation 15.1 can be evaluated using 2D LotUS (Equation 14.1), but fortunately, there is a shortcut formula that is usually easier to apply.

Proposition 15.1 (Shortcut formula for covariance) An equivalent formula to Equation 15.1 is \[ \text{Cov}[ X, Y ] = \text{E}[ XY ] - \text{E}[ X ] \text{E}[ Y ]. \tag{15.2}\]

Let \(\mu_X = \text{E}[ X ]\) and \(\mu_Y = \text{E}[ Y ]\). Now, \[\begin{align*} \text{Cov}[ X, Y ] &= \text{E}[ (X - \mu_X)(Y - \mu_Y) ] \\ &= \text{E}[ XY ] - \text{E}[ X \mu_Y ] - \text{E}[ \mu_X Y ] + \text{E}[ \mu_X \mu_Y ] \\ &= \text{E}[ XY ] - \mu_X \mu_Y - \mu_X \mu_Y + \mu_X \mu_Y \\ &= \text{E}[ XY ] - \text{E}[ X ]\text{E}[ Y ]. \end{align*}\]

The shortcut formula is usually easier to apply than the original definition (Definition 15.1).

Example 15.1 (Covariance of Xavier and Yvette’s wins) In Example 14.5, for Xavier’s wins \(X\) and Yvette’s wins \(Y\), we calculated \(\text{E}[ XY ] \approx 4.11\) and \(\text{E}[ X ]\text{E}[ Y ] \approx 3.37\). We can use these values to calculate the covariance using the shortcut formula: \[ \text{Cov}[ X, Y ] = \text{E}[ XY ] - \text{E}[ X ]\text{E}[ Y ] \approx 4.11 - 3.37 = 0.74. \]

The covariance is positive, which makes sense because Xavier and Yvette share the first three bets. If one person wins, then the other wins also, so \(X\) and \(Y\) tend to move together.

15.1 Relation to Independence and Variance

If two random variables are independent, then \(X\) and \(Y\) do not move together, so their covariance should be zero.

Proposition 15.2 (Independent random variables have covariance zero) Let \(X\) and \(Y\) be independent random variables. Then, \[ \text{Cov}[ X, Y ] = 0. \]

The converse of this statement is not true. It is possible for two random variables to have zero covariance, but not be independent. We will see an example in Example 15.2.

Proof

This is easiest to see using the shortcut formula (Proposition 15.1).

\[ \text{Cov}[ X, Y ] = \text{E}[ XY ] - \text{E}[ X ] \text{E}[ Y ] = \text{E}[ X ] \text{E}[ Y ] - \text{E}[ X ] \text{E}[ Y ] = 0, \]

where we used Theorem 14.3 to write \(\text{E}[ XY ] = \text{E}[ X ]\text{E}[ Y ]\).

The covariance is also intimately related to the variance. In fact, the variance is simply the covariance of a random variable with itself.

Proposition 15.3 (Relationship between variance and covariance) Let \(X\) be a random variable. Then, \[ \text{Var}[ X ] = \text{Cov}[ X, X ]. \]

Proof

From the definition of covariance (Definition 15.1), \[ \text{Cov}[ X, X ] = \text{E}[ (X - \text{E}[ X ])(X - \text{E}[ X ]) ] = \text{E}[ (X - \text{E}[ X ])^2 ] = \text{Var}[ X ]. \]

15.2 Properties of Covariance

Now, we highlight some properties of covariance that will allow us to compute covariances without 2D LotUS.

Proposition 15.4 (Properties of covariance) Let \(X, Y, Z\) be random variances and \(a\) be a constant. Then, the following are true.

  1. (Symmetry) \(\text{Cov}[ X, Y ] = \text{Cov}[ Y, X ]\).
  2. (Constants cannot covary) \(\text{Cov}[ X, a ] = 0\).
  3. (Multiplying by a constant) \(\text{Cov}[ a X, Y ] = a \text{Cov}[ X, Y ] = \text{Cov}[ X, a Y ]\).
  4. (Distributive property) \(\text{Cov}[ X+Y, Z ] = \text{Cov}[ X, Z ] + \text{Cov}[ Y, Z ]\) and \(\text{Cov}[ X, Y+Z ] = \text{Cov}[ X, Y ] + \text{Cov}[ X, Z ]\).

Property 1 follows immediately from Equation 15.1.

To prove Property 2, note that \(\text{E}[ a ] = a\) because \(a\) is a constant, so \(a - \text{E}[ a ] = 0\). Therefore, \[ \text{Cov}[ X, a ] = \text{E}[ (X - \text{E}[ X ])\underbrace{(a - \text{E}[ a ])}_0 ] = 0. \]

To prove Property 3, we plug in \(aX\) for \(X\) in Equation 15.1 and use properties of expectation. \[ \begin{aligned} \text{Cov}[ aX, Y ] &= \text{E}[ (aX - \text{E}[ aX ])(Y - \text{E}[ Y ]) ] \\ &= \text{E}[ (aX - a\text{E}[ X ])(Y - \text{E}[ Y ]) ] \\ &= \text{E}[ a(X - \text{E}[ X ])(Y - \text{E}[ Y ]) ] \\ &= a\text{E}[ (X - \text{E}[ X ])(Y - \text{E}[ Y ]) ] \end{aligned} \]

To prove Property 4, we plug in the random variables into Equation 15.1 and use properties of expectation, such as linearity. \[ \begin{aligned} \text{Cov}[ X + Y, Z ] &= \text{E}[ (X + Y - \text{E}[ X + Y ])(Z - \text{E}[ Z ]) ] \\ &= \text{E}[ (X + Y - \text{E}[ X ] - \text{E}[ Y ])(Z - \text{E}[ Z ]) ] \\ &= \text{E}[ (X - \text{E}[ X ] + Y - \text{E}[ Y ])(Z - \text{E}[ Z ]) ] \\ &= \text{E}[ (X - \text{E}[ X ])(Z - \text{E}[ Z ]) + (Y - \text{E}[ Y ])(Z - \text{E}[ Z ]) ] \\ &= \text{E}[ (X - \text{E}[ X ])(Z - \text{E}[ Z ]) ] + \text{E}[ (Y - \text{E}[ Y ])(Z - \text{E}[ Z ]) ] \\ &= \text{Cov}[ X, Z ] + \text{Cov}[ Y, Z ] \end{aligned} \]

These properties can simplify the calculation of covariances substantially.

Example 15.2 (Sum and difference of die rolls) Suppose we roll two fair six-sided dice; let \(X\) be the outcome on the first die and \(Y\) be the outcome on the second die. Note that \(X\) an \(Y\) are independent, so \(\text{Cov}[ X, Y ] = 0\).

Let \(S = X+Y\) be the sum and \(D = X-Y\) be the difference. Then, the covariance of \(S\) and \(D\) is \[\begin{align*} \text{Cov}[ S, D ] &= \text{Cov}[ X+Y, X-Y ] \\ &= \text{Cov}[ X, X ] - \text{Cov}[ X, Y ] + \text{Cov}[ Y, X ] - \text{Cov}[ Y, Y ] \\ &= \text{Var}(X) - \text{Var}(Y) \\ &= 0. \end{align*}\]

We could have calculated the variance of a 6-sided die, but there is no need because \(X\) and \(Y\) have the same distribution and thus the same variance.

Although \(S\) and \(D\) have zero covariance, they are not independent. To see why, consider \(P(D = 0)\) and \(P(D = 0 | S = 12)\).

  • \(P(D = 0) = 6/36\), since a difference of \(0\) means that the two dice show the same number.
  • \(P(D = 0 | S = 12) = 1\), since a sum of \(12\) necessarily means that both dice showed a six, so \(D = 0\).

Since we have found a value of \(x\) and \(y\) for which \(P(D = x | S = y) \neq P(D = x)\), \(D\) and \(S\) cannot be independent. (Per Definition 13.2, independence requires that these probabilities be equal for all \(x\) and \(y\).)

This example illustrates that the converse to Proposition 15.2 is not true. Two random variables can have zero covariance without being independent.

One useful trick is to express a complicated random variable as a sum of simpler random variables and then use the properties.

Example 15.3 (Revisiting Example 15.1) In Example 15.1, we calculated the covariance between Xavier’s wins \(X\) and Yvette’s wins \(Y\). This was not too difficult because we had already calculated \(\text{E}[ XY ]\) in Example 14.5. But if we did not already know \(\text{E}[ XY ]\), this would be quite an involved calculation.

Here is a simpler way that uses properties of covariance. If we define \(I_1, I_2, \dots, I_5\) to be the indicator of a win on the \(k\) spins of the roulette wheel, then

  • \(X = I_1 + I_2 + I_3\)
  • \(Y = I_1 + I_2 + I_3 + I_4 + I_5\)

Now, we can use the distributive property of covariance to write

\[ \begin{align} \text{Cov}[ X, Y ] &= \text{Cov}[ I_1 + I_2 + I_3, I_1 + I_2 + I_3 + I_4 + I_5 ] \\ &= \text{Cov}[ I_1, I_1 ] + \text{Cov}[ I_1, I_2 ] + \dots + \text{Cov}[ I_3, I_5 ]. \end{align} \] There are many terms, but do not despair! Because the spins are independent, \[ \text{Cov}[ I_j, I_k ] = 0; \qquad j \neq k. \]

That leaves us with just the terms of the form \(\text{Cov}[ I_k, I_k ]\), which by Proposition 15.3 is \(\text{Var}[ I_k ]\). That is,

\[\begin{align} \text{Cov}[ X, Y ] &= \text{Var}[ I_1 ] + \text{Var}[ I_2 ] + \text{Var}[ I_3 ] \\ &= 3 \cdot \frac{18}{38} (1 - \frac{18}{38}) \\ &\approx 0.7479. \end{align}\]

Because variances can be written as covariances, we can use properties of covariance to derive properties of variances.

Example 15.4 (Variance of a linear transformation) Let \(X\) be a random variable and \(a, b\) be constants. Then, \[ \begin{aligned} \text{Var}[ aX + b ] &= \text{Cov}[ aX + b, aX + b ] \\ &= \text{Cov}[ aX, aX ] + \text{Cov}[ aX, b ] + \text{Cov}[ b, aX ] + \text{Cov}[ b, b ] \\ &= a^2 \text{Cov}[ X, X ] \\ &= a^2 \text{Var}[ X ]. \end{aligned} \]

Adding a constant does not affect the variance, while multiplying by a constant scales the variance by that constant squared.

We previously derived the variance of the binomial distribution in Example 11.7 by evaluating a messy sum. Here is a simpler approach using properties of covariance.

Example 15.5 (Binomial variance using covariance) Let \(X \sim \text{Binomial}(n,p)\). In Example 14.3, we wrote \[ X = I_1 + I_2 + \dots + I_n, \] where \(I_k\) was the indicator of heads on the \(k\)th toss, and we applied linearity of expectation to evaluate \(\text{E}[ X ]\).

We can apply properties of covariance to the same decomposition in order to evaluate \(\text{Var}[ X ]\).

\[\begin{align*} \text{Var}[ X ] &= \text{Cov}[ X, X ] \\ &= \text{Cov}[ I_1 + \cdots + I_n, I_1 + \cdots + I_n ] \\ &= \sum_k \underbrace{\text{Cov}[ I_k, I_k ]}_{\text{Var}[ I_k ]} + \sum_{j \neq k} \underbrace{\text{Cov}[ I_j, I_k ]}_0 \\ &= \sum_k p(1-p) \\ &= np(1-p). \end{align*}\]

In the third equality, we used independence of the coin tosses to conclude that \(\text{Cov}[ I_j, I_k ] = 0\) for \(j \neq k\).

What about the variance of the hypergeometric distribution? In Example 14.4, we saw that we could calculate the expected value using the same representation.

Example 15.6 (Hypergeometric variance using covariance) Let \(Y \sim \text{Hypergeometric}(M, N, n)\). In Example 14.4, we wrote \(Y\) as a sum of indicator random variables, \[ Y = I_1 + I_2 + \dots + I_n, \] where \(I_k\) is the indicator of that the \(k\)th draw is a white ball.

If we apply properties of covariance to evaluate \(\text{Var}[ X ]\), as in Example 15.5,

\[\begin{align*} \text{Var}[ X ] &= \text{Cov}[ X, X ] \\ &= \text{Cov}[ I_1 + \cdots + I_n, I_1 + \cdots + I_n ] \\ &= \sum_k \underbrace{\text{Cov}[ I_k, I_k ]}_{\text{Var}[ I_k ]} + \sum_{j \neq k} \text{Cov}[ I_j, I_k ], \end{align*} \] we encounter a snag. The draws are not independent, so we cannot conclude that \(\text{Cov}[ I_j, I_k ] = 0\) for \(j \neq k\). We need to calculate this covariance using Proposition 15.1. Recalling that these random variables are indicators,

\[ \begin{aligned} \text{Cov}[ I_j, I_k ] &= \text{E}[ I_jI_k ] - \text{E}[ I_j ]\text{E}[ I_k ] \\ &= P(I_j = 1, I_k = 1) - P(I_j = 1) P(I_k = 1) \\ &= \frac{M}{M + N} \frac{M - 1}{M + N - 1} - \frac{M}{M + N} \frac{M}{M + N} \\ &= -\frac{M}{M + N} \frac{N}{M + N} \frac{1}{M + N - 1} \end{aligned} \] Note that this covariance is negative. This makes sense because if you draw a white ball, that makes it less likely that you’ll draw a white ball again. Therefore, \(I_j\) and \(I_k\) tend to move in opposite directions.

Now, substituting this covariance into the complete expression, we have

\[\begin{align*} \text{Var}[ X ] &= \sum_k \left(\frac{M}{M + N} \frac{N}{M + N}\right) + \sum_{j \neq k} \left( -\frac{M}{M + N} \frac{N}{M + N} \frac{1}{M + N - 1} \right) \\ &= n \left(\frac{M}{M + N} \frac{N}{M + N}\right) - n (n - 1)\left(\frac{M}{M + N} \frac{N}{M + N} \frac{1}{M + N - 1} \right) \\ &= n \frac{M}{M + N} \frac{N}{M + N} \left(1 - \frac{n - 1}{M + N - 1} \right), \end{align*} \] which is the same as the formula we derived in Proposition 12.3, expressed in a slightly different form.