Lesson 30 Properties of Covariance
Optional Video
This is an old video of mine. I’m including it in case it is helpful, but you do not need to watch it.
Theory
Calculating variance and covariance, even from the shortcut formulas (28.2) and (29.2), is tedious. Just as linearity simplified the calculation of expected values, the properties we learn in this lesson will simplify the calculation of variances and covariances.
Theorem 30.1 (Properties of Covariance) Let X,Y,ZX,Y,Z be random variables, and let cc be a constant. Then:
- Covariance-Variance Relationship: Var[X]=Cov[X,X]Var[X]=Cov[X,X] (This was also Theorem 29.1.)
Pulling Out Constants:
Cov[cX,Y]=c⋅Cov[X,Y]Cov[cX,Y]=c⋅Cov[X,Y]
Cov[X,cY]=c⋅Cov[X,Y]Cov[X,cY]=c⋅Cov[X,Y]
Distributive Property:
Cov[X+Y,Z]=Cov[X,Z]+Cov[Y,Z]Cov[X+Y,Z]=Cov[X,Z]+Cov[Y,Z]
Cov[X,Y+Z]=Cov[X,Y]+Cov[X,Z]Cov[X,Y+Z]=Cov[X,Y]+Cov[X,Z]
- Symmetry: Cov[X,Y]=Cov[Y,X]Cov[X,Y]=Cov[Y,X]
- Constants cannot covary: Cov[X,c]=0Cov[X,c]=0.
These properties follow immediately from the definition of covariance (29.1), so we omit their proofs.
Example 30.1 How does the variance change, when we multiply the random variable by a constant aa? We answer this question using properties of covariance: Var[aX]=Cov[aX,aX](Covariance-Variance Relationship)=a⋅a⋅Cov[X,X](Pulling Out Constants)=a2Var[X](Covariance-Variance Relationship)Var[aX]=Cov[aX,aX](Covariance-Variance Relationship)=a⋅a⋅Cov[X,X](Pulling Out Constants)=a2Var[X](Covariance-Variance Relationship)
It makes sense that the variance should scale by a2a2, since the variance is in squared units.Example 30.2 Let X,Y,Z,WX,Y,Z,W be random variables. Then, by applying the distributive property twice: Cov[X+Y,Z+W]=Cov[X,Z+W]+Cov[Y,Z+W]=Cov[X,Z]+Cov[X,W]+Cov[Y,Z]+Cov[Y,W]Cov[X+Y,Z+W]=Cov[X,Z+W]+Cov[Y,Z+W]=Cov[X,Z]+Cov[X,W]+Cov[Y,Z]+Cov[Y,W]
The result should remind you of FOILing from your high school algebra class, i.e., (x+y)(z+w)=xz+xw+yz+yw.(x+y)(z+w)=xz+xw+yz+yw. That is because multiplication also has a “distributive property”, just like covariance.Here is a cute application of the properties of covariance that emphasizes the point that two variables can have zero covariance without being independent.
Example 30.3 (Covariance between the Sum and the Difference) Two fair six-sided dice are rolled. Let XX be the number on the first die. Let YY be the number on the second die.
If S=X+YS=X+Y is their sum and D=X−YD=X−Y is their difference, what is Cov[S,D]Cov[S,D]?
Cov[S,D]=Cov[X+Y,X−Y]=Cov[X,X]+Cov[X,−Y]+Cov[Y,X]+Cov[Y,−Y]=Var[X] −Cov[X,Y] +Cov[X,Y]−Var[Y]=Var[X]−Var[Y]=0.Cov[S,D]=Cov[X+Y,X−Y]=Cov[X,X]+Cov[X,−Y]+Cov[Y,X]+Cov[Y,−Y]=Var[X] −Cov[X,Y] +Cov[X,Y]−Var[Y]=Var[X]−Var[Y]=0.
In the second-to-last line, we used the fact that XX and YY both represent the outcome when
a fair, six-sided die is rolled. So they must have the same distribution and the same variance.
You could calculate Var[X]Var[X] and Var[Y]Var[Y] if you’d like (they turn out to be about 2.917),
but it is not necessary in this example because we know they will cancel.
Here is yet another illustration of the power of these properties, to derive the formula for the variance of the binomial distribution. Compare the simplicity of this derivation with Example 28.3.
Example 30.4 (Variance of the Binomial Distribution) In Example 26.3, we argued that a Binomial(n,N1,N0)Binomial(n,N1,N0) random variable XX could be broken down as the sum of simpler random variables: X=Y1+Y2+…+Yn,X=Y1+Y2+…+Yn, where YiYi represents the outcome of the iith draw from the box. Since the draws are made with replacement, the YiYis are independent.
The distribution of each YiYi is y01f(y)N0NN1N.y01f(y)N0NN1N. It is not hard to calculate that E[Yi]=N1NE[Yi]=N1N and Var[Yi]=E[Y2i]−E[Yi]2=(02⋅N0N+12⋅N1N)−(N1N)2=N1NN0N.Var[Yi]=E[Y2i]−E[Yi]2=(02⋅N0N+12⋅N1N)−(N1N)2=N1NN0N.
Now, we will use properties of covariance to express Var[X]Var[X] in terms of Var[Yi]Var[Yi], which we calculated above: Var[X]=Cov[X,X]=Cov[Y1+Y2+…+Yn,Y1+Y2+…Yn]=Cov[Y1,Y1]+Cov[Y1,Y2]+…+Cov[Yn,Yn]Var[X]=Cov[X,X]=Cov[Y1+Y2+…+Yn,Y1+Y2+…Yn]=Cov[Y1,Y1]+Cov[Y1,Y2]+…+Cov[Yn,Yn] Because the YiYis are independent, all covariances of the form Cov[Yi,Yj]Cov[Yi,Yj] for i≠ji≠j are zero. That leaves just terms of the form Cov[Yi,Yi]Cov[Yi,Yi], which is equivalent to Var[Yi]Var[Yi] by Property 1: =Cov[Y1,Y1]+Cov[Y2,Y2]+…+Cov[Yn,Yn]=Var[Y1]+Var[Y2]+…+Var[Yn]=N1NN0N+N1NN0N+…+N1NN0N=nN1NN0N.=Cov[Y1,Y1]+Cov[Y2,Y2]+…+Cov[Yn,Yn]=Var[Y1]+Var[Y2]+…+Var[Yn]=N1NN0N+N1NN0N+…+N1NN0N=nN1NN0N.
This derivation gives insight into why the variance of a binomial distribution is nN1NN0NnN1NN0N. The variance of each draw is N1NN0NN1NN0N, and the variance goes up by nn for the nn independent draws.It is instructive to compare this derivation with the one for the hypergeometric distribution.
Example 30.5 (Variance of the Hypergeometric Distribution) In Example 26.3, we saw that a Hypergeometric(n,N1,N0)Hypergeometric(n,N1,N0) random variable XX can be broken down in exactly the same way as a binomial random variable: X=Y1+Y2+…+Yn,X=Y1+Y2+…+Yn, where Yi represents the outcome of the ith draw from the box. However, since the draws are made without replacement, the Yis are no longer independent. (Knowing that one draw was a 1 makes it less likely for another draw to be a 1.)
Each Yi still has expected value E[Yi]=N1N and variance Var[Yi]=N1NN0N. But now we also need to consider the covariance between two different draws, since the draws are not independent. You calculated the covariance between two draws without replacement in Lesson 29. It is Cov[Yi,Yj]=−1N−1N1NN0N.
Now, we will use properties of covariance to express Var[X] in terms of Var[Yi] and Cov[Yi,Yj], which we calculated above: Var[X]=Cov[X,X]=Cov[Y1+Y2+…+Yn,Y1+Y2+…Yn]=Cov[Y1,Y1]+Cov[Y1,Y2]+…+Cov[Yn,Yn] We have n terms of the form Cov[Yi,Yi]=Var[Yi] and n(n−1) terms of the form Cov[Yi,Yj] for i≠j. =nVar[Yi]+n(n−1)Cov[Yi,Yj]=nN1NN0N−n(n−1)1N−1N1NN0N=nN1NN0N(1−n−1N−1)
Notice that the variance of the hypergeometric is same as the variance of the corresponding binomial, except for the factor (1−n−1N−1). This factor is less than 1, so the variance of the hypergeometric is always less than that of the corresponding binomial. This makes sense because the draws are made without replacement in the hypergeometric distribution. Each time we draw a 1, we are less likely to draw a 1 again (and more likely to draw a 0). As a result, the number of 1s is more likely to be somewhere near the middle in the hypergeometric distribution, so the variance will be smaller. The figure below compares the p.m.f.s of the two distributions.
The factor (1−n−1N−1) is called the finite population correction. If our box had infinitely many tickets, then drawing without replacement is essentially the same as drawing with replacement. Thus, in the limit as N→∞, the hypergeometric distribution is the binomial distribution, and the finite population correction disappears. The finite population correction is only necessary because our boxes only contain a finite number of tickets.
Essential Practice
Let W1 be your net winnings on a single spin of a roulette wheel when you bet $1 on a single number. This bet pays 35 to 1, meaning that for each dollar you bet, you win $35 if the ball lands on that number and lose $1 otherwise. We calculated the p.m.f., expected value, and variance of W1 in Examples 22.1 and 28.1.
Let W1,W2,...,W10 be independent random variables with the same distribution as W1.
Consider the random variables X=10W1 and Y=W1+W2+...+W10. Which one represents…
- …your net winnings if you bet $1 on that number on each of 10 spins of the roulette wheel?
- …your net winnings if you bet $10 on that number on a single spin of the roulette wheel?
Now, calculate E[X], E[Y], Var[X], and Var[Y]. How do they compare?
Consider the following three scenarios:
- A fair coin is tossed 3 times. X is the number of heads and Y is the number of tails.
- A fair coin is tossed 4 times. X is the number of heads in the first 3 tosses, Y is the number of heads in the last 3 tosses.
- A fair coin is tossed 6 times. X is the number of heads in the first 3 tosses, Y is the number of heads in the last 3 tosses.
Use properties of covariance to calculate Cov[X,Y] for each of these three scenarios. You should not need to use LOTUS or the shortcut formula for covariance.
Hint 1: For the first scenario, write Y as a function of X.
Hint 2: For the second scenario, write X=A+B and Y=B+C, where A,B,C are independent random variables.
(This problem is challenging but rewarding.)
A poker hand (5 cards) is dealt off the top of a well-shuffled deck of 52 cards. Let X be the number of diamonds in the hand. Let Y be the number of hearts in the hand.
- Do you think Cov[X,Y] is positive, negative, or zero? Explain.
- Let Di(i=1,...,5) be a random variable that is 1 if the ith card is a diamond and 0 otherwise. What is E[Di]?
Let Hi(i=1,...,5) be a random variable that is 1 if the ith card is a heart and 0 otherwise. Of course, E[Hi] is the same as E[Di], since there are the same number of hearts as diamonds in a 52-card deck. What is Cov[Di,Hi]? What is Cov[Di,Hj], when i≠j? (Keep in mind that Di and Hi are indicator random variables that only take on the values 0 or 1.)
Hint: Make a table for the joint p.m.f. There are only 4 possible outcomes.
Use your answers to parts b and c (and the properties of covariance, of course) to calculate Cov[X,Y].
Additional Practice
Recall the coupon collector problem from Lesson 26:
McDonald’s decides to give a Pokemon toy with every Happy Meal. Each time you buy a Happy Meal, you are equally likely to get any one of the 6 types of Pokemon. Let X be the number of Happy Meals you have to buy until you “catch ’em all”.
In that lesson, you calculated E[X] using linearity of expectation. Now, use properties of covariance to calculate Var[X].
At Diablo Canyon nuclear plant, radioactive particles hit a Geiger counter according to a Poisson process with a rate of 3.5 particles per second. Let X be the number of particles detected in the first 2 seconds. Let Z be the number of particles detected in the first 3 seconds. Find Cov[X,Z].
Hint: Note that X and Z are not independent. However, you should be able to write Z=X+Y, where Y is a random variable that is independent of X.