14 Expectations Involving Multiple Random Variables
$$
$$
In Chapter 13, we saw that the relationship between two random variables can be completely described by the joint PMF. However, the joint PMF is usually more information than we need. Can we summarize the relationship between two random variables with a single number? As we will see in this chapter and Chapter 15, \(\text{E}\!\left[ XY \right]\) contains useful information about the relationship between \(X\) and \(Y\). In order to calculate \(\text{E}\!\left[ XY \right]\), we need a general way to evaluate expectations of the form \(\text{E}\!\left[ g(X, Y) \right]\), discussed in the next section.
14.1 2D LotUS
To calculate expectations of the form \(\text{E}\!\left[ g(X, Y) \right]\), we need the following generalization of LotUS (Theorem 11.1) to two variables.
Similar to Theorem 11.1, we calculate the expectation of \(g(X, Y)\) by weighting the possible values of \(g(x, y)\) by the corresponding probabilities, which are now supplied by the joint PMF (because there are two random variables).
14.2 Expectation of sums
Most traits are polygenic—that is, determined by multiple genes. If \(X\) and \(Y\) represent the numbers of dominant alleles of two genes, and each dominant allele increases the risk of some trait, then \(X + Y\) summarizes a child’s overall risk of the trait. We might want to know their expected risk, \(\text{E}\!\left[ X + Y \right]\). This expectation can be evaluated using 2D LotUS.
Notice that \(\text{E}\!\left[ X + Y \right]\) agrees with \(\text{E}\!\left[ X \right] + \text{E}\!\left[ Y \right] = 2\), no matter what the relationship between \(X\) and \(Y\) is. This is not a coincidence, as the next result shows.
Theorem 14.2 is more remarkable than it first appears. It says that \(\text{E}\!\left[ X + Y \right]\), which in principle could depend on the joint distribution of \(X\) and \(Y\), can be calculated using only the distributions of \(X\) and \(Y\) individually. That is, no matter how \(X\) and \(Y\) are related to each other, \(\text{E}\!\left[ X + Y \right]\) will have the same value.
Linearity of expectation is an incredibly powerful tool. If we can break down a complicated random variable into a sum of simpler random variables, then we can use linearity of expectation to calculate its expectation, even if we do not know its PMF.
Finally, we use linearity of expectation to provide simple derivations of the binomial and hypergeometric expectations. Compare this next derivation with Proposition 9.2.
Not only does linearity simplify the proof, but it also provides insight into why the formula is true.
The same strategy works for the hypergeometric distribution. In fact, the hypergeometric distribution has the same expectation as the binomial, precisely because of linearity of expectation, which holds regardless of the relationship between the random variables.
In Example 14.4, suppose the draws had been made with replacement. Then the indicator random variables would be independent, and \(Y\) would be \(\text{Binomial}(n, p=\frac{M}{N})\). But the expectation would still be \[ \text{E}\!\left[ Y \right] = np = n\frac{M}{N}. \] This reinforces the fact that linearity of expectation does not depend on the relationship between the random variables we are adding.
14.3 Expectation of products
We can also use 2D LotUS to evaluate \(\text{E}\!\left[ XY \right]\). Unlike \(X + Y\), the random variable \(XY\) does not have a simple interpretation. However, as we will see in Chapter 15, its expectation, \(\text{E}\!\left[ XY \right]\), captures useful information about the relationship between \(X\) and \(Y\).
Even though \(X\) and \(Y\) are \(\textrm{Binomial}(n= 2, p= 1/2)\) in all three cases, the value of \(\text{E}\!\left[ XY \right]\) is different in each case. This is because \(\text{E}\!\left[ XY \right]\) captures information about the relationship between the two variables, not just \(X\) and \(Y\) individually.
In particular, notice that \(\text{E}\!\left[ XY \right] \neq \text{E}\!\left[ X \right] \text{E}\!\left[ Y \right]\) in general. However, in Example 14.5, we saw that they are equal when \(X\) and \(Y\) are independent. This is no accident, as the next theorem shows.
However, unlike linearity of expectation, Theorem 14.3 only holds when the random variables are independent. In general, to evaluate \(\text{E}\!\left[ XY \right]\), we need to use 2D LotUS.
14.4 Exercises
Exercise 14.1 (Using 2D LotUS) Using the joint PMF from Exercise 13.1,
- Compute \(\text{E}\!\left[ X+Y \right]\) using 2D LotUS. Is it equal to \(\text{E}\!\left[ X \right] + \text{E}\!\left[ Y \right]\)? Why or why not?
- Compute \(\text{E}\!\left[ XY \right]\) using 2D LotUS. Is it equal to \(\text{E}\!\left[ X \right] \text{E}\!\left[ Y \right]\)? Why or why not?
Exercise 14.2 (Expected number of Secret Santa matches, revisited) In Exercise 9.1, you showed that the expected number of friends who draw their own name in Secret Santa is \(1\), when there are \(n = 4\) friends. You calculated this using the PMF of the number of friends who draw their own name, which you derived in Exercise 8.1.
How many friends are expected to draw their own name if there are \(n\) friends? There is no simple expression in general for the PMF of \(X\), the number of friends who draw their own name. Nevertheless, derive \(\text{E}\!\left[ X \right]\) using linearity of expectation.
Exercise 14.3 (Expected number of birthday matches) In Example 2.3, we showed that the probability that at least two people in a room of \(n\) people share a birthday is surprisingly high, even for a moderately sized room, such as \(n = 30\).
- Calculate the expected number of days with a shared birthday. (Your expression should be in terms of \(n\), but try plugging in specific values of \(n\) to see if your expected value makes sense in light of Example 2.3.)
- Calculate the expected number of pairs of people who share a birthday. (For example, if Aldo, Bri, and Chen all have a birthday on April 20, then that would be three pairs of people who share a birthday: Aldo and Bri, Aldo and Chen, and Bri and Chen.)
Exercise 14.4 (Expectation of the negative binomial) In Proposition 12.2, we showed directly that the expectation of a \(\text{NegativeBinomial}(r, p)\) random variable \(X\) is \(\text{E}\!\left[ X \right] = \frac{r}{p}\).
Express \(X\) as a sum of simpler random variables, and provide a simpler derivation of this result using linearity of expectation. (Hint: You may use the result from Example 9.7.)
Exercise 14.5 (Another proof of inclusion-exclusion) Let \(A_1, A_2, \dots, A_n\) be events. Let \(I_{\bigcup_i A_i}\) be the indicator that at least one of the events occurs, and let \(I_k\) be the indicator of the event \(A_k\).
- Express \(I_{\bigcup_i A_i}\) in terms of \(I_1, \dots, I_n\).
- Use linearity of expectation to calculate \(\text{E}\!\left[ I_{\bigcup_i A_i} \right]\) and establish the inclusion-exclusion formula Equation 4.6.
Exercise 14.6 (Happy Meals) McDonald’s decides to give a Pokemon toy with each Happy Meal purchased. Each time you purchase a Happy Meal, you are equally likely to get any one of the 6 types of Pokemon toys. What is the expected number of Happy Meals you need to purchase until you “catch them all”?
Hint: Express your random variable as a sum of geometric random variables.
Exercise 14.7 (Cars in traffic) Suppose that \(n\) cars are traveling, each at a different constant speed, down a one-lane road. Assuming that the cars are equally likely to be in any order, what is the expected number of cars that will eventually be stuck behind a slower car?
Exercise 14.8 (Expected number of hash collisions) In Exercise 2.8, you calculated the probability of a collision in a hash table. In this exercise, we will consider the expected number of collisions in a hash table with \(k\) records and \(n\) addresses, assuming \(k \leq n\).
- Calculate the expected number of “empty” addresses (i.e., with no addresses stored).
- Calculate the expected number of addresses with collisions.
Exercise 14.9 (Quicksort) Sorting is a fundamental problem in computer science. Here is one recipe for sorting a list, such as \[ [3, 8, 1, 4, 9, 6], \] called quicksort.
- Pick a random number from the list to serve as a “pivot”. Suppose we pick the number \(4\).
- Organize the list into three parts: a list of numbers less than pivot, the pivot itself, and a list of numbers greater than the pivot. For example, if the pivot is \(4\), then the above list becomes \[ [3, 1], \underline{4}, [8, 9, 6]. \]
- Now, for each of the lists above, repeat steps 1 and 2. If a list only has one number or is empty, then it is sorted. Applying this to the above list, suppose we choose \(3\) as the pivot for the left list and \(8\) as the pivot for the right list. Then, we end up with the following list, which is sorted. \[ [1], \underline{3}, [], 4, [6], \underline{8}, [9] \]
How long does it take for quicksort to sort a list of \(n\) numbers? One way to measure this is to count the number of comparisons that are made between two numbers.
- Let \(X\) be the number of comparisons made by quicksort (when a pivot is chosen randomly). Define indicator random variables \(I_{jk}\), where \(j < k\) correspond to the \(j\)th smallest and \(k\)th smallest numbers, \(x_{(j)}\) and \(x_{(k)}\), such that \[ X = \sum_{j=1}^{n-1} \sum_{k=j+1}^n I_{jk}. \]
- Calculate \(\text{E}\!\left[ X \right]\). (Hint: In order for a comparison to be made between two numbers, one of them must be chosen as a pivot when they are in the same list. If any number between them is chosen as a pivot first, then they will end up in different lists and never be compared.)
- Come up with a simple approximation for \(\text{E}\!\left[ X \right]\) by approximating the sums by integrals.
Exercise 14.10 (Ratio of die rolls) Suppose you roll a fair die twice. Let \(X\) be the outcome of the first roll and \(Y\) be the outcome of the second roll.
- Compute \(\text{E}\!\left[ \displaystyle \frac{X}{Y} \right]\). Is this equal to \(\displaystyle \frac{\text{E}\!\left[ X \right]}{\text{E}\!\left[ Y \right]}\)?
- Explain how you could use Theorem 14.3 here to obtain the correct answer.