A random variable can take on many possible values. How do we reduce these values to a single number summary?
You may be familiar with the average (or mean) as a way of summarizing a list of numbers. For example, the list of six numbers \[
1, 1, 2, 3, 3, 10
\tag{9.1}\] can be summarized by their average: \[
\bar x = \frac{1 + 1 + 2 + 3 + 3 + 10}{6} = \frac{20}{6} \approx 3.33.
\tag{9.2}\]
There is another way to calculate this average. Instead of summing over the six numbers, we can sum over the four distinct values, \(1\), \(2\), \(3\), and \(10\), weighting each value by how often it appears: \[
\bar x = \frac{2}{6} \cdot 1 + \frac{1}{6} \cdot 2 + \frac{2}{6} \cdot 3 + \frac{1}{6} \cdot 10 = \frac{20}{6} \approx 3.33.
\tag{9.3}\]
The expected value is a summary of a random variable that is inspired by the weighted average (Equation 9.3), where we weight each possible outcome of a random variable by the probability of that outcome. In fact, the list of numbers in Equation 9.1 are the six faces of the die that Koopa rolls in the game Super Mario Party. The outcome of one of Koopa’s rolls is a random variable \(X\) with PMF
\(x\)
\(1\)
\(2\)
\(3\)
\(10\)
\(f(x)\)
\(\frac{2}{6}\)
\(\frac{1}{6}\)
\(\frac{2}{6}\)
\(\frac{1}{6}\)
so the expected value of \(X\) is \[
\text{expected roll} = \frac{2}{6} \cdot 1 + \frac{1}{6} \cdot 2 + \frac{2}{6} \cdot 3 + \frac{1}{6} \cdot 10 = \frac{20}{6} \approx 3.33.
\]
Notice that the expected value is not a possible value of \(X\)! In what sense is \(\frac{20}{6}\) “expected”? The expected value refers to a long-run average. If Koopa rolls the die repeatedly, generating many instances \(X_1, X_2, X_3, \dots\), then the average of the rolls will approach \(\frac{20}{6}\). That is, \[ \lim_{n\to\infty} \frac{X_1 + X_2 + \dots + X_n}{n} = \frac{20}{6}.\]
The code below simulates \(n = 1000\) rolls of Koopa’s die and calculates the average of the rolls. Run the code below and see how close it is to \(\frac{20}{6}\). What happens if you increase the number of rolls \(n\)?
This means that on average, Koopa will be able to move about \(3.33\) spaces per roll. Would he be better off with this die or a standard die with faces labeled \(1\) to \(6\)? One way to answer this question is to compare their expected values. A standard die would have expected value \[ \text{expected roll} = \frac{1}{6} \cdot 1 + \frac{1}{6} \cdot 2 + \frac{1}{6} \cdot 3 + \frac{1}{6} \cdot 4 + \frac{1}{6} \cdot 5 + \frac{1}{6} \cdot 6 = 3.5, \] so a standard die would allow him to move a bit farther per roll.
9.1 Definition and Examples
We are ready to define the expected value formally.
Definition 9.1 (Expected value of a discrete r.v.) Let \(X\) be a discrete random variable with PMF \(f_X(x)\). Then, the expected value (or EV) of \(X\) is \[
\text{E}\!\left[ X \right] = \sum_x x f_X(x) = \sum_x xP(X = x),
\tag{9.4}\]
where the sum is taken over all possible values \(x\).
Let’s calculate the expected values of some random variables!
Example 9.1 (Expected value of roulette bets) Recall the casino game roulette, where a ball has an equal probability of landing in a slot labeled 1 - 36, 0, and 00. Hence, each of the 38 numbers has a probability of \(\frac{1}{38}\) occurring.
Consider the following two types of bets:
You bet on 1 outcome out of 38. This is called a straight-up bet.
You bet on 18 outcomes out of 38. This is called the red/black bet.
For each $1 wagered, the straight-up bet pays $36 and the red/black bet pays $2.
Let \(S\) and \(R\) denote the random variables representing the profit of the straight-up bet and the red/black bet, respectively, for a $1 wager.
These two bets are very different; with the straight-up bet, you feel like you lose all the time. But if we calculate their expected values:
the two bets have the same expected value. In fact, all bets in roulette have the same expected value of \(-\frac{1}{19}\)! This is a reminder that the expected value only tells a part of the story.
Another way of thinking about the expected value of a random variable is: the EV is where the PMF of the random variable “balances.” In other words, the EV is the center of mass of the PMF.
Let us look at the PMFs of the random variables \(R\) and \(S\) from Example 9.1.
In both cases, we can visualize how the PMF of \(R\) and \(S\) are both balanced on the red fulcrums.
If we do the same thing with Koopa’s die roll, whether the PMF is balanced on the EV is not as obvious:
It does turn out that the PMF is indeed balanced on the fulcrum at the EV.
Before we present the result and the proof, we need to quantify the rotation force (or torque) each blue bar exerts against the fulcrum.
If a bar of weight (probability, in this case) \(p\) is \(d\) units left of the fulcrum, the torque applied is \(p \cdot d\); the torque is positive if the rotation force is counterclockwise. On the other hand, if the same bar is \(d\) units right of the fulcrum, the torque applied is \(p \cdot (-d)\).
Theorem 9.1 (Expected value is the center of mass) Let \(X\) be a random variable. Then, \(\text{E}\!\left[ X \right]\) is the center of mass of the PMF of \(X\).
If there is a fulcrum at \(\text{E}\!\left[ X \right]\), then the torque exerted by a bar of probability \(f_X(x_i)\) at position \(x_i\) would be \[
f_X(x_i) \cdot (\text{E}\!\left[ X \right] - x_i).
\] Hence, the total torque exerted at the fulcrum is \[\begin{align*}
\sum_x f_X(x) \cdot (\text{E}\!\left[ X \right] - x) &= \sum_x f_X(x) \cdot \text{E}\!\left[ X \right] - \sum_x f_X(x) \cdot x \\
&= \text{E}\!\left[ X \right] \sum_x f_X(x) - \text{E}\!\left[ X \right] \\
&= \text{E}\!\left[ X \right] \cdot 1 - \text{E}\!\left[ X \right] \\
&= 0.
\end{align*}\] Therefore, the total torque exerted at the fulcrum is 0, and so, the fulcrum is at the center of mass.
Example 9.2 (Phenylketonuria and expected value) In Example 8.8, we argued that \(X\), the number of children who are born with phenylketonuria (PKU) to a couple who are both carriers, was a \(\textrm{Binomial}(n= 5, p= .25)\) random variable. We derived its PMF to be:
\(x\)
\(0\)
\(1\)
\(2\)
\(3\)
\(4\)
\(5\)
\(f(x)\)
\(243/1024\)
\(405/1024\)
\(270/1024\)
\(90/1024\)
\(15/1024\)
\(1/1024\)
What is \(\text{E}\!\left[ X \right]\), the expected number of children who are born with PKU?
Notice that the expected value in Example 9.2 was simply the number of children (\(n=5\)) times the probability that each child has PKU (\(p=.25\)). This was no accident; the expected value of a binomial random variable will always be \(np\). We derive this useful formula in the next example.
Example 9.3 (Binomial expectation) Let \(X \sim \text{Binomial}(n,p)\). Using the combinatorial identity \(\displaystyle k \binom{n}{k} = n \binom{n-1}{k-1}\), we see that \[\begin{align*}
\text{E}\!\left[ X \right] &= \sum_{k=0}^n k \binom{n}{k} p^k (1-p)^{n-k} & \text{(definition of expectation)} \\
&= \sum_{k=1}^n k \binom{n}{k} p^k (1-p)^{n-k} & \text{($i=0$ term is $0$)} \\
&= np \sum_{k=1}^n \binom{n-1}{k-1} p^{k-1} (1-p)^{n-k} & \text{(pull out $np$)} \\
&= np \sum_{j=0}^{n-1} \underbrace{\binom{n-1}{j} p^j (1-p)^{n-1-j}}_{\text{Binomial}(n-1, p)\ \text{PMF}} & \text{($j = k - 1$)} \\
&= np,
\end{align*}\] where the last step follows because every PMF must sum to \(1\).
Therefore, whenever we encounter a \(\text{Binomial}(n, p)\) random variable, we know its expectation will be \(np\), so we do not need to calculate it from Definition 9.1, as we did in Example 9.2.
This derivation was quite cumbersome. In Example 14.3, we will present an alternative derivation of the binomial expectation that involves less algebra and offers more intuition.
9.2 Paradoxes
Expected value can yield surprising answers, especially when infinities are involved. This section presents two fascinating historical examples.
Example 9.4 (Pascal’s wager) Should we believe in God? Suppose \(p\) is the probability that God exists.
If we were to adopt the belief that God exists and live a lifestyle accordingly, we incur a finite amount of loss, associated with sacrificing certain pleasures and luxuries.
After dying, if God does exist, we would enjoy infinite time in heaven, which would equate to infinite gains. If God does not exist, then there is nothing after.
On the other hand, if we did not believe in God and God turned out to be real, we would suffer infinite losses in hell.
The quasi-EV calculation, if we were to believe in God, would be \[
\text{E}\!\left[ X \right] = -\text{finite sacrifices} + p \cdot \text{infinite gains} + (1-p) \cdot 0 > \infty
\] as long as \(p > 0\), if \(X\) represents our lifetime gains. Pascal argued that since no one can ever be certain of God’s existence, \(p \neq 0\), and so, we should all strive to believe in God.
Example 9.5 (St. Petersburg paradox) A casino offers a game of chance for a single player in which a fair coin is tossed at each stage. The initial stake begins at 2 dollars and is doubled every time tails appears. The first time heads appears, the game ends and the player wins whatever is the current stake.
Thus, the player wins 2 dollars if heads appears on the first toss, 4 dollars if tails appears on the first toss and heads on the second, 8 dollars if tails appears on the first two tosses and heads on the third, and so on.
Mathematically, the player wins \(2^{k+1}\) dollars, where \(k\) is the number of consecutive tails tosses.
What would be a fair price to pay the casino for entering the game?
Let us consider the expected payout at each stage. The probability of the first heads appearing on the first toss is 1/2, and the payout is $2. The probability of first heads appearing on the second toss is 1/4, and the payout is $4, and so on. So, if \(X\) represents the total payout, then \[\begin{align*}
\text{E}\!\left[ X \right] &= 2 \cdot \frac{1}{2} + 4 \cdot \frac{1}{4} + 8 \cdot \frac{1}{8} + \cdots \\
&= 1 + 1 + 1 + \cdots \\
&= \infty.
\end{align*}\] Thus, we should be willing to pay any price to enter this game since the expected payout is infinite!
However, most of us would balk at paying $50 to play such a game.
9.3 Geometric Distribution and Tail Sum Formula
In the St. Petersburg paradox (Example 9.5), a coin was tossed until heads appeared. If \(X\) represents the number of tosses, what is the PMF of \(X\)?
In order to calculate the PMF, we need to determine \[ f(k) = P(X = k) \] as a function of \(k\). Note that \(k = 1, 2, \dots\); there is no upper bound to what the random variable \(X\) can be!
The event \(\left\{ X = k \right\}\) means that the first \(k-1\) tosses are tails and the \(k\)th toss is heads. Because the tosses are independent, the probability of this is \[ P(X = k) = (1 - p)^{k-1} p, \] where \(p\) is the probability of heads.
This motivates the following definition.
Definition 9.2 (Geometric distribution) If \(X\) is random variable with the PMF \[
f_X(x) = (1-p)^{x-1} p, \qquad x = 1, 2, \dots,
\] for some \(0 < p \leq 1\), \(X\) is said to be a geometric random variable with parameter \(p\). We use the notation \(X \sim \text{Geometric}(p)\).
To apply the geometric distribution to problems other than coin tossing, it helps to make the analogy with coin tossing.
Example 9.6 (Rolling a two for the first time) With a fair die, what is the probability of getting our first two on the fifth roll?
In this case, we are concerned with the outcomes “two” and “not two”, which have probabilities \(1/6\) and \(5/6\), respectively. The outcome “two” corresponds to “heads”, and the other outcomes correspond to “tails.”
Thus, if \(X\) is the number of rolls until the first two, then \(X \sim \text{Geometric}(1/6)\). Hence, the desired probability is \[
P(X = 5) = \left( \frac{5}{6} \right)^4 \frac{1}{6} \approx 0.08038.
\]
What about the expected value of the geometric distribution? We could try to evaluate the sum \[ \text{E}\!\left[ X \right] = \sum_{k=1}^\infty k (1 - p)^{k-1} p, \] but this is not an easy sum to evaluate. The optional section below explains how.
Evaluating the sum for geometric expectation
First, we factor out \(p\): \[\text{E}\!\left[ X \right] = p \sum_{k=1}^\infty k(1 - p)^{k-1}.\]
To evaluate the summation, we first define \(x = 1 - p\) to obtain \[ \sum_{k=1}^\infty k(1 - p)^{k-1} = \sum_{k=1}^\infty k x^{k-1}. \]
Notice that each term is the derivative of \(x^k\). In other words, this is the term-by-term derivative of a power series:
Multiplying by \(p\), we obtain the expected value \[ \text{E}\!\left[ X \right] = p \sum_{k=1}^\infty k(1 - p)^{k-1} = p \frac{1}{p^2} = \frac{1}{p}. \]
Instead, we will calculate this expected value by using an alternative formula, which is derived in the result below.
Proposition 9.1 (Tail Sum Expectation) Let \(X\) be a random variable that is nonnegative-integer-valued. That is, it takes on the values \(0, 1, 2, ...\). Then
Because \(X\) is nonnegative-integer-valued, the expected value is \[ \text{E}\!\left[ X \right] = \sum_{k=0}^\infty k \cdot P(X = k). \] Another way to interpret each term in this summation, \(k \cdot P(X = k)\), is \[ \underbrace{P(X = k) + P(X = k) + \dots + P(X = k)}_{\text{$k$ times}}. \]
The definition of expected value says to first sum each row of the triangular array to obtain terms of the form \(k \cdot P(X = k)\), then sum these terms to obtain the expected value. But the entries of the array can be summed in any order (because they are non-negative). If we instead first sum each column of the array, then we obtain terms of the form \(P(X > k)\), which we can sum to obtain the expected value.
Proof 2: Algebraic Proof
We can also write this proof using summations. \[\begin{align*}
\sum_{x=0}^\infty P(X > x) &= \sum_{x=0}^\infty \sum_{k=x+1}^\infty P(X = k) \\
&= \sum_{k=1}^\infty \sum_{x = 1}^k P(X = k) \\
&= \sum_{k=1}^\infty k \cdot P(X = k) \\
&= \text{E}\!\left[ X \right].
\end{align*}\]
The key trick was switching the order of the two summations in the second line. This is equivalent to switching from summing over columns to summing over rows in the triangular array above.
Proposition 9.1 provides an easier way to calculate the expected value of a \(\text{Geometric}(p)\) random variable.
Example 9.7 Let \(X\) be a \(\text{Geometric}(p)\) random variable. Since \(X\) only takes on the values 1, 2, 3, …, we can apply Proposition 9.1. But we first need to determine \(P(X > k)\).
We can calculate \(P(X > k)\) by summing the geometric PMF, or by observing that \(\{ X > k\}\) means the first \(k\) trials were all failures, so \[ P(X > k) = (1 - p)^k; \qquad k=0, 1, 2, \dots. \]