11  LotUS and Variance

11.1 Law of the Unconscious Statistician

In Chapter 9, we saw that expected value is a good summary of a random variable. In Chapter 10, we discussed how a random variable behaves under transformations. In this chapter, we combine the two ideas and find the expected value of a function of a random variable.

Of course, one way to calculate \(\text{E}[ g(X) ]\) is to first determine the PMF of the transformed random variable \(g(X)\) and calculate the expected value using Equation 9.4. For example, in Example 10.3, we determined the PMF of \(D = |5 - 2L|\), the drunkard’s distance from the bar entrance after 5 steps, to be

\(x\) \(1\) \(3\) \(5\)
\(f_{D}(x)\) \(20/32\) \(10/32\) \(2/32\)

so the drunkard is expected to be \[ \text{E}[ D ] = 1 \cdot \frac{20}{32} + 3 \cdot \frac{10}{32} + 5 \cdot \frac{2}{32} = 1.875\ \text{feet} \tag{11.1}\] from the bar entrance after 5 steps.

But determining the PMF of \(D\) was a lot of work! Fortunately, the following result allows us to calculate the expected value using the PMF of \(L\).

Theorem 11.1 (Law of the Unconscious Statistician (LotUS)) Let \(X\) be a discrete random variable with PMF \(f_X\). Then, \[ \text{E}[ g(X) ] = \sum_x g(x) f_X(x), \tag{11.2}\] where the sum is over the possible values of \(X\).

Note that \(Y = g(X)\) is itself a random variable, so by the definition of expected value, we know that \[ \text{E}[ g(X) ] = \text{E}[ Y ] = \sum_y y f_Y(y). \]

Recall how we found the PMF of \(Y\) using Equation 10.5. For each possible value of \(Y\), we sum the probabilities of the corresponding values of \(X\): \[ f_Y(y) = \sum_{x: g(x) = y} f_X(x). \]

Substituting this into the expression above, we obtain \[ \begin{align} \text{E}[ g(X) ] &= \sum_y y \sum_{x: g(x) = y} f_X(x) \\ &= \sum_y \sum_{x: g(x)=y} g(x) f_X(x) \\ &= \sum_x g(x) f_X(x). \end{align} \]

The last line follows because the sets \(\{ x: g(x) = y \}\) for different values of \(y\) are a partition of all the possible values of \(x\).

Theorem 11.1 says that to determine the average value of \(g(X)\), we weight the possible values of \(g(X)\) by their corresponding probabilities. Some statisticians even take Equation 11.2 to be the definition of expected value itself. But it is not clear that this definition is consistent because there are two ways to calculate \(\text{E}[ g(X) ]\):

  1. Use Equation 11.2.
  2. Derive the PMF of \(Y = g(X)\) and calculate \(\text{E}[ Y ] = \sum_y y f_Y(y)\).

The fact that these two methods agree is a theorem, which requires a proof. Because it is easy to forget that Theorem 11.1 is a theorem, this result has earned the moniker “Law of the Unconscious Statistician.”

Now let’s apply LotUS to the random walk example.

Example 11.1 (Expected Distance in a Random Walk) LotUS says that we only need to work with the PMF of \(L\) to determine \[ \text{E}[ D ] = \text{E}[ |5 - 2L| ], \] where \(L\) is the number of steps to the left.

We know that \(L \sim \text{Binomial}(n=5, p=1/2)\). Since the transformation is \(g(L) = |5 - 2L|\), Theorem 11.1 says that \[ \begin{align} \text{E}[ |5 - 2L| ] &= \sum_x g(x) f_L(x) \\ &= \sum_{x=0}^5 |5 - 2x| \frac{\binom{5}{x}}{2^5} \\ &= |5 - 2(0)| \frac{\binom{5}{0}}{2^5} + |5 - 2(1)| \frac{\binom{5}{1}}{2^5} + |5 - 2(2)| \frac{\binom{5}{2}}{2^5} \\ &\, \, \, \, + |5 - 2(3)| \frac{\binom{5}{3}}{2^5} + |5 - 2(4)| \frac{\binom{5}{4}}{2^5} + |5 - 2(5)| \frac{\binom{5}{4}}{2^5} \end{align} \]

The easiest way to evaluate this sum is using R.

We get the same answer as before, \(\text{E}[ D ] = \text{E}[ |5 - 2L| ] = 1.875\). Although the calculation was tedious, it was still much less involved than calculating the PMF of \(D\), as in Example 10.3.

Caution!

Because \(L\) is binomial, we also know that \(\text{E}[ L ] = np = 2.5\). Surprisingly, this information is not useful for calculating \(\text{E}[ |5 - 2L| ]\). In particular, \[ \underbrace{\text{E}[ |5 - 2L| ]}_{1.8} \neq \underbrace{|5 - 2(2.5)|}_0. \]

In general, we cannot interchange the expectation and the transformation: \[ \text{E}[ g(X) ] \neq g(\text{E}[ X ]). \] To calculate \(\text{E}[ g(X) ]\) requires computing a sum anew; we cannot simply plug in \(\text{E}[ X ]\) into the function.

LotUS also offers another perspective on the St. Petersburg Paradox (Example 9.5).

Example 11.2 (St. Petersburg Paradox as a Transformation) In Example 9.5, a fair coin was tossed until heads appeared. If \(X\) represents the number of tosses, then \(X \sim \text{Geometric}(p=1/2)\). We know that its PMF is \(f_X(k) = 1/2^k\) and \(\text{E}[ X ] = 1/p = 2\).

The “paradox” arises in the payout. The payout starts at $2 and doubles each time a tails is tossed. We can describe the payout \(Y\) as a transformation of \(X\): \[ Y = 2^X. \]

To calculate \(\text{E}[ Y ]\), we can use LotUS: \[ \text{E}[ Y ] = \text{E}[ 2^X ] = \sum_k g(k) \cdot f_X(k) = \sum_{k=1}^\infty 2^k \cdot \frac{1}{2^k} = \sum_{k=1}^\infty 1 = \infty. \]

Once again, LotUS allowed us to avoid finding the PMF of \(Y\) explicitly; we were able to work with the known PMF of \(X\). Also, observe that we cannot simply interchange the expectation and the transformation: \[ \text{E}[ g(X) ] \neq g(\text{E}[ X ]) = 2^{\text{E}[ X ]} = 4. \]

There is one situation where it is possible to interchange expectation and transformation—when \(g\) is a linear transformation, of the form \[ g(x) = ax + b. \] In these situations, we can bypass LotUS.

Proposition 11.1 (Linear Transformations) Let \(X\) be a random variable and let \(a\) and \(b\) be constants. Then,

\[\text{E}[ aX + b ] = a \text{E}[ X ] + b \tag{11.3}\]

By LotUS, \[ \begin{aligned} \text{E}[ aX + b ] &= \sum_x (ax + b) f_X(x) & \text{(LotUS)} \\ &= \sum_x ax f_X(x) + \sum_x b f_X(x) & \text{(split up sum)} \\ &= a \underbrace{\sum_x x f_X(x)}_{\text{E}[ X ]} + b \underbrace{\sum_x f_X(x)}_1 & \text{(pull out constants)} \\ &= a \text{E}[ X ] + b. \end{aligned} \] In the last step, we used the fact that any PMF sums to 1.

When the transformation is linear, applying Proposition 11.1 is much easier than applying LotUS.

Example 11.3 (Expectation of a linear transformation) In Example 10.2, we argued that if \(X \sim \text{Binomial}(n, p)\), then \(n - X \sim \text{Binomial}(n, 1-p)\). By Example 9.3, we know that
the expectation of a binomial random variable must be \(\text{E}[ n - X ] = n(1 - p)\).

But \(g(x) = n - x\) is a linear transformation with \(a = -1\) and \(b = n\). So applying Proposition 11.1, the expected value must be \[ \text{E}[ n - X ] = n - \text{E}[ X ] = n - np = n(1 - p). \] The remarkable thing is that this expectation does not depend on the fact that \(X\) is binomial. It is true for any distribution of \(X\), as long as \(\text{E}[ X ] = np\).

However, calculating \(\text{E}[ g(X) ]\) requires LotUS in general. The next example drives this point home.

Example 11.4 (Expected Value of a Call Option) In Example 10.4, we considered a call option with a strike price of $55. If the price of the underlying stock is a random variable \(X\) with PMF

\(x\) \(50\) \(53\) \(57\) \(60\)
\(f_X(x)\) \(1/8\) \(2/8\) \(4/8\) \(1/8\)

then the value of the option at maturity is \(g(X) = \max(X - 55, 0)\). What is \(\text{E}[ g(X) ]\)?

By LotUS, the expected value of this option is \[ \begin{align} \text{E}[ g(X) ] &= \max(50 - 55, 0) \cdot \frac{1}{8} + \max(53 - 55, 0) \cdot \frac{2}{8} \\ &\quad \quad + \max(57 - 55, 0) \cdot \frac{4}{8} + \max(60 - 55, 0) \cdot \frac{1}{8} \\ &= 0 \cdot \frac{1}{8} + 0 \cdot \frac{2}{8} + 2 \cdot \frac{4}{8} + 5 \cdot \frac{1}{8} \\ &= \$1.625. \end{align} \]

To check this answer, we calculate the expected value of \(Y = \max(X - 55, 0)\) directly using the PMF that we derived in Example 10.4: \[ \text{E}[ Y ] = 0 \cdot \frac{3}{8} + 2 \cdot \frac{4}{8} + 5 \cdot \frac{1}{8} = \$1.625. \]

11.2 Variance

In Example 9.1, we saw that a $1 straight-up bet and a $1 red/black bet in roulette have the same expected value of \(-\frac{1}{19}\). However, these two bets are clearly different — the straight-up bet is much harder to win, but it is countered by the significantly higher payout. This is captured by variance, which measures how much the possible outcomes deviate from the expected value.

Definition 11.1 (Variance of a discrete random variable) Let \(X\) be a discrete random variable. Then, the variance of \(X\) is \[ \text{Var}[ X ] = \text{E}[ (X - \text{E}[ X ])^2 ]. \tag{11.4}\]

Let’s use Equation 11.4 to calculate the variance of the different roulette bets.

Example 11.5 (Variance of roulette bets) Let \(S\) be the profit from a $1 straight-up bet. Then \(\text{E}[ S ] = -\frac{1}{19}\).

The variance is \[ \begin{aligned} \text{Var}[ S ] &= \text{E}[ (S - \text{E}[ S ])^2 ] \\ &= \sum_x (x - (-\frac{1}{19}))^2 \cdot f_S(x) \\ &= (35 - (-\frac{1}{19}))^2 \cdot \frac{1}{38} + (-1 - (-\frac{1}{19}))^2 \cdot \frac{37}{38} \\ &\approx 33.208. \end{aligned} \]

Now let \(R\) be the profit from a $1 bet on reds. Then \(\text{E}[ R ] = -\frac{1}{19}\) and the variance is \[ \begin{aligned} \text{Var}[ R ] &= \text{E}[ (R - \text{E}[ R ])^2 ] \\ &= \sum_x (x - (-\frac{1}{19}))^2 \cdot f_R(x) \\ &= (1 - (-\frac{1}{19}))^2 \cdot \frac{18}{38} + (-1 - (-\frac{1}{19}))^2 \cdot \frac{20}{38} \\ &\approx 0.997. \end{aligned} \]

So the variances of the two bets are very different, even though their expected values are the same.

There is another version of the variance formula that is often easier for computations.

Proposition 11.2 (Shortcut Formula for Variance) \[ \text{Var}[ X ] = \text{E}[ X^2 ] - (\text{E}[ X ])^2 \tag{11.5}\]

Let \(\mu = \text{E}[ X ]\). Note that \(\mu\) is a constant.

Using LotUS, we can expand the term inside the sum to obtain the shortcut formula. \[ \begin{align*} \text{Var}[ X ] &= \text{E}[ (X - \mu)^2 ] \\ &= \sum_x (x-\mu)^2 f_X(x) \\ &= \sum_x (x^2 - 2\mu x + \mu^2) f_X(x) \\ &= \underbrace{\sum_x x^2 f_X(x)}_{\text{E}[ X^2 ]} - 2 \mu \underbrace{\sum_x x f_X(x)}_{\text{E}[ X ]} + \mu^2 \underbrace{\sum_x f_X(x)}_1 \\ &= \text{E}[ X^2 ] - 2\mu \text{E}[ X ] + \mu^2 \\ &= \text{E}[ X^2 ] - 2\mu^2 + \mu^2 \\ &= \text{E}[ X^2 ] - (\text{E}[ X ])^2. \end{align*} \]

The shortcut formula is easier to use because usually \(\text{E}[ X ]\) is already known, so one just needs to compute \(\text{E}[ X^2 ]\). Let’s use the shortcut formula on the roulette example from Example 11.5.

Example 11.6 (Variance with the shortcut formula) For the $1 straight-up bet, we know that \(\text{E}[ S ] = -\frac{1}{19}\) so we just need to compute \(\text{E}[ S^2 ]\) using LotUS: \[ \text{E}[ S^2 ] = (35)^2 \cdot \frac{1}{38} + (-1)^2 \cdot \frac{37}{38} = \frac{631}{19}. \] By Proposition 11.2, the variance is \[ \text{Var}[ S ] = \text{E}[ S^2 ] - (\text{E}[ S ])^2 = \frac{631}{19} - (-\frac{1}{19})^2 \approx 33.208.\]

For the $1 bet on reds, we know that \(\text{E}[ R ] = -\frac{1}{19}\). To calculate \(\text{E}[ R^2 ]\), we could use LotUS again, or we could observe that the only possible values of \(R\) are \(-1\) or \(1\); in either case, \(R^2 = 1\), so \(\text{E}[ R^2 ] = 1\).

By Proposition 11.2, the variance is \[ \text{Var}[ R ] = \text{E}[ R^2 ] - (\text{E}[ R ])^2 = 1 - (-\frac{1}{19})^2 \approx 0.997.\]

In Example 9.3, we determined that the expectation of a \(\text{Binomial}(n,p)\) random variable \(X\) is \(\text{E}[ X ] = np\). Now, we will use Proposition 11.2 to determine a similar expression for \(\text{Var}[ X ]\).

Example 11.7 (Variance of a binomial) First, we use LotUS to calculate \(\text{E}[ X^2 ]\):

\[ \begin{align*} \text{E}[ X^2 ] &= \sum_{k=0}^n k^2 \binom{n}{k} p^k (1-p)^{n-k} \\ &= \sum_{k=1}^n k^2 \binom{n}{k} p^k (1-p)^{n-k} & (\text{the $k=0$ term is 0}) \\ &= np \sum_{k=1}^n k \binom{n-1}{k-1} p^{k-1} (1-p)^{n-k} & \left( k \binom{n}{k} = n \binom{n-1}{k-1} \right)\\ &= np \sum_{j=0}^{n-1} (j+1) \binom{n-1}{j} p^j (1-p)^{(n-1)-j} & \left( \text{reindexing $j = k-1$} \right) \\ &= np \text{E}[ Y+1 ], \end{align*} \] where \(Y \sim \text{Binomial}(n-1,p)\). In the last step, we observed that the sum is of the form \(\sum_j (j+1) f_Y(j)\), which by LotUS, is \(\text{E}[ Y + 1 ]\).

By Proposition 11.1 and Example 9.3, \[ \text{E}[ Y + 1 ] = \text{E}[ Y ] + 1 = (n-1)p + 1. \]

Substituting this into the expression above, we see that \[ \text{E}[ X^2 ] = np \text{E}[ Y + 1 ] = np ( (n-1)p + 1 ) = np(np + (1 - p)), \] so \[ \text{Var}[ X ] = \text{E}[ X^2 ] - (\text{E}[ X ])^2 = np(np + (1 - p)) - (np)^2 = np(1-p). \]

This derivation was quite cumbersome. In Example 15.5, we will present an alternative derivation of the binomial variance that involves less algebra and offers more intuition.