In statistics, we typically model data \(Y_1, \dots, Y_n\) as independent random variables. One of the most common models is \[
Y_i \sim \text{Normal}(\mu_i, \sigma^2),
\tag{45.1}\] or equivalently, in random vector notation as \[
\vec Y \sim \textrm{MVN}(\vec \mu, \sigma^2 I).
\]
Note that we do not assume that \(Y_1, \dots, Y_n\) are identically distributed; we allow them to have different means. However, we assume that they all have the same variance. This assumption, called homoskedasticity (from Greek for “equal scatter”), is essential to simplifying calculations, as we will see.
Here is one example of a model where we might choose to model data as independent normal, but with different means.
Example 45.1 (Linear regression) Let \(Y_1, \dots, Y_n\) represent the heights of boys ages 5 - 15. We know that children grow taller over time, so if we also know their ages \(x_1, \dots, x_n\), we might model their heights as independent random variables \[
Y_i \sim \text{Normal}(\alpha + \beta x_i, \sigma^2),
\] where \(\mu_i\) is a linear function of the age. In other words, it might be reasonable to assume that the heights of the boys are independent normal with the same variance, but the means should grow with age.
If we let \(\vec x \overset{\text{def}}{=}(x_1, \dots, x_n)\), then we can write \(Y_1, \dots, Y_n\) in in random vector notation as \[
\vec Y \sim \textrm{MVN}(\vec\mu = \vec 1 \alpha + \vec x \beta, \Sigma = \sigma^2 I).
\]
To estimate \(\alpha\), \(\beta\), and \(\sigma\), we can maximize the (log-)likelihood \[
\ell_{\vec Y}(\alpha, \beta, \sigma^2) = -\frac{n}{2} \log (2\pi \sigma^2) - \frac{1}{2} (\vec Y - (\underbrace{\vec 1 \alpha + \vec x \beta}_{\vec\mu}))^\intercal (\underbrace{\sigma^2 I}_{\Sigma})^{-1} (\vec Y - (\underbrace{\vec 1 \alpha + \vec x \beta}_{\vec\mu})).
\tag{45.2}\]
Simplifying a bit, we see that the MLEs of \(\alpha\) and \(\beta\) can be obtained by solving \[
(\hat\alpha, \hat\beta) = \underset{\alpha,\beta}{\arg\min}\ \left\| \vec Y - \begin{bmatrix} \vec 1 & \vec x \end{bmatrix} \begin{pmatrix}\alpha \\ \beta \end{pmatrix} \right\|^2,
\] which is a least-squares problem. Once we know \(\hat\alpha\) and \(\hat\beta\), we can substitute \[
\hat{\vec\mu} = \vec 1 \hat\alpha + \vec x \hat\beta
\] into Equation 45.2 to obtain the MLE of \(\sigma^2\): \[
\hat\sigma^2 = \underset{\sigma^2}{\arg\max}\ \left( -\frac{n}{2} \log(2\pi\sigma^2) - \frac{1}{2\sigma^2} \left\| \vec{Y} - \hat{\vec{\mu}} \right\|^2 \right).
\] By taking the derivative and setting it equal to 0, we see that \[
\hat\sigma^2 = \frac{1}{n} \Big|\Big|\vec Y - \hat{\vec\mu}\Big|\Big|^2.
\]
In this chapter, we will develop tools to analyze the sampling distribution of the estimators \(\hat{\vec\mu}\) and \(\hat\sigma^2\).
Projection and Independence
In this section, we will generalize the result from Theorem 44.2. There, we showed that if \(\vec X\) is a vector of i.i.d. \(\text{Normal}(\mu, \sigma^2)\) random variables, then \[ \vec 1 \bar X = P_{\vec{1}} \vec X, \] where \(\displaystyle P_{\vec{1}} \overset{\text{def}}{=}\frac{\vec{1} \vec{1}^\intercal}{n}\), is independent of \[ \vec X - \vec 1 \bar X = (I - P_{\vec{1}})\vec X. \] It turns out that \(P\) is an orthogonal projection matrix.
Definition 45.1 (Orthogonal projection matrix) Let \(P\) be an \(n \times n\) matrix. \(P\) is an orthogonal projection matrix if \(P^2 = P\) and \(P^\intercal = P\).
In particular, \(P\) is the orthogonal projection onto \(C(P) \subseteq \mathbb{R}^n\), where \(C(P)\) is the column space of \(P\).
To understand why \(P^2 = P\), we observe that the vector is already in the subspace after the first projection, so any subsequent projections do nothing. This property is called idempotence (from Latin for “same power”).
Coming back to \(P_{\vec{1}}\), it has the following properties:
- \(P_{\vec{1}} \vec{1} = \vec{1}\)
- \(P_{\vec{1}} \vec{v} = \vec{0}\) for any \(\vec{v} \in C(\vec{1})^\perp\); i.e., \(\vec{v} \perp \vec{1}\).
Now, we will show that Theorem 44.2 holds for independent normal random variables that do not necessarily have the same mean, as well as arbitrary projection matrices.
Theorem 45.1 (Independence of projection and residual) As defined as in Equation 45.1, let \[\vec Y = (Y_1, \dots, Y_n) \sim \textrm{MVN}(\vec\mu, \sigma^2 I),\] and let \(P\) be any (orthogonal) projection matrix.
Then, the projection \(P\vec Y\) and the vector of residuals \((I - P)\vec Y\) are independent.
First, note that \[
\begin{pmatrix} P \vec{Y} \\ (I - P)\vec{Y} \end{pmatrix} = \begin{bmatrix} P \\ I - P \end{bmatrix} \vec Y,
\] is multivariate normal by Proposition 44.3. Therefore, if we can show that their cross-covariance is zero, then they must be independent by Theorem 44.1.
\[
\begin{align}
\text{Cov}\!\left[ P\vec Y, (I - P)\vec Y \right] &= P \underbrace{\text{Var}\!\left[ \vec Y \right]}_{\sigma^2 I} (I - P)^\intercal \\
&= \sigma^2 P(I - P) \\
&= \sigma^2 (P - P^2) \\
&= 0_{n \times n}.
\end{align}
\]
Notice that the assumption of homoskedasticity was critical to the proof of Theorem 45.1. If \(\text{Var}\!\left[ \vec Y \right]\) were not proportional to the identity matrix \(I\), then we would not have been able to combine \(P\) and \((I - P)\).
The next example demonstrates an application of Theorem 45.1.
Example 45.2 (Independence in linear regression) In Example 45.1, we noted that the MLEs of \(\alpha\) and \(\beta\) could be obtained by solving a least-squares problem: \[
(\hat\alpha, \hat\beta) = \underset{\alpha,\beta}{\arg\min}\ \left\| \vec Y - \begin{bmatrix} \vec 1 & \vec x \end{bmatrix} \begin{pmatrix}\alpha \\ \beta \end{pmatrix} \right\|^2 = \underset{\alpha,\beta}{\arg\min}\ \left\| \vec{Y} - \hat{\vec{\mu}} \right\|^2.
\] The solution is obtained by projecting \(\vec Y\) onto \(\text{span}(\{ \vec 1, \vec x \})\). That is, \[
\hat{\vec \mu} = \underbrace{\begin{bmatrix} \vec 1 & \vec x \end{bmatrix}}_X \begin{pmatrix}\hat\alpha \\ \hat\beta \end{pmatrix} = P_X \vec Y,
\] where \(P_X \overset{\text{def}}{=}X (X^\intercal X)^{-1} X^\intercal\) is the projection matrix onto \(C(X)\). Notice that \(P_X \vec Y\) is a vector representing the “normal” height for each child based on their age.
We also saw that the MLE of \(\sigma^2\) was \[
\hat\sigma^2 = \frac{1}{n} \left\| \vec Y - \hat{\vec\mu} \right\|^2,
\] which can be written as \[
\frac{1}{n} \left\| (I - P_X)\vec Y \right\|^2.
\] \((I - P_X)\vec Y\) is a vector representing how much each child differs from the norm for their age, and \(\hat\sigma^2\) is the average of these squared differences.
By Theorem 45.1, we know that \(P_X\vec Y\) and \((I - P_X)\vec Y\) are independent. Since \(\hat{\vec\mu} = P_X\vec Y\), while \(\hat\sigma^2\) is a function only of \((I - P_X)\vec Y\), they must be independent.
Finally, we make a simple but important observation. If \(P\) is a projection matrix, \((I - P)\) is also. It is also symmetric, and \[
(I - P)^2 = I - 2P + P^2 = I - 2P + P = I - P.
\] In particular, \(P\) is matrix of orthogonal projection onto \(C(P)\), and \(I - P\) is the matrix of orthogonal projection onto \(C(P)^\intercal\).
Length and the Chi-Square Distribution
In this section, we will focus on the case where the data \(Z_1, \dots, Z_n\) are i.i.d. standard normal. That is, \[
\vec Z \overset{\text{def}}{=}(Z_1, \dots, Z_n) \sim \textrm{MVN}(\vec 0, I).
\tag{45.3}\] We will discuss how the results here extend to the more general case (Equation 45.1) at the end.
First, we determine the distribution of the squared length of such a random vector: \[
||\vec Z||^2 = Z_1^2 + Z_2^2 + \dots + Z_n^2.
\] In Section 35.3, we took this to be the definition of the chi-square distribution with \(n\) degrees of freedom, or \(\chi^2_n\), which we saw was equivalent to the \(\textrm{Gamma}(\alpha= \frac{n}{2}, \lambda= \frac{1}{2})\) distribution.
Now, suppose we multiply \(\vec Z\) by a diagonal matrix \(D\) of the following form: \[
D = \begin{bmatrix} I_{k \times k} \\ & 0_{(n-k) \times (n-k)} \end{bmatrix}.
\tag{45.4}\] Then, \(D\) zeroes out the elements of \(\vec Z\) after the first \(k\) so \[
|| D \vec Z ||^2 = Z_1^2 + \dots + Z_k^2 \sim \chi_k^2.
\tag{45.5}\] This seemingly trivial observation is the basis for a more profound result. It turns out that if \(P\) is any projection matrix onto a subspace of dimension \(k\), then \[
||P\vec Z||^2 \sim \chi^2_k.
\] The argument proceeds as follows:
- We can rotate the axes so that in the new coordinate system, \(P\) is a diagonal matrix of the form Equation 45.4.
- The distribution of \(\vec Z\) does not change under rotation; it is still Equation 45.3 in this new coordinate system.
- In this new coordinate system, the distribution is clearly \(\chi_k^2\) by Equation 45.5.
The next two results make this precise. We start by establishing the second point, that the distribution of \(\vec Z\) does not change under rotation.
Lemma 45.1 (Rotating a multivariate normal) Let \(Q\) be any orthogonal matrix. That is, the columns of \(Q\) are an orthonormal basis of \(\mathbb{R}^n\) so that \(Q^\intercal Q = I\). Then, \(\vec Z' \overset{\text{def}}{=}Q\vec Z\) has the same distribution as \(\vec Z\). That is, \(\vec Z'\) is also a vector of i.i.d. standard normals.
The distribution of \(\vec Z' \overset{\text{def}}{=}Q\vec Z\) is also multivariate normal by Proposition 44.3. By Proposition 43.1, its mean vector is \[ \text{E}\!\left[ \vec Z' \right] = \text{E}\!\left[ Q\vec Z \right] = Q\text{E}\!\left[ \vec Z \right] = Q \vec 0 = \vec 0 \] and its covariance matrix is \[ \text{Var}\!\left[ \vec Z' \right] = \text{Var}\!\left[ Q\vec Z \right] = Q\underbrace{\text{Var}\!\left[ \vec Z \right]}_{\sigma^2 I} Q^\intercal = \sigma^2 \underbrace{QQ^\intercal}_I = \sigma^2 I, \] which together completely characterize a multivariate normal distribution. So \(\vec Z'\) has the same distribution as \(\vec Z\).
Here is the geometric intuition for this result. The PDF of \(\vec Z\) is \[
f_{\vec Z}(\vec z) = \frac{1}{(2\pi\sigma^2)^{n/2}} e^{-\frac{1}{2\sigma^2} \sum_{i=1}^n z_i^2},
\] which only depends on the distance from the origin, so it is spherically symmetric. On the other hand, the orthogonal matrix \(Q\) is simply a rotation around the origin. So multiplying by \(Q\) may change the value of \(\vec Z\), but does not change its distribution.
Now we are ready to establish the main theorem, which is a special case of a more general result proved by the Scottish-American statistician Bill Cochran. Despite the theoretical nature of this result, Cochran was in fact the consummate applied statistician—for example, serving on the scientific advisory committee for the U.S. Surgeon General’s 1964 report that cigarette smoking causes lung cancer. Cochran served as the chair of the Johns Hopkins Department of Biostatistics, before moving to Harvard to found the Department of Statistics there.
William Cochran (1909-1980)
Theorem 45.2 (Cochran (1934)) Let \(P\) be a rank-\(k\) projection matrix; that is, it projects vectors onto a subspace of dimension \(k\). Then, \[
|| P \vec Z ||^2 \sim \chi_k^2.
\]
Since \(P\) projects onto a subspace of dimension \(k\), we can let \[\{ \vec q_1, \dots, \vec q_k \}\] be an orthonormal basis of this subspace, which exists by the Gram-Schmidt algorithm. Note that all of these vectors are eigenvectors of \(P\) with eigenvalue \(1\) because they are already in the subspace, so \[ P\vec q_j = \vec q_j; \qquad j=1, \dots, k. \]
We can extend this to an orthonormal basis of \(\mathbb{R}^n\). Applying Gram-Schmidt to \(\left\{ \vec{q_1}, \dots, \vec{q_k}, \vec{e_1}, \dots, \vec{e_n} \right\}\) yields \[
\left\{ \vec{q_1}, \dots, \vec{q_k}, \vec{q_{k+1}}, \dots, \vec{q_n} \right\}.
\] Note that \(\left\{ \vec{q_{k+1}}, \dots, \vec{q_n} \right\}\) is an orthonormal basis of \(C(P)^\perp\). Hence,
\[ P\vec q_j = \vec 0; \qquad j=k+1, \dots, n. \]
Therefore, \(PQ = QD\), where \(Q\) is an orthogonal matrix consisting of the eigenvectors \[
Q \overset{\text{def}}{=}\begin{bmatrix} \vert & & \vert & \vert & & \vert \\ \vec q_1 & \dots & \vec q_k & \vec q_{k+1} & \dots & \vec q_n \\ \vert & & \vert & \vert & & \vert \end{bmatrix},
\] and \(D\) is a diagonal matrix consisting of the eigenvalues \[
D = \begin{bmatrix} I_{k \times k} \\ & 0_{(n-k) \times (n-k)} \end{bmatrix}.
\] Equivalently, we can write \(P = QDQ^\intercal\), since \(Q^{-1} = Q^\intercal\).
Substituting this diagonalization into the above, we see that \[
|| P\vec Z ||^2 = || (QDQ^\intercal)\vec Z ||^2 = \vec Z^\intercal (Q D \underbrace{Q^\intercal) (Q}_{I} D Q^\intercal) \vec Z = || D Q^\intercal \vec Z ||^2.
\] But by Lemma 45.1, we know that \(Q^\intercal \vec Z\) has the same distribution as \(\vec Z\), so \(|| D Q^\intercal \vec Z||^2\) must have the same distribution as \(||D \vec Z ||^2 = Z_1^2 + \dots + Z_k^2\), which is \(\chi^2_k\).
Although data rarely can be modeled as i.i.d. standard normal, we can often reduce them to standard normal random variables, as the next result shows.
Theorem 45.3 (Distribution of the sample variance for normal data) Let \(X_1, \dots, X_n\) be i.i.d. \(\text{Normal}(\mu, \sigma^2)\), and let \(S^2\) be the sample variance. Then,
\[
\frac{(n-1)S^2}{\sigma^2} \sim \chi^2_{n-1}
\]
First, we observe that the random variable in question can be written as \[\begin{align}
\frac{(n-1) S^2}{\sigma^2} &= \frac{\sum_{i=1}^n (X_i - \bar X)^2}{\sigma^2} \\
&= \sum_{i=1}^n \left(\frac{X_i - \mu}{\sigma} - \frac{\bar X - \mu}{\sigma}\right)^2 \\
&= \sum_{i=1}^n (Z_i - \bar Z)^2,
\end{align}\] where \(Z_1, \dots, Z_n\) are i.i.d. standard normal.
Let \(\vec Z \overset{\text{def}}{=}(Z_1, \dots, Z_n)\). Then, we can write \[
\sum_{i=1}^n (Z_i - \bar Z)^2 = || \vec Z - \vec 1 \bar Z ||^2 = ||(I - P_{\vec{1}}) \vec Z||^2.
\tag{45.6}\]
Since \(P_{\vec{1}}\) projects onto a \(1\)-dimensional subspace, \((I - P_{\vec{1}})\) projects onto an \((n-1)\)-dimensional subspace. By Theorem 45.2, \(\left\| (I - P_{\vec{1}})\vec Z \right\|^2\) follows a \(\chi_{n-1}^2\) distribution.
One consequence of Theorem 45.3 is that we can easily determine the bias and variance of \(S^2\) for normal data.
Example 45.3 (Bias and variance of the sample variance) In Chapter 38, we showed that \(S^2\) is unbiased for the variance \(\sigma^2\) for i.i.d. random variables from any distribution. We can verify this for i.i.d. normal random variables in particular using Theorem 45.3: \[
\text{E}\!\left[ S^2 \right] = \frac{\sigma^2}{n-1} \underbrace{\text{E}\!\left[ \chi_{n-1}^2 \right]}_{n-1} = \sigma^2,
\] where we have abused notation by letting \(\chi_{n-1}^2\) denote a random variable with that distribution. The expectation of the \(\chi_{n-1}^2\) distribution follows from the fact that it is identical to the \(\textrm{Gamma}(\alpha= \frac{n-1}{2}, \lambda= \frac{1}{2})\) distribution, and the expectation of the gamma distribution is \(\alpha/\lambda\).
Similarly, \[
\text{Var}\!\left[ S^2 \right] = \left(\frac{\sigma^2}{n-1}\right)^2 \underbrace{\text{Var}\!\left[ \chi_{n-1}^2 \right]}_{2(n-1)} = \frac{2\sigma^4}{n-1}
\] because the variance of the gamma distribution is \(\alpha/\lambda^2\). However, note that this formula for the variance only holds when the data are normal; there is no simple formula for the variance of \(S^2\) in general.
Finally, we will examine the sampling distribution of \(\hat\sigma^2\) from Example 45.1. Even though we showed in Example 45.2 that \[
\hat\sigma^2 = \frac{1}{n}|| (I - P)\vec Y ||^2,
\] Theorem 45.2 does not apply directly because \(\vec Y\) does not have mean \(\vec 0\). However, this is only a minor setback, as we see next.
Corollary 45.1 (Alternative form of Cochran’s theorem) Let \(P\) be a rank-\(k\) projection matrix; that is, it projects vectors onto a subspace of dimension \(k\). Suppose \[ \vec W \sim \textrm{MVN}(\vec 0, P). \] Then, \[
|| \vec W ||^2 \sim \chi^2_k.
\]
Let \(\vec Z\) be defined as in Equation 45.3. Then, \(\vec W = P\vec Z\), since \[
P\vec Z \sim \textrm{MVN}(\vec\mu=\vec 0, \Sigma=\underbrace{PP^\intercal}_P),
\] where we used the following properties of projection matrices to simplify the covariance matrix: \(P^\intercal = P\) and \(P^2 = P\). Therefore, \[
||\vec W||^2 = ||P\vec Z||^2 \sim \chi^2_k
\] by Theorem 45.2.
In other words, Corollary 45.1 says that the original vector that we are projecting need not have mean \(\vec 0\), as long as its projection has mean \(\vec 0\). As a result, Corollary 45.1 is applicable to more situations than Theorem 45.2, including \(\hat\sigma^2\) from Example 45.1.
Example 45.4 (Distribution of \(\hat\sigma^2\) in linear regression) Continuing Example 45.2, we first determine the distribution of \((I - P_X)\vec Y\). Since \[
\vec Y \sim \textrm{MVN}(\vec\mu = \vec 1 \alpha + \vec x \beta, \Sigma = \sigma^2 I),
\] we have by Proposition 44.3 and Proposition 43.1 that \[
(I - P_X)\vec Y \sim \textrm{MVN}\big(\vec\mu = \underbrace{(I - P_X)(\vec 1 \alpha + \vec x \beta)}_{\vec 0}, \Sigma = \underbrace{(I - P_X) (\sigma^2 I) (I - P_X)^\intercal}_{\sigma^2 (I - P_X)}\big).
\] To simplify the expressions, we used the fact that \(\vec 1 \alpha + \vec x \beta\) is already in the column span of \(X\), as well as properties of projection matrices.
Therefore, we see that \((I - P_X)\vec Y\) can be written as \(\sigma \vec W\), where \[
\vec W \sim \textrm{MVN}(\vec\mu=\vec 0, \Sigma=I - P_X),
\] and since \(I - P_X\) is a projection matrix of rank \(n-2\), Corollary 45.1 applies to \(\vec W\).
Now, the MLE of \(\sigma^2\) can be written as \[
\hat\sigma^2 = \frac{1}{n} ||(I - P_X)\vec Y||^2 = \frac{\sigma^2}{n} ||\vec W||^2.
\] so applying Corollary 45.1 to \(\vec W\) yields \[
\frac{n \hat\sigma^2}{\sigma^2} \sim \chi^2_{n-2}.
\tag{45.7}\] In other words, \(\hat\sigma^2 \sim \textrm{Gamma}(\alpha= \frac{n-2}{2}, \lambda= \frac{n}{2\sigma^2})\).
Here is one useful application of Equation 45.7. Since we know the distribution, we can evaluate its expectation \[
\text{E}\!\left[ \hat\sigma^2 \right] = \frac{n-2}{n} \sigma^2
\tag{45.8}\] to see that the MLE is biased for \(\sigma^2\). However, Equation 45.8 also suggests the fix. We can obtain an unbiased estimator of the variance in linear regression by simply rescaling: \[
\hat\sigma^2_{\text{unbiased}} = \frac{n}{n-2} \hat\sigma^2 = \frac{1}{n-2} ||\vec Y - \hat{\vec\mu} ||^2.
\]
This is the estimator of variance that is preferred in linear regression, and we now know its sampling distribution: \[
\frac{(n-2) \hat\sigma_{\text{unbiased}}^2}{\sigma^2} \sim \chi^2_{n-2},
\] or in other words, \(\hat\sigma^2_{\text{unbiased}} \sim \textrm{Gamma}(\alpha= \frac{n-2}{2}, \lambda= \frac{n-2}{2\sigma^2})\).
Notice that in Example 45.4, we were able to apply Corollary 45.1 to \((I - P_X)\vec Y\) only because it had mean \(\vec 0\). We cannot apply Corollary 45.1 to the complementary projection \(\hat{\vec\mu} = P_X \vec Y\), and in fact, the distribution of \(||\hat{\vec\mu}||^2\) has no simple description.
Exercises
Exercise 45.1 (Alternative proof of Theorem 45.3) Our goal is to reprove Theorem 45.3 using moment generating functions. Let \(X_1, \dots, X_n\) be i.i.d \(\text{Normal}(\mu, \sigma^2)\) and let \[
S^2 = \frac{1}{n-1} \sum_{i=1}^n \left( X_i - \bar{X} \right)^2.
\]
- First, consider \(Z_1, \dots, Z_n\) be i.i.d. \(\text{Normal}(0,1)\). Show that \[
\sum_{i=1}^n Z_i^2 = \sum_{i=1}^n \left( Z_i - \bar{Z} \right)^2 + n \bar{Z}^2.
\]
- Next, write down the MGF of both sides of the above result to determine the MGF of \[
\sum_{i=1}^n \left( Z_i - \bar{Z} \right)^2.
\] It may be helpful to recall that \(\bar{Z}\) is independent of \(\sum \left( Z_i - \bar{Z}\right)^2\). Use the MGF to determine its distribution.
- Finally, use location-scale transformations \(X_i = \mu + \sigma Z_i\) to conclude that \[
\frac{n-1}{\sigma^2} S^2 \sim \chi_{n-1}^2.
\]
Exercise 45.2 (Difference of means) Let \(X_1, \dots, X_m, Y_1, \dots, Y_n\) be i.i.d. \(\text{Normal}(\mu, \sigma^2)\).
- What is the distribution of \(\bar X - \bar Y\)?
- Let \(S_X^2\) be the sample variance of \(X_1, \dots, X_m\), and \(S_Y^2\) be the sample variance of \(Y_1, \dots, Y_n\). To estimate \(\sigma^2\), we could use either \(S_X^2\) or \(S_Y^2\), or better yet, we could use the pooled estimate of variance: \[
S_{\text{pooled}}^2 \overset{\text{def}}{=}\frac{(m-1) S_X^2 + (n-1) S_Y^2}{m+n-2}.
\tag{45.9}\] Show that \(S_{\text{pooled}}^2\) is an unbiased estimator of \(\sigma^2\) and determine the exact distribution of \((m+n-2)\frac{S_{\text{pooled}}^2}{\sigma^2}\).
- Argue that \(S_{\text{pooled}}^2\) is independent of \(\bar X - \bar Y\).
Cochran, William G. 1934. “The Distribution of Quadratic Forms in a Normal System, with Applications to the Analysis of Covariance.” In Mathematical Proceedings of the Cambridge Philosophical Society, 30:178–91. 2. Cambridge University Press.