## Example Problems on Random Effects

ST703 Homework 9 on Random Effects

Problems: 1, 2, 3, 4

# 1

Assume that the one-way random-effects model in Box 14.1 is appropriate for the protein content data in Exercise 14.1.

## a.

Construct the ANOVA table showing the mean squares and their expected values.

$\begin{array}{c c c c c c c} \text{Source} & \text{d.f} & \text{Sum of Squares} & \text{Mean Square} & \text{F-ratio} & p \text{-value} & E\\ \hline \text{Model} & 9 & 11.879 & 12.431 & 2.17935 & 0.07 & \sigma^2+ 3 \sigma^2_T \\ \text{Error} & 20 & 114.08 & 5.709 & & & \sigma^2\\ \text{Corrected Total} & 29 & 225.959 & & & & \\ \hline \end{array}$ \begin{align} df_{Model} & = t - 1 \\ & = 10 - 1 = 9 \\ \\ df_{Error} & = N - t \\ & = 30 - 10 = 20 \\ \\ df_{Total} & = df_{Model} + df_{Error} \\ & = 9 + 20 = 29 \\ \\ \bar{y}_{++} & = \frac{1172.8}{30} \\ & = 39.093 \\ \\ SSModel & = \sum^{t}_{i = 1} \sum_{j = 1}^n (\bar{y}_{i+} - \bar{y}_{++})^2 \\ & = 3 \left(\left(\frac{107.1}{3}-39.093\right)^2+\left(\frac{107.6}{3}-39.093\right)^2+\left(\frac{114.3}{3}-39.093\right)^2+\left(\frac{116.1}{3}-39.093\right)^2+\left(\frac{118.9}{3}-39.093\right)^2+\left(\frac{119.2}{3}-39.093\right)^2+\left(\frac{120.4}{3}-39.093\right)^2+\left(\frac{120.7}{3}-39.093\right)^2+\left(\frac{123}{3}-39.093\right)^2+\left(\frac{125.5}{3}-39.093\right)^2\right) \\ \\ MSModel & = \frac{ SSModel }{ dfModel } \\ & = \frac{111.879}{9} = 12.431 \\ \\ SSE & = \sum_{i=1}^{t} \sum_{j=1}^{n} (y_{ij} - \bar{y}){i+})^2 \\ & = \left(28.6\, -\frac{107.1}{3}\right)^2+\left(36.3\, -\frac{107.1}{3}\right)^2+\left(42.2\, -\frac{107.1}{3}\right)^2+\left(35.6\, -\frac{107.6}{3}\right)^2+\left(35.9\, -\frac{107.6}{3}\right)^2+\left(36.1\, -\frac{107.6}{3}\right)^2+\left(37.9\, -\frac{114.3}{3}\right)^2+\left(38.1\, -\frac{114.3}{3}\right)^2+\left(38.3\, -\frac{114.3}{3}\right)^2+\left(37.2\, -\frac{116.1}{3}\right)^2+\left(39.4\, -\frac{116.1}{3}\right)^2+\left(39.5\, -\frac{116.1}{3}\right)^2+\left(38.9\, -\frac{118.9}{3}\right)^2+\left(39.6\, -\frac{118.9}{3}\right)^2+\left(40.4\, -\frac{118.9}{3}\right)^2+\left(39.6\, -\frac{119.2}{3}\right)^2+\left(39.7\, -\frac{119.2}{3}\right)^2+\left(39.9\, -\frac{119.2}{3}\right)^2+\left(38.3\, -\frac{120.4}{3}\right)^2+\left(41.1\, -\frac{120.4}{3}\right)^2+\left(38.9\, -\frac{120.7}{3}\right)^2+\left(40.8\, -\frac{120.7}{3}\right)^2+\left(39.6\, -\frac{123}{3}\right)^2+\left(42.4\, -\frac{123}{3}\right)^2+\left(42.1\, -\frac{125.3}{3}\right)^2+\left(40.2\, -\frac{125.5}{3}\right)^2+\left(43.2\, -\frac{125.5}{3}\right)^2+\left(41-\frac{120.4}{3}\right)^2+\left(41-\frac{120.7}{3}\right)^2+\left(41-\frac{123}{3}\right)^2 \\ & = 114.08 \\ \\ MSE & = \frac{ SSE }{ dfError } \\ & = \frac{ 114.08 }{ 20 } = 5.704 \\ \\ SSTotal & = SSModel + SSE \\ & = 111.879\, +114.08 = 225.959 \\ \\ F & = \frac{ MSModel }{ MSE } \\ & = \frac{ 12.431 }{ 5.704 } = 2.18625 \\ \\ \end{align}

We compare this $F$ value to an $F^{9}_{20}$ distribution to get a p-value of $0.07$.

## b.

On the basis of the ANOVA table constructed in (a), would it be reasonable to conclude that there is a significant plant-to-plant variation in the protein content of seeds produced by the $F_3$ generation plants?

\begin{align} H_0: & \text{There is no plant-to-plant variation} \\ H_A: & \text{There is plant-to-plant variation} \end{align}

Using a significance level of 0.05, since our p-value 0.07 > 0.05, we fail to reject the null hypothesis, there is no significant plant-to-plant variation.

## c.

Estimate the components of variance, and comment on their relative magnitudes.

\begin{align} \hat{\sigma}^2 & = MSE \\ & = 5.709 \\ \\ \hat{\sigma}^2 + n_0 \hat{\sigma}^2_{T} & = MSModel \\ \hat{\sigma}^2_{T} & = \frac{ MSModel - \hat{\sigma}^2 }{ n } \\ & = \frac{ 12.431 - 5.709 }{ 3 } \\ & = 2.24067 \end{align}

Here the intragroup variation, $\sigma^2$, is $5.709 / 2.24067 = 2.548$ time larger than the intergroup variation, $\sigma_T^2$.

## d.

Calculate an estimate of the coefficient of variation of the protein content measurements, and comment of its value.

$CV = \frac{ \sqrt{\sigma^2_T + \sigma^2} }{ | \mu | } = \frac{ \sqrt{2.24067 + 5.709} }{ 39.093 } = 0.072$

The standard deviation of the response is about 7.2% of its mean.

## f.

On the basis of the observed data, would it be reasonable to conclude that the average protein content of seeds produced in the $F_3$ generation is more than 40%?

\begin{align} H_0: & \mu \leq 40 \\ H_A: & \mu > 40 \end{align}

We can address this with a confidence lower bound.

$\bar{Y}_{++} - t_{t-1, \alpha} \sqrt{\frac{ MSModel }{ n t }}$

Where $t_{t - 1, alpha} = t_{10 - 1, 0.05} = 1.833113$.

\begin{align} 39.093 & -1.833113 \cdot \sqrt{\frac{ 12.431 }{ 3 \cdot 10 }} \\ (37.913, & \infty) \end{align}

Since the lower bound is less than 40, we fail to reject the null in favor of the alternative. The average protein content of seeds produced in the $F_3$ generations is not more than 40%.

# 2

To see how the effect of diet supplements on the weights of wethers varied with environmental conditions, a study was conducted in four randomly selected locations; each location represented a randomly selected environment. The experimenters randomly assigned 24 crossbred wethers to the four locations in such a way that each location had six wethers. Within each location, the animals were randomized to receive three diets, with two animals on each diet. Teh four-week weight gains in wethers are as follows:

$\begin{array}{c c c c c} \hline \text{Diet} & & \rlap{\text{Location}} & & \\ & 1 & 2 & 3 & 4 \\ \hline 1 & 2.10 & 2.02 & 2.16 & 1.98 \\ & 2.32 & 2.04 & 2.18 & 1.86 \\ \hline 2 & 2.24 & 2.30 &2.22 & 1.64 \\ & 2.22 & 2.12 & 2.18 & 1.73 \\ \hline 3 & 2.27 & 2.14 & 2.26 & 1.83 \\ & 2.24 & 2.17 & 2.21 & 1.89 \\ \hline \end{array}$

## a.

Write a suitable ANOVA model for the data. Explain all the terms in your model.

$Y_{ijk} = \mu + \alpha_i + B_j + (\alpha B)_{ij} + E_{ijk}$
• $Y_{ijk}$ - weight of the $k$th wether in the $i$th diet group in the $j$th location.
• $\mu$ - the overall mean weight.
• $\alpha_i$ - the fixed effect of the $i$th diet.
• $i = 1,2,3 \rightarrow a = 3$
• $\alpha_i \stackrel{iid}{\sim} N(0, \sigma_\alpha^2)$
• $B_j$ - the random effect of the $j$th location.
• $j = 1,2,3,4 \rightarrow b = 4$
• $B_j \stackrel{iid}{\sim} N(0, \sigma_B^2)$
• $(\alpha B)_{ij}$ - the random interaction effect of the $ij$th group.
• $(\alpha B)_{ij} \stackrel{iid}{\sim} n(0, \sigma^2_{\alpha B}))$
• $E_{ijk}$ - random error of the $k$th wether in the $i$th diet group in the $j$th location.
• $E_{ijk} \stackrel{iid}{\sim} N(0, \sigma^2)$
• $\sigma^2$ - the variance within each a group.

Finally, $\alpha_i, B_j, (\alpha B)_{ij}, E_{ijk}$ are mutually independent.

## b.

Obtain estimates of the components of the variance due to location, interaction between location and diet, and error. Interpret the results.

We will use SAS to help us obtain our estimates.

proc mixed data=wethers method=type3 cl covtest;
class diet location;
model weight = diet / ddfm = satterthwaite solution;
random location diet*location;
run; We will need the following mean square values.

\begin{align} MSE & = \frac{ 1 }{ n a b - a b } \sum_i \sum_j \sum_k (y_{ijk} - \bar{y}_{ij+})^2 \\ & = 0.004779 \\ \\ MSB & = \frac{ 1 }{ b - 1 } \sum_i \sum_j \sum_k (\bar{y}_{+j+} - \bar{y}_{+++})^2 \\ & 0.213104 \\ \\ MSAB & = \frac{ 1 }{ (a-1)(b-1) } \sum_i \sum_j \sum_k (\bar{y}_{ij+} - \bar{y}_{i++} - \bar{y}_{+j+} + \bar{y}_{+++})^2 \\ & = 0.014887 \end{align}

We can use these to estimate our variance components. Notice that $n = n_{ij} = 2$.

\begin{align} \hat{\sigma}^2 & = MSE \\ & = 0.004779 \\ \\ \hat{\sigma}_B^2 & = \frac{ MSB - MSAB }{ n \cdot a } \\ & = \frac{ 0.213104 - 0.014887 }{ 2 \cdot 3 } \\ & = 0.0330362 \\ \\ \hat{\sigma}_{\alpha B}^2 & = \frac{ MSAB - MSE }{ n } \\ & =\frac{ 0.014887 - 0.004779 }{ 2 } \\ & = 0.005054 \end{align}
• $\hat{\sigma}^2$ estimates the population variability to be 0.004779.
• $\hat{\sigma}_B^2$ estimates the variability between location groups to be 0.0330362.
• $\hat{\sigma}_{\alpha B}$ estimates the variability between diet-location groups to be 0.005054.

## c.

Construct the appropriate ANOVA table for the data and perform all ANOVA $F$-tests. Interpret the results.

We can use our ANOVA table from (b). First, we must give the formulas for the pieces that we haven’t calculated.

\begin{align} df_E & = a \cdot b \\ df_{\alpha} & = a - 1 \\ df_{B} & = b - 1\\ df_{\alpha B} & = (a - 1) (b-1) \\ \\ SSE & = MSE \cdot df_{E} \\ SSB & = MSB \cdot df_{B} \\ SS\alpha B & = MS\alpha B \cdot df_{\alpha B} \\ SS\alpha & = \sum_i \sum_j \sum_k (\bar{y}_{i++} - \bar{y}_{+++})^2 \\ \\ MS\alpha & = SS\alpha / df_{\alpha} \\ \\ SSTotal & = SSA + SSB + SSAB + SSE \end{align}

Now we can do our F-tests. We will start with the interaction term and then proceed with the main effects. We will use a significance value of 0.05 throughout.

\begin{align} H_0: & \sigma^2_{\alpha \beta} = 0 \\ H_A: & \sigma^2_{\alpha \beta} \neq 0 \end{align}

We can check $F = \frac{ MSAB }{ MSE } = \frac{ 0.014887 }{ 0.004779 } = 3.12$. Comparing this to an $F^{6}_{12}$ distribution gives $p = 0.0445 < 0.05$, so we would reject the null in favor of the alternative that there is an interaction effect.

Though we have an interaction effect, we will also check our main effects. We will start by looking at the diet effect.

\begin{align} H_0: & \sigma_{\alpha}^2 = 0 \\ H_A: & \sigma_{\alpha}^2 \neq 0 \end{align}

We can check $F = \frac{ MSA }{ MSAB } = \frac{ 0.00554 }{ 0.014887 } = 0.37$. Comparing this to an $F^{2}_{6}$ we get a p value of $0.7035 > 0.05$, so we fail to reject the null in alternative; there is no diet effect.

Then we can check the location effect.

\begin{align} H_0: & \sigma^2_{B} = 0 \\ H_A: & \sigma^2_{B} \neq 0 \end{align}

We can check $F = \frac{ MSB }{ MSAB } = \frac{ 0.213104 }{ 0.014887 } = 14.31$. Comparing this to an $F^{3}_{6}$ distribution, we get a p-value of $0.0038 < 0.05$, so we reject the null in favor of the alternative; there is a location effect.

## d.

Construct a 90% confidence interval for the true mean weight gain of animals fed diet 1 in a randomly selected location.

First we will need to find the standard error.

\begin{align} V(\bar{Y}_{1++}) & = V[ \mu + \alpha_1 + \bar{B}_{+} + \bar{(\alpha \beta)}_{1+} + \bar{E}_{1++}] \\ & = V[ \bar{B}_{+} + \bar{(\alpha \beta)}_{1+} + \bar{E}_{1++}] \\ & = V(\bar{B}_{+}) + V(\bar{(\alpha \beta)}_{1+}) +V(\bar{E}_{1++}) \end{align}

Notice that the covariance terms are 0 because of the independence structure. We will now look at each term individually.

\begin{align} V(\bar{B}_{+}) & = \frac{ \sum V(B_i) }{ 4^2 } \\ & = \frac{ \sum \sigma^2_B }{ 16 } \\ & = \frac{ 4 \hat{\sigma}^2_B }{ 16 } \\ & = \frac{ \hat{\sigma}^2_B }{ 4 } \\ \\ V(\bar{(\alpha \beta)}_{1+}) & = V\Big[ \frac{ \sum (\alpha \beta)_{ij} }{ 4 } \Big] \\ & = \frac{ \sum V((\alpha \beta)_{ij}) }{ 4^2 } \\ & = \frac{ 4 \hat{\sigma}^2_{AB} }{ 16 } \\ & = \frac{ \hat{\sigma}^2_{AB} }{ 4 } \\ \\ V(\bar{E}_{++}) & = V\Big[ \frac{ \sum_{i=1}^{4} \sum_{j=1}^{2} E_{ij} }{ 8 } \Big] \\ & = \frac{ \sum \sum V(E_{ij}) }{ 8^2 } \\ & = \frac{ 8 \hat{\sigma}^2 }{ 8^2 } \\ & = \frac{ \hat{\sigma}^2 }{ 8 } \end{align}

Now we can go back and combine them.

\begin{align} V(\bar{Y}_{1++}) & = V(\bar{B}_{+}) + V(\bar{(\alpha \beta)}_{1+}) +V(\bar{E}_{1++}) \\ & = \frac{ 1 }{ 4 } \frac{ MSB - MSAB }{ 6 } + \frac{ 1 }{ 4 } \frac{ MSAB - MSE }{ 2 } + \frac{ MSE }{ 8 } \\ & = \frac{ MSB }{ 24 } + \frac{ MSAB }{ 12 } \end{align}

Because this variance includes multiple mean squares, we will estimate our degrees of freedom with the Satterthwaite approximation.

\begin{align} \hat{df} & = \frac{ (\frac{ 1 }{ 24 }MSB + \frac{ 1 }{ 12 } MSAB)^2 }{ \frac{ (\frac{ 1 }{ 24 }MSB)^2 }{ 3 } + \frac{ (\frac{ 1 }{ 12 }MSAB)^2 }{ 6 } } \\ & = \frac{\left(\frac{0.213104}{24}+\frac{0.014887}{12}\right)^2}{\frac{1}{6} \left(\frac{0.014887}{12}\right)^2+\frac{1}{3} \left(\frac{0.213104}{24}\right)^2} \\ & = 3.85919 \end{align}

We can also calculate our standard error.

\begin{align} SE(\bar{Y}_{1++}) & =\sqrt{\frac{ 1 }{ 24 } MSB + \frac{ 1 }{ 12 } MSAB} \\ & = \sqrt{\frac{0.213104}{24}+\frac{0.014887}{12}} \\ & = 0.100598 \end{align}

Now we can calculate our confidence interval.

\begin{align} \bar{y}_{1++} & \pm t_{3.85919, 0.10 / 2} \sqrt{SE(\bar{Y}_{1++})} \\ 2.0825 & \pm 2.154393 \cdot 0.100598 \\ (1.86577, & 2.29923) \end{align}

We are 90% confident that the true value of the mean weight gain of animals fed diet 1 is between 1.86577 and 2.29923.

## e.

Construct a set of Bonferonni 90% simultaneous confidence intervals for all possible difference between diet means. Interpret the intervals.

We can use SAS to calculate these confidence intervals. Notice that we have 3 comparisons. Notice that the adjusted confidence intervals all contain 0, so none of the differences are significant.

# 3

To estimate the average cost of hospital stay, a random sample of four hospitals (factor $A$) was selected from the population of all hospitals in a large region. For each hospital, patient admission records for three randomly selected days (factor $B$) were examined. Two patients were selected at random from all the patients admitted on each day to each hospital. On the basis o the average daily hospital bill (in dollars) for each selected patient, the following ANOVA table was constructed:

$\begin{array}{c c} \hline \text{Source} & \text{Mean square} \\ \hline A \text{ (hospitals)} & 2240.05 \\ B(A) \text{ (days in hospital)} & 312.32 \\ \text{Error (patients)} & 122.02 \\ \hline \end{array}$