Example Problems on Random Effects
ST703 Homework 9 on Random Effects
Problems: 1, 2, 3, 4
1
Assume that the one-way random-effects model in Box 14.1 is appropriate for the protein content data in Exercise 14.1.
a.
Construct the ANOVA table showing the mean squares and their expected values.
Sourced.fSum of SquaresMean SquareF-ratiop-valueEModel911.87912.4312.179350.07σ2+3σ2TError20114.085.709σ2Corrected Total29225.959 dfModel=t−1=10−1=9dfError=N−t=30−10=20dfTotal=dfModel+dfError=9+20=29ˉy++=1172.830=39.093SSModel=t∑i=1n∑j=1(ˉyi+−ˉy++)2=3((107.13−39.093)2+(107.63−39.093)2+(114.33−39.093)2+(116.13−39.093)2+(118.93−39.093)2+(119.23−39.093)2+(120.43−39.093)2+(120.73−39.093)2+(1233−39.093)2+(125.53−39.093)2)MSModel=SSModeldfModel=111.8799=12.431SSE=t∑i=1n∑j=1(yij−ˉy)i+)2=(28.6−107.13)2+(36.3−107.13)2+(42.2−107.13)2+(35.6−107.63)2+(35.9−107.63)2+(36.1−107.63)2+(37.9−114.33)2+(38.1−114.33)2+(38.3−114.33)2+(37.2−116.13)2+(39.4−116.13)2+(39.5−116.13)2+(38.9−118.93)2+(39.6−118.93)2+(40.4−118.93)2+(39.6−119.23)2+(39.7−119.23)2+(39.9−119.23)2+(38.3−120.43)2+(41.1−120.43)2+(38.9−120.73)2+(40.8−120.73)2+(39.6−1233)2+(42.4−1233)2+(42.1−125.33)2+(40.2−125.53)2+(43.2−125.53)2+(41−120.43)2+(41−120.73)2+(41−1233)2=114.08MSE=SSEdfError=114.0820=5.704SSTotal=SSModel+SSE=111.879+114.08=225.959F=MSModelMSE=12.4315.704=2.18625We compare this F value to an F920 distribution to get a p-value of 0.07.
b.
On the basis of the ANOVA table constructed in (a), would it be reasonable to conclude that there is a significant plant-to-plant variation in the protein content of seeds produced by the F3 generation plants?
H0:There is no plant-to-plant variationHA:There is plant-to-plant variationUsing a significance level of 0.05, since our p-value 0.07 > 0.05, we fail to reject the null hypothesis, there is no significant plant-to-plant variation.
c.
Estimate the components of variance, and comment on their relative magnitudes.
ˆσ2=MSE=5.709ˆσ2+n0ˆσ2T=MSModelˆσ2T=MSModel−ˆσ2n=12.431−5.7093=2.24067Here the intragroup variation, σ2, is 5.709/2.24067=2.548 time larger than the intergroup variation, σ2T.
d.
Calculate an estimate of the coefficient of variation of the protein content measurements, and comment of its value.
CV=√σ2T+σ2|μ|=√2.24067+5.70939.093=0.072The standard deviation of the response is about 7.2% of its mean.
f.
On the basis of the observed data, would it be reasonable to conclude that the average protein content of seeds produced in the F3 generation is more than 40%?
H0:μ≤40HA:μ>40We can address this with a confidence lower bound.
ˉY++−tt−1,α√MSModelntWhere tt−1,alpha=t10−1,0.05=1.833113.
39.093−1.833113⋅√12.4313⋅10(37.913,∞)Since the lower bound is less than 40, we fail to reject the null in favor of the alternative. The average protein content of seeds produced in the F3 generations is not more than 40%.
2
To see how the effect of diet supplements on the weights of wethers varied with environmental conditions, a study was conducted in four randomly selected locations; each location represented a randomly selected environment. The experimenters randomly assigned 24 crossbred wethers to the four locations in such a way that each location had six wethers. Within each location, the animals were randomized to receive three diets, with two animals on each diet. Teh four-week weight gains in wethers are as follows:
DietLocation123412.102.022.161.982.322.042.181.8622.242.302.221.642.222.122.181.7332.272.142.261.832.242.172.211.89a.
Write a suitable ANOVA model for the data. Explain all the terms in your model.
Yijk=μ+αi+Bj+(αB)ij+Eijk- Yijk - weight of the kth wether in the ith diet group in the jth location.
- μ - the overall mean weight.
- αi - the fixed effect of the ith diet.
- i=1,2,3→a=3
- αiiid∼N(0,σ2α)
- Bj - the random effect of the jth location.
- j=1,2,3,4→b=4
- Bjiid∼N(0,σ2B)
- (αB)ij - the random interaction effect of the ijth group.
- (αB)ijiid∼n(0,σ2αB))
- Eijk - random error of the kth wether in the ith diet group in the jth location.
- Eijkiid∼N(0,σ2)
- σ2 - the variance within each a group.
- Eijkiid∼N(0,σ2)
Finally, αi,Bj,(αB)ij,Eijk are mutually independent.
b.
Obtain estimates of the components of the variance due to location, interaction between location and diet, and error. Interpret the results.
We will use SAS to help us obtain our estimates.
proc mixed data=wethers method=type3 cl covtest;
class diet location;
model weight = diet / ddfm = satterthwaite solution;
random location diet*location;
run;
We will need the following mean square values.
MSE=1nab−ab∑i∑j∑k(yijk−ˉyij+)2=0.004779MSB=1b−1∑i∑j∑k(ˉy+j+−ˉy+++)20.213104MSAB=1(a−1)(b−1)∑i∑j∑k(ˉyij+−ˉyi++−ˉy+j++ˉy+++)2=0.014887We can use these to estimate our variance components. Notice that n=nij=2.
ˆσ2=MSE=0.004779ˆσ2B=MSB−MSABn⋅a=0.213104−0.0148872⋅3=0.0330362ˆσ2αB=MSAB−MSEn=0.014887−0.0047792=0.005054- ˆσ2 estimates the population variability to be 0.004779.
- ˆσ2B estimates the variability between location groups to be 0.0330362.
- ˆσαB estimates the variability between diet-location groups to be 0.005054.
c.
Construct the appropriate ANOVA table for the data and perform all ANOVA F-tests. Interpret the results.
We can use our ANOVA table from (b). First, we must give the formulas for the pieces that we haven’t calculated.
dfE=a⋅bdfα=a−1dfB=b−1dfαB=(a−1)(b−1)SSE=MSE⋅dfESSB=MSB⋅dfBSSαB=MSαB⋅dfαBSSα=∑i∑j∑k(ˉyi++−ˉy+++)2MSα=SSα/dfαSSTotal=SSA+SSB+SSAB+SSENow we can do our F-tests. We will start with the interaction term and then proceed with the main effects. We will use a significance value of 0.05 throughout.
H0:σ2αβ=0HA:σ2αβ≠0We can check F=MSABMSE=0.0148870.004779=3.12. Comparing this to an F612 distribution gives p=0.0445<0.05, so we would reject the null in favor of the alternative that there is an interaction effect.
Though we have an interaction effect, we will also check our main effects. We will start by looking at the diet effect.
H0:σ2α=0HA:σ2α≠0We can check F=MSAMSAB=0.005540.014887=0.37. Comparing this to an F26 we get a p value of 0.7035>0.05, so we fail to reject the null in alternative; there is no diet effect.
Then we can check the location effect.
H0:σ2B=0HA:σ2B≠0We can check F=MSBMSAB=0.2131040.014887=14.31. Comparing this to an F36 distribution, we get a p-value of 0.0038<0.05, so we reject the null in favor of the alternative; there is a location effect.
d.
Construct a 90% confidence interval for the true mean weight gain of animals fed diet 1 in a randomly selected location.
First we will need to find the standard error.
V(ˉY1++)=V[μ+α1+ˉB++¯(αβ)1++ˉE1++]=V[ˉB++¯(αβ)1++ˉE1++]=V(ˉB+)+V(¯(αβ)1+)+V(ˉE1++)Notice that the covariance terms are 0 because of the independence structure. We will now look at each term individually.
V(ˉB+)=∑V(Bi)42=∑σ2B16=4ˆσ2B16=ˆσ2B4V(¯(αβ)1+)=V[∑(αβ)ij4]=∑V((αβ)ij)42=4ˆσ2AB16=ˆσ2AB4V(ˉE++)=V[∑4i=1∑2j=1Eij8]=∑∑V(Eij)82=8ˆσ282=ˆσ28Now we can go back and combine them.
V(ˉY1++)=V(ˉB+)+V(¯(αβ)1+)+V(ˉE1++)=14MSB−MSAB6+14MSAB−MSE2+MSE8=MSB24+MSAB12Because this variance includes multiple mean squares, we will estimate our degrees of freedom with the Satterthwaite approximation.
^df=(124MSB+112MSAB)2(124MSB)23+(112MSAB)26=(0.21310424+0.01488712)216(0.01488712)2+13(0.21310424)2=3.85919We can also calculate our standard error.
SE(ˉY1++)=√124MSB+112MSAB=√0.21310424+0.01488712=0.100598Now we can calculate our confidence interval.
ˉy1++±t3.85919,0.10/2√SE(ˉY1++)2.0825±2.154393⋅0.100598(1.86577,2.29923)We are 90% confident that the true value of the mean weight gain of animals fed diet 1 is between 1.86577 and 2.29923.
e.
Construct a set of Bonferonni 90% simultaneous confidence intervals for all possible difference between diet means. Interpret the intervals.
We can use SAS to calculate these confidence intervals. Notice that we have 3 comparisons.
Notice that the adjusted confidence intervals all contain 0, so none of the differences are significant.
3
To estimate the average cost of hospital stay, a random sample of four hospitals (factor A) was selected from the population of all hospitals in a large region. For each hospital, patient admission records for three randomly selected days (factor B) were examined. Two patients were selected at random from all the patients admitted on each day to each hospital. On the basis o the average daily hospital bill (in dollars) for each selected patient, the following ANOVA table was constructed:
SourceMean squareA (hospitals)2240.05B(A) (days in hospital)312.32Error (patients)122.02The average of all 24 patient bills was $380.24.
a.
Estimate the components of variability in the daily hospital cost due to difference between hospitals, due to differences between days, and due to differences between patients. Calculate the proportion of total variability in daily costs due to each of these sources. Comment on the results.
ˆσ2=MSE=122.02ˆσ2A=MSA−MSB(A)nb=2240.05−312.322⋅3=321.288ˆσ2B(A)=MSB(A)−MSEn=312.32−122.022=95.15 ˆσ2ˆσ2+ˆσ2A+ˆσ2B(A)=0.23ˆσ2Aˆσ2+ˆσ2A+ˆσ2B(A)=0.5898ˆσ2B(A)ˆσ2+ˆσ2A+ˆσ2B(A)=0.179712The hospital effect contains about 59% of the variability followed error at 23% and the nested effect of days in hospital at 18%.
b.
Construct a 95% confidence interval for the expected daily cost for a patient admitted to one of the hospitals in the region.
We want to find a confidence interval for ˉY+++.
ˉY+++=380.24V(ˉY+++)=V(ˉA+¯B(A)+ˉE)=ˆσ2Anb+ˆσ2B(A)n+ˆσ2nab=MSA−MSB(A)a+MSB(A)−MSEnba+MSEnab=MSAnab=2240.052⋅4⋅3=93.3354We do not need to use a Satterthwaite approximation since there is only one MS term, so we can use dfA=3. We get a t-value of t0.05/2,3.
380.24±3.182446⋅√93.3354(349.494,410.986)We are 95% confidence that the true expected cost for a patient admitted to one of the hospitals in the region is between 349.494 and 410.986 dollars.
c.
Construct a 95% lower confidence interval for the expected daily cost to a patient admitted to one of the hospitals in the region. Interpret the result.
We will do the same as above, but with a one sided t-value of t0.05,3=2.353363.
380.24±2.353363⋅√93.3354(357.502,∞)We are 95% confident that the lower bound for the true expected cost for a patient admitted to one of the hospitals in the region is between 357.502 dollars.
4
In an experiment to compare the serum amylase values determined be four laboratories, 160 serum specimens with a known amylase value of 4.2 were used. The specimens were randomized to 16 technicians (four from each lab) in such a way that each technician was assigned ten specimens. The technicians made independent measurements of the amylase values of the specimens assigned to them. On the basis of the results, the following ANOVA table was constructed:
SourceMean squareA (labs)9.07B(A) (technician)1.63Error (Specimens)1.02The totals for each lab may be summarized as follows:
Lab1234TotalTotal152144184176656a.
Write a model for the data on the assumption that the four technicians in each lab constitute a random sample from the population of technicians working in that lab. Explain all the terms in the model.
Yijk=μ+αi+Bj(i)+Eijk- Yijk - measure of amylase for kth specimen in lab i by technician j
- μ - the overall mean amylase
- αi - the fixed effect from lab i
- i=1,2,3,4→a=4
- Bj(i) - the random effect of technician j in lab i
- j=1,2,3,4→b=4
- Bj(i)iid∼n(0,σ2B(A))
- Eijk - the random error for kth specimen in lab i by technician j
- Eijkiid∼n(0,σ2)
Finally, Bj(i),Eijk are independent.
b.
Perform a pairwise multiple comparison of the mean values for the four labs. Interpret the result.
Our hypotheses would be.
H0:μi=μjHA:μi≠μjFirst we need the average values of each lab. Notice that there are 4⋅10=40 replicates in each, so we need to divide the totals by 40.
Lab1234TotalAverage3.83.64.64.44.1Then we can find the variance. We will use the fact that αi is fixed and that our covariances are 0.
V(ˉYi++−ˉYl++)=V(μ+αi+ˉB+(i)+ˉEi++−(μ+αl+ˉB+(l)+ˉEl++))=V(ˉB+(i)+ˉEi++−ˉB+(l)−ˉEl++))=2ˆσ2B(A)b+2ˆσ2nb=2(MSB(A)nb−MSEnb+MSEnb)=2MSB(A)nb=201.6340=0.0815SE(ˉYi++−ˉYl++)=√0.0815=0.285482We again do not need to use Satterthwaite approximated degrees of freedom as we only have one MS term, we can just use dfB(A)=12. We will do a Tukey adjustment since we are performing all pairwise comparisons. Here we have our critical value of q12,4,0.05⋅√12=4.19866⋅√12=2.9689. We will compare $\frac{ |\bar{Y}{i++} - \bar{Y}{j++}| }{ SE(\bar{Y}{i++} - \bar{Y}{j++}) }$ to this value.
i & j & t & \text{result}
1 & 2 & 0.700569563 & \text{Fail To Reject}
1 & 3 & 2.802278252 & \text{Fail To Reject}
1 & 4 & 2.101708689 & \text{Fail To Reject}
2 & 3 & 3.502847815 & \text{Reject}
2 & 4 & 2.802278252 & \text{Fail To Reject}
3 & 4 & 0.700569563 & \text{Fail To Reject} \
Notice that the only significant difference is between lab 2 and 3.
c.
Let σ2B(A) and σ2 denote, respectively, the components of variance due to differences between technicians and due to differences between specimens. Compute an estimate of the quantity
ρ=σ2B(A)σ2B(A)+σ2.What conclusions can be drawn from the estimated value of ρ?
ρ=σ2B(A)σ2B(A)+σ2=MSB(A)−MSEnMSB(A)−MSEn+MSE=1.63−1.02101.63−1.0210+1.02=0.0564About 5.6% of variabilty is explained by the technician.
d.
Let μi denote the true mean amylase values for lab i (i=1,…,4). Construct a set of 90% simultaneous confidence intervals for the contrasts:
θ1=μ1−μ2θ2=12(μ1+μ2)−12(μ3+μ4)θ3=μ3−μ4.Interpret the intervals.
We will use a Bonferonni adjustment. Notice that θ1 and θ3 are both testing so we can use the same standard errors from part (b). We will need to compute a new variance for θ2.
V(θ2)=V(12(ˉB+(1)+ˉE1++)+12(ˉB+(2)+ˉE2++)+12(ˉB+(3)+ˉE3++)+12(ˉB+(4)+ˉE4++))=14[4ˆσB(A)b+4ˆσnb]SE(θ2)=V(12(ˉB+(1)+ˉE1++)=√MSB(A)nb ˆθSElowerupperθ10.20.285482−0.4860246650.886024665θ2−0.80.201866−1.285092073−0.314907927θ30.20.285482−0.4860246650.886024665Our interval for θ1 tells us that we have 95% confidence that the difference between the true mean amylase values of lab 1 and 2 is within -0.486024665 and 0.886024665.
Our interval for θ2 tells us that we have 95% confidence that the difference between the average of the true means of labs 1 and 2 and the average of the true means of labs 3 and 4 is within -1.285092073 and -0.314907927.
Our interval for θ3 tells us that we have 95% confidence that the difference between the true mean amylase values of lab 3 and 4 is within -0.486024665 and 0.886024665.