Data Transformations in Analysis of Variance

13.10 Data Transformations in Analysis of Variance

In Chapter 11, considerable attention was given to transformation of the response y in situations where a linear regression model was being fit to a set of data. Obviously, the same concept applies to multiple linear regression, though it was not discussed in Chapter 12. In the regression modeling discussion, emphasis was placed on the transformations of y that would produce a model that fit the data better than the model in which y enters linearly. For example, if the “time” structure is exponential in nature, then a log transformation on y linearizes the

544 Chapter 13 One-Factor Experiments: General structure and thus more success is anticipated when one uses the transformed

response. While the primary purpose for data transformation discussed thus far has been to improve the fit of the model, there are certainly other reasons to transform or reexpress the response y, and many of them are related to assumptions that are being made (i.e., assumptions on which the validity of the analysis depends). One very important assumption in analysis of variance is the homogeneous variance

assumption discussed early in Section 13.4. We assume a common variance σ 2 . If the variance differs a great deal from treatment to treatment and we perform the standard ANOVA discussed in this chapter (and future chapters), the results can

be substantially flawed. In other words, the analysis of variance is not robust to the assumption of homogeneous variance. As we have discussed thus far, this is the centerpiece of motivation for the residual plots discussed in the previous section and illustrated in Figures 13.9, 13.10, and 13.11. These plots allow us to detect nonhomogeneous variance problems. However, what do we do about them? How can we accommodate them?

Where Does Nonhomogeneous Variance Come From?

Often, but not always, nonhomogeneous variance in ANOVA is present because of the distribution of the responses. Now, of course we assume normality in the response. But there certainly are situations in which tests on means are needed even though the distribution of the response is one of the nonnormal distributions discussed in Chapters 5 and 6, such as Poisson, lognormal, exponential, or gamma. ANOVA-type problems certainly exist with count data, time to failure data, and so on.

We demonstrated in Chapters 5 and 6 that, apart from the normal case, the variance of a distribution will often be a function of the mean, say σ 2 i = g(μ i ). For example, in the Poisson case Var(Y i )=μ i =σ 2 i (i.e., the variance is equal to the mean). In the case of the exponential distribution, Var(Y i )=σ 2 i =μ 2 i (i.e., the variance is equal to the square of the mean). For the case of the lognormal, a log transformation produces a normal distribution with constant variance σ 2 . The same concepts that we used in Chapter 4 to determine the variance of a nonlinear function can be used as an aid to determine the nature of the variance stabilizing transformation g(y i ). Recall that the first order Taylor series expansion

of g(y i ) around y i =μ i where g ′ (μ i )= . The transformation func-

∂g(y i )

∂y i y i =μ i

tion g(y) must be independent of μ in order to suffice as the variance stabilizing transformation. From the above,

Var[g(y i )] ≈ [g ′ (μ i )] 2 σ 2 i .

As a result, g(y i ) must be such that g ′ (μ

i )∝ σ . Thus, if we suspect that the response is Poisson distributed, σ 1/2

i )∝ μ 1/2 . Thus, the variance i stabilizing transformation is g(y 1/2

i =μ i , so g ′ (μ

i )=y i . From this illustration and similar ma- nipulation for the exponential and gamma distributions, we have the following.

Exercises 545 Distribution Variance Stabilizing Transformations

Poisson

g(y) = y 1/2

Exponential

g(y) = ln y

Gamma

g(y) = ln y

Exercises

13.25 Four kinds of fertilizer f 1 ,f 2 ,f 3 , and f 4 are French, and biology:

used to study the yield of beans. The soil is divided Subject into 3 blocks, each containing 4 homogeneous plots.

The yields in kilograms per plot and the correspond- Student Math English French Biology ing treatments are as follows:

Block 1 Block 2

Test the hypothesis that the courses are of equal dif- f 2 = 39.3

ficulty. Use a P-value in your conclusions and discuss your findings.

Conduct an analysis of variance at the 0.05 level of sig- nificance using the randomized complete block model.

13.29 In a study on The Periphyton of the South River, Virginia: Mercury Concentration, Productivity, 13.26 Three varieties of potatoes are being compared and Autotropic Index Studies, conducted by the De- for yield. The experiment is conducted by assigning partment of Environmental Sciences and Engineering each variety at random to one of 3 equal-size plots at at Virginia Tech, the total mercury concentration in each of 4 different locations. The following yields for periphyton total solids was measured at 6 different sta- varieties A, B, and C, in 100 kilograms per plot, were tions on 6 different days. Determine whether the mean recorded:

mercury content is significantly different between the Location 1 Location 2 Location 3 Location 4

stations by using the following recorded data. Use a B : 13

P -value and discuss your findings. A : 18

Station C : 12

CA CB El E2 E3 E4 Perform a randomized complete block analysis of vari-

0.45 3.24 1.33 2.04 3.93 5.93 ance to test the hypothesis that there is no difference in

April 8

June 23 0.10 0.10 0.99 4.31 9.92 6.49 the yielding capabilities of the 3 varieties of potatoes.

0.25 0.25 1.65 3.13 7.39 4.43 Use a 0.05 level of significance. Draw conclusions.

0.15 0.16 2.17 3.50 8.82 5.39 13.27 The following data are the percents of foreign

July 15

0.17 0.39 4.30 2.91 5.50 4.29 additives measured by 5 analysts for 3 similar brands

July 23

of strawberry jam, A, B, and C: 13.30 A nuclear power facility produces a vast amount of heat, which is usually discharged into

Analyst 1 Analyst 2 Analyst 3 Analyst 4 Analyst 5 aquatic systems. This heat raises the temperature of B : 2.7

the aquatic system, resulting in a greater concentration C : 3.6

of chlorophyll a, which in turn extends the growing sea- son. To study this effect, water samples were collected

Perform a randomized complete block analysis of vari- monthly at 3 stations for a period of 12 months. Sta- ance to test the hypothesis, at the 0.05 level of signifi- tion A is located closest to a potential heated water cance, that the percent of foreign additives is the same discharge, station C is located farthest away from the for all 3 brands of jam. Which brand of jam appears discharge, and station B is located halfway between to have fewer additives?

stations A and C. The following concentrations of chlorophyll a were recorded.

13.28 The following data represent the final grades obtained by 5 students in mathematics, English,

546 Chapter 13 One-Factor Experiments: General

Station

million, were recorded:

Month

Analyst January

Individual Employee Chemist Laboratory March

Perform an analysis of variance and test the hypoth- August

esis, at the 0.05 level of significance, that there is no September

difference in the arsenic levels for the 3 methods of October

13.33 Scientists in the Department of Plant Pathol- Perform an analysis of variance and test the hypoth- ogy at Virginia Tech devised an experiment in which

esis, at the 0.05 level of significance, that there is no 5 different treatments were applied to 6 different lo- difference in the mean concentrations of chlorophyll a cations in an apple orchard to determine if there were

at the 3 stations. significant differences in growth among the treatments. Treatments 1 through 4 represent different herbicides

13.31 In a study conducted by the Department of Health and Physical Education at Virginia Tech, 3 di- and treatment 5 represents a control. The growth ets were assigned for a period of 3 days to each of 6 period was from May to November in 1982, and the subjects in a randomized complete block design. The amounts of new growth, measured in centimeters, for subjects, playing the role of blocks, were assigned the samples selected from the 6 locations in the orchard following 3 diets in a random order:

were recorded as follows:

Locations Diet 1:

mixed fat and carbohydrates, Diet 2:

1 2 3 4 5 6 Diet 3:

high fat,

Treatment

high carbohydrates.

82 444 170 437 134 At the end of the 3-day period, each subject was put on

56 50 443 701 373 a treadmill and the time to exhaustion, in seconds, was

4 607 650 493 257 490 262 measured. Perform the analysis of variance, separating

5 388 263 185 103 518 622 out the diet, subject, and error sum of squares. Use a P -value to determine if there are significant differences Perform an analysis of variance, separating out the

among the diets, using the following recorded data. treatment, location, and error sum of squares. De-

termine if there are significant differences among the Diet

Subject

1 2 3 4 5 6 treatment means. Quote a P-value. 1 84 35 91 57 56 45 13.34 In the paper “Self-Control and Therapist

2 91 48 71 45 61 61 Control in the Behavioral Treatment of Overweight 3 122

Women,” published in Behavioral Research and Ther- apy (Vol.

10, 1972), two reduction treatments and 13.32 Organic arsenicals are used by forestry person-

a control treatment were studied for their effects on nel as silvicides. The amount of arsenic that the body the weight change of obese women. The two reduc-

takes in when exposed to these silvicides is a major tion treatments were a self-induced weight reduction health problem. It is important that the amount of program and a therapist-controlled reduction program. exposure be determined quickly so that a field worker Each of 10 subjects was assigned to one of the 3 with a high level of arsenic can be removed from the treatment programs in a random order and measured job. In an experiment reported in the paper “A Rapid for weight loss. The following weight changes were Method for the Determination of Arsenic Concentra- recorded: tions in Urine at Field Locations,” published in the American Industrial Hygiene Association Journal (Vol.

37, 1976), urine specimens from 4 forest service per- sonnel were divided equally into 3 samples each so that each individual’s urine could be analyzed for arsenic by

a university laboratory, by a chemist using a portable system, and by a forest-service employee after a brief orientation. The following arsenic levels, in parts per

13.11 Random Effects Models 547

at the 0.05 level of significance, that there is no dif- Subject Control

Treatment

Self-induced Therapist ference in the color density of the fabric for the three

levels of dye. Consider plants to be blocks.

0.75 13.36 An experiment was conducted to compare 4 −0.25

three types of coating materials for copper wire. The 5 −2.25

purpose of the coating is to eliminate “flaws” in the 6 −1.00

4.00 wire. Ten different specimens of length 5 millimeters 7 −1.00

were randomly assigned to receive each coating, and

the thirty specimens were subjected to an abrasive

wear type process. The number of flaws was measured

for each, and the results are as follows: Perform an analysis of variance and test the hypothesis,

Material at the 0.01 level of significance, that there is no differ-

1 2 3 ence in the mean weight losses for the 3 treatments.

6 8 4 5 3 3 5 4 12 8 7 14 Which treatment was best?

7 7 9 6 2 4 4 5 18 6 7 18 7 8 4 3 8 5 13.35 In the book Design of Experiments for the Quality Improvement, published by the Japanese Stan- Suppose it is assumed that the Poisson process applies dards Association (1989), a study on the amount of dye and thus the model is Y ij =μ i +ǫ ij , where μ i is the

mean of a Poisson distribution and σ needed to get the best color for a certain type of fabric 2 Y ij =μ i . was reported. The three amounts of dye, 1 3 % wof ( 1 3 % (a) Do an appropriate transformation on the data and

of the weight of a fabric), 1% wof, and 3% wof, were perform an analysis of variance. each administered at two different plants. The color (b) Determine whether or not there is sufficient evi- density of the fabric was then observed four times for

dence to choose one coating material over the other. each level of dye at each plant.

Show whatever findings suggest a conclusion. Amount of Dye

(c) Do a plot of the residuals and comment. Plant 1 5.2 6.0 12.3 10.5 22.4 17.8 (d) Give the purpose of your data transformation.

5.9 5.9 12.4 10.9 22.5 18.4 (e) What additional assumption is made here that may Plant 2 6.5 5.5 14.5 11.8 29.0 23.2 not have been completely satisfied by your trans- 6.4 5.9 16.0 13.6 29.7 24.0 formation?

Perform an analysis of variance to test the hypothesis, (f) Comment on (e) after doing a normal probability

plot on the residuals.

Dokumen yang terkait

Optimal Retention for a Quota Share Reinsurance

0 0 7

Digital Gender Gap for Housewives Digital Gender Gap bagi Ibu Rumah Tangga

0 0 9

Challenges of Dissemination of Islam-related Information for Chinese Muslims in China Tantangan dalam Menyebarkan Informasi terkait Islam bagi Muslim China di China

0 0 13

Family is the first and main educator for all human beings Family is the school of love and trainers of management of stress, management of psycho-social-

0 0 26

THE EFFECT OF MNEMONIC TECHNIQUE ON VOCABULARY RECALL OF THE TENTH GRADE STUDENTS OF SMAN 3 PALANGKA RAYA THESIS PROPOSAL Presented to the Department of Education of the State Islamic College of Palangka Raya in Partial Fulfillment of the Requirements for

0 3 22

GRADERS OF SMAN-3 PALANGKA RAYA ACADEMIC YEAR OF 20132014 THESIS Presented to the Department of Education of the State College of Islamic Studies Palangka Raya in Partial Fulfillment of the Requirements for the Degree of Sarjana Pendidikan Islam

0 0 20

A. Research Design and Approach - The readability level of reading texts in the english textbook entitled “Bahasa Inggris SMA/MA/MAK” for grade XI semester 1 published by the Ministry of Education and Culture of Indonesia - Digital Library IAIN Palangka R

0 1 12

A. Background of Study - The quality of the english textbooks used by english teachers for the tenth grade of MAN Model Palangka Raya Based on Education National Standard Council (BSNP) - Digital Library IAIN Palangka Raya

0 0 15

1. The definition of textbook - The quality of the english textbooks used by english teachers for the tenth grade of MAN Model Palangka Raya Based on Education National Standard Council (BSNP) - Digital Library IAIN Palangka Raya

0 0 38

CHAPTER IV DISCUSSION - The quality of the english textbooks used by english teachers for the tenth grade of MAN Model Palangka Raya Based on Education National Standard Council (BSNP) - Digital Library IAIN Palangka Raya

0 0 95