14 Wine Quality Table 12-13 presents data on taste-testing 38 brands of pinot noir wine (the data

Example 12-14 Wine Quality Table 12-13 presents data on taste-testing 38 brands of pinot noir wine (the data

were fi rst reported in an article by Kwan, Kowalski, and Skogenboe in an article in the Journal

Agricultural and Food Chemistry (1979, Vol. 27), and it also appears as one of the default data sets in the Minitab software package). The response variable is y = quality, and we wish to fi nd the “best” regression equation that relates quality to the other fi ve parameters.

Figure 12-12 is the matrix of scatter plots for the wine quality data. We notice that there are some indications of possible linear relationships between quality and the regressors, but there is no obvious visual impression of which regressors would be appropriate. Table 12-14 lists all possible regressions output from the software. In this analysis, we asked the computer software to present the best three equations for each subset size. Note that the computer software reports the values of R R 2 ,

adj C p and S = MS E for each model. From Table 12-14 we see that the three-variable equation with x 2 = aroma, x 4 = fl avor, and x 5 = oakiness produces the minimum C p equation whereas the four-variable

model, which adds x 2

1 = clarity to the previous three regressors, results in maximum R adj (or minimum MS E ). The three-

variable model is

ˆy =.+. 6 47 0 580 x 2 +. 1 20 4 −. 0 602 5

and the four-variable model is

ˆy =.+. 4 99 1 79 x 1 +. 0 530 2 1 26 4 −. 0 659 5

5"- t 12-13 Wine Quality Data

x 1 x 2 x 3 x 4 x 5 y

Clarity Aroma Body Flavor Oakiness Quality

Section 12-6Aspects of Multiple Regression Modeling

FIGURE 12-12

A matrix of scatter plots from computer software for the wine quality data.

5"- t 12-14 All Possible Regressions Computer Output for the Wine Quality Data

Best Subsets Regression: Quality versus Clarity, Aroma, . . .

Response is quality

R-Sq (adj)

C–p

y a y r s

Chapter 12Multiple Linear Regression

These models should now be evaluated further using residual plots and the other techniques discussed earlier in the chapter to see whether either model is satisfactory with respect to the underlying assumptions and to determine whether one of them is preferable. It turns out that the residual plots do not reveal any major problems with either model. The value of PRESS for the three-variable model is 56.0524, and for the four-variable model, it is 60.3327. Because PRESS is smaller in the model with three regressors, and because it is the model with the smallest number of predictors, it would likely be the preferred choice.

Stepwise Regression Stepwise regression is probably the most widely used variable selection technique. The procedure iteratively constructs a sequence of regression models by adding or removing variables at each step. The criterion for adding or removing a variable at any step is usually expressed

in terms of a partial F-test. Let f in

be the value of the F-random variable for adding a variable

to the model, and let f out

be the value of the F-random variable for removing a variable from

the model. We must have f in ≥ f out , and usually f in = f out .

Stepwise regression begins by forming a one-variable model using the regressor variable that has the highest correlation with the response variable Y . This will also be the regressor producing

the largest F-statistic. For example, suppose that at this step, x 1 is selected. At the second step, the

remaining K −1 candidate variables are examined, and the variable for which the partial F-statistic

SS R β|ββ j 1 , 0

) (12-49)

F j =

MS E ( x,x j 1 )

is a maximum is added to the equation provided that f j > f in . In Equation 12-49, MS E (, xx j 1 ) denotes the mean square for error for the model containing both x 1 and x j . Suppose that this

procedure indicates that x 2 should be added to the model. Now the stepwise regression algo-

rithm determines whether the variable x 1 added at the fi rst step should be removed. This is

done by calculating the F-statistic

SS R β|ββ 2 , 0

) (12-50)

F 1 =

MS E ( x,x 1 2 )

If the calculated value f 1 < f out , the variable x 1 is removed; otherwise it is retained, and we

would attempt to add a regressor to the model containing both x 1 and x 2 .

In general, at each step the set of remaining candidate regressors is examined, and the regressor with the largest partial F-statistic is entered provided that the observed value of f exceeds f in . Then the partial F-statistic for each regressor in the model is calculated, and the

regressor with the smallest observed value of F is deleted if the observed f < f out . The proce-

dure continues until no other regressors can be added to or removed from the model.

Stepwise regression is almost always performed using a computer program. The analyst exer- cises control over the procedure by the choice of f in and f out . Some stepwise regression computer programs require that numerical values be specifi ed for f in and f out . Because the number of

degrees of freedom on MS E depends on the number of variables in the model, which changes

from step to step, a fi xed value of f in and f out causes the type I and type II error rates to vary. Some computer programs allow the analyst to specify the type I error levels for f in and f out . However, the “advertised” signifi cance level is not the true level because the variable selected is the one that maximizes (or minimizes) the partial F-statistic at that stage. Sometimes it is useful to experiment with different values of f in and f out (or different advertised type I error rates) in several different runs to see whether this substantially affects the choice of the fi nal model.

14 Wine Quality Table 12-13 presents data on taste-testing 38 brands of pinot noir wine (the data

Example 12-14 Wine Quality Table 12-13 presents data on taste-testing 38 brands of pinot noir wine (the data

Parts

Dokumen yang terkait

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

The Correlation between students vocabulary master and reading comprehension

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

Pengaruh Persepsi Kemudahan dan Kepuasan Wajib Pajak Terhadap Penggunaan E Filling (Survei Pada Wajib Pajak Orang Pribadi Di Kpp Pratama Soreang)

PENGARUH ARUS PENGELASAN TERHADAP KEKUATAN TARIK PADA PENGELASAN BIMETAL (STAINLESS STEEL A 240 Type 304 DAN CARBON STEEL A 516 Grade 70) DENGAN ELEKTRODA E 309-16

Transmission of Greek and Arabic Veteri

Services for adults with an autism spect

Dukungan

Links

14 Wine Quality Table 12-13 presents data on taste-testing 38 brands of pinot noir wine (the data

Example 12-14 Wine Quality Table 12-13 presents data on taste-testing 38 brands of pinot noir wine (the data

Parts

Dokumen yang terkait

Analisis Komparasi Internet Financial Local Government Reporting Pada Website Resmi Kabupaten dan Kota di Jawa Timur The Comparison Analysis of Internet Financial Local Government Reporting on Official Website of Regency and City in East Java

ANTARA IDEALISME DAN KENYATAAN: KEBIJAKAN PENDIDIKAN TIONGHOA PERANAKAN DI SURABAYA PADA MASA PENDUDUKAN JEPANG TAHUN 1942-1945 Between Idealism and Reality: Education Policy of Chinese in Surabaya in the Japanese Era at 1942-1945)

Improving the Eighth Year Students' Tense Achievement and Active Participation by Giving Positive Reinforcement at SMPN 1 Silo in the 2013/2014 Academic Year

The Correlation between students vocabulary master and reading comprehension

Improping student's reading comprehension of descriptive text through textual teaching and learning (CTL)

The correlation between listening skill and pronunciation accuracy : a case study in the firt year of smk vocation higt school pupita bangsa ciputat school year 2005-2006

Pengaruh Persepsi Kemudahan dan Kepuasan Wajib Pajak Terhadap Penggunaan E Filling (Survei Pada Wajib Pajak Orang Pribadi Di Kpp Pratama Soreang)

PENGARUH ARUS PENGELASAN TERHADAP KEKUATAN TARIK PADA PENGELASAN BIMETAL (STAINLESS STEEL A 240 Type 304 DAN CARBON STEEL A 516 Grade 70) DENGAN ELEKTRODA E 309-16

Transmission of Greek and Arabic Veteri

Services for adults with an autism spect

Dokumen yang Anda mencari sudah siap untuk unduhkan