15 Wine Quality Stepwise Regression Table 12-15 gives the software stepwise regression output

Example 12-15 Wine Quality Stepwise Regression Table 12-15 gives the software stepwise regression output

  for the wine quality data. The software uses fi xed values of a for entering and removing vari-

  ables. The default level is α = 0.15 for both decisions. The output in Table 12-15 uses the default value. Notice that

  Section 12-6Aspects of Multiple Regression Modeling

  the variables were entered in the order fl avor (step 1), oakiness (step 2), and aroma (step 3) and that no variables were removed. No other variable could be entered, so the algorithm terminated. This is the three-variable model found by all possible regressions that results in a minimum value of C p .

  Forward Selection The forward selection procedure is a variation of stepwise regression and is based on the principle that regressors should be added to the model one at a time until there are no remain- ing candidate regressors that produce a signifi cant increase in the regression sum of squares. That is, variables are added one at a time as long as their partial F-value exceeds f in . Forward selection is a simplifi cation of stepwise regression that omits the partial F-test for deleting variables from the model that have been added at previous steps. This is a potential weakness of forward selection; that is, the procedure does not explore the effect that adding a regressor at the current step has on regressor variables added at earlier steps. Notice that if we were to apply forward selection to the wine quality data, we would obtain exactly the same results as we did with stepwise regression in Example 12-15, because stepwise regression terminated without deleting a variable.

  Backward Elimination The backward elimination algorithm begins with all K candidate regressors in the model. Then the regressor with the smallest partial F-statistic is deleted if this F-statistic is insignifi cant, that

  is, if f < f out . Next, the model with K −1 regressors is fi t, and the next regressor for potential elimination is found. The algorithm terminates when no further regressor can be deleted.

  Table 12-16 shows the computer software package output for backward elimination applied to the wine quality data. The α value for removing a variable is α = 0.10. Notice that this pro- cedure removes body at step 1 and then clarity at step 2, terminating with the three-variable model found previously.

  5"- t 12-15 Stepwise Regression Output for the Wine Quality Data

  Stepwise Regression: Quality versus Clarity, Aroma, . . .

  Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is Quality on 5 predictors, with N = 38

  R-Sq(adj)

  C–p

  Chapter 12Multiple Linear Regression

  5"- t 12-16 Backward Elimination Output for the Wine Quality Data

  Stepwise Regression: Quality versus Clarity, Aroma, . . .

  Backward elimination. Alpha-to-Remove: 0.1 Response is Quality on 5 predictors, with N = 38

  R-Sq(adj)

  Some Comments on Final Model Selection We have illustrated several different approaches to the selection of variables in multiple linear regression. The final model obtained from any model-building procedure should be subjected to the usual adequacy checks, such as residual analysis, lack-of-fit testing, and examination of the effects of influential points. The analyst may also consider augmenting the original set of candidate variables with cross-products, polynomial terms, or other transformations of the original variables that might improve the model. A major criticism of variable selection methods such as stepwise regression is that the analyst may conclude that there is one “best” regression equation. Generally, this is not the case because several equally good regression models can often be used. One way to avoid this problem is to use several different model- building techniques and see whether different models result. For example, we have found the same model for the wine quality data using stepwise regression, forward selection, and backward elimination. The same model was also one of the two best found from all possible regressions. The results from variable selection methods frequently do not agree, so this is a good indication that the three-variable model is the best regression equation.

  If the number of candidate regressors is not too large, the all-possible regressions method

  is recommended. We usually recommend using the minimum MS E and C p evaluation criteria

  in conjunction with this procedure. The all-possible regressions approach can find the “best”

  Section 12-6Aspects of Multiple Regression Modeling

  regression equation with respect to these criteria, but stepwise-type methods offer no such assurance. Furthermore, the all-possible regressions procedure is not distorted by dependen- cies among the regressors as stepwise-type methods are.