Chapter15 - Multiple Regression Model Building

Statistics for Managers Using Microsoft® Excel 5th Edition

Learning Objectives

In this chapter, you learn:

 To use quadratic terms in a regression model

 To use transformed variables in a regression model

 To measure the correlation among independent variables

 To build a regression model, using either the stepwise or best-subsets approach

 To avoid the pitfalls involved in developing a multiple regression model

Nonlinear Relationships

The relationship between the dependent variable

 and an independent variable may not be linear Can review the scatter plot to check for non-

 linear relationships Example: Quadratic model

 2 Y β β X β X ε     i 1 1i 2 1i i

The second independent variable is the square of the

 first variable

Quadratic Regression Model Model form:

2 Y β β

X β

X ε    

i 1 1i 2 1i i

where: β = Y intercept β = regression coefficient for linear effect of X on Y

1 β = regression coefficient for quadratic effect on Y

2 ε = random error in Y for observation i i Quadratic Regression Model Linear fit does not give random residuals

re si du al s

X Y

re si du al s

X

Nonlinear fit gives random residuals

Quadratic Regression Model

2 Y β β X β X ε     i 1 1i 2 1i i

Quadratic models may be considered when the scatter diagram takes on one of the following shapes: Y Y Y Y

X X

β < 0 β > 0 β < 0 β > 0

β > 0 β > 0 β < 0 β < 0

2 β = the coefficient of the linear term

1 β = the coefficient of the squared term

2 Quadratic Regression Equation

 Test for Overall Relationship

 H : β

1 = β

2 = 0 (no overall relationship between X and Y)

 H

1 : β

1 and/or β

2 ≠ 0 (there is a relationship between X and Y)

 F-test statistic =

2 1i 2 1i 1 i

X b X b b Yˆ   

MSE MSR

 Collect data and use Excel to calculate the regression equation:

Testing for Significance Quadratic Effect

Testing the Quadratic Effect Compare quadratic regression equation with the linear regression equation

2 1i 2 1i

1 ^ X b X b b

   i

Y 1i

1 ^ i

X b b Y   Testing for Significance Quadratic Effect

Testing the Quadratic Effect Consider the quadratic regression equation

Hypotheses

(The quadratic term does not improve the model) (The quadratic term improves the model)

2 1i 2 1i

1 ^ i

X b X b b Y   

H : β

2 = 0 H

1 : β

2  0 Testing for Significance Quadratic Effect

Testing the Quadratic Effect Hypotheses

H : β 2 = 0 (The quadratic term does not improve the model)

(The quadratic term improves the model)

H : β

2  0

The test statistic is

 b β where:



2 t  b = estimated slope

2 β = hypothesized slope (zero)

2 S b

2 d.f. n

3   S = standard error of the slope b2 Testing for Significance Quadratic Effect

Testing the Quadratic Effect

If the t test for the quadratic effect is significant, keep the quadratic term in the model.

Quadratic Regression Example

Filter Purity increases as filter time increases: Purity Time

8 3 100

60 ty

10 ri u

12 P

16 Time

Quadratic Regression Example

Simple (linear) regression results:

t statistic, F statistic, and Y = -11.283 + 5.985 Time

adjusted r are all high, but the residuals are not random:

Standard Coefficients Error t Stat P-value

Intercept -11.28267 3.46805 -3.25332 0.00691

Time Residual Plot

Time 5.98520 0.30966 19.32819 2.078E-10

10 Regression Statistics

5 F Significance F R Square 0.96888 ls a

373.57904 2.0778E-10 u

Adjusted R Square 0.96628 id s

e Standard Error 6.15997

Time Quadratic Regression Example

 Quadratic regression results:

Time Residual Plot

2 ^

10 Y = 1.539 + 1.565 Time + 0.245 (Time)

5 ls a

Standard u id

Coefficients Error t Stat P-value s e R

20 Intercept 1.53870 2.24465 0.68550 0.50722

Time Time 1.56496 0.60179 2.60052 0.02467 Time-squared 0.24516 0.03258 7.52406 1.165E-05

Time-squared Residual Plot

Regression Statistics F Significance F

10 R Square 0.99494 1080.7330 2.368E-13 ls

5 a u

Adjusted R Square 0.99402 id s e

Standard Error 2.59513 R

100 200 300 400

Time-squared

The quadratic term is significant and improves

the model: adjusted r is higher and S is

YX lower, residuals are now random.

Using Transformations in Regression Analysis

Idea: Non-linear models can often be transformed to a

 linear form Can be estimated by least squares if

 transformed Transform X or Y or both to get a better fit or to

 deal with violations of regression assumptions Can be based on theory, logic or scatter plots



The Square Root Transformation

The square-root transformation Used to

 overcome violations of the equal variance assumption

 fit a non-linear relationship 1i i 1 i

ε X β β Y    The Square Root Transformation

Shape of original relationship Relationship when transformed

1i i 1 i ε X β β Y

   1i i 1 i

ε X β β Y   

> 0 b

< 0

X Y Y Y Y

X X The Log Transformation

Original multiplicative model Transformed multiplicative model

i β 1i i

ε X β Y

1 

1i i 1 i ε log X log β β log Y log

   The Multiplicative Model:

Original multiplicative model Transformed exponential model

2i i 2 1i 1 i

ε ln X β X β β Y ln

    The Exponential Model: i X β

X β β i ε e Y

2i 2 1i

1  



Interpretation of Coefficients

For the multiplicative model:

log Y log β β log X log ε

  

i 1 1i i

When both dependent and independent variables are transformed:

 k interpreted as follows: a 1 percent change in X leads k to an estimated b percentage change in the mean value k of Y.

The coefficient of the independent variable X can be

Collinearity

 two or more independent variables The correlated variables contribute redundant

Collinearity: High correlation exists among

 information to the multiple regression model

Collinearity

 variables can adversely affect the regression results

Including two highly correlated independent

No new information provided

 Can lead to unstable coefficients (large

 standard error and low t-values) Coefficient signs may not match prior

 expectations

Some Indications of Strong Collinearity

 Large change in the value of a previous coefficient

Incorrect signs on the coefficients

 when a new variable is added to the model A previously significant variable becomes

 insignificant when a new independent variable is added

Detecting Collinearity Variance Inflationary Factor

VIF is used to measure collinearity: j

VIF  j

1 R  j

where R is the coefficient of determination from a regression

model that uses X as the dependent variable and all other X

variables as the independent variables

If VIF > 5, X is highly correlated with the j j other independent variables Model Building

Goal is to develop a model with the best set of

 independent variables

 Lower probability of collinearity

Easier to interpret if unimportant variables are removed

 Stepwise regression procedure

 Provide evaluation of alternative models as variables are

 added



Try all combinations and select the model with the

Best-subset approach

 2 highest adjusted r

Stepwise Regression

Idea: develop the least squares regression

 equation in steps, adding one independent variable at a time and evaluating whether existing variables should remain or be removed The coefficient of partial determination is the

 measure of the marginal contribution of each independent variable, given that other independent variables are in the model

Best Subsets Regression

Idea: estimate all possible regression equations using all possible combinations of independent variables

2 Choose the best model by looking for the highest adjusted r

2 The model with the largest adjusted r will also have the

smallest S YX

Stepwise regression and best subsets regression can be performed using Excel with PHStat add in

Alternative Best Subsets Criterion

 Calculate the value C

for each potential regression model

 Consider models with C

values close to or below k + 1

 k is the number of independent variables in the model under consideration Alternative Best Subsets Criterion

 The C

Statistic

)) ( 1 k ( 2 n R

1 ) T n )( R 1 ( C

2 k p

      

Where k = number of independent variables included in a particular regression model T = total number of parameters to be estimated in the full regression model = coefficient of multiple determination for model with k independent variables = coefficient of multiple determination for full model with all T estimated parameters

2 k R

2 T R

8 Steps in Model Building

1. Choose independent variables to include in the model

2. Estimate full model and check VIFs and check if any VIFs > 5

 If no VIF > 5, go to step 3

 If one VIF > 5, remove this variable

 If more than one, eliminate the variable with the highest VIF and repeat step 2 3. Perform best subsets regression with remaining variables.

4. List all models with C p close to or less than (k + 1).

 Do extra variables make a significant contribution?

Consider parsimony.

 6. Perform complete analysis with chosen model, including residual analysis.

7. Transform the model if necessary to deal with violations of linearity or other model assumptions.

8. Use the model for prediction and inference.

Model Validation

The final step in the model-building process is to validate the selected regression model.



Compare the results of the regression model

Collect new data and compare the results.

 to previous results.

If the data set is large, split the data into two

 parts and cross-validate the results.

Model Building Flowchart

VIF>5? Run best subsets regression to obtain

Yes No

More than one? Remove this X

Perform predictions No

Do complete analysis Add quadratic term and/or transform variables as indicated

“best” models in terms of C

VIF Any

Choose X

Run regression to find VIFs Remove variable with highest

,…,X

Yes Pitfalls and Ethical Considerations To avoid pitfalls and address ethical considerations:

Understand that interpretation of the estimated

 regression coefficients are performed holding all other independent variables constant Evaluate residual plots for each independent

 variable Evaluate interaction terms



Pitfalls and Ethical Considerations

To avoid pitfalls and address ethical considerations:

Obtain VIFs for each independent variable before

 determining which variables should be included in the model. Examine several alternative models using best-subsets

 regression. Use other methods when the assumptions necessary for

 least-squares regression have been seriously violated.

Chapter Summary

 Developed the quadratic regression model

 Discussed using transformations in regression models

 The multiplicative model

 The exponential model

 Described collinearity

 Discussed model building

 Stepwise regression

 Best subsets

 Addressed pitfalls in multiple regression and ethical considerations In this chapter, we have

Chapter15 - Multiple Regression Model Building

Learning Objectives

X

Nonlinear fit gives random residuals

X X

Testing the Quadratic Effect

Quadratic Regression Example

Time Residual Plot

Time Residual Plot

Time-squared Residual Plot

Using Transformations in Regression Analysis

Collinearity

Some Indications of Strong Collinearity

Stepwise Regression

Model Validation

To avoid pitfalls and address ethical considerations:

Chapter Summary

Dokumen yang terkait

Chapter03 - Numerical Descriptive Measures

Chapter04 - Basic Probability

Chapter05 - Discrete Probability

Chapter07 - Sampling and Sampling Distributions

Chapter09 - Hypothesis Testing One-Sample Tests

Chapter10 - Hypothesis Testing Two-Sample Tests

Chapter11 - Analysis of Variance

Chapter12 - Chi Square Tests and Nonparametric

Chapter13 - Simple Linear Regression

Chapter14 - Introduction to Multiple Regression

Dukungan

Links

Chapter15 - Multiple Regression Model Building

Learning Objectives

X

Nonlinear fit gives random residuals

X X

Testing the Quadratic Effect

Quadratic Regression Example

Time Residual Plot

Time Residual Plot

Time-squared Residual Plot

Using Transformations in Regression Analysis

Collinearity

Some Indications of Strong Collinearity

Stepwise Regression

Model Validation

To avoid pitfalls and address ethical considerations:

Chapter Summary

Dokumen yang terkait

Chapter03 - Numerical Descriptive Measures

Chapter04 - Basic Probability

Chapter05 - Discrete Probability

Chapter07 - Sampling and Sampling Distributions

Chapter09 - Hypothesis Testing One-Sample Tests

Chapter10 - Hypothesis Testing Two-Sample Tests

Chapter11 - Analysis of Variance

Chapter12 - Chi Square Tests and Nonparametric

Chapter13 - Simple Linear Regression

Chapter14 - Introduction to Multiple Regression

Dokumen yang Anda mencari sudah siap untuk unduhkan