Changing the slope, b

Copyright 1996 Lawrence C. Marsh What factors determine variance and covariance ? 1. σ 2 : uncertainty about y t values uncertainty about b 1 , b 2 and their relationship. 2. The more spread out the x t values are then the more confidence we have in b 1 , b 2 , etc. 3. The larger the sample size, T, the smaller the variances and covariances. 4. The variance b 1 is large when the squared x t values are far from zero in either direction.

5. Changing the slope, b

2 , has no effect on the intercept, b 1 , when the sample mean is zero. But if sample mean is positive, the covariance between b 1 and b 2 will be negative, and vice versa. 4.15 Copyright 1996 Lawrence C. Marsh Gauss-Markov Theorm Under the first five assumptions of the simple, linear regression model, the ordinary least squares estimators b 1 and b 2 have the smallest variance of all linear and unbiased estimators of β 1 and β 2 . This means that b 1 and b 2 are the Best Linear Unbiased Estimators BLUE of β 1 and β 2 . 4.16 Copyright 1996 Lawrence C. Marsh implications of Gauss-Markov 1. b 1 and b 2 are “best” within the class of linear and unbiased estimators. 2. “Best” means smallest variance within the class of linearunbiased. 3. All of the first five assumptions must hold to satisfy Gauss-Markov. 4. Gauss-Markov does not require assumption six: normality. 5. G-Markov is not based on the least squares principle but on b 1 and b 2 . 4.17 Copyright 1996 Lawrence C. Marsh G-Markov implications continued 6. If we are not satisfied with restricting our estimation to the class of linear and unbiased estimators, we should ignore the Gauss-Markov Theorem and use some nonlinear andor biased estimator instead. Note: a biased or nonlinear estimator could have smaller variance than those satisfying Gauss-Markov. 7. Gauss-Markov applies to the b 1 and b 2 estimators and not to particular sample values estimates of b 1 and b 2 . 4.18 Copyright 1996 Lawrence C. Marsh Probability Distribution of Least Squares Estimators b 2 ~ N β 2 , Σ x t − x σ 2 2 b 1 ~ N β 1 , Τ Σ x t − x 2 σ 2 Σ x t 2 4.19 Copyright 1996 Lawrence C. Marsh y t and ε t normally distributed The least squares estimator of β 2 can be expressed as a linear combination of y t ’s: b 2 = Σ w t y t b 1 = y − b 2 x Σ x t − x 2 where w t = x t − x This means that b 1 and b 2 are normal since linear combinations of normals are normal. 4.20 Copyright 1996 Lawrence C. Marsh normally distributed under The Central Limit Theorem If the first five Gauss-Markov assumptions hold, and sample size, T, is sufficiently large, then the least squares estimators, b 1 and b 2 , have a distribution that approximates the normal distribution with greater accuracy the larger the value of sample size, T. 4.21 Copyright 1996 Lawrence C. Marsh Consistency We would like our estimators, b 1 and b 2 , to collapse onto the true population values, β 1 and β 2 , as sample size, T, goes to infinity. One way to achieve this consistency property is for the variances of b 1 and b 2 to go to zero as T goes to infinity. Since the formulas for the variances of the least squares estimators b 1 and b 2 show that their variances do, in fact, go to zero, then b 1 and b 2 , are consistent estimators of β 1 and β 2 . 4.22 Copyright 1996 Lawrence C. Marsh Estimating the variance of the error term, σ 2 e t = y t − b 1 − b 2 x t Σ e t t =1 T 2 T − 2 σ 2 = σ 2 is an unbiased estimator of σ 2 4.23 Copyright 1996 Lawrence C. Marsh The Least Squares Predictor, y o G iven a value of the explanatory variable, X o , w e w ould like to predict a value of the dependent variable, y o . The least squares predictor is: y o = b 1 + b 2 x o 4.7.2 4.24 Copyright 1996 Lawrence C. Marsh Inference in the Simple Regression Model Chapter 5 Copyright © 1997 John Wiley Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley Sons, Inc. The purchaser may make back-up copies for hisher own use only and not for distribution or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein. 5.1 Copyright 1996 Lawrence C. Marsh 1. y t = β 1 + β 2 x t + ε t

2. E

ε t = 0 = E y t = β 1 + β 2 x t

3. var

ε t = σ 2 = var y t

4. cov

ε i , ε j = cov y i , y j = 5. x t ≠ c for every observation 6. ε t ~N0, σ 2 = y t ~ N β 1 + β 2 x t , σ 2 Assumptions of the Simple Linear Regression Model 5.2 Copyright 1996 Lawrence C. Marsh Probability D istribution of Least Squares Estim ators b 1 ~ N β 1 , Τ Σ x t − x 2 σ 2 Σ x t 2 b 2 ~ N β 2 , Σ x t − x σ 2 2 5.3 Copyright 1996 Lawrence C. Marsh σ 2 = Τ − 2 e t 2 Σ Unbiased estimator of the error variance : σ 2 σ 2 Τ − 2 ∼ Τ − 2 χ Transform to a chi-square distribution : Error Variance Estimation 5.4 Copyright 1996 Lawrence C. Marsh We make a correct decision if: • The null hypothesis is false and we decide to reject it. • The null hypothesis is true and we decide not to reject it. Our decision is incorrect if: • The null hypothesis is true and we decide to reject it. This is a type I error . • The null hypothesis is false and we decide not to reject it. This is a type II error . 5.5 Copyright 1996 Lawrence C. Marsh b 2 ~ N β 2 , Σ x t − x σ 2 2 Create a standardized normal random variable, Z, by subtracting the mean of b 2 and dividing by its standard deviation: b 2 − β 2 varb 2 Ζ = ∼ Ν0,1 5.6 Copyright 1996 Lawrence C. Marsh Simple Linear Regression y t = β 1 + β 2 x t + ε t where E ε t = 0 y t ~ N β 1 + β 2 x t , σ 2 since Ey t = β 1 + β 2 x t ε t = y t − β 1 − β 2 x t Therefore, ε t ~ N0, σ 2 . 5.7 Copyright 1996 Lawrence C. Marsh Create a Chi-Square ε t ~ N0, σ 2 but want N0, 1 . ε t σ ~ N0, 1 Standard Normal . ε t σ 2 ~ χ 2 1 Chi-Square . 5.8 Copyright 1996 Lawrence C. Marsh Sum of Chi-Squares Σ t =1 ε t σ 2 = ε 1 σ 2 + ε 2 σ 2 +. . .+ ε T σ 2 χ 2 1 + χ 2 1 +. . .+ χ 2 1 = χ 2 Τ Therefore, Σ t =1 ε t σ 2 ∼ χ 2 Τ 5.9 Copyright 1996 Lawrence C. Marsh Since the errors ε t = y t − β 1 − β 2 x t are not observable, we estimate them with the sample residuals e t = y t − b 1 − b 2 x t . Unlike the errors, the sample residuals are not independent since they use up two degrees of freedom by using b 1 and b 2 to estimate β 1 and β 2 . We get only T − 2 degrees of freedom instead of T. Chi-Square degrees of freedom 5.10 Copyright 1996 Lawrence C. Marsh Student-t Distribution t = ~ t m Z V m where Z ~ N0,1 and V ~ χ m 2 5.11 Copyright 1996 Lawrence C. Marsh t = ~ t m Z V T − 2 where Z = b 2 − β 2 varb 2 and varb 2 = σ 2 Σ x i − x 2 5.12 Copyright 1996 Lawrence C. Marsh t = Z V T-2 b 2 − β 2 varb 2 t = T − 2 σ 2 σ 2 T − 2 V = T − 2 σ 2 σ 2 5.13 Copyright 1996 Lawrence C. Marsh varb 2 = σ 2 Σ x i − x 2 b 2 − β 2 σ 2 Σ x i − x 2 t = = T − 2 σ 2 σ 2 T − 2 b 2 − β 2 σ 2 Σ x i − x 2 notice the cancellations 5.14 Copyright 1996 Lawrence C. Marsh b 2 − β 2 σ 2 Σ x i − x 2 t = = b 2 − β 2 varb 2 t = b 2 − β 2 seb 2 5.15 Copyright 1996 Lawrence C. Marsh Student’s t - statistic t = ~ t T − 2 b 2 − β 2 seb 2 t has a Student-t Distribution with T − 2 degrees of freedom. 5.16 Copyright 1996 Lawrence C. Marsh Figure 5.1 Student-t Distribution 1−α t ft -t c t c α 2 α 2 red area = rejection region for 2-sided test 5.17 Copyright 1996 Lawrence C. Marsh probability statements P-t c ≤ t ≤ t c = 1 − α P t -t c = P t t c = α 2 P-t c ≤ ≤ t c = 1 − α b 2 − β 2 seb 2 5.18 Copyright 1996 Lawrence C. Marsh Confidence Intervals Two-sided 1 −α x100 C.I. for β 1 : b 1 − t α 2 [seb 1 ], b 1 + t α 2 [seb 1 ] b 2 − t α 2 [seb 2 ], b 2 + t α 2 [seb 2 ] Two-sided 1 −α x100 C.I. for β 2 : 5.19 Copyright 1996 Lawrence C. Marsh Student-t vs. Normal Distribution 1. Both are symmetric bell-shaped distributions. 2. Student-t distribution has fatter tails than the normal. 3. Student-t converges to the normal for infinite sample. 4. Student-t conditional on degrees of freedom df. 5. Normal is a good approximation of Student-t for the first few decimal places when df 30 or so. 5.20 Copyright 1996 Lawrence C. Marsh Hypothesis Tests 1. A null hypothesis, H . 2. An alternative hypothesis, H 1 . 3. A test statistic. 4. A rejection region. 5.21 Copyright 1996 Lawrence C. Marsh Rejection Rules 1. Two-Sided Test : If the value of the test statistic falls in the critical region in either tail of the t-distribution, then we reject the null hypothesis in favor of the alternative. 2. Left-Tail Test : If the value of the test statistic falls in the critical region which lies in the left tail of the t-distribution, then we reject the null hypothesis in favor of the alternative. 2. Right-Tail Test : If the value of the test statistic falls in the critical region which lies in the right tail of the t-distribution, then we reject the null hypothesis in favor of the alternative. 5.22 Copyright 1996 Lawrence C. Marsh Format for Hypothesis Testing 1. Determine null and alternative hypotheses. 2. Specify the test statistic and its distribution as if the null hypothesis were true. 3. Select α and determine the rejection region. 4. Calculate the sample value of test statistic. 5. State your conclusion. 5.23 Copyright 1996 Lawrence C. Marsh practical vs. statistical significance in economics Practically but not statistically significant: When sample size is very small, a large average gap between the salaries of men and women might not be statistically significant. Statistically but not practically significant: When sample size is very large, a small correlation say, ρ = 0.00000001 between the winning numbers in the PowerBall Lottery and the Dow-Jones Stock Market Index might be statistically significant. 5.24 Copyright 1996 Lawrence C. Marsh Type I and Type II errors Type I error: We make the mistake of rejecting the null hypothesis when it is true. α = Prejecting H when it is true. Type II error: We make the mistake of failing to reject the null hypothesis when it is false. β = Pfailing to reject H when it is false. 5.25 Copyright 1996 Lawrence C. Marsh Prediction Intervals A 1 −α x100 prediction interval for y o is: y o ± t c se f se f = var f f = y o − y o Σ x t − x 2 var f = σ 2 1 + + 1 Τ x o − x 2 5.26 Copyright 1996 Lawrence C. Marsh The Simple Linear Regression Model Chapter 6 Copyright © 1997 John Wiley Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley Sons, Inc. The purchaser may make back-up copies for hisher own use only and not for distribution or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein. 6.1 Copyright 1996 Lawrence C. Marsh Explaining Variation in y t Predicting y t without any explanatory variables: y t = β 1 + e t Σ e t = Σ y t − β 1 2 2 t = 1 t = 1 T T = −2 Σ y t − b 1 = 0 ∂Σ e t 2 t = 1 t = 1 T T ∂ β 1 Σ y t − b 1 = 0 t = 1 T Σ y t − Tb 1 = 0 t = 1 T b 1 = y Why not y? 6.2 Copyright 1996 Lawrence C. Marsh Explaining Variation in y t y t = b 1 + b 2 x t + e t Unexplained variation: y t = b 1 + b 2 x t Explained variation: e t = y t − y t = y t − b 1 − b 2 x t 6.3 Copyright 1996 Lawrence C. Marsh Explaining Variation in y t y t = y t + e t Why not y? y t − y = y t − y + e t using y as baseline SST = SSR + SSE Σ y t − y 2 = Σ y t − y 2 + Σ e t t = 1 T T T t = 1 t = 1 2 cross product term drops out 6.4 Copyright 1996 Lawrence C. Marsh Total Variation in y t SST = total sum of squares SST measures variation of y t around y Σ y t − y 2 t = 1 T SST = 6.5 Copyright 1996 Lawrence C. Marsh Explained Variation in y t SSR = regression sum of squares y t = b 1 + b 2 x t Fitted y t values: SSR measures variation of y t around y Σ y t − y 2 t = 1 T SSR = 6.6 Copyright 1996 Lawrence C. Marsh Unexplained Variation in y t SSE = error sum of squares SSE measures variation of y t around y t e t = y t − y t = y t − b 1 − b 2 x t Σ y t − y t 2 = Σ e t 2 t = 1 T SSE = t = 1 T 6.7 Copyright 1996 Lawrence C. Marsh Analysis of Variance Table Table 6.1 Analysis of Variance Table Source of Sum of Mean Variation DF Squares Square Explained 1 SSR SSR1 Unexplained T-2 SSE SSET-2 [ = σ 2 ] Total T-1 SST 6.8 Copyright 1996 Lawrence C. Marsh Coefficient of Determination ≤ R 2 ≤ 1 What proportion of the variation in y t is explained? SSR SST R 2 = 6.9 Copyright 1996 Lawrence C. Marsh Coefficient of Determination SST = SSR + SSE SST SSR SSE SST SST SST = + SSR SSE SST SST 1 = + Dividing by SST SSR SST R 2 = = 1 − SSE SST 6.10 Copyright 1996 Lawrence C. Marsh R 2 is only a descriptive measure. R 2 does not measure the quality of the regression model. Focusing solely on maximizing R 2 is not a good idea. Coefficient of Determination 6.11 Copyright 1996 Lawrence C. Marsh covX,Y ρ = varX varY Correlation Analysis covX,Y r = varX varY Population: Sample: 6.12 Copyright 1996 Lawrence C. Marsh Correlation Analysis varX = Σ x t − x 2 T − 1 t = 1 T varY = Σ y t − y 2 T − 1 t = 1 T covX,Y = Σ x t − xy t − y T − 1 t = 1 T 6.13 Copyright 1996 Lawrence C. Marsh Correlation Analysis Σ x t − x 2 Σ y t − y 2 t = 1 T Σ x t − xy t − y t = 1 T r = t = 1 T Sample Correlation Coefficient 6.14 Copyright 1996 Lawrence C. Marsh Correlation Analysis and R 2 For simple linear regression analysis: r 2 = R 2 R 2 is also the correlation between y t and y t measuring “goodness of fit”. 6.15 Copyright 1996 Lawrence C. Marsh Regression Computer Output Table 6.2 Computer Generated Least Squares Results 1 2 3 4 5 Parameter Standard T for H0: Variable Estimate Error Parameter=0 Prob|T| INTERCEPT 40.7676 22.1387 1.841 0.0734 X 0.1283 0.0305 4.201 0.0002 Typical computer output of regression estimates: 6.16 Copyright 1996 Lawrence C. Marsh Regression Computer Output seb 1 = varb 1 = 490.12 = 22.1287 seb 2 = varb 2 = 0.0009326 = 0.0305 b 1 = 40.7676 b 2 = 0.1283 seb 1 t = = = 1.84 b 1 40.7676 22.1287 seb 2 b 2 t = = = 4.20 0.1283 0.0305 6.17 Copyright 1996 Lawrence C. Marsh Regression Computer Output Table 6.3 Analysis of Variance Table Sum of Mean Source DF Squares Square Explained 1 25221.2229 25221.2229 Unexplained 38 54311.3314 1429.2455 Total 39 79532.5544 R-square: 0.3171 Sources of variation in the dependent variable: 6.18 Copyright 1996 Lawrence C. Marsh Regression Computer Output SSR SST R 2 = = 1 − = 0.317 SSE SST SSE T-2 = σ 2 = 1429.2455 SSE = Σ e t 2 = 54311 SST = Σ y t − y 2 = 79532 SSR = Σ y t − y 2 = 25221 6.19 Copyright 1996 Lawrence C. Marsh y t = 40.7676 + 0.1283x t s.e. 22.1387 0.0305 y t = 40.7676 + 0.1283x t t 1.84 4.20 Reporting Regression Results 6.20 Copyright 1996 Lawrence C. Marsh R 2 = 0.317 Reporting Regression Results This R 2 value may seem low but it is typical in studies involving cross-sectional data analyzed at the individual or micro level. A considerably higher R 2 value would be expected in studies involving time-series data analyzed at an aggregate or macro level. 6.21 Copyright 1996 Lawrence C. Marsh Effects of Scaling the Data Changing the scale of x y t = β 1 + c β 2 x t c + e t y t = β 1 + β 2 x t + e t y t = β 1 + β 2 x t + e t β 2 = c β 2 x t = x t c where and The estimated coefficient and standard error change but the other statistics are unchanged. 6.22 Copyright 1996 Lawrence C. Marsh Effects of Scaling the Data Changing the scale of y y t c = β 1 c + β 2 cx t + e t c y t = β 1 + β 2 x t + e t β 1 = β 1 c and All statistics are changed except for the t-statistics and R 2 value. y t = β 1 + β 2 x t + e t β 2 = β 2 c e t = e t c y t = y t c where 6.23 Copyright 1996 Lawrence C. Marsh Effects of Scaling the Data Changing the scale of x and y y t c = β 1 c + c β 2 cx t c + e t c y t = β 1 + β 2 x t + e t β 1 = β 1 c and No change in the R 2 or the t-statistics or in regression results for β 2 but all other stats change. y t = β 1 + β 2 x t + e t x t = x t c e t = e t c y t = y t c where 6.24 Copyright 1996 Lawrence C. Marsh Functional Forms The term linear in a simple regression model does not mean a linear relationship between variables, but a model in which the parameters enter the model in a linear way. 6.25 Copyright 1996 Lawrence C. Marsh y t = β 1 + β 2 x t + e t Linear Statistical Models: Nonlinear Statistical Models: lny t = β 1 + β 2 x t + e t y t = β 1 + β 2 lnx t + e t y t = β 1 + β 2 x t + e t 2 y t = β 1 + β 2 x t + e t β 3 y t = β 1 + β 2 x t + exp β 3 x t + e t y t = β 1 + β 2 x t + e t β 3 Linear vs. Nonlinear 6.27 Copyright 1996 Lawrence C. Marsh y x nonlinear relationship between food expenditure and income Linear vs. Nonlinear food expenditure income 6.27 Copyright 1996 Lawrence C. Marsh Useful Functional Forms 1. Linear 2. Reciprocal 3. Log-Log 4. Log-Linear 5. Linear-Log 6. Log-Inverse Look at each form and its slope and elasticity 6.28 Copyright 1996 Lawrence C. Marsh Linear y t = β 1 + β 2 x t + e t slope: β 2 elasticity: β 2 y t Useful Functional Forms x t 6.29 Copyright 1996 Lawrence C. Marsh Reciprocal y t = β 1 + β 2 + e t Useful Functional Forms 1 x t slope: elasticity: 1 x t 2 − β 2 1 x t y t − β 2 6.30 Copyright 1996 Lawrence C. Marsh x t y t Log-Log lny t = β 1 + β 2 lnx t + e t slope: β 2 elasticity: β 2 Useful Functional Forms 6.31 Copyright 1996 Lawrence C. Marsh Log-Linear lny t = β 1 + β 2 x t + e t slope: β 2 y t elasticity: β 2 x t Useful Functional Forms 6.32 Copyright 1996 Lawrence C. Marsh Linear-Log y t = β 1 + β 2 lnx t + e t _ slope: β 2 elasticity: β 2 1 x t y t 1 _ Useful Functional Forms 6.33 Copyright 1996 Lawrence C. Marsh Useful Functional Forms lny t = β 1 - β 2 + e t 1 x t Log-Inverse slope: β 2 elasticity: β 2 x 2 t y t 1 x t 6.34 Copyright 1996 Lawrence C. Marsh 1. E e t = 0 2. var e t = σ 2 3. cove i , e j = 0 4. e t ~ N0, σ 2 Error Term Properties 6.35 Copyright 1996 Lawrence C. Marsh Economic Models 1. Demand Models 2. Supply Models 3. Production Functions 4. Cost Functions 5. Phillips Curve 6.36 Copyright 1996 Lawrence C. Marsh 1. Demand Models quality demanded y d and price x constant elasticity Economic Models lny t = β 1 + β 2 lnx t + e t d 6.37 Copyright 1996 Lawrence C. Marsh 2. Supply Models quality supplied y s and price x constant elasticity Economic Models lny t = β 1 + β 2 lnx t + e t s 6.38 Copyright 1996 Lawrence C. Marsh 3. Production Functions output y and input x constant elasticity Economic Models lny t = β 1 + β 2 lnx t + e t Cobb-Douglas Production Function: 6.39 Copyright 1996 Lawrence C. Marsh 4a. Cost Functions total cost y and output x Economic Models y t = β 1 + β 2 x 2 t + e t 6.40 Copyright 1996 Lawrence C. Marsh 4b. Cost Functions average cost xy and output x Economic Models y t x t = β 1 x t + β 2 x t + e t x t 6.41 Copyright 1996 Lawrence C. Marsh 5. Phillips Curve wage rate w t and time t Economic Models unemployment rate, u t w t-1 ∆ w t = w t − w t-1 = γα + γη u t 1 nonlinear in both variables and parameters 6.42 Copyright 1996 Lawrence C. Marsh The Multiple Regression Model Chapter 7 Copyright © 1997 John Wiley Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley Sons, Inc. The purchaser may make back-up copies for hisher own use only and not for distribution or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein. 7.1 Copyright 1996 Lawrence C. Marsh Two Explanatory Variables y t = β 1 + β 2 x t2 + β 3 x t3 + e t ∂ y t ∂ x t2 = β 2 ∂ x t3 ∂ y t = β 3 x t ‘s affect y t separately But least squares estimation of β 2 now depends upon both x t2 and x t3 . 7.2 Copyright 1996 Lawrence C. Marsh Correlated Variables y t = output x t2 = capital x t3 = labor Always 5 workers per machine. If number of workers per machine is never varied, it becomes impossible to tell if the machines or the workers are responsible for changes in output. y t = β 1 + β 2 x t2 + β 3 x t3 + e t 7.3 Copyright 1996 Lawrence C. Marsh The General Model y t = β 1 + β 2 x t2 + β 3 x t3 +. . .+ β K x tK + e t The parameter β 1 is the intercept constant term. The “variable” attached to β 1 is x t1 = 1. Usually, the number of explanatory variables is said to be K − 1 ignoring x t1 = 1, while the number of parameters is K. Namely: β 1 . . . β K . 7.4 Copyright 1996 Lawrence C. Marsh 1. Ee t = 0 2. vare t = σ 2 3. cov e t , e s = for t ≠ s 4. e t ~ N0, σ 2 Statistical Properties of e t 7.5 Copyright 1996 Lawrence C. Marsh 1. E y t = β 1 + β 2 x t2 +. . .+ β K x tK 2. vary t = vare t = σ 2 3. covy t ,y s = cove t , e s = 0 t ≠ s 4. y t ~ N β 1 + β 2 x t2 +. . .+ β K x tK , σ 2 Statistical Properties of y t 7.6 Copyright 1996 Lawrence C. Marsh Assumptions 1. y t = β 1 + β 2 x t2 +. . .+ β K x tK + e t 2. E y t = β 1 + β 2 x t2 +. . .+ β K x tK 3. vary t = vare t = σ 2 4. covy t ,y s = cove t ,e s = 0 t ≠ s 5. The values of x tk are not random 6. y t ~ N β 1 + β 2 x t2 +. . .+ β K x tK , σ 2 7.7 Copyright 1996 Lawrence C. Marsh Least Squares Estimation y t = β 1 + β 2 x t2 + β 3 x t3 + e t S ≡ S β 1 , β 2 , β 3 = Σ y t − β 1 − β 2 x t2 − β 3 x t3 2 t = 1 T Define: y t = y t − y x t2 = x t2 − x 2 x t3 = x t3 − x 3 7.8 Copyright 1996 Lawrence C. Marsh b 1 = y − b 1 − b 2 x 2 − b 3 x 3 b 3 = Σ y t x t3 Σ x t2 − Σ y t x t2 Σ x t3 x t2 2 Σ x t2 Σ x t3 − Σ x t2 x t3 2 2 2 b 2 = Σ y t x t2 Σ x t3 − Σ y t x t3 Σ x t2 x t3 2 Σ x t2 Σ x t3 − Σ x t2 x t3 2 2 2 Least Squares Estimators 7.9 Copyright 1996 Lawrence C. Marsh Dangers of Extrapolation Statistical models generally are good only “within the relevant range”. This means that extending them to extreme data values outside the range of the original data often leads to poor and sometimes ridiculous results. If height is normally distributed and the normal ranges from minus infinity to plus infinity, pity the man minus three feet tall. 7.10 Copyright 1996 Lawrence C. Marsh Error Variance Estimation σ 2 = Τ − Κ e t 2 Σ Unbiased estimator of the error variance : σ 2 σ 2 Τ − Κ ∼ Τ − Κ χ Transform to a chi-square distribution : 7.11 Copyright 1996 Lawrence C. Marsh Gauss-Markov Theorem Under the assumptions of the multiple regression model, the ordinary least squares estimators have the smallest variance of all linear and unbiased estimators. This means that the least squares estimators are the Best Linear U nbiased Estimators BLUE. 7.12 Copyright 1996 Lawrence C. Marsh Variances y t = β 1 + β 2 x t2 + β 3 x t3 + e t varb 3 = 1 − r 23 Σ x t3 − x 3 2 2 σ 2 varb 2 = 1 − r 23 Σ x t2 − x 2 2 2 σ 2 Σ x t2 − x 2 2 Σ x t3 − x 3 2 where r 23 = Σ x t2 − x 2 x t3 − x 3 When r 23 = 0 these reduce to the simple regression formulas. 7.13 Copyright 1996 Lawrence C. Marsh Variance Decomposition The variance of an estimator is smaller when: 1. The error variance, σ 2 , is smaller: σ 2 0 . 2. The sample size, T, is larger: Σ x t2 − x 2 2 . 3. The variable’s values are more spread out: x t2 − x 2 2 . 4. The correlation is close to zero: r 23 0 . 2 t = 1 T 7.14 Copyright 1996 Lawrence C. Marsh Covariances y t = β 1 + β 2 x t2 + β 3 x t3 + e t where r 23 = Σ x t2 − x 2 2 Σ x t3 − x 3 2 Σ x t2 − x 2 x t3 − x 3 1 − r 23 Σ x t2 − x 2 2 Σ x t3 − x 3 2 covb 2 ,b 3 = 2 − r 23 σ 2 7.15 Copyright 1996 Lawrence C. Marsh Covariance Decomposition 1. The error variance, σ 2 , is larger. 2. The sample size, T, is smaller. 3. The values of the variables are less spread out. 4. The correlation, r 23 , is high . The covariance between any two estimators is larger in absolute value when: 7.16 Copyright 1996 Lawrence C. Marsh Var-Cov Matrix y t = β 1 + β 2 x t2 + β 3 x t3 + e t varb 1 covb 1 ,b 2 covb 1 ,b 3 covb 1 ,b 2 ,b 3 = covb 1 ,b 2 varb 2 covb 2 ,b 3 covb 1 ,b 3 covb 2 ,b 3 varb 3 The least squares estimators b 1 , b 2 , and b 3 have covariance matrix: 7.17 Copyright 1996 Lawrence C. Marsh Normal y t = β 1 + β 2 x 2t + β 3 x 3t +. . .+ β K x Kt + e t y t ~N β 1 + β 2 x 2t + β 3 x 3t +. . .+ β K x Kt , σ 2 e t ~ N0, σ 2 This implies and is implied by: b k ~ N β k , varb k z = ~ N0,1 for k = 1,2,...,K b k − β k varb k Since b k is a linear function of the y t ’s: 7.18 Copyright 1996 Lawrence C. Marsh Student-t b k − β k varb k t = = b k − β k seb k Since generally the population variance of b k , varb k , is unknown , we estimate it with which uses σ 2 instead of σ 2 . varb k t has a Student-t distribution with df= T − K . 7.19 Copyright 1996 Lawrence C. Marsh Interval Estimation b k − β k seb k P − t c ≤ ≤ t c = 1 − α t c is critical value for T-K degrees of freedom such that P t ≥ t c = α 2. P b k − t c seb k ≤ β k ≤ b k + t c seb k = 1 − α Interval endpoints: b k − t c seb k , b k + t c seb k 7.20 Copyright 1996 Lawrence C. Marsh Hypothesis Testing and Nonsample Information Chapter 8 Copyright © 1997 John Wiley Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley Sons, Inc. The purchaser may make back-up copies for hisher own use only and not for distribution or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein. 8.1 Copyright 1996 Lawrence C. Marsh 1. Student-t Tests 2. Goodness-of-Fit 3. F-Tests 4. ANOVA Table 5. Nonsample Information 6. Collinearity 7. Prediction Chapter 8: Overview 8.2 Copyright 1996 Lawrence C. Marsh Student - t Test y t = β 1 + β 2 X t2 + β 3 X t3 + β 4 X t4 + e t Student-t tests can be used to test any linear combination of the regression coefficients: H : β 2 + β 3 + β 4 = 1 H : β 1 = 0 H : 3 β 2 − 7 β 3 = 21 H : β 2 − β 3 ≤ 5 Every such t-test has exactly T − K degrees of freedom where K=coefficients estimatedincluding the intercept. 8.3 Copyright 1996 Lawrence C. Marsh One Tail Test y t = β 1 + β 2 X t2 + β 3 X t3 + β 4 X t4 + e t H : β 3 ≤ H 1 : β 3 b 3 se b 3 t = ~ t T − K t c df = T − K = T − 4 α 1 − α 8.4 Copyright 1996 Lawrence C. Marsh Two Tail Test y t = β 1 + β 2 X t2 + β 3 X t3 + β 4 X t4 + e t H : β 2 = 0 H 1 : β 2 ≠ b 2 se b 2 t = ~ t T − K t c df = T − K = T − 4 α2 1 − α -t c α2 8.5 Copyright 1996 Lawrence C. Marsh Goodness - of - Fit ≤ R 2 ≤ 1 Coefficient of Determination SST R 2 = = Σ y t − y 2 t = 1 T SSR Σ y t − y 2 t = 1 T 8.6 Copyright 1996 Lawrence C. Marsh Adjusted R-Squared Adjusted Coefficient of Determination Original: Adjusted: SST T − 1 R 2 = 1 − SSE T − K SST = 1 − SSE R 2 = SST SSR 8.7 Copyright 1996 Lawrence C. Marsh Computer Output Table 8.2 Summary of Least Squares Results Variable Coefficient Std Error t-value p-value constant 104.79 6.48 16.17 0.000 price − 6.642 3.191 − 2.081 0.042 advertising 2.984 0.167 17.868 0.000 b 2 se b 2 t = = − 6.642 3.191 − 2.081 = 8.8 Copyright 1996 Lawrence C. Marsh Reporting Your Results y t = 104.79 − 6.642 X t2 + 2.984 X t3 6.48 3.191 0.167 s.e. y t = 104.79 − 6.642 X t2 + 2.984 X t3 16.17 -2.081 17.868 t Reporting t-statistics: Reporting standard errors: 8.9 Copyright 1996 Lawrence C. Marsh Single Restriction F-Test y t = β 1 + β 2 X t2 + β 3 X t3 + β 4 X t4 + e t H : β 2 = 0 H 1 : β 2 ≠ df d = T − K = 49 df n = J = 1 SSE R − SSE U J SSE U T − K F = 1964.758 − 1805.1681 1805.16852 − 3 = = 4.33 By definition this is the t-statistic squared: t = − 2.081 F = t 2 = 4.33 8.10 Copyright 1996 Lawrence C. Marsh Multiple Restriction F-Test y t = β 1 + β 2 X t2 + β 3 X t3 + β 4 X t4 + e t H : β 2 = 0, β 4 = 0 H 1 : H not true df d = T − K = 49 df n = J = 2 SSE R − SSE U J SSE U T − K F = First run the restricted regression by dropping X t2 and X t4 to get SSE R . Next run unrestricted regression to get SSE U . 8.11 Copyright 1996 Lawrence C. Marsh F-Tests SSE R − SSE U J SSE U T − K F = F-Tests of this type are always right-tailed, even for left-sided or two-sided hypotheses, because any deviation from the null will make the F value bigger move rightward. F c α 1 − α f F F 8.12 Copyright 1996 Lawrence C. Marsh F-Test of Entire Equation y t = β 1 + β 2 X t2 + β 3 X t3 + e t H : β 2 = β 3 = 0 H 1 : H not true df d = T − K = 49 df n = J = 2 SSE R − SSE U J SSE U T − K F = 13581.35 − 1805.1682 1805.16852 − 3 = = 159.828 We ignore β 1 . Why? F c = 3.187 α = 0.05 Reject H 8.13 Copyright 1996 Lawrence C. Marsh ANOVA Table Table 8.3 Analysis of Variance Table Sum of Mean Source DF Squares Square F-Value Explained 2 11776.18 5888.09 158.828 Unexplained 49 1805.168 36.84 Total 51 13581.35 p-value: 0.0001 SST R 2 = = SSR = 0.867 11776.18 13581.35 8.14 Copyright 1996 Lawrence C. Marsh Nonsample Information lny t = β 1 + β 2 lnX t2 + β 3 lnX t3 + β 4 lnX t4 + e t A certain production process is known to be Cobb-Douglas with constant returns to scale. β 2 + β 3 + β 4 = 1 where β 4 = 1 − β 2 − β 3 lny t X t4 = β 1 + β 2 lnX t2 X t4 + β 3 lnX t3 X t4 + e t y t = β 1 + β 2 X t2 + β 3 X t3 + β 4 X t4 + e t Run least squares on the transformed model. Interpret coefficients same as in original model. 8.15 Copyright 1996 Lawrence C. Marsh Collinear Variables The term “independent variable” means an explanatory variable is independent of of the error term, but not necessarily independent of other explanatory variables. Since economists typically have no control over the implicit “experimental design”, explanatory variables tend to move together which often makes sorting out their separate influences rather problematic. 8.16 Copyright 1996 Lawrence C. Marsh Effects of Collinearity 1. no least squares output when collinearity is exact. 2. large standard errors and wide confidence intervals. 3. insignificant t-values even with high R 2 and a significant F-value. 4. estimates sensitive to deletion or addition of a few observations or “insignificant” variables. 5. good “within-sample”same proportions but poor “out-of-sample”different proportions prediction. A high degree of collinearity will produce: 8.17 Copyright 1996 Lawrence C. Marsh Identifying Collinearity Evidence of high collinearity include: 1. a high pairwise correlation between two explanatory variables. 2. a high R-squared when regressing one explanatory variable at a time on each of the remaining explanatory variables. 3. a statistically significant F-value when the t-values are statistically insignificant. 4. an R-squared that doesn’t fall by much when dropping any of the explanatory variables. 8.18 Copyright 1996 Lawrence C. Marsh Mitigating Collinearity Since high collinearity is not a violation of any least squares assumption, but rather a lack of adequate information in the sample: 1. collect more data with better information. 2. impose economic restrictions as appropriate. 3. impose statistical restrictions when justified. 4. if all else fails at least point out that the poor model performance might be due to the collinearity problem or it might not. 8.19 Copyright 1996 Lawrence C. Marsh Prediction Given a set of values for the explanatory variables, 1 X 02 X 03 , the best linear unbiased predictor of y is given by: y t = β 1 + β 2 X t2 + β 3 X t3 + e t This predictor is unbiased in the sense that the average value of the forecast error is zero . y = b 1 + b 2 X 02 + b 3 X 03 8.20 Copyright 1996 Lawrence C. Marsh Extensions of the Multiple Regression Model Chapter 9 Copyright © 1997 John Wiley Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley Sons, Inc. The purchaser may make back-up copies for hisher own use only and not for distribution or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein. 9.1 Copyright 1996 Lawrence C. Marsh Topics for This Chapter 1. Intercept Dummy Variables 2. Slope Dummy Variables 3. Different Intercepts Slopes 4. Testing Qualitative Effects 5. Are Two Regressions Equal? 6. Interaction Effects 7. Dummy Dependent Variables 9.2 Copyright 1996 Lawrence C. Marsh Intercept Dummy Variables Dummy variables are binary 0,1 D t = 1 if red car, D t = 0 otherwise. y t = β 1 + β 2 X t + β 3 D t + e t y t = speed of car in miles per hour X t = age of car in years Police: red cars travel faster . H : β 3 = 0 H 1 : β 3 9.3 Copyright 1996 Lawrence C. Marsh y t = β 1 + β 2 X t + β 3 D t + e t red cars: y t = β 1 + β 3 + β 2 x t + e t other cars: y t = β 1 + β 2 X t + e t y t X t miles per hour age in years β 1 + β 3 β 1 β 2 β 2 red cars other cars 9.4 Copyright 1996 Lawrence C. Marsh Slope Dummy Variables y t = β 1 + β 2 X t + β 3 D t X t + e t y t = β 1 + β 2 + β 3 X t + e t y t = β 1 + β 2 X t + e t y t X t value of porfolio years β 2 + β 3 β 1 β 2 stocks bonds Stock portfolio: D t = 1 Bond portfolio: D t = 0 β 1 = initial investment 9.5 Copyright 1996 Lawrence C. Marsh Different Intercepts Slopes y t = β 1 + β 2 X t + β 3 D t + β 4 D t X t + e t y t = β 1 + β 3 + β 2 + β 4 X t + e t y t = β 1 + β 2 X t + e t y t X t harvest weight of corn rainfall β 2 + β 4 β 1 β 2 “miracle” regular “miracle” seed: D t = 1 regular seed: D t = 0 β 1 + β 3 9.6 Copyright 1996 Lawrence C. Marsh y t = β 1 + β 2 X t + β 3 D t + e t β 2 β 1 + β 3 β 2 β 1 y t X t Men Women y t = β 1 + β 2 X t + e t For men : D t = 1. For women : D t = 0. years of experience y t = β 1 + β 3 + β 2 X t + e t wage rate H : β 3 = H 1 : β 3 . . Testing for discrimination in starting wage 9.7 Copyright 1996 Lawrence C. Marsh y t = β 1 + β 5 X t + β 6 D t X t + e t β 5 β 5 + β 6 β 1 y t X t Men Women y t = β 1 + β 5 + β 6 X t + e t y t = β 1 + β 5 X t + e t For men D t = 1. For women D t = 0. Men and women have the same starting wage, β 1 , but their wage rates increase at different rates diff.= β 6 . β 6 means that men’s wage rates are increasing faster than womens wage rates. years of experience wage rate 9.8 Copyright 1996 Lawrence C. Marsh y t = β 1 + β 2 X t + β 3 D t + β 4 D t X t + e t β 1 + β 3 β 1 β 2 β 2 + β 4 y t X t Men Women y t = β 1 + β 3 + β 2 + β 4 X t + e t y t = β 1 + β 2 X t + e t Women are given a higher starting wage, β 1 , while men get the lower starting wage, β 1 + β 3 , β 3 . But, men get a faster rate of increase in their wages, β 2 + β 4 , which is higher than the rate of increase for women, β 2 , since β 4 . years of experience An Ineffective Affirmative Action Plan women are started at a higher wage. Note : β 3 wage rate 9.9 Copyright 1996 Lawrence C. Marsh Testing Qualitative Effects 1. Test for differences in intercept . 2. Test for differences in slope . 3. Test for differences in both intercept and slope . 9.10 Copyright 1996 Lawrence C. Marsh H : β 3 ≤ 0 vs . Η 1 : β 3 H : β 4 ≤ 0 vs . Η 1 : β 4 Y t = β 1 + β 2 X t + β 3 D t + β 4 D t X t b − 3 Est. Var b 3 ˜ t n − 4 b − 4 Est. Var b 4 ˜ t n − 4 men: D t = 1 ; women: D t = 0 Testing for discrimination in starting wage. Testing for discrimination in wage increases. intercept slope + e t 9.11 Copyright 1996 Lawrence C. Marsh Testing : H o : β 3 = β 4 = 0 H 1 : otherwise and SSE R = y t − b 1 − b 2 X t 2 t = 1 T ∑ SSE U = y t − b 1 − b 2 X t − b 3 D t − b 4 D t X t 2 t = 1 T ∑ SSE R − SSE U 2 SSE U T − 4 ∼ F T − 4 2 intercept and slope 9.12 Copyright 1996 Lawrence C. Marsh Are Two Regressions Equal? y t = β 1 + β 2 X t + β 3 D t + β 4 D t X t + e t variations of “The Chow Test” I. Assuming equal variances pooling: men: D t = 1 ; women: D t = 0 H o : β 3 = β 4 = 0 vs. H 1 : otherwise y t = wage rate This model assumes equal wage rate variance. X t = years of experience 9.13 Copyright 1996 Lawrence C. Marsh y t = β 1 + β 2 X t + e t II. Allowing for unequal variances: y tm = δ 1 + δ 2 X tm + e tm y tw = γ 1 + γ 2 X tw + e tw Everyone: Men only: Women only: SSE R Forcing men and women to have same β 1 , β 2 . Allowing men and women to be different. SSE m SSE w where SSE U = SSE m + SSE w F = SSE R − SSE U J SSE U T − K J = restrictions K=unrestricted coefs. running three regressions J = 2 K = 4 9.14 Copyright 1996 Lawrence C. Marsh Interaction Variables 1. Interaction Dummies 2. Polynomial Terms special case of continuous interaction 3. Interaction Among Continuous Variables 9.15 Copyright 1996 Lawrence C. Marsh 1. Interaction Dummies y t = β 1 + β 2 X t + β 3 M t + β 4 B t + e t For men : M t = 1. For women : M t = 0. For black : B t = 1. For nonblack : B t = 0. No Interaction: wage gap assumed the same: y t = β 1 + β 2 X t + β 3 M t + β 4 B t + β 5 M t B t + e t Interaction: wage gap depends on race: Wage Gap between Men and Women y t = wage rate; X t = experience 9.16 Copyright 1996 Lawrence C. Marsh 2. Polynomial Terms y t = β 1 + β 2 X t + β 3 X 2 t + β 4 X 3 t + e t Linear in parameters but nonlinear in variables: y t = income; X t = age Polynomial Regression y t X t People retire at different ages or not at all. 90 20 30 40 50 60 80 70 9.17 Copyright 1996 Lawrence C. Marsh y t = β 1 + β 2 X t + β 3 X 2 t + β 4 X 3 t + e t y t = income; X t = age Polynomial Regression Rate income is changing as we age: ∂ y t ∂ X t = β 2 + 2 β 3 X t + 3 β 4 X 2 t Slope changes as X t changes. 9.18 Copyright 1996 Lawrence C. Marsh 3. Continuous Interaction y t = β 1 + β 2 Z t + β 3 B t + β 4 Z t B t + e t Exam grade = fsleep: Z t , study time: B t Sleep and study time do not act independently. More study time will be more effective when combined with more sleep and less effective when combined with less sleep. 9.19 Copyright 1996 Lawrence C. Marsh Your mind sorts things out while you sleep when you have things to sort out. y t = β 1 + β 2 Z t + β 3 B t + β 4 Z t B t + e t Exam grade = fsleep: Z t , study time: B t ∂ y t ∂ B t = β 2 + β 4 Z t Your studying is more effective with more sleep. ∂ y t ∂ Z t = β 2 + β 4 B t continuous interaction 9.20 Copyright 1996 Lawrence C. Marsh y t = β 1 + β 2 Z t + β 3 B t + β 4 Z t B t + e t Exam grade = fsleep: Z t , study time: B t If Z t + B t = 24 hours, then B t = 24 − Z t y t = β 1 + β 2 Z t + β 3 24 − Z t + β 4 Z t 24 − Z t + e t y t = β 1 +24 β 3 + β 2 −β 3 +24 β 4 Z t − β 4 Z 2 t + e t y t = δ 1 + δ 2 Z t + δ 3 Z 2 t + e t Sleep needed to maximize your exam grade: ∂ y t ∂ Z t = δ 2 + 2 δ 3 Z t = 0 where δ 2 0 and δ 3 − δ 2 2δ 3 Z t = 9.21 Copyright 1996 Lawrence C. Marsh 1. Linear Probability Model 2. Probit Model 3. Logit Model Dummy Dependent Variables 9.22 Copyright 1996 Lawrence C. Marsh Linear Probability Model y i = β 1 + β 2 X i2 + β 3 X i3 + β 4 X i4 + e i X i2 = total hours of work each week 1 quits job 0 does not quit y i = X i3 = weekly paycheck X i4 = hourly pay X i3 divided by X i2 9.23 Copyright 1996 Lawrence C. Marsh X i2 y i = β 1 + β 2 X i2 + β 3 X i3 + β 4 X i4 + e i y t = 1 y t = total hours of work each week y i = b 1 + b 2 X i2 + b 3 X i3 + b 4 X i4 y i Read predicted values of y i off the regression line : Linear Probability Model 9.24 Copyright 1996 Lawrence C. Marsh 1. Probability estimates are sometimes less than zero or greater than one. 2. Heteroskedasticity is present in that the model generates a nonconstant error variance. Linear Probability Model Problems with Linear Probability Model: 9.25 Copyright 1996 Lawrence C. Marsh Probit Model z i = β 1 + β 2 X i2 + . . . 2 π fz i = e − 0.5z i 2 1 Fz i = P[ Z ≤ z i ] = ∫ e − 0.5u 2 du 2 π 1 Normal probability density function: Normal cumulative probability function: z i −∞ latent variable, z i : 9.26 Copyright 1996 Lawrence C. Marsh p i = P[ Z ≤ β 1 + β 2 X i2 ] = F β 1 + β 2 X i2 Since z i = β 1 + β 2 X i2 + . . . , we can substitute in to get : Probit Model X i2 total hours of work each week y t = 1 y t = 9.27 Copyright 1996 Lawrence C. Marsh Logit Model p i = 1 1 + e − β 1 + β 2 X i2 + . . . Define p i : For β 2 , p i will approach 1 as X i2 + ∞ p i is the probability of quitting the job. For β 2 , p i will approach 0 as X i2 − ∞ 9.28 Copyright 1996 Lawrence C. Marsh Logit Model X i2 total hours of work each week y t = 1 y t = p i = 1 1 + e − β 1 + β 2 X i2 + . . . p i is the probability of quitting the job. 9.29 Copyright 1996 Lawrence C. Marsh Maximum Likelihood Maximum likelihood estimation MLE is used to estimate Probit and Logit functions. The small sample properties of MLE are not known, but in large samples MLE is normally distributed, and it is consistent and asymptotically efficient. 9.30 Copyright 1996 Lawrence C. Marsh Heteroskedasticity Chapter 10 Copyright © 1997 John Wiley Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley Sons, Inc. The purchaser may make back-up copies for hisher own use only and not for distribution or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein. 10.1 Copyright 1996 Lawrence C. Marsh The Nature of Heteroskedasticity Heteroskedasticity is a systematic pattern in the errors where the variances of the errors are not constant. Ordinary least squares assumes that all observations are equally reliable . For efficiency accurate estimationprediction reweight observations to ensure equal error variance. 10.2 Copyright 1996 Lawrence C. Marsh y t = β 1 + β 2 x t + e t Regression Model Ee t = 0 vare t = σ 2 zero mean: homoskedasticity: nonautocorrelation: cove t , e s = t ≠ s heteroskedasticity: vare t = σ t 2 10.3 Copyright 1996 Lawrence C. Marsh Homoskedastic pattern of errors x t y t . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . income consumption 10.4 Copyright 1996 Lawrence C. Marsh . . x t x 1 x 2 y t fy t The Homoskedastic Case . . x 3 x 4 income consumption 10.5 Copyright 1996 Lawrence C. Marsh Heteroskedastic pattern of errors x t y t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . income consumption 10.6 Copyright 1996 Lawrence C. Marsh . x t x 1 x 2 y t fy t consumption x 3 . . The Heteroskedastic Case income rich people poor people 10.7 Copyright 1996 Lawrence C. Marsh Properties of Least Squares 1. Least squares still linear and unbiased . 2. Least squares not efficient . 3. Usual formulas give incorrect standard errors for least squares. 4. Confidence intervals and hypothesis tests based on usual standard errors are wrong . 10.8 Copyright 1996 Lawrence C. Marsh y t = β 1 + β 2 x t + e t heteroskedasticity: vare t = σ t 2 incorrect formula for least squares variance: varb 2 = σ 2 Σ x t − x 2 correct formula for least squares variance: varb 2 = Σ σ t 2 x t − x 2 [Σ x t − x 2 ] 2 10.9 Copyright 1996 Lawrence C. Marsh Hal White’s Standard Errors White’s estimator of the least squares variance: est.varb 2 = Σ e t 2 x t − x 2 [Σ x t − x 2 ] 2 In large samples White’s standard error square root of estimated variance is a correct accurate consistent measure. 10.10 Copyright 1996 Lawrence C. Marsh Two Types of Heteroskedasticity 1. Proportional Heteroskedasticity. continuous functionof x t , for example 2. Partitioned Heteroskedasticity. discrete categoriesgroups 10.11 Copyright 1996 Lawrence C. Marsh Proportional Heteroskedasticity y t = β 1 + β 2 x t + e t where vare t = σ t 2 Ee t = 0 cove t , e s = 0 t ≠ s σ t 2 = σ 2 x t The variance is assumed to be proportional to the value of x t 10.12 Copyright 1996 Lawrence C. Marsh σ t 2 = σ 2 x t y t = β 1 + β 2 x t + e t std.dev. proportional to x t variance: standard deviation: σ t = σ x t y t 1 x t e t = β 1 + β 2 + x t x t x t x t To correct for heteroskedasticity divide the model by x t vare t = σ t 2 10.13 Copyright 1996 Lawrence C. Marsh y t 1 x t e t = β 1 + β 2 + x t x t x t x t y t = β 1 x t1 + β 2 x t2 + e t vare t = var = vare t = σ 2 x t e t x t 1 x t 1 x t vare t = σ 2 e t is heteroskedastic , but e t is homoskedastic . 10.14 Copyright 1996 Lawrence C. Marsh

1. Decide which variable is proportional to the