Decide which variable is proportional to the Divide all terms in the original model by the Run least squares on the transformed model Residual Plots provide information on the Goldfeld-Quandt Test checks for presence

Copyright 1996 Lawrence C. Marsh

1. Decide which variable is proportional to the

heteroskedasticity x t in previous example.

2. Divide all terms in the original model by the

square root of that variable divide by x t .

3. Run least squares on the transformed model

which has new y t , x t1 and x t2 variables but no intercept . Generalized Least Squares These steps describe weighted least squares: 10.15 Copyright 1996 Lawrence C. Marsh Partitioned Heteroskedasticity y t = β 1 + β 2 x t + e t vare t = σ 1 2 vare t = σ 2 2 error variance of “field” corn : error variance of “sweet” corn : y t = bushels per acre of corn x t = gallons of water per acre rain or other t = 1, . . . ,100 t = 1, . . . ,80 t = 81, . . . ,100 10.16 Copyright 1996 Lawrence C. Marsh y t = β 1 + β 2 x t + e t vare t = σ 1 2 “field” corn : y t = β 1 + β 2 x t + e t vare t = σ 2 2 “sweet” corn : y t 1 x t e t = β 1 + β 2 + σ 1 σ 1 σ 1 σ 1 y t 1 x t e t = β 1 + β 2 + σ 2 σ 2 σ 2 σ 2 Reweighting Each Group’s Observations t = 1, . . . ,80 t = 81, . . . ,100 10.17 Copyright 1996 Lawrence C. Marsh Apply Generalized Least Squares Run least squares separately on data for each group. σ 1 2 provides estimator of σ 1 2 using the 80 observations on “field” corn. σ 2 2 provides estimator of σ 2 2 using the 20 observations on “sweet” corn. 10.18 Copyright 1996 Lawrence C. Marsh

1. Residual Plots provide information on the

exact nature of heteroskedasticity partitioned or proportional to aid in correcting for it.

2. Goldfeld-Quandt Test checks for presence

of heteroskedasticity. Detecting Heteroskedasticity Determine existence and nature of heteroskedasticity : 10.19 Copyright 1996 Lawrence C. Marsh Residual Plots e t x t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Plot residuals against one variable at a time after sorting the data by that variable to try to find a heteroskedastic pattern in the data. 10.20 Copyright 1996 Lawrence C. Marsh Goldfeld-Quandt Test The Goldfeld-Quandt test can be used to detect heteroskedasticity in either the proportional case or for comparing two groups in the discrete case. For proportional heteroskedasticity, it is first necessary to determine which variable, such as x t , is proportional to the error variance. Then sort the data from the largest to smallest values of that variable. 10.21 Copyright 1996 Lawrence C. Marsh H o : σ 1 2 = σ 2 2 H 1 : σ 1 2 σ 2 2 GQ = ~ F [T 1 -K 1 , T 2 -K 2 ] σ 1 2 σ 2 2 In the proportional case, drop the middle r observations where r ≈ T6, then run separate least squares regressions on the first T 1 observations and the last T 2 observations. Small values of GQ support H o while large values support H 1 . Goldfeld-Quandt Test Statistic Use F Table 10.22 Copyright 1996 Lawrence C. Marsh σ t 2 = σ 2 exp{ α 1 z t1 + α 2 z t2 } More General Model Structure of heteroskedasticity could be more complicated: z t1 and z t2 are any observable variables upon which we believe the variance could depend. Note: The function exp{ . } ensures that σ t 2 is positive. 10.23 Copyright 1996 Lawrence C. Marsh σ t 2 = σ 2 exp{ α 1 z t1 + α 2 z t2 } More General Model ln σ t 2 = ln σ 2 + α 1 z t1 + α 2 z t2 ln σ t 2 = α + α 1 z t1 + α 2 z t2 where α = ln σ 2 H o : α 1 = 0, α 2 = 0 H 1 : α 1 ≠ 0, α 2 ≠ andor Least squares residuals, e t ln e t 2 = α + α 1 z t1 + α 2 z t2 + ν t the usual F test 10.24 Copyright 1996 Lawrence C. Marsh Autocorrelation Chapter 11 Copyright © 1997 John Wiley Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley Sons, Inc. The purchaser may make back-up copies for hisher own use only and not for distribution or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein. 11.1 Copyright 1996 Lawrence C. Marsh The Nature of Autocorrelation For efficiency accurate estimationprediction all systematic information needs to be incor- porated into the regression model. Autocorrelation is a systematic pattern in the errors that can be either attracting positive or repelling negative autocorrelation. 11.2 Copyright 1996 Lawrence C. Marsh Postive Auto. No Auto. Negative Auto. e t . e t e t t t t . . . . . . . . . . . . . . . . . . .. . . . . . .. . .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . crosses line not enough attracting crosses line randomly crosses line too much repelling 11.3 Copyright 1996 Lawrence C. Marsh y t = β 1 + β 2 x t + e t Regression Model Ee t = 0 vare t = σ 2 zero mean: homoskedasticity: nonautocorrelation: cove t , e s = t ≠ s autocorrelation: cove t , e s ≠ t ≠ s 11.4 Copyright 1996 Lawrence C. Marsh Order of Autocorrelation y t = β 1 + β 2 x t + e t e t = ρ e t − 1 + ν t e t = ρ 1 e t − 1 + ρ 2 e t − 2 + ν t e t = ρ 1 e t − 1 + ρ 2 e t − 2 + ρ 3 e t − 3 + ν t 1st Order: 2nd Order: 3rd Order: We will assume First Order Autocorrelation: e t = ρ e t − 1 + ν t AR1 : 11.5 Copyright 1996 Lawrence C. Marsh First Order Autocorrelation y t = β 1 + β 2 x t + e t e t = ρ e t − 1 + ν t where − 1 ρ 1 E ν t = 0 var ν t = σ ν 2 cov ν t , ν s = t ≠ s These assumptions about ν t imply the following about e t : Ee t = 0 vare t = σ e 2 = cove t , e t − k = σ e 2 ρ k for k corre t , e t − k = ρ k for k σ ν 2 1 − ρ 2 11.6 Copyright 1996 Lawrence C. Marsh Autocorrelation creates some Problems for Least Squares: 1. The least squares estimator is still linear and unbiased but it is not efficient . 2. The formulas normally used to compute the least squares standard errors are no longer correct and confidence intervals and hypothesis tests using them will be wrong . 11.7 Copyright 1996 Lawrence C. Marsh Generalized Least Squares y t = β 1 + β 2 x t + e t e t = ρ e t − 1 + ν t y t = β 1 + β 2 x t + ρ e t − 1 + ν t substitute in for e t Now we need to get rid of e t − 1 continued AR1 : 11.8 Copyright 1996 Lawrence C. Marsh y t = β 1 + β 2 x t + e t y t = β 1 + β 2 x t + ρ e t − 1 + ν t e t = y t − β 1 − β 2 x t e t − 1 = y t − 1 − β 1 − β 2 x t − 1 y t = β 1 + β 2 x t + ρ y t − 1 − β 1 − β 2 x t − 1 + ν t lag the errors once continued 11.9 Copyright 1996 Lawrence C. Marsh y t = β 1 + β 2 x t + ρ y t − 1 − β 1 − β 2 x t − 1 + ν t y t = β 1 + β 2 x t + ρ y t − 1 − ρ β 1 − ρ β 2 x t − 1 + ν t y t − ρ y t − 1 = β 1 1 −ρ + β 2 x t −ρ x t − 1 + ν t y t = β 1 + β 2 x t2 + ν t y t = y t − ρ y t − 1 β 1 = β 1 1 −ρ x t2 = x t −ρ x t − 1 11.10 Copyright 1996 Lawrence C. Marsh y t = β 1 + β 2 x t2 + ν t y t = y t − ρ y t − 1 β 1 = β 1 1 −ρ x t2 = x t − ρ x t − 1 Problems estimating this model with least squares : 1. One observation is used up in creating the transformed lagged variables leaving only T − 1 observations for estimating the model. 2. The value of ρ is not known. We must find some way to estimate it. 11.11 Copyright 1996 Lawrence C. Marsh Recovering the 1st Observation Dropping the 1st observation and applying least squares is not the best linear unbiased estimation method. Efficiency is lost because the variance of the error associated with the 1st observation is not equal to that of the other errors. This is a special case of the heteroskedasticity problem except that here all errors are assumed to have equal variance except the 1st error. 11.12 Copyright 1996 Lawrence C. Marsh Recovering the 1st Observation y 1 = β 1 + β 2 x 1 + e 1 The 1st observation should fit the original model as: We could include this as the 1st observation for our estimation procedure but we must first transform it so that it has the same error variance as the other observations. with error variance: vare 1 = σ e 2 = σ ν 2 1- ρ 2 . Note: The other observations all have error variance σ ν 2 . 11.13 Copyright 1996 Lawrence C. Marsh y 1 = β 1 + β 2 x 1 + e 1 with error variance: vare 1 = σ e 2 = σ ν 2 1- ρ 2 . The other observations all have error variance σ ν 2 . Given any constant c : varce 1 = c 2 vare 1 . If c = 1- ρ 2 , then var 1- ρ 2 e 1 = 1- ρ 2 vare 1 . = 1- ρ 2 σ e 2 = 1- ρ 2 σ ν 2 1- ρ 2 = σ ν 2 The transformation ν 1 = 1- ρ 2 e 1 has variance σ ν 2 . 11.14 Copyright 1996 Lawrence C. Marsh y 1 = β 1 + β 2 x 1 + e 1 The transformed error ν 1 = 1- ρ 2 e 1 has variance σ ν 2 . Multiply through by 1- ρ 2 to get: 1- ρ 2 y 1 = 1- ρ 2 β 1 + 1- ρ 2 β 2 x 1 + 1- ρ 2 e 1 This transformed first observation may now be added to the other T-1 observations to obtain the fully restored set of T observations. 11.15 Copyright 1996 Lawrence C. Marsh Estimating Unknown ρ Value e t = ρ e t − 1 + ν t First, use least squares to estimate the model: If we had values for the e t ’s, we could estimate: y t = β 1 + β 2 x t + e t The residuals from this estimation are: e t = y t - b 1 - b 2 x t 11.16 Copyright 1996 Lawrence C. Marsh e t = y t - b 1 - b 2 x t e t = ρ e t − 1 + ν t Next, estimate the following by least squares: The least squares solution is: Σ e t e t-1 Σ e t-1 T T t = 2 t = 2 2 ρ = 11.17 Copyright 1996 Lawrence C. Marsh Durbin-Watson Test H o : ρ = 0 vs. H 1 : ρ ≠ 0 , ρ 0, or ρ Σ e t − e t-1 Σ e t T T t = 2 t = 1 2 d = 2 The Durbin-Watson Test statistic, d, is : 11.18 Copyright 1996 Lawrence C. Marsh Testing for Autocorrelation The test statistic, d, is approximately related to ρ as: d ≈ 21 −ρ When ρ = 0 , the Durbin-Watson statistic is d ≈ 2. When ρ = 1 , the Durbin-Watson statistic is d ≈ 0. Tables for critical values for d are not always readily available so it is easier to use the p-value that most computer programs provide for d. Reject H o if p-value α , the significance level. 11.19 Copyright 1996 Lawrence C. Marsh Prediction with AR1 Errors When errors are autocorrelated, the previous period’s error may help us predict next period’s error. The best predictor, y T+1 , for next period is: y T+1 = β 1 + β 2 x T+1 + ρ e T ~ where β 1 and β 2 are generalized least squares estimates and e T is given by: ~ e T = y T − β 1 − β 2 x T ~ 11.20 Copyright 1996 Lawrence C. Marsh y T+h = β 1 + β 2 x T+h + ρ h e T ~ For h periods ahead, the best predictor is: Assuming | ρ | 1, the influence of ρ h e T diminishes the further we go into the future the larger h becomes. ~ 11.21 Copyright 1996 Lawrence C. Marsh Pooling Time-Series and Cross-Sectional Data Chapter 12 Copyright © 1997 John Wiley Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley Sons, Inc. The purchaser may make back-up copies for hisher own use only and not for distribution or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein. 12.1 Copyright 1996 Lawrence C. Marsh Pooling Time and Cross Sections y it = β 1it + β 2it x 2it + β 3it x 3it + e it If left unrestricted, this model requires different equations for each firm in each time period. for the i th firm in the t th time period 12.2 Copyright 1996 Lawrence C. Marsh Seemingly Unrelated Regressions y it = β 1i + β 2i x 2it + β 3i x 3it + e it SUR models impose the restrictions: β 1it = β 1i β 2it = β 2i β 3it = β 3i Each firm gets its own coefficients: β 1i , β 2i and β 3i but those coefficients are constant over time. 12.3 Copyright 1996 Lawrence C. Marsh The investment expenditures INV of General Electric G and WestinghouseW may be related to their stock market value V and actual capital stock K as follows: INV Gt = β 1G + β 2G V Gt + β 3G K Gt + e Gt INV Wt = β 1W + β 2W V Wt + β 3W K Wt + e Wt i = G, W t = 1, . . . , 20 Two-Equation SUR Model 12.4 Copyright 1996 Lawrence C. Marsh Estimating Separate Equations For now make the assumption of no correlation between the error terms across equations: We make the usual error term assumptions: cov e Gt , e Gs = 0 cov e Wt , e Ws = 0 var e Gt = σ G 2 var e Wt = σ W 2 E e Gt = 0 E e Wt = 0 cov e Gt , e Wt = 0 cov e Gt , e Ws = 0 12.5 Copyright 1996 Lawrence C. Marsh homoskedasticity assumption: σ G = σ W 2 2 INV t = β 1G + δ 1 D t + β 2G V t + δ 2 D t V t + β 3G K t + δ 3 D t K t + e t Dummy variable model assumes that : σ G = σ W 2 2 For Westinghouse observations D t = 1; otherwise D t = 0. β 1W = β 1G + δ 1 β 2W = β 2G + δ 2 β 3W = β 3G + δ 3 12.6 Copyright 1996 Lawrence C. Marsh Problem with OLS on Each Equation The first assumption of the Gauss-Markov Theorem concerns the model specification . If the model is not fully and correctly specified the Gauss-Markov properties might not hold. Any correlation of error terms across equations must be part of model specification. 12.7 Copyright 1996 Lawrence C. Marsh Any correlation between the dependent variables of two or more equations that is not due to their explanatory variables is by default due to correlated error terms . Correlated Error Terms 12.8 Copyright 1996 Lawrence C. Marsh 1. Sales of Pepsi vs. sales of Coke. uncontrolled factor: outdoor temperature 2. Investments in bonds vs. investments in stocks. uncontrolled factor: computerappliance sales 3. Movie admissions vs. Golf Course admissions. uncontrolled factor: weather conditions 4. Sales of butter vs. sales of bread. uncontrolled factor: bagels and cream cheese Which of the following models would be likely to produce positively correlated errors and which would produce negatively correlations errors? 12.9 Copyright 1996 Lawrence C. Marsh Joint Estimation of the Equations INV Gt = β 1G + β 2G V Gt + β 3G K Gt + e Gt INV Wt = β 1W + β 2W V Wt + β 3W K Wt + e Wt cov e G t , e W t = σ GW 12.10 Copyright 1996 Lawrence C. Marsh Seemingly Unrelated Regressions When the error terms of two or more equations are correlated , efficient estimation requires the use of a Seemingly Unrelated Regressions SUR type estimator to take the correlation into account. Be sure to use the Seemingly Unrelated Regressions SUR procedure in your regression software program to estimate any equations that you believe might have correlated errors . 12.11 Copyright 1996 Lawrence C. Marsh Separate vs. Joint Estimation SUR will give exactly the same results as estimating each equation separately with OLS if either or both of the following two conditions are true: 1. Every equation has exactly the same set of explanatory variables with exactly the same values. 2. There is no correlation between the error terms of any of the equations. 12.12 Copyright 1996 Lawrence C. Marsh Test for Correlation Η ο : σ GW = 0 Test the null hypothesis of zero correlation σ GW σ G σ W r GW = 2 2 2 2 λ = T r GW 2 λ ∼ χ 2 1 asy. 12.13 Copyright 1996 Lawrence C. Marsh Start with the residuals e Gt and e Wt from each equation estimated separately. σ GW σ G σ W r GW = 2 2 2 2 λ = T r GW 2 λ ∼ χ 2 1 asy. σ GW = Σ e Gt e Wt 1 T σ G = Σ e Gt 1 T 2 2 σ W = Σ e Wt 1 T 2 2 12.14 Copyright 1996 Lawrence C. Marsh Fixed Effects Model y it = β 1it + β 2it x 2it + β 3it x 3it + e it y it = β 1i + β 2 x 2it + β 3 x 3it + e it Fixed effects models impose the restrictions: β 1it = β 1i β 2it = β 2 β 3it = β 3 For each i th cross section in the t th time period: Each i th cross-section has its own constant β 1i intercept. 12.15 Copyright 1996 Lawrence C. Marsh The Fixed Effects Model is conveniently represented using dummy variables: y it = β 11 D 1i + β 12 D 2i + β 13 D 3i + β 14 D 4 i + β 2 x 2it + β 3 x 3it + e it D 1i =1 if North D 1i =0 if not N D 2i =1 if East D 2i =0 if not E D 3i =1 if South D 3i =0 if not S D 4i =1 if West D 4i =0 if not W y it = millions of bushels of corn produced x 2it = price of corn in dollars per bushel x 3it = price of soybeans in dollars per bushel Each cross-sectional unit gets its own intercept, but each cross-sectional intercept is constant over time. 12.16 Copyright 1996 Lawrence C. Marsh H o : β 11 = β 12 = β 13 = β 14 Test for Equality of Fixed Effects H 1 : H o not true The H o joint null hypothesis may be tested with F-statistic: SSE R − SSE U J SSE U NT − K F = ~ F NT − K J SSE R is the restricted error sum of squares one intercept SSE U is the unrestricted error sum of squares four intercepts N is the number of cross-sectional units N = 4 K is the number of parameters in the model K = 6 J is the number of restrictions being tested J = N − 1 = 3 T is the number of time periods 12.17 Copyright 1996 Lawrence C. Marsh Random Effects Model y it = β 1i + β 2 x 2it + β 3 x 3it + e it β 1i = β 1 + µ i β 1 is the population mean intercept. µ i is an unobservable random error that accounts for the cross-sectional differences. 12.18 Copyright 1996 Lawrence C. Marsh β 1i = β 1 + µ i µ i are independent of one another and of e it E µ i = 0 var µ i = σ µ 2 where i = 1, ... ,N Consequently, E β 1i = β 1 var β 1i = σ µ 2 Random Intercept Term 12.19 Copyright 1996 Lawrence C. Marsh y it = β 1i + β 2 x 2it + β 3 x 3it + e it y it = β 1 + µ i + β 2 x 2it + β 3 x 3it + e it y it = β 1 + β 2 x 2it + β 3 x 3it + µ i +e it y it = β 1 + β 2 x 2it + β 3 x 3it + ν it Random Effects Model 12.20 Copyright 1996 Lawrence C. Marsh ν it = µ i +e it y it = β 1 + β 2 x 2it + β 3 x 3it + ν it ν it has zero mean : E ν it = 0 ν it is homoskedastic : var ν it = σ µ + σ e 2 2 The errors from the same firm in different time periods are correlated: The errors from different firms are always uncorrelated: cov ν it , ν is = σ µ 2 cov ν it , ν js = t ≠ s i ≠ j 12.21 Copyright 1996 Lawrence C. Marsh Simultaneous Equations Models Chapter 13 Copyright © 1997 John Wiley Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley Sons, Inc. The purchaser may make back-up copies for hisher own use only and not for distribution or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein. 13.1 Copyright 1996 Lawrence C. Marsh Keynesian Macro Model Assumptions of Simple Keynesian Model 1. Consumption, c, is function of income, y. 2. Total expenditures = consumption + investment. 3. Investment assumed independent of income. 13.2 Copyright 1996 Lawrence C. Marsh consumption is a function of income: income is either consumed or invested: c = β 1 + β 2 y y = c + i The Structural Equations 13.3 Copyright 1996 Lawrence C. Marsh The Statistical Model c t = β 1 + β 2 y t + e t y t = c t + i t The consumption equation: The income identity: 13.4 Copyright 1996 Lawrence C. Marsh The Simultaneous Nature of Simultaneous Equations c t = β 1 + β 2 y t + e t y t = c t + i t Since y t contains e t they are correlated 2. 1. 3. 4. 5. 13.5 Copyright 1996 Lawrence C. Marsh The Failure of Least Squares The least squares estimators of parameters in a structural simul- taneous equation is biased and inconsistent because of the cor- relation between the random error and the endogenous variables on the right-hand side of the equation. 13.6 Copyright 1996 Lawrence C. Marsh Single Equation: Simultaneous Equations: Single vs. Simultaneous Equations c t y t e t c t y t i t e t 13.7 Copyright 1996 Lawrence C. Marsh Deriving the Reduced Form c t = β 1 + β 2 y t + e t y t = c t + i t c t = β 1 + β 2 c t + i t + e t 1 − β 2 c t = β 1 + β 2 i t + e t 13.8 Copyright 1996 Lawrence C. Marsh Deriving the Reduced Form 1 − β 2 c t = β 1 + β 2 i t + e t c t = + i t + e t 1 −β 2 1 −β 2 1 −β 2 1 β 1 β 2 c t = π 11 + π 21 i t + ν t The Reduced Form Equation 13.9 Copyright 1996 Lawrence C. Marsh Reduced Form Equation c t = π 11 + π 21 i t + ν t 1 −β 2 β 1 π 11 = 1 −β 2 β 2 π 21 = 1 −β 2 1 ν t = + e t and 13.10 Copyright 1996 Lawrence C. Marsh y t = c t + i t where c t = π 11 + π 21 i t + ν t y t = π 12 + π 22 i t + ν t It is sometimes useful to give this equation its own reduced form parameters as follows: y t = π 11 + 1 + π 21 i t + ν t 13.11 Copyright 1996 Lawrence C. Marsh y t = π 12 + π 22 i t + ν t c t = π 11 + π 21 i t + ν t Since c t and y t are related through the identity: y t = c t + i t , the error term, ν t , of these two equations is the same, and it is easy to show that: 1 −β 2 β 1 π 11 = π 12 = 1 −β 2 π 22 = 1 −π 21 = 1 13.12 Copyright 1996 Lawrence C. Marsh Identification The structural parameters are β 1 and β 2 . The reduced form parameters are π 11 and π 21 . Once the reduced form parameters are estimated, the identification problem is to determine if the orginal structural parameters can be expressed uniquely in terms of the reduced form parameters. 1 + π 21 β 2 = π 21 1 + π 21 β 1 = π 11 13.13 Copyright 1996 Lawrence C. Marsh Identification An equation is exactly identified if its structural behavorial parameters can be uniquely expres- sed in terms of the reduced form parameters. An equation is over-identified if there is more than one solution for expressing its structural behavorial parameters in terms of the reduced form parameters. An equation is under-identified if its structural behavorial parameters cannot be expressed in terms of the reduced form parameters. 13.14 Copyright 1996 Lawrence C. Marsh The Identification Problem A system of M equations containing M endogenous variables must exclude at least M − 1 variables from a given equation in order for the parameters of that equation to be identified and to be able to be consistently estimated. 13.15 Copyright 1996 Lawrence C. Marsh Two Stage Least Squares Problem: right-hand endogenous variables y t2 and y t1 are correlated with the error terms. y t1 = β 1 + β 2 y t2 + β 3 x t1 + e t1 y t2 = α 1 + α 2 y t1 + α 3 x t2 + e t2 13.16 Copyright 1996 Lawrence C. Marsh Problem: right-hand endogenous variables y t2 and y t1 are correlated with the error terms. Solution: First, derive the reduced form equations. y t1 = β 1 + β 2 y t2 + β 3 x t1 + e t1 y t2 = α 1 + α 2 y t1 + α 3 x t2 + e t2 y t1 = π 11 + π 21 x t1 + π 31 x t2 + ν t1 y t2 = π 12 + π 22 x t1 + π 32 x t2 + ν t2 Solve two equations for two unknowns, y t1 , y t2 : 13.17 Copyright 1996 Lawrence C. Marsh y t1 = π 11 + π 21 x t1 + π 31 x t2 + ν t1 y t2 = π 12 + π 22 x t1 + π 32 x t2 + ν t2 Use least squares to get fitted values: 2SLS: Stage I y t1 = π 11 + π 21 x t1 + π 31 x t2 y t2 = π 12 + π 22 x t1 + π 32 x t2 y t2 = y t2 + ν t2 y t1 = y t1 + ν t1 13.18 Copyright 1996 Lawrence C. Marsh 2SLS: Stage II y t2 = y t2 + ν t2 y t1 = y t1 + ν t1 and y t1 = β 1 + β 2 y t2 + β 3 x t1 + e t1 y t2 = α 1 + α 2 y t1 + α 3 x t2 + e t2 Substitue in for y t1 , y t2 y t1 = β 1 + β 2 y t2 + ν t2 + β 3 x t1 + e t1 y t2 = α 1 + α 2 y t1 + ν t1 + α 3 x t2 + e t2 13.19 Copyright 1996 Lawrence C. Marsh 2SLS: Stage II continued y t1 = β 1 + β 2 y t2 + β 3 x t1 + u t1 y t2 = α 1 + α 2 y t1 + α 3 x t2 + u t2 u t1 = β 2 ν t2 + e t1 u t2 = α 2 ν t1 + e t2 where and Run least squares on each of the above equations to get 2SLS estimates: β 1 , β 2 , β 3 , α 1 , α 2 and α 3 ~ ~ ~ ~ ~ ~ 13.20 Copyright 1996 Lawrence C. Marsh Nonlinear Least Squares Chapter 14 Copyright © 1997 John Wiley Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley Sons, Inc. The purchaser may make back-up copies for hisher own use only and not for distribution or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein. 14.1 Copyright 1996 Lawrence C. Marsh A. “Regression” model with only an intercept term: Review of Least Squares Principle y t = α + e t e t = y t − α Σ e t = Σ y t − α 2 2 SSE = Σ y t − α 2 ∂ SSE ∂ α = − 2 Σ y t − α = 0 Σ y t − Σ α = 0 Σ y t − Τ α = 0 α = Σ y t = y 1 T minimize the sum of squared errors Yields an exact analytical solution: 14.2 Copyright 1996 Lawrence C. Marsh Review of Least Squares B. Regression model without an intercept term: y t = β x t + e t e t = y t − β x t Σ e t = Σ y t − β x t 2 2 SSE = Σ y t − β x t 2 ∂ SSE ∂ α = − 2 Σ x t y t − β x t = 0 Σ x t y t − Σ β x t = 0 2 Σ x t y t − β Σ x t = 0 2 β Σ x t = Σ x t y t 2 β = Σ x t y t 2 Σ x t This yields an exact analytical solution: 14.3 Copyright 1996 Lawrence C. Marsh Review of Least Squares C. Regression model with both an intercept and a slope: y t = α + β x t + e t SSE = Σ y t − α − β x t 2 ∂ SSE ∂ α = − 2 Σ y t − α − β x t = 0 ∂ SSE ∂ β = − 2 Σ x t y t − α − β x t = 0 y − α − β x = 0 β = Σ x t − xy t − y Σ x t − x 2 Σ x t y t − αΣ x t − βΣ x t = 0 2 This yields an exact analytical solution: α = y − β x 14.4 Copyright 1996 Lawrence C. Marsh Nonlinear Least Squares D. Nonlinear Regression model: y t = x t β + e t SSE = Σ y t − x t β 2 PROBLEM: An exact analytical solution to this does not exist. ∂ SSE ∂ β = − 2 Σ x t β ln x t y t − x t β = 0 Σ [ x t β ln x t y t ] − Σ [ x t 2β ln x t ] = 0 Must use numerical search algorithm to try to find value of β to satisfy this. 14.5 Copyright 1996 Lawrence C. Marsh Find Minimum of Nonlinear SSE β β SSE SSE = Σ y t − x t β 2 14.6 Copyright 1996 Lawrence C. Marsh The least squares principle is still appropriate when the model is nonlinear , but it is harder to find the solution. Conclusion 14.7 Copyright 1996 Lawrence C. Marsh Nonlinear least squares optimization methods: The Gauss-Newton Method Optional Appendix 14.8 Copyright 1996 Lawrence C. Marsh The Gauss-Newton Algorithm 1. Apply the Taylor Series Expansion to the nonlinear model around some initial b o . 2. Run Ordinary Least Squares OLS on the linear part of the Taylor Series to get b m . 3. Perform a Taylor Series around the new b m to get b m+1 . 4. Relabel b m+1 as b m and rerun steps 2.-4. 5. Stop when b m+1 − b m becomes very small. 14.9 Copyright 1996 Lawrence C. Marsh The Gauss-Newton Method y t = fX t ,b + ε t for t = 1, . . . , n. Do a Taylor Series Expansion around the vector b = b o as follows: y t = fX t ,b ο + f’X t ,b ο b - b ο + ε t ∗ where ε t ∗ ≡ b - b o T f’’X t ,b ο b - b ο + R t + ε t fX t ,b = fX t ,b ο + f’X t ,b ο b - b ο + b - b ο T f’’X t ,b ο b - b ο + R t 14.10 Copyright 1996 Lawrence C. Marsh y t = fX t ,b ο + f’X t ,b ο b - b ο + ε t ∗ y t - fX t ,b ο = f’X t ,b ο b - f’X t ,b ο b ο + ε t ∗ y t - fX t ,b ο + f’X t ,b ο b ο = f’X t ,b ο b + ε t ∗ y t ∗ ο = f’X t ,b ο b + ε t ∗ where y t ∗ ο ≡ y t - fX t ,b ο + f’X t ,b ο b ο This is linear in b . Gauss-Newton just runs OLS on this transformed truncated Taylor series. 14.11 Copyright 1996 Lawrence C. Marsh y t ∗ ο = f’X t ,b ο b + ε t ∗ Gauss-Newton just runs OLS on this transformed truncated Taylor series. or y ∗ ο = f’X,b ο b + ∈ ∗ for t = 1, . . . , n in matrix terms b = [ f’X,b ο T f’X,b ο ] -1 f’X,b ο T y ∗ ο This is analogous to linear OLS where y = Xb + ∈ led to the solution: b = X T X −1 X T y except that X is replaced with the matrix of first partial derivatives: f’X t ,b ο and y is replaced by y ∗ ο

i.e. “y” = y

ο and “X” = f’X,b ο 14.12 Copyright 1996 Lawrence C. Marsh Recall that: y o ≡ y − fX,b o + f’X,b ο b ο Now define: y ∗∗ ο ≡ y − fX,b o Therefore: y ∗ ο = y ∗∗ ο + f’X,b ο b ο b = [ f’X,b ο T f’X,b ο ] -1 f’X,b ο T y ∗ ο Now substitute in for y ∗ in Gauss-Newton solution: to get: b = b o + [ f’X,b ο T f’X,b ο ] -1 f’X,b ο T y ∗∗ ο 14.13 Copyright 1996 Lawrence C. Marsh b = b o + [ f’X,b ο T f’X,b ο ] -1 f’X,b ο T y ∗∗ ο b 1 = b ο + [ f’X,b ο T f’X,b ο ] -1 f’X,b ο T y ∗∗ ο Now call this b value b 1 as follows: More generally, in going from interation m to iteration m+1 we obtain the general expression: b m+1 = b m + [ f’X,b m T f’X,b m ] -1 f’X,b m T y ∗∗ m 14.14 Copyright 1996 Lawrence C. Marsh b m+1 = [ f’X,b m T f’X,b m ] -1 f’X,b m T y m b m+1 = b m + [ f’X,b m T f’X,b m ] -1 f’X,b m T y ∗∗ m Thus, the Gauss-Newton nonlinear OLS solution can be expressed in two alternative, but equivalent, forms:

1. replacement form: