POLYNOMIAL REGRESSION Applied Numerical Methods with MATLAB fo

For this case the sum of the squares of the residuals is S r = n i = 1 y i − a − a 1 x i − a 2 x 2 i 2 15.2 To generate the least-squares fit, we take the derivative of Eq. 15.2 with respect to each of the unknown coefficients of the polynomial, as in ∂ S r ∂ a = − 2 y i − a − a 1 x i − a 2 x 2 i ∂ S r ∂ a 1 = − 2 x i y i − a − a 1 x i − a 2 x 2 i ∂ S r ∂ a 2 = − 2 x 2 i y i − a − a 1 x i − a 2 x 2 i These equations can be set equal to zero and rearranged to develop the following set of normal equations: na + x i a 1 + x 2 i a 2 = y i x i a + x 2 i a 1 + x 3 i a 2 = x i y i x 2 i a + x 3 i a 1 + x 4 i a 2 = x 2 i y i y x a y x b FIGURE 15.1 a Data that are ill-suited for linear least-squares regression. b Indication that a parabola is preferable. 15.1 POLYNOMIAL REGRESSION 363 where all summations are from i = 1 through n. Note that the preceding three equations are linear and have three unknowns: a , a 1 , and a 2 . The coefficients of the unknowns can be calculated directly from the observed data. For this case, we see that the problem of determining a least-squares second-order polynomial is equivalent to solving a system of three simultaneous linear equations. The two-dimensional case can be easily extended to an mth-order polynomial as in y = a + a 1 x + a 2 x 2 + · · · + a m x m + e The foregoing analysis can be easily extended to this more general case. Thus, we can recognize that determining the coefficients of an mth-order polynomial is equivalent to solving a system of m + 1 simultaneous linear equations. For this case, the standard error is formulated as s yx = S r n − m + 1 15.3 This quantity is divided by n − m + 1 because m + 1 data-derived coefficients— a , a 1 , . . . , a m —were used to compute S r ; thus, we have lost m + 1 degrees of freedom. In addition to the standard error, a coefficient of determination can also be computed for polynomial regression with Eq. 14.20. EXAMPLE 15.1 Polynomial Regression Problem Statement. Fit a second-order polynomial to the data in the first two columns of Table 15.1. TABLE 15.1 Computations for an error analysis of the quadratic least-squares fit. x i y i y i − y – 2 y i − a − a 1 x i − a 2 x 2 i 2 2.1 544.44 0.14332 1 7.7 314.47 1.00286 2 13.6 140.03 1.08160 3 27.2 3.12 0.80487 4 40.9 239.22 0.61959 5 61.1 1272.11 0.09434 152.6 2513.39 3.74657 Solution. The following can be computed from the data: m = 2 x i = 15 x 4 i = 979 n = 6 y i = 152.6 x i y i = 585.6 ¯ x = 2.5 x 2 i = 55 x 2 i y i = 2488.8 ¯y = 25.433 x 3 i = 225 Therefore, the simultaneous linear equations are 6 15 55 15 55 225 55 225 979 a a 1 a 2 = 152.6 585.6 2488.8 These equations can be solved to evaluate the coefficients. For example, using MATLAB: N = [6 15 55;15 55 225;55 225 979]; r = [152.6 585.6 2488.8]; a = N\r a = 2.4786 2.3593 1.8607 Therefore, the least-squares quadratic equation for this case is y = 2.4786 + 2.3593x + 1.8607x 2 The standard error of the estimate based on the regression polynomial is [Eq. 15.3] s yx = 3.74657 6 − 2 + 1 = 1.1175 The coefficient of determination is r 2 = 2513.39 − 3.74657 2513.39 = 0.99851 and the correlation coefficient is r = 0.99925. y x 5 50 Least-squares parabola FIGURE 15.2 Fit of a second-order polynomial. 15.2 MULTIPLE LINEAR REGRESSION 365 These results indicate that 99.851 percent of the original uncertainty has been ex- plained by the model. This result supports the conclusion that the quadratic equation represents an excellent fit, as is also evident from Fig. 15.2.

15.2 MULTIPLE LINEAR REGRESSION

Another useful extension of linear regression is the case where y is a linear function of two or more independent variables. For example, y might be a linear function of x 1 and x 2 , as in y = a + a 1 x 1 + a 2 x 2 + e Such an equation is particularly useful when fitting experimental data where the variable being studied is often a function of two other variables. For this two-dimensional case, the regression “line” becomes a “plane” Fig. 15.3. As with the previous cases, the “best” values of the coefficients are determined by formulating the sum of the squares of the residuals: S r = n i = 1 y i − a − a 1 x 1,i − a 2 x 2,i 2 15.4 and differentiating with respect to each of the unknown coefficients: ∂ S r ∂ a = − 2 y i − a − a 1 x 1,i − a 2 x 2,i ∂ S r ∂ a 1 = − 2 x 1,i y i − a − a 1 x 1,i − a 2 x 2,i ∂ S r ∂ a 2 = − 2 x 2,i y i − a − a 1 x 1,i − a 2 x 2,i y x 1 x 2 FIGURE 15.3 Graphical depiction of multiple linear regression where y is a linear function of x 1 and x 2 . The coefficients yielding the minimum sum of the squares of the residuals are obtained by setting the partial derivatives equal to zero and expressing the result in matrix form as ⎡ ⎣ n x 1,i x 2,i x 1,i x 2 1,i x 1,i x 2,i x 2,i x 1,i x 2,i x 2 2,i ⎤ ⎦ a a 1 a 2 = y i x 1,i y i x 2,i y i 15.5 EXAMPLE 15.2 Multiple Linear Regression Problem Statement. The following data were created from the equation y = 5 + 4x 1 − 3x 2 : x 1 x 2 y 5 2 1 10 2.5 2 9 1 3 4 6 3 7 2 27 Use multiple linear regression to fit this data. Solution. The summations required to develop Eq. 15.5 are computed in Table 15.2. Substituting them into Eq. 15.5 gives 6 16.5 14 16.5 76.25 48 14 48 54 a a 1 a 2 = 54 243.5 100 15.6 which can be solved for a = 5 a 1 = 4 a 2 = − 3 which is consistent with the original equation from which the data were derived. The foregoing two-dimensional case can be easily extended to m dimensions, as in y = a + a 1 x 1 + a 2 x 2 + · · · + a m x m + e TABLE 15.2 Computations required to develop the normal equations for Example 15 . 2 . y x 1 x 2 x 2 1 x 2 2 x 1 x 2 x 1 y x 2 y 5 10 2 1 4 1 2 20 10 9 2.5 2 6.25 4 5 22.5 18 1 3 1 9 3 3 4 6 16 36 24 12 18 27 7 2 49 4 14 189 54 54 16.5 14 76.25 54 48 243.5 100