Simple Linear Regression Case Study

11.11 Simple Linear Regression Case Study

In the manufacture of commercial wood products, it is important to estimate the relationship between the density of a wood product and its stiffness. A relatively new type of particleboard is being considered that can be formed with considerably more ease than the accepted commercial product. It is necessary to know at what density the stiffness is comparable to that of the well-known, well-documented commercial product. A study was done by Terrance E. Conners, Investigation of Certain Mechanical Properties of a Wood-Foam Composite (M.S. Thesis, Depart- ment of Forestry and Wildlife Management, University of Massachusetts). Thirty particleboards were produced at densities ranging from roughly 8 to 26 pounds per cubic foot, and the stiffness was measured in pounds per square inch. Table 11.8 shows the data.

It is necessary for the data analyst to focus on an appropriate fit to the data and use inferential methods discussed in this chapter. Hypothesis testing on the slope of the regression, as well as confidence or prediction interval estimation, may well be appropriate. We begin by demonstrating a simple scatter plot of the raw data with a simple linear regression superimposed. Figure 11.23 shows this plot.

The simple linear regression fit to the data produced the fitted model y = −25,433.739 + 3884.976x (R ˆ 2 = 0.7975),

11.11 Simple Linear Regression Case Study 429

Table 11.8: Density and Stiffness for 30 Particleboards Density, x

Stiffness, y

Density, x

Stiffness, y

Figure 11.23: Scatter plot of the wood density data. Figure 11.24: Residual plot for the wood density

data.

and the residuals were computed. Figure 11.24 shows the residuals plotted against the measurements of density. This is hardly an ideal or healthy set of residuals. They do not show a random scatter around a value of zero. In fact, clusters of positive and negative values suggest that a curvilinear trend in the data should be investigated.

To gain some type of idea regarding the normal error assumption, a normal probability plot of the residuals was generated. This is the type of plot discussed in

430 Chapter 11 Simple Linear Regression and Correlation Section 8.8 in which the horizontal axis represents the empirical normal distribution

function on a scale that produces a straight-line plot when plotted against the residuals. Figure 11.25 shows the normal probability plot of the residuals. The normal probability plot does not reflect the straight-line appearance that one would like to see. This is another symptom of a faulty, perhaps overly simplistic choice of a regression model.

Residual Quantile

Standard Normal Quantile

Figure 11.25: Normal probability plot of residuals for wood density data. Both types of residual plots and, indeed, the scatter plot itself suggest here that

a somewhat complicated model would be appropriate. One possible approach is to use a natural log transformation. In other words, one might choose to regress ln y against x. This produces the regression

2 ln y = 8.257 + 0.125x (R 2 = 0.9016).

To gain some insight into whether the transformed model is more appropriate, consider Figures 11.26 and 11.27, which reveal plots of the residuals in stiffness [i.e.,

y i -antilog (2 ln y)] against density. Figure 11.26 appears to be closer to a random pattern around zero, while Figure 11.27 is certainly closer to a straight line. This

in addition to the higher R 2 -value would suggest that the transformed model is more appropriate.