Simple Linear Regression Case Study
11.11 Simple Linear Regression Case Study
In the manufacture of commercial wood products, it is important to estimate the relationship between the density of a wood product and its stiffness. A relatively new type of particleboard is being considered that can be formed with considerably more ease than the accepted commercial product. It is necessary to know at what density the stiffness is comparable to that of the well-known, well-documented commercial product. A study was done by Terrance E. Conners, Investigation of Certain Mechanical Properties of a Wood-Foam Composite (M.S. Thesis, Depart- ment of Forestry and Wildlife Management, University of Massachusetts). Thirty particleboards were produced at densities ranging from roughly 8 to 26 pounds per cubic foot, and the stiffness was measured in pounds per square inch. Table 11.8 shows the data.
It is necessary for the data analyst to focus on an appropriate fit to the data and use inferential methods discussed in this chapter. Hypothesis testing on the slope of the regression, as well as confidence or prediction interval estimation, may well be appropriate. We begin by demonstrating a simple scatter plot of the raw data with a simple linear regression superimposed. Figure 11.23 shows this plot.
The simple linear regression fit to the data produced the fitted model y = −25,433.739 + 3884.976x (R ˆ 2 = 0.7975),
11.11 Simple Linear Regression Case Study 429
Table 11.8: Density and Stiffness for 30 Particleboards Density, x
Stiffness, y
Density, x
Stiffness, y
Figure 11.23: Scatter plot of the wood density data. Figure 11.24: Residual plot for the wood density
data.
and the residuals were computed. Figure 11.24 shows the residuals plotted against the measurements of density. This is hardly an ideal or healthy set of residuals. They do not show a random scatter around a value of zero. In fact, clusters of positive and negative values suggest that a curvilinear trend in the data should be investigated.
To gain some type of idea regarding the normal error assumption, a normal probability plot of the residuals was generated. This is the type of plot discussed in
430 Chapter 11 Simple Linear Regression and Correlation Section 8.8 in which the horizontal axis represents the empirical normal distribution
function on a scale that produces a straight-line plot when plotted against the residuals. Figure 11.25 shows the normal probability plot of the residuals. The normal probability plot does not reflect the straight-line appearance that one would like to see. This is another symptom of a faulty, perhaps overly simplistic choice of a regression model.
Residual Quantile
Standard Normal Quantile
Figure 11.25: Normal probability plot of residuals for wood density data. Both types of residual plots and, indeed, the scatter plot itself suggest here that
a somewhat complicated model would be appropriate. One possible approach is to use a natural log transformation. In other words, one might choose to regress ln y against x. This produces the regression
2 ln y = 8.257 + 0.125x (R 2 = 0.9016).
To gain some insight into whether the transformed model is more appropriate, consider Figures 11.26 and 11.27, which reveal plots of the residuals in stiffness [i.e.,
y i -antilog (2 ln y)] against density. Figure 11.26 appears to be closer to a random pattern around zero, while Figure 11.27 is certainly closer to a straight line. This
in addition to the higher R 2 -value would suggest that the transformed model is more appropriate.
Parts
» Probability Statistics for Engineers Scientists
» Sampling Procedures; Collection of Data
» Measures of Location: The Sample Mean and Median
» Discrete and Continuous Data
» Statistical Modeling, Scientific Inspection, and Graphical Diagnostics
» Conditional Probability, Independence, and the Product Rule
» Concept of a Random Variable
» Discrete Probability Distributions
» Continuous Probability Distributions
» Joint Probability Distributions
» Variance and Covariance of Random Variables
» Means and Variances of Linear Combinations of Random Variables
» Binomial and Multinomial Distributions
» Negative Binomial and Geometric Distributions
» Poisson Distribution and the Poisson Process
» Areas under the Normal Curve
» Applications of the Normal Distribution
» Normal Approximation to the Binomial
» Gamma and Exponential Distributions
» Weibull Distribution (Optional)
» Transformations of Variables
» Moments and Moment-Generating Functions
» Sampling Distribution of Means and the Central Limit Theorem
» Quantile and Probability Plots
» Classical Methods of Estimation
» Single Sample: Estimating the Mean
» Two Samples: Estimating the Difference between Two Means
» Single Sample: Estimating a Proportion
» Two Samples: Estimating the Difference between Two Proportions
» Two Samples: Estimating the Ratio of Two Variances
» Maximum Likelihood Estimation (Optional)
» Statistical Hypotheses: General Concepts
» Testing a Statistical Hypothesis
» The Use of P -Values for Decision Making in Testing Hypotheses
» Single Sample: Tests Concerning a Single Mean
» Two Samples: Tests on Two Means
» Choice of Sample Size for Testing Means
» Graphical Methods for Comparing Means
» One Sample: Test on a Single Proportion
» Two Samples: Tests on Two Proportions
» One- and Two-Sample Tests Concerning Variances
» Test for Independence (Categorical Data)
» Introduction to Linear Regression
» The Simple Linear Regression (SLR) Model
» Least Squares and the Fitted Model
» Inferences Concerning the Regression Coefficients
» Analysis-of-Variance Approach
» Test for Linearity of Regression: Data with Repeated Observations
» Data Plots and Transformations
» Simple Linear Regression Case Study
» Linear Regression Model Using Matrices
» Inferences in Multiple Linear Regression
» Choice of a Fitted Model through Hypothesis Testing
» Special Case of Orthogonality (Optional)
» Categorical or Indicator Variables
» Sequential Methods for Model Selection
» Study of Residuals and Violation of Assumptions (Model Checking)
» Cross Validation, C p , and Other Criteria for Model Selection
» Special Nonlinear Models for Nonideal Conditions
» Analysis-of-Variance Technique
» One-Way Analysis of Variance: Completely Randomized Design (One-Way ANOVA)
» Tests for the Equality of Several Variances
» Single-Degree-of-Freedom Comparisons
» Comparing a Set of Treatments in Blocks
» Randomized Complete Block Designs
» Graphical Methods and Model Checking
» Data Transformations in Analysis of Variance
» Interaction in the Two-Factor Experiment
» Two-Factor Analysis of Variance
» Factorial Experiments for Random Effects and Mixed Models
» Factorial Experiments in a Regression Setting
» Fractional Factorial Experiments
» Analysis of Fractional Factorial Experiments
» Introduction to Response Surface Methodology
» Rank Correlation Coefficient
» Control Charts for Variables
» Control Charts for Attributes
» Bayes Estimates Using Decision Theory Framework
Show more