Introduction to Linear Regression
11.1 Introduction to Linear Regression
Often, in practice, one is called upon to solve problems involving sets of variables when it is known that there exists some inherent relationship among the variables. For example, in an industrial situation it may be known that the tar content in the outlet stream in a chemical process is related to the inlet temperature. It may be of interest to develop a method of prediction, that is, a procedure for estimating the tar content for various levels of the inlet temperature from experimental infor- mation. Now, of course, it is highly likely that for many example runs in which the inlet temperature is the same, say 130 ◦
C, the outlet tar content will not be the same. This is much like what happens when we study several automobiles with the same engine volume. They will not all have the same gas mileage. Houses in the same part of the country that have the same square footage of living space will not all be sold for the same price. Tar content, gas mileage (mpg), and the price of houses (in thousands of dollars) are natural dependent variables, or responses, in these three scenarios. Inlet temperature, engine volume (cubic feet), and square feet of living space are, respectively, natural independent variables, or regressors. A reasonable form of a relationship between the response Y and the regressor x is the linear relationship
Y=β 0 +β 1 x,
where, of course, β 0 is the intercept and β 1 is the slope. The relationship is illustrated in Figure 11.1. If the relationship is exact, then it is a deterministic relationship between two scientific variables and there is no random or probabilistic component to it. However, in the examples listed above, as well as in countless other scientific and engineering phenomena, the relationship is not deterministic (i.e., a given x does not always give the same value for Y ). As a result, important problems here are probabilistic in nature since the relationship above cannot be viewed as being exact. The concept of regression analysis deals with finding the best relationship
390 Chapter 11 Simple Linear Regression and Correlation
Y=
Figure 11.1: A linear relationship; β 0 : intercept; β 1 : slope.
between Y and x, quantifying the strength of that relationship, and using methods that allow for prediction of the response values given values of the regressor x.
In many applications, there will be more than one regressor (i.e., more than one independent variable that helps to explain Y ). For example, in the case where the response is the price of a house, one would expect the age of the house to contribute to the explanation of the price, so in this case the multiple regression structure might be written
Y=β 0 +β 1 x 1 +β 2 x 2 , where Y is price, x 1 is square footage, and x 2 is age in years. In the next chap-
ter, we will consider problems with multiple regressors. The resulting analysis is termed multiple regression, while the analysis of the single regressor case is called simple regression. As a second illustration of multiple regression, a chem- ical engineer may be concerned with the amount of hydrogen lost from samples of a particular metal when the material is placed in storage. In this case, there
may be two inputs, storage time x 1 in hours and storage temperature x 2 in degrees centigrade. The response would then be hydrogen loss Y in parts per million. In this chapter, we deal with the topic of simple linear regression, treating only the case of a single regressor variable in which the relationship between y and x is linear. For the case of more than one regressor variable, the reader is referred to Chapter 12. Denote a random sample of size n by the set {(x i ,y i ); i = 1, 2, . . . , n}. If additional samples were taken using exactly the same values of x, we should expect the y values to vary. Hence, the value y i in the ordered pair (x i ,y i ) is a value of some random variable Y i .
Parts
» Probability Statistics for Engineers Scientists
» Sampling Procedures; Collection of Data
» Measures of Location: The Sample Mean and Median
» Discrete and Continuous Data
» Statistical Modeling, Scientific Inspection, and Graphical Diagnostics
» Conditional Probability, Independence, and the Product Rule
» Concept of a Random Variable
» Discrete Probability Distributions
» Continuous Probability Distributions
» Joint Probability Distributions
» Variance and Covariance of Random Variables
» Means and Variances of Linear Combinations of Random Variables
» Binomial and Multinomial Distributions
» Negative Binomial and Geometric Distributions
» Poisson Distribution and the Poisson Process
» Areas under the Normal Curve
» Applications of the Normal Distribution
» Normal Approximation to the Binomial
» Gamma and Exponential Distributions
» Weibull Distribution (Optional)
» Transformations of Variables
» Moments and Moment-Generating Functions
» Sampling Distribution of Means and the Central Limit Theorem
» Quantile and Probability Plots
» Classical Methods of Estimation
» Single Sample: Estimating the Mean
» Two Samples: Estimating the Difference between Two Means
» Single Sample: Estimating a Proportion
» Two Samples: Estimating the Difference between Two Proportions
» Two Samples: Estimating the Ratio of Two Variances
» Maximum Likelihood Estimation (Optional)
» Statistical Hypotheses: General Concepts
» Testing a Statistical Hypothesis
» The Use of P -Values for Decision Making in Testing Hypotheses
» Single Sample: Tests Concerning a Single Mean
» Two Samples: Tests on Two Means
» Choice of Sample Size for Testing Means
» Graphical Methods for Comparing Means
» One Sample: Test on a Single Proportion
» Two Samples: Tests on Two Proportions
» One- and Two-Sample Tests Concerning Variances
» Test for Independence (Categorical Data)
» Introduction to Linear Regression
» The Simple Linear Regression (SLR) Model
» Least Squares and the Fitted Model
» Inferences Concerning the Regression Coefficients
» Analysis-of-Variance Approach
» Test for Linearity of Regression: Data with Repeated Observations
» Data Plots and Transformations
» Simple Linear Regression Case Study
» Linear Regression Model Using Matrices
» Inferences in Multiple Linear Regression
» Choice of a Fitted Model through Hypothesis Testing
» Special Case of Orthogonality (Optional)
» Categorical or Indicator Variables
» Sequential Methods for Model Selection
» Study of Residuals and Violation of Assumptions (Model Checking)
» Cross Validation, C p , and Other Criteria for Model Selection
» Special Nonlinear Models for Nonideal Conditions
» Analysis-of-Variance Technique
» One-Way Analysis of Variance: Completely Randomized Design (One-Way ANOVA)
» Tests for the Equality of Several Variances
» Single-Degree-of-Freedom Comparisons
» Comparing a Set of Treatments in Blocks
» Randomized Complete Block Designs
» Graphical Methods and Model Checking
» Data Transformations in Analysis of Variance
» Interaction in the Two-Factor Experiment
» Two-Factor Analysis of Variance
» Factorial Experiments for Random Effects and Mixed Models
» Factorial Experiments in a Regression Setting
» Fractional Factorial Experiments
» Analysis of Fractional Factorial Experiments
» Introduction to Response Surface Methodology
» Rank Correlation Coefficient
» Control Charts for Variables
» Control Charts for Attributes
» Bayes Estimates Using Decision Theory Framework
Show more