John Wiley & Sons, Inc. Applied Stafisfics and Probability for Engineers, by
Montgomery and Runger. Atatürk University. Regression&Correla on. After
carefully ...
Regression&Correla5on
STATISTICS and PROBABILITY LECTURE: Linear Regression&CorrelaBon
Prof. Dr. İrfan KAYMAZ Atatürk University Engineering Faculty Department of Mechanical Engineering Atatürk University
objecEves of this lecture Regression&Correla5on A>er carefully following this lecture, you should be able to do the following: 1. Use simple linear regression for building empirical models to engineering and scienEfic data. 2. Understand how the method of least squares is used to esEmate the parameters in a linear regression model. 3. Analyze residuals to determine if the regression model is an adequate fit to the data or to see if any underlying assumpEons are violated. 4. Use the regression model to make a predicEon of a future observaEon and construct an appropriate predicEon interval on the future observaEon. 5. Apply the correlaEon model.
© John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University
Empirical Models
Regression&Correla5on
• Many problems in engineering and science involve
exploring the relaEonships between two or more variables. egression analysis is a staEsEcal technique that is • R very useful for these types of problems. • For example, in a chemical process, suppose that the yield of the product is related to the process-‐operaEng temperature. • Regression analysis can be used to build a model to predict yield at a given temperature level.
© John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University
Empirical Models
Regression&Correla5on
ScaTer Diagram of oxygen purity versus hydrocarbon level from the table © John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University
Empirical Models
Regression&Correla5on
Based on the scaTer diagram, it is probably reasonable to assume that the mean of the random variable Y is related to x by the following straight-‐line relaEonship:
where the slope and intercept of the line are called regression coefficients. The simple linear regression model is given by
where ε is the random error term.
© John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University
Empirical Models
Regression&Correla5on
We think of the regression model as an empirical model. Suppose that the mean and variance of ε are 0 and σ2, respecEvely, then
The variance of Y given x is
© John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University
Empirical Models
Regression&Correla5on
The true regression model is a line of mean values:
§ where β1 can be interpreted as the change in the mean of Y for a unit change in x. § Also, the variability of Y at a parEcular value of x is determined by the error variance, σ2. § This implies there is a distribuEon of Y-‐values at each x and that the variance of this distribuEon is the same at each x.
© John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University
Empirical Models
Regression&Correla5on
Figure:The distribuEon of Y for a given value of x for the oxygen purity-‐ hydrocarbon data.
© John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University
Simple Linear Regression Regression&Correla5on § The case of simple linear regression considers a single regressor or predictor x and a dependent or response variable Y. § The expected value of Y at each level of x is a random variable:
§ We assume that each observaEon, Y, can be described by the model
© John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University
Simple Linear Regression Regression&Correla5on § Suppose that we have n pairs of observaEons (x1, y1), (x2, y2), …, (xn, yn).
Figure: DeviaEons of the data from the esEmated regression model.
© John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University
Simple Linear Regression Regression&Correla5on § The method of least squares is used to esEmate the parameters, β0 and β1 by minimizing the sum of the squares of the verEcal deviaEons in the Figure
Figure: DeviaEons of the data from the esEmated regression model. © John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University
Simple Linear Regression Regression&Correla5on § the n observaEons in the sample can be expressed as
§ The sum of the squares of the deviaEons of the observaEons from the true regression line is
© John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University
Simple Linear Regression Regression&Correla5on
© John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University
Simple Linear Regression Regression&Correla5on
14 © John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University
Simple Linear Regression Regression&Correla5on Definition
© John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University
Simple Linear Regression Regression&Correla5on
© John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University
Simple Linear Regression Regression&Correla5on Notation
17 © John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University
Example
Regression&Correla5on
© John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University
Adequacy of the Regression Model Regression&Correla5on
Coefficient of Determination (R2) § The quanEty
§ is called the coefficient of determinaBon and is o>en used to judge the adequacy of a regression model. § 0 ≤ R2 ≤ 1; § We o>en refer (loosely) to R2 as the amount of variability in the data explained or accounted for by the regression model.
© John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University
Adequacy of the Regression Model Regression&Correla5on
Coefficient of Determination (R2) § For the oxygen purity regression model, R2 = SSR/SST = 152.13/173.38 = 0.877 § Thus, the model accounts for 87.7% of the variability in the data.
© John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University
CorrelaBon
Regression&Correla5on
It is possible to draw inferences about the correlaEon coefficient ρ in this model. The esEmator of ρ is the sample correlaEon coefficient
21 © John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University
CorrelaBon
Regression&Correla5on
It is o>en useful to test the hypotheses
The appropriate test staEsEc for these hypotheses is
Reject H0 if |t0| > tα/2,n-‐2. © John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University
CorrelaBon
Regression&Correla5on
Find the correlaEon between pull strength and wire length
23 © John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University
Next Week
Regression&Correla5on
Have a good holiday…
© John Wiley & Sons, Inc. Applied Sta+s+cs and Probability for Engineers, by Montgomery and Runger.
Atatürk University