simple linear regression

13 downloads 0 Views 236KB Size Report
MS Statistics I. SIMPLE LINEAR REGRESSION. 6.1 THE MODEL. The Simple Linear Regression Model for n observations can be written as yi = β0 + β1xi + εi,.
JAYSON N. PAYLA

BERNADETTE F. TUBO, Ph.D. MS Statistics I

SIMPLE LINEAR REGRESSION

6.1 THE MODEL The Simple Linear Regression Model for n observations can be written as yi = β0 + β1 xi + εi ,

(1)

i= 1,2, ....,n The designation simple indicates that there is only one x to predict the response y and linear means that the model is linear in β0 and β1 . For example, a model such as yi = β0 + β1 x2i + εi is linear in β0 and β1 , whereas the model yi = β0 + eβ1 xi + εi is not linear. In this chapter, we assume that yi and εi are random variables and that the values of xi are known constants, which means that the same values of x1 , x2 , ..., xn would be used in repeated sampling. The case in which the x variables are random variables is treated in Chapter 10. To complete the model, we make the following additional assumptions: 1. E(εi ) = 0 for all i = 1, 2, ..., n, or equivalently, E(yi ) = β0 + β1 xi . 2. var(εi ) = σ 2 for all i = 1, 2, ..., n, or equivalently, var(yi ) = σ 2 . 3. cov(εi , εj ) = 0 for all i 6= j, or equivalently, cov(yi , yj ) = 0. 6.2 ESTIMATION OF β0 , β1 AND σ 2 Using a random sample of n observations y1 , y2 , ..., yn and the accompanying fixed values x1 , x2 , ..., xn , we can estimate the parameters β0 , β1 and σ 2 . To obtain the estimate βˆ0 and βˆ1 , we use the method of least squares.

In the least-squares approach, we seek estimators βˆ0 and βˆ1 that minimized the sum of squares of the deviations yi − yˆi of the n observed yi ’s from their predicted values yˆi = βˆ0 + βˆ1 xi

0

εˆ εˆ =

n X

εˆ2i

=

i=1

n X

2

(yi − yˆi ) =

i=1

n X

(yi − βˆ0 − βˆ1 xi )2

(2)

i=1

Note that the predicted value yˆi estimate E(yi ), not yi , that is βˆ0 + βˆ1 xi estimates β0 + β1 xi , not ˆ i ), but yˆi is commonly used. β0 + β1 xi + εi . A better notation would be E(y To find the values of βˆ0 and βˆ1 that minimized εˆ0 εˆ in (2), we differentiate with respect to βˆ0 and βˆ1 and set the results equal to 0:

n

X δ εˆ0 εˆ = −2 (yi − βˆ0 − βˆ1 xi ) = 0 δ βˆ0

(3)

i=1

n

X δ εˆ0 εˆ = −2 (yi − βˆ0 − βˆ1 xi )xi = 0 ˆ δ β1

(4)

i=1

The solutions to (3) and (4) is given by Pn Pn (x − x¯)(yi − y¯) xi yi − n¯ xy¯ i=1 ˆ Pn i = i=1 β 1 = Pn 2 2 x ¯ )2 i=1 xi − n¯ i=1 (xi − x

(5)

βˆ0 = y¯ − βˆ1 x¯.

(6)

Let’s start by noting the following x¯ =

1 n

Pn

i=1

xi ⇒

Pn

i=1

xi = n¯ x and similarly,

Also, n X

(xi − x¯)

2

n X = (x2i − 2xi x¯ + x¯2 )

i=1

=

i=1 n X

x2i − 2¯ x

i=1

=

n X

n X

=

n X

i=1

x2i − 2¯ xn¯ x + n¯ x2

i=1 n X

xi +

x2i − n¯ x2

i=1

Page 2

i=1

x¯2

Pn

i=1

yi = n¯ y

Proof: To solve (6). −2

n X

(yi − βˆ0 − βˆ1 xi ) = 0

i=1 n X

(yi − βˆ0 − βˆ1 xi ) = 0

i=1 n X

yi −

i=1

n X

βˆ0 −

i=1

n X

βˆ1 xi = 0

i=1

n¯ y − nβˆ0 − nβˆ1 x¯ = 0 n(¯ y − βˆ0 − βˆ1 x¯) = 0 y¯ − βˆ0 − βˆ1 x¯ = 0 βˆ0 = y¯ − βˆ1 x¯

Proof: To solve (5). −2

n X (yi − βˆ0 − βˆ1 xi )xi = 0 i=1

n X

(xi yi − βˆ0 xi − βˆ1 x2i ) = 0

i=1 n X i=1 n X i=1 n X i=1 n X

xi y i −

n X

βˆ0 xi −

i=1 n X

xi yi − βˆ0

n X

βˆ1 x2i = 0

i=1

xi − βˆ1

i=1

n X

x2i = 0

i=1

xi yi − (¯ y − βˆ1 x¯)n¯ x − βˆ1

xi yi − n¯ xy¯ + nβˆ1 x¯2 − βˆ1

i=1

n X i=1 n X

x2i = 0 x2i = 0

i=1 n X

xi yi − n¯ xy¯ = βˆ1

i=1 n X

n X

x2i − nβˆ1 x¯2

i=1

n X ˆ (xi − x¯)(yi − y¯) = β1 (xi − x¯)2

i=1

i=1

Page 3

βˆ1

Pn (x − x¯)(yi − y¯) i=1 Pn i = ¯ )2 i=1 (xi − x

To verify that βˆ0 and βˆ1 in (5) and (6) minimize εˆ0 εˆ in (2), we can examine the second derivative or simply observe that εˆ0 εˆ has no maximum and therefore the first derivatives yields a minimum. Example 6.2 Students in a class (taught by one of the authors) claimed that doing the homework had not helped prepare them for the midterm exam. The exam score y and homework score x (average up to the time of the midterm) for the 18 students in the class were as follows:

Page 4

Using (5) and (6), we obtain Pn xi yi − n¯ xy¯ 81, 195 − 18(58.056)(61.389) βˆ1 = Pi=1 = 0.8726 = n 2 80, 199 − 18(58.056)2 x2 i=1 xi − n¯

βˆ0 = y¯ − βˆ1 x¯ = 61.389 − 0.8726(58.056) = 10.73 The prediction equation is thus given by

yˆ = 10.73 + 0.8726x

This equation and the 18 points are plotted in Figure 6.1. It is readily apparent in the plot that the slope βˆ1 is the rate of change of yˆ as x varies and that the intercept βˆo is the value of yˆ at x = 0. Note that the three assumptions were not used in deriving the least squares estimators βˆ1 and βˆ0 in (5) and (6). It is not necessary that yˆi = βˆ0 + βˆ1 xi be based on E(yi ) = β0 + βi xi ; that is yˆi = βˆ0 + βˆ1 xi can be fit to a set of data for which E(yi ) 6= β0 + β1 xi . This is illustrated in Figure 6.2, where a straight line has been fitted to curved data.

Page 5

However, if the three assumptions hold, then the least-squares estimators βˆ0 and βˆ1 are unbiased and have minimum variance among all linear unbiased estimators. Using the three assumptions, we obtain the following means and variances of βˆ0 and βˆ1 : E(βˆ1 ) = β1

(7)

E(βˆ0 ) = β0

(8)

σ2 ¯)2 i=1 (xi − x

var(βˆ1 ) = Pn

(9)

x¯2 2 1 ˆ P var(β0 ) = σ [ + n ] n ¯ )2 i=1 (xi − x

(10)

Proof: To show the unbiasness of the slope estimator βˆ1 in (7). βˆ1 = = = =

Pn (x − x¯)(yi − y¯) i=1 Pn i (x − x¯)2 Pn i=1 i (xi − x¯)yi Pi=1 n (x − x¯)2 Pni=1 i ¯)(β0 + β1 xi + εi ) i−x i=1 (xP n ¯)2 i=1 (xi − x n X 1 Pn [(xi − x¯)β0 + β1 (xi − x¯)xi + (xi − x¯)εi ] ¯)2 i=1 i=1 (xi − x n

n

n

X X X 1 [β (x − x ¯ ) + β (x − x ¯ )x + (xi − x¯)εi ] 0 i 1 i i ¯)2 i=1 (xi − x i=1 i=1 i=1

= Pn

n

= E(βˆ1 ) = = =

n

X X 1 2 (0 + β (x − x ¯ ) + (xi − x¯)εi ) 1 i ¯)2 i=1 (xi − x i=1 i=1 Pn (x − x ¯ )ε i i β1 + Pi=1 n 2 (x − x ¯ ) i=1 i Pn (xi − x¯)εi E(β1 ) + E( Pi=1 ) n ¯ )2 i=1 (xi − x n X 1 E[(xi − x¯)εi ] β1 + Pn ¯)2 i=1 i=1 (xi − x Pn (xi − x¯) β1 + Pni=1 E(εi ) ¯)2 i=1 (xi − x

= Pn

= β1 .

Page 6

Proof: To show the unbiasness of the intercept estimator βˆ0 in (8). Note that y¯ = β0 + β1 x¯ + ε¯ and ε¯ 6= ε¯ˆ βˆ0 = y¯ − βˆ1 x¯ = β0 + β1 x¯ + ε¯ − βˆ1 x¯ = β0 + x¯(β1 − βˆ1 ) + ε¯ E[βˆ0 ] = E[β0 ] + E[¯ x(β1 − βˆ1 )] + E[¯ ε] = β0 + x¯(β1 − E[βˆ1 ]) + 0 n

n

1X 1X N ote : ε¯ = εi ⇒ E[¯ ε] = E[εi ] = 0 n i=1 n i=1 = β0 + x¯(β1 − β1 ) = β0

Proof: To show (9). Using βˆ1 =

Pn (x −¯ x)(yi −¯ y) i=1 Pn i x)2 i=1 (xi −¯

=

(xi −¯ x)yi i=1 (xi −¯ x) 2

Pn

and assuming var(yi ) = σ 2 and

cov(yi , yj ) = 0, we have Pn (xi − x¯)yi ˆ ] var(β1 ) = var[ Pi=1 n ¯)2 i=1 (xi − x Pn ¯)2 i=1 (xi − x P var(yi ) = n ¯)4 i=1 (xi − x σ2 = Pn ¯)2 i=1 (xi − x

Page 7

P Proof: To show (10). βˆ0 can be written in the form βˆ0 = y¯ − βˆ1 x¯ = ni=1 var(βˆ0 ) = var[

yi n

Pn

(x −¯ x)y

i i − x¯ Pi=1 . Then n (xi −¯ x)2 i=1

n X yi i=1

Pn (xi − x¯)yi − x¯ Pi=1 ] n n ¯)2 i=1 (xi − x

n X 1 x¯(xi − x¯) 2 = [ − ] var(yi ) n (xi − x¯)2 i=1 n X 2¯ x(xi − x¯) x¯2 (xi − x¯)2 2 1 = + ]σ [ 2− n n(xi − x¯)2 (xi − x¯)4 i=1 P P n X 2¯ x ni=1 (xi − x¯) x¯2 ni=1 (xi − x¯)2 1 2 = σ [ − Pn + Pn ] 2 2 4 n n (x − x ¯ ) (x − x ¯ ) i i i=1 i=1 i=1

n x¯2 P ] − 0 + n ¯ )2 n2 i=1 (x1 − x x¯2 1 ] = σ 2 [ + Pn ¯ )2 n i=1 (xi − x = σ2[

The method of least squares does not yield an estimator of var(yi ) = σ 2 ; minimization of εˆ0 εˆ yields only β0 and β1 . To estimate σ 2 , we use the definition σ 2 = E[yi − E(yi )]2 . By assumption 2, σ 2 is the same for each yi , i = 1, 2, ..., n. Using yˆi as an estimator of E(yi ), we estimate σ 2 by an average from the sample, that is

2

s =

Pn

− yˆi )2 = n−2

i=1 (yi

Pn

i=1 (yi

− βˆ0 − βˆ1 xi )2 SSE = n−2 n−2

(11)

P where βˆ0 and βˆ1 are given by (5) and (6) and SSE = ni=1 (yi − yˆi )2 . The deviation εˆi = yi − yˆi is often called the residuals of yi , and SSE is called the residual sum of squares or error sum of squares. With n − 2 in the denominator, s2 is an unbiased estimator of σ 2 :

E(s2 ) =

E(SSE) (n − 2)σ 2 = = σ2 n−2 n−2

Page 8

(12)

Proof: For any εi , ..., εn IID with mean µ and variance σ 2 where i=1,...,n . Note that E[

Pn

− ε¯)2 ] = E

E[

Pn

− µ)2 − n(¯ ε − µ)2 ] =

i=1 (εi

i=1 (εi

Pn

i=1 [(εi

− µ + µ − ε¯)2 ] = E Pn

i=1

Pn

i=1 [(εi

− µ) − (¯ ε − µ)]2 =

var(εi ) − nvar(¯ ε) = nσ 2 −

nσ 2 n

= nσ 2 − σ 2 = (n − 1)σ 2 .

E(SSE) n−2 P n X [ ni=1 (xi − x¯)(yi − y¯)]2 1 2 Pn = E[ (yi − y¯) − ] n − 2 i=1 ¯)2 i=1 (xi − x Pn n n X (x − x¯)(yi − y¯) 2 X 1 2 Pn i = E[ (yi − y¯) − [ i=1 (xi − x¯)2 ] ] 2 n − 2 i=1 (x − x ¯ ) i=1 i i=1

E(s2 ) =

n n X 2X 1 2 ˆ (xi − x¯)2 ] E[ (yi − y¯) − β1 = n − 2 i=1 i=1 n n X 2X 1 2 ˆ = [E[ (yi − y¯) ] − E[β1 (xi − x¯)2 ]] n−2 i=1 i=1

=

n n X 2 X 1 [E[ (β0 + β1 xi + εi − β0 − β1 x¯ − ε¯)2 ] − E[βˆ1 ] (xi − x¯)2 ] n−2 i=1 i=1

n n X X 1 2 2 ˆ ˆ = [E[ (β1 xi − β1 x¯ + εi − ε¯) ] − [[var(β1 ) + [E(β1 )] ] (xi − x¯)2 ]] n−2 i=1 i=1 n n X X σ2 1 2 2 P + β1 ] (xi − x¯)2 ]] [E[ (β1 (xi − x¯) + εi − ε¯) ] − [[ n = 2 ¯) n−2 i=1 (xi − x i=1 i=1

=

n n X X 1 [E[ (β12 (xi − x¯)2 + 2β1 (xi − x¯)(εi − ε¯) + (εi − ε¯)2 )] − [σ 2 + β12 (xi − x¯)2 ]] n−2 i=1 i=1

n n n X X 1 X 2 2 2 2 2 = [ E[β1 (xi − x¯) ] + 0 + E[ (εi − ε¯) ] − σ − β1 (xi − x¯)2 ] n − 2 i=1 i=1 i=1 n

n

X X 1 = [β12 (xi − x¯)2 + (n − 1)σ 2 − σ 2 − β12 (xi − x¯)2 ] n−2 i=1 i=1 1 [(n − 1)σ 2 − σ 2 ] n−2 1 = [σ 2 (n − 1 − 1)] n−2 1 = [σ 2 (n − 2)] n−2 =

= σ2

Page 9

Intuitively, we divide by n − 2 in (11) instead of n − 1 as in s2 =

Pn

y )2 i=1 (yi −¯ n−1

, because yˆi = βˆ0 + βˆ1 xi has

two estimated parameters and should thereby be a better estimator of E(yi ) than y¯. Thus we expect SSE =

Pn

i=1 (yi

− yˆi )2 to be less than

Pn

i=1 (yi

− y¯)2 . In fact, using (5) and (6), we can write the

numerator of (11) in the form

SSE =

n X

(yi − yˆi ) =

i=1

which shows that

Pn

i=1 (yi

Proof: To show SSE =

2

n X

2

(yi − y¯) −

i=1

− yˆi )2 is indeed smaller than

Pn

i=1 (yi

Pn

[

Pn

(x − x¯)(yi − i=1 Pn i ¯)2 i=1 (xi − x

i=1 (yi

y¯)]2

(13)

− y¯)2 .

− yˆi )2 in (11) can be expressed in the form given (13).

n X SSE = (yi − yˆi )2

=

i=1 n X

(yi − βˆ0 − βˆ1 xi )2

i=1 n X = (yi − y¯ + βˆ1 x¯ − βˆ1 xi )2 i=1 n X = [(yi − y¯) − βˆ1 (xi − x¯)]2

=

i=1 n X

(yi − y¯) − 2βˆ1

i=1

2

n n X 2X ˆ (yi − y¯)(xi − x¯) + β1 (xi − x¯)2 i=1

i=1

Pn Pn n n n X (xi − x¯)(yi − y¯) X (xi − x¯)(yi − y¯) 2 X 2 i=1 i=1 Pn Pn ] (yi − y¯)(xi − x¯) + [ ] (xi − x¯)2 = (yi − y¯) − 2[ 2 2 (x − x ¯ ) (x − x ¯ ) i=1 i i=1 i i=1 i=1 i=1 P P n n n X (x − x¯)(yi − y¯) 2 (x − x¯)(yi − y¯) 2 Pn i Pn i ] + [ i=1 ] = (yi − y¯)2 − 2[ i=1 ¯) ¯) i=1 (xi − x i=1 (xi − x i=1 P n X [ ni=1 (xi − x¯)(yi − y¯)]2 2 Pn = (yi − y¯) − ¯)2 i=1 (xi − x i=1

Page 10