Linear regression - Google Sites

Linear regression STAT-S-301 Exercise session 2

*

Octobre 13th, 2016

Linear regression model with a single regressor Linear regression attempts to model the relationship between two variables by tting a linear equation to the a sample of n observations (i=1,...,n) as such Yi

= β0 + β1 Xi + ui

Yi Xi

dependent variable, regressand independent variable, regressor

β0 +β1 Xi population regression line/function with two coecients/parameters:

β0 β1

as intercept of the population regression line as slope with

change in

ui

β1 =

Xi

∆Yi ∆Xi , the change in

= error term, the dierence between

the population regression line.

Yi

Yi

associated with a unit

and its predicted value from

This term contains all the other factors

besides X that determine the value of Y for a specic onservation i. The main goal will be to predict the value of Y for a given value of X.

Ordinary Least Squares Estimator (OLS) The issue is that the population regression line coecients are unknown: we need to estimate them using a random sample.

One way to do so is to use

the OLS estimators, which try to construct a regression line that is as close as possible to the observed sample data. Closeness is optimized by minimizing the sum of squared residuals SSR

=

n P

i=1

u2i =

OLS estimators:

*

TA: Elise Petit ([email protected])

1

n P

i=1

(Yi − β0 − β1 Xi )2 .

n P

sXY βˆ1 = 2 = sX

(Xi − X)(Yi − Y )

i=1 n P

(Xi − X)2

i=1

βˆ0 = Y − βˆ1 X Yˆ = βˆ0 + βˆ1 X Yî = βˆ0 + βˆ1 Xi uî = Yi − Yî

OLS sample regression line: OLS predicted values OLS residuals

Least squares assumptions While OLS is often chosen for its desirable theoretical properties, it is important to note that OLS estimators are only appropriate estimators under three main assumptions.

Exogenous explanatory variables: E[ui |Xi ] = 0 The conditional distribution of the error term given

Xi

has a mean of zero for

all observations. This means that the other factors contained in the error term are unrelated to

Xi

in the sense that, for any given value of

Xi ,

the mean of the

distribution of these other factors is zero.

Independently and identically distributed variables: (Xi , Yi ) , i = 1, ..., n are i.i.d. The joint values of

Xi

and

Yi

are i.i.d. accross observations. This arises when

observations are drawn by simple random sampling from a single large population. This typically does not hold for panel and time series data.

Large outliers are unlikely: 0 < E[Xi4 ] < ∞ & 0 < E[Xi4 ] < ∞ Observations with values of

Xi

and

Yi

that are far outside the usual range of the

sample data are unlikely to arise. This can be express in mathematical terms by assuming that

X

and

Y

have nonzero nite fourth moments (or nite kurtosis).

Measures of t How well does the regression line describe the data?

The R² The regression R² is the fraction of the sample variance of

2

Yi

explained by

Xi :

R2 =

n P (Yî − Y )2

explained sum of squares ESS = = i=1 n P total sum of squares T SS

(Yi − Y )2

i=1 or it can be viewed as the fraction of the variance of

Yi

not explained by

n P

sum of squared residuals SSR R2 = 1 − =1− = 1 − i=1 n P total sum of squares T SS

Xi :

(Yi − Yî )2 (Yi − Y )2

i=1 Since R² is a fraction, it ranges between 0 and 1. The higher the value of R²,

Yi .

the better the regression is at predicting

Let's look at the two extreme

situations:

Xi explains none of the variation of Yi : ˆ Yi = βˆ0 = Y ⇒ ESS = 0 ⇒ R2 = 0

1. If

Xi explains all of the variation ˆ Yi = Yi ⇒ SSR = 0 ⇒ R2 = 1

2. If

of

Yi :

Standard Error of the Regression (SER) The SER is an estimator of the standard deviation of Since

ui

ui ,

the regression error.

is unknown, we use its sample counterpart, the OLS residuals

SER = suˆ =

q

r s2uˆ

=

u î .1

SSR n−2

SER represents the spread of the observations around the regression line.

Sampling distribution of the OLS estimators 2

Under the three Least Squares assumptions :

µβˆ1 = E[βˆ1 ] = β1

Unbiased estimators:

Variance inversely proportional to n:

(Xi − µX ) ui

3

σβ2ˆ = var[βˆ1 ] = 1

σv2 2 nσX

where vi =

1 In this formula, the variance is computed by dividing SSR by n-2 instead of n since two regression coecients were estimated 2 Parallel results hold for β 0 3 var[βˆ ] = var[Hi ui ] 0 2 2 nE[Hi ]

where Hi = 1 −

µX E[Xi2 ]

3

Xi

Law of large numbers:

p βˆ1 −→ β1

Central limit theorem:

βˆ1 −E[βˆ1 ] √ var[βˆ1 ]

∼ N (0, 1)

Hypothesis tests and condence intervals The same general approach as the one used for the population mean can be applied to testing hypotheses about the coecients

β1

or

β0

since both coecients

also have a normal sampling distribution in large samples.

Let's develop the

approach for a two sided test onβ1 as an example.

Hypotheses:

Step 1:

H0 : β1 = β1,0 H1 : β1 6= β1,0

SE(βˆ1 ) = σ ˆβˆ1 =

q

q

σ ˆβ2ˆ = 1

ˆv2 1 σ 2 nσ ˆX

v u u u = t n1

n P (Xi −X)2 uî 2 i=1 2 n P 1 (Xi −X)2 n

1 n−2

i=1

βˆ1 −β1,0 SE(βˆ1 )

Step 2:

tobs =

Step 3:

p − value = PH0 (|t| > |tobs |) = 2Φ (− |tobs |)

Step 4: RHo or not

Identically, we can construct a 95% condence interval as such:

h i β1 = βˆ1 ± 1.96SE(βˆ1 )

Exercise Consider the standard simple linear regression model

Yi = β0 + β1 Xi + ui where

Yi

is the dependent variable,

Xi

is the regressor and

ui

is the error

term, with i=1,...,n. 1. Give the matricial notation related to the above model. Specify the dimensions of each matrix in the equation 2. Explain how to estimate

β0

and

β1

using the matricial notation

3. Derive using the matricial notation the estimated parameters for 4. Prove that the estimator

βˆ1

is a linear function of

5. Prove that it is conditionally unbiased.

4

Y1 , ..., Yn

β0

and

β1