Linear regression attempts to model the relationship between two variables by fitting a linear equation to the a sample
Linear regression STAT-S-301 Exercise session 2
*
Octobre 13th, 2016
Linear regression model with a single regressor Linear regression attempts to model the relationship between two variables by tting a linear equation to the a sample of n observations (i=1,...,n) as such Yi
= β0 + β1 Xi + ui
Yi Xi
dependent variable, regressand independent variable, regressor
β0 +β1 Xi population regression line/function with two coecients/parameters:
β0 β1
as intercept of the population regression line as slope with
change in
ui
β1 =
Xi
∆Yi ∆Xi , the change in
= error term, the dierence between
the population regression line.
Yi
Yi
associated with a unit
and its predicted value from
This term contains all the other factors
besides X that determine the value of Y for a specic onservation i. The main goal will be to predict the value of Y for a given value of X.
Ordinary Least Squares Estimator (OLS) The issue is that the population regression line coecients are unknown: we need to estimate them using a random sample.
One way to do so is to use
the OLS estimators, which try to construct a regression line that is as close as possible to the observed sample data. Closeness is optimized by minimizing the sum of squared residuals SSR
=
n P
i=1
u2i =
OLS estimators:
*
TA: Elise Petit (
[email protected])
1
n P
i=1
(Yi − β0 − β1 Xi )2 .
n P
sXY βˆ1 = 2 = sX
(Xi − X)(Yi − Y )
i=1 n P
(Xi − X)2
i=1
βˆ0 = Y − βˆ1 X Yˆ = βˆ0 + βˆ1 X Yˆi = βˆ0 + βˆ1 Xi uˆi = Yi − Yˆi
OLS sample regression line: OLS predicted values OLS residuals
Least squares assumptions While OLS is often chosen for its desirable theoretical properties, it is important to note that OLS estimators are only appropriate estimators under three main assumptions.
Exogenous explanatory variables: E[ui |Xi ] = 0 The conditional distribution of the error term given
Xi
has a mean of zero for
all observations. This means that the other factors contained in the error term are unrelated to
Xi
in the sense that, for any given value of
Xi ,
the mean of the
distribution of these other factors is zero.
Independently and identically distributed variables: (Xi , Yi ) , i = 1, ..., n are i.i.d. The joint values of
Xi
and
Yi
are i.i.d. accross observations. This arises when
observations are drawn by simple random sampling from a single large population. This typically does not hold for panel and time series data.
Large outliers are unlikely: 0 < E[Xi4 ] < ∞ & 0 < E[Xi4 ] < ∞ Observations with values of
Xi
and
Yi
that are far outside the usual range of the
sample data are unlikely to arise. This can be express in mathematical terms by assuming that
X
and
Y
have nonzero nite fourth moments (or nite kurtosis).
Measures of t How well does the regression line describe the data?
The R² The regression R² is the fraction of the sample variance of
2
Yi
explained by
Xi :
R2 =
n P (Yˆi − Y )2
explained sum of squares ESS = = i=1 n P total sum of squares T SS
(Yi − Y )2
i=1 or it can be viewed as the fraction of the variance of
Yi
not explained by
n P
sum of squared residuals SSR R2 = 1 − =1− = 1 − i=1 n P total sum of squares T SS
Xi :
(Yi − Yˆi )2 (Yi − Y )2
i=1 Since R² is a fraction, it ranges between 0 and 1. The higher the value of R²,
Yi .
the better the regression is at predicting
Let's look at the two extreme
situations:
Xi explains none of the variation of Yi : ˆ Yi = βˆ0 = Y ⇒ ESS = 0 ⇒ R2 = 0
1. If
Xi explains all of the variation ˆ Yi = Yi ⇒ SSR = 0 ⇒ R2 = 1
2. If
of
Yi :
Standard Error of the Regression (SER) The SER is an estimator of the standard deviation of Since
ui
ui ,
the regression error.
is unknown, we use its sample counterpart, the OLS residuals
SER = suˆ =
q
r s2uˆ
=
u ˆi .1
SSR n−2
SER represents the spread of the observations around the regression line.
Sampling distribution of the OLS estimators 2
Under the three Least Squares assumptions :
µβˆ1 = E[βˆ1 ] = β1
Unbiased estimators:
Variance inversely proportional to n:
(Xi − µX ) ui
3
σβ2ˆ = var[βˆ1 ] = 1
σv2 2 nσX
where vi =
1 In this formula, the variance is computed by dividing SSR by n-2 instead of n since two regression coecients were estimated 2 Parallel results hold for β 0 3 var[βˆ ] = var[Hi ui ] 0 2 2 nE[Hi ]
where Hi = 1 −
µX E[Xi2 ]
3
Xi
Law of large numbers:
p βˆ1 −→ β1
Central limit theorem:
βˆ1 −E[βˆ1 ] √ var[βˆ1 ]
∼ N (0, 1)
Hypothesis tests and condence intervals The same general approach as the one used for the population mean can be applied to testing hypotheses about the coecients
β1
or
β0
since both coecients
also have a normal sampling distribution in large samples.
Let's develop the
approach for a two sided test onβ1 as an example.
Hypotheses:
Step 1:
H0 : β1 = β1,0 H1 : β1 6= β1,0
SE(βˆ1 ) = σ ˆβˆ1 =
q
q
σ ˆβ2ˆ = 1
ˆv2 1 σ 2 nσ ˆX
v u u u = t n1
n P (Xi −X)2 uˆi 2 i=1 2 n P 1 (Xi −X)2 n
1 n−2
i=1
βˆ1 −β1,0 SE(βˆ1 )
Step 2:
tobs =
Step 3:
p − value = PH0 (|t| > |tobs |) = 2Φ (− |tobs |)
Step 4: RHo or not
Identically, we can construct a 95% condence interval as such:
h i β1 = βˆ1 ± 1.96SE(βˆ1 )
Exercise Consider the standard simple linear regression model
Yi = β0 + β1 Xi + ui where
Yi
is the dependent variable,
Xi
is the regressor and
ui
is the error
term, with i=1,...,n. 1. Give the matricial notation related to the above model. Specify the dimensions of each matrix in the equation 2. Explain how to estimate
β0
and
β1
using the matricial notation
3. Derive using the matricial notation the estimated parameters for 4. Prove that the estimator
βˆ1
is a linear function of
5. Prove that it is conditionally unbiased.
4
Y1 , ..., Yn
β0
and
β1