DEPARTMENT OF ECONOMICS. Unit ECON 12122. Introduction to
Econometrics. Notes 3. Estimating Regression Parameters. These notes provide
a summary ...
DEPARTMENT OF ECONOMICS Unit ECON 12122 Introduction to Econometrics Notes 3 Estimating Regression Parameters These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also consult the reading as given in the unit outline and the lectures. 3.1 Introduction Having already begun to discuss how to interpret regression results, it is time discuss methods of estimating the parameters of the linear regression model. There are a number of different methods of doing this. In this unit, one method will be covered. This is called least squares or to distinguish it from the many different types of least squares, ordinary least squares (OLS). The essence of OLS is very simple. Suppose we have a multiple regression model with one dependent variable, three explanatory variables and a sample of n observations on each of these variables i.e. E( Yi X1i ,X 2i , X 3i ) =
β0 + β1X1i + β2 X 2i + β3X 3i
i=1,2,..,n
This can be written as Yi
=
β0 + β1X1i + β2 X 2i + β3X 3i + ui
i=1,2,… ,n
where E( u i X1i , X 2i , X 3i ) = 0 We wish to estimate the four coefficients. We use a minimum distance estimator. The distances to be minimised are ui2. Thus the least squares estimator minimises ∑ (u 2i ) = ∑ (Yi− β0 − β1X1i − β2 X 2i − β3X 3i )2 = S. Our estimators of β0 , β1 , β2 , β3 will be those values which minimise S. The necessary conditions for this are that
1
δ S = − 2 ∑ (Yi− β0 − β1X1i − β2 X 2i − β3X 3i ) = 0 δ β0 δ S = − 2 ∑ (Yi− β0 − β1X1i − β2 X 2i − β3X 3i ) X1i = 0 δ β1 δ S = − 2 ∑ (Yi− β0 − β1X1i − β2 X 2i − β3X 3i ) X 2i = 0 δ β2 δ S = − 2 ∑ (Yi− β0 − β1X1i − β2 X 2i − β3X 3i ) X 3i = 0 δ β3 This is a set of four linear equations in four unknowns – the least squares estimators of β0 , β1 , β2 , β3 . Solving these equations in practice is sufficiently arduous for the use of a computer to be essential. However the structure of these first order conditions should be clear and how they would generalise to the case where the model contained k parameters. Solving the four equations above algebraically is not straightforward without the use of a different kind of algebra. However normal algebraic solutions can be easily obtained for the simple regression model.
3.2 The least squares estimators of the simple regression model. The simple regression model takes the following form. E( Yi X i ) =
α + βXi
i=1,2...,n
This can be written as Yi
=
α + βXi +
ui
where E( u i X i ) = 0. We wish to estimate α and β. As before the least squares estimator minimises ∑ (u 2i ) = ∑ (Yi − α − βXi)2 = S. We choose values of α and β that minimises S. The first order conditions for this are:
2
δS =− 2∑ (Yi − α− βX i )=0 δα
(1)
δ S =− 2 ∑ [ (Yi − α− βX i )X i ]=0 δ β
(2)
These two linear equations (sometimes called the normal equations) can be solved for the least squares estimator of α and β.
3.3 The first order conditions. Before solving for these estimators it is worth considering (1) and (2) in some detail. Since (1) and (2) will be used to provide expressions for the least squares estimators of α and β,the least squares estimators must satisfy these two equations. If α$is the least squares estimator of α, and β$of β, then the least squares residual can be defined as uˆi =Yi − αˆ− βˆX i
i=1,2,...,n
(3)
Note that u$i ≠ ui. Using (3) for the least squares residual we can write (1) as
∑ uˆi = 0
(4)
∑ (uˆi X i )=0
(5)
(4) states that the least squares residuals sum to zero over the sample. Since this is derived from the first order condition for the constant term (α), if the model does not contain a constant term (α = 0), then (4) will no longer be true. (5) states that the cross products of the least squares residual ( u$i ) and the explanatory variable (Xi) sum to zero over the sample. This is an important property of least squares $ Such a cross-product can residuals and therefore the least squares estimators, α$and β. be thought of as a covariance. So (5) can be regarded as saying the covariance between the least squares residuals and the explanatory variables is zero. 3.4 Solving for α$and β$ The simplest way to solve for α$and β$is the following. Take (1) and re-arrange ∑ Yi =
∑ α + β ∑ Xi
nα
∑ Yi − β∑ X i
=
3
Since the least squares estimators of α and β ( αˆ and βˆ respectively) satisfy this equation it can be solved for αˆ in terms of βˆ . Thus
αˆ =
Y − βˆX
(6)
where Y and X are the sample means of Yi and Xi respectively. (6) shows the constant term is defined so that the least squares time passes through the point of sample means. Once we have solved for β$, (6) provides a simple way of calculating α$. Using (6) substitute for α in (2)
[( ) ]= 0 ∑[ ( Yi − Y )X i − βˆ( X i − X )X i ] = 0
∑ Yi − ( Y − βˆX )− βˆX i X i
or thus
∑ (Yi − Y )X i βˆ = 2 ∑ (X i − X i )
=
∑ X i Yi − nX Y
(7)
∑ X i 2 − nX 2
()
The solution for β$shows that the least squares estimator of the slope βˆ is the ratio of the covariance between Yi and Xi to the variance of Xi. Equation (7) can be expressed in a simpler way. In order to do this, we simplify further the simple regression model. We define both Yi and Xi in terms of deviations form their sample means. Thus let y i = (Yi − Y ) and x i =(X i − X ) then both yi and x i have sample means of zero and the constant term ( α ) drops out of the model. The new simplified model is
y i = βx i + u i
i=1,2… ,n
(8)
Where u i is redefined to be (u i − u ) . The constant term is set to zero in (8), since both the means of y i and xi are zero, the estimated least squares line must pass through the origin. The least squares estimator of β is
∑xy βˆ= i 2 i ∑ xi (9) is identical to (7) and is much easier to use and manipulate (use (7) and set Y and X to zero).
4
(9)
3.5 The properties of the least squares estimators. The discussion will focus on β$. These results can be generalised to include α$. (see the textbooks). One way of re-writing (9) is β$ =
where w i =
∑ w i yi
xi ∑ x i2
β$is therefore a linear function of yi and is termed a linear estimator. This linearity property is very useful when we want to derive the probability density of β$(see below). Another way of writing (9) is the following substitute for yi from (8) β∃ =
∑ xiyi ∑ x 2i
β∃ = β +
=
∑ x i (βx i + ∑ x 2i
ui)
= β +
∑ xi u i ∑ x 2i
∑ (x i u i )
(10)
∑ xi2
(10) is an important expression. It says that the least squares estimator β$is made up of two components. The first is the true value β. The second is a function of the unobserved disturbances ui and the explanatory variable xi. For β$is to be a 'good' estimator of β, then the second term in (10) must be 'small' or 'unimportant' in some sense. From (10) it is easy to show that β$is unbiased or E β∃ = β . To show this we must show that
()
Σ(x i u i ) E =0 2 Σx i If we condition on x i (assume x i are fixed) then Σ(x i u i ) Σ(x i E( u i ) ) E = 0 as E( u i ) = 0 = 2 2 Σ x Σ x i i A corollary of this result is that if Σ(x i u i ) E ≠ 0, 2 Σ x i
(11)
5
βˆ will be biased. The expression inside the square brackets is the ratio between the cross-product between x i and u i . If x i were random, this can be thought of as a covariance. Thus if the covariance is non-zero i.e. there is a relationship between the disturbances and the explanatory variable, then the OLS estimator will be biased.
The final property of the OLS estimator concerns its variance. To derive an expression for this variance we begin by re-arranging (10).
Σ( x i u i ) β∃− β = Σx 2i The variance of βˆ can be defined as
(12)
2 Σ( x i u i ) E β∃− β = E 2 Σ x i
(
2
)
since βˆ is unbiased
Expanding the right hand side of (13) we find
1 2 E β∃− β = E 2 Σx i
(
)
( )
2
n n 2 2 Σx i u i + 2 ∑ ∑ x i x j u i u j i =1 j = i + 1
In order to take expectations of this expression we need to know something about the variances of the u i that is E u 2i , and the covariances between each u i and u j that is
(
( )
)
E uiu j .
Unfortunately economic theory usually has very little to say about these quantities. So here we make the following assumption
( ) E( u i u j ) = ο for all i ≠ j E u i2 = σ2 for all i
(homoscedasticity) (no serial correlation)
These assumptions are important. If they are true then
(
)
2 σ2 ∃ E β− β = Σx 2i
Having derived the variance of the OLS estimator, we need to know whether this variance is small relative to other estimators. It turns out that a theorem, the GaussMarkov theorem, proves that it is small in the sense of being best linear unbiased.
6
(13)
The Gauss Markov theorem states that if E( Yi X i ) =
E(uiuj)
= =
α + βXi σ2 i=j 0 i≠j
(model correct) (homoscedasticity) (no serial correlation)
where u i = Yi − E( Yi X i ) , then there are no other linear unbiased estimators of α and β which have a smaller variance than the least squares estimators. 3.6 Testing Hypothesis In order to test hypothesis about α and β, we need to know the distribution of αˆ and βˆ . To do this we can proceed in two ways. (1)
(
Assume u i ~ N ο, σ2
)
Since the normal distribution remains normal after the linear transformation, we can exploit the linearity of the model and the linearity of the least squares estimators so that σ2 u i ~ N(ο, σ2 ) ⇒ y i ~ N ( α + βX i , σ2 ) ⇒ βˆ ~ N β, Σx 2 i (2) We can appeal to a central limit theorem concerning the least squares estimators and then assume that the sample which we are using is sufficiently large for the asymptotic distribution of βˆ to be approximately true. Under the central limit theorem whatever the distribution of u i σ2 βˆ ~ N β, ∑ x2
as n → ∞
There is one problem with using these results on the normality of βˆ . It is that generally σ2 is unknown. To get round this problem it must be estimated. It turns out that, in the 2 ∑ uˆi 2 2 ˆ = simple regression model, the following is an unbiased estimator of σ , σ n− 2 2 ∑ uˆi ˆ2 = or σ in the multiple regression model, where k is the number of parameters. n− k It can then be shown that
(βˆ − β) ˆ2 σ
∼
t n − 2 (or is distributed as t n − k in the multiple regression model)
2
∑ xi This expression can be used to test hypotheses about β. 7