Reference : Gujarati, Chapter 9; Neter, Chapter 10;. Stewart, Session 3.6. We use
... ECON 7710, By WONG Wing Keung, Professor of Economics. 5-2. DATA a ;.
5-1
REGRESSION ON DUMMY VARIABLES Reference : Gujarati, Chapter 9; Neter, Chapter 10; Stewart, Session 3.6. We use dummy variable to represent qualitative explanatory variables in regression analysis, usually use the value 0 and 1. For example, Yi = β0 + β1Di + ui where Y = annual salary of a college professor,
1
if male college professor
0
otherwise
Di =
Consider the following example with SAS program:
(1)
ECON 7710, By WONG Wing Keung, Professor of Economics
DATA a ; INPUT Y label Y D CARDS; 22.0 1 18.0 0 18.5 0 20.5 1 17.5 0 ;
5-2
D @@ ; = ’annual salary of a college professor’ = ’Sex : 1 for male and 0 for female’ ; 19.0 21.7 21.0 17.0 21.2
0 1 1 0 1
proc plot ; plot y * D = ’*’ ; run ; proc reg data = a ; model Y = D ; PLOT p.*D = ’p’ y*D = ’*’ / overlay ; run ;
ECON 7710, By WONG Wing Keung, Professor of Economics
5-3
Plot of Y*D. Symbol used is ’*’. Y | | 22.0 | * 21.5 | * 21.0 | * 20.5 | * 20.0 | 19.5 | 19.0 | * 18.5 | * 18.0 | * 17.5 | * 17.0 | * | -------------------------------------------------0 1 Sex :0 for male and 1 for female
Model: MODEL1 Dependent Variable: Y
Source Model Error C Total
DF 1 8 9
Root MSE Dep Mean
Variable INTERCEP D
DF 1 1
annual salary of a college professor Analysis of Variance Sum of Mean Squares Square 26.89600 26.89600 3.88800 0.48600 30.78400
0.69714 19.64000
R-square Adj R-sq
F Value 55.342
Prob>F 0.0001
0.8737 0.8579
Parameter Estimates Parameter Standard T for H0: Estimate Error Parameter=0 18.000000 0.31176915 57.735 3.280000 0.44090815 7.439
Prob > |T| 0.0001 0.0001
ECON 7710, By WONG Wing Keung, Professor of Economics
P PRED r e d i c t e d V a l u e o f Y
22
20
18
16
------------------------------------------------------------| | | * | | * | | ? | | | | * | | | | | | * | | | | * | | ? | | * | | * | | | | | | | | | ------------------------------------------------------------0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Sex :0 for male and 1 for female D
yˆi = 18.00 + 3.28Di (0.32)
(0.44)
with R2 = 0.8737. Mean salary of a female college professor is E(Yi | Di = 0) = β0 and mean salary of a male college professor is E(Yi | Di = 1) = β0 + β1
5-4
ECON 7710, By WONG Wing Keung, Professor of Economics
Regression on One Quantitative Variable and One Qualitative Variable with Two Classes For example, Yi = β0 + β1Di + β2Xi + ui where Y = annual salary of a college professor, X = years of teaching experience
1
if male college professor
0
otherwise
Di =
Then, Mean salary of a female college professor is E(Yi | Di = 0) = β0 + β2Xi and mean salary of a male college professor is E(Yi | Di = 1) = β0 + β1 + β2Xi
5-5
ECON 7710, By WONG Wing Keung, Professor of Economics
5-6
Regression on One Quantitative Variable and One Qualitative Variable with More Than Two Classes Qualitative Variable may consist of more than two classes, e.g. Race in Singapore. Rule : the number of dummies be one less than the number of categories of the variable. For example, Race in Singapore consists of Chinese, Malay, Indian and others. Then, we can define D1i = D2i = D3i =
1
if Chinese
0
otherwise
1
if Malay
0
otherwise
1
if Indian
0
otherwise
The model will be Yi = β0 + β1D1i + β2D2i + β3D3i + β4Xi + ui
ECON 7710, By WONG Wing Keung, Professor of Economics
5-7
Then, Mean salary of a Chinese professor is E(Yi | D1i = 1, D2i = 0, D3i = 0) = β0 + β1 + β4Xi Mean salary of a Malay professor is E(Yi | D1i = 0, D2i = 1, D3i = 0) = β0 + β2 + β4Xi Mean salary of an Indian professor is E(Yi | D1i = 0, D2i = 0, D3i = 1) = β0 + β3 + β4Xi Mean salary of a professor of other races is E(Yi | D1i = 0, D2i = 0, D3i = 0) = β0 + β4Xi
ECON 7710, By WONG Wing Keung, Professor of Economics
5-8
Example Use Dummy variables in seasonal analysis: Let D2i = D3i = D4i =
1
if second quarter
0
otherwise
1
if third quarter
0
otherwise
1
if fourth quarter
0
otherwise
The model is: Profiti = β1 + β2D2i + β3D3i + β4D4i + β5Salei + ui Then, profit in Spring is: E(Profiti | D2i = 0, D3i = 0, D4i = 0) = β1 + β5Salei Then, profit in Summer is: E(Profiti | D2i = 1, D3i = 0, D4i = 0) = β1+β2+β5Salei and so on.
ECON 7710, By WONG Wing Keung, Professor of Economics
5-9
Interaction Effect Consider the following Model: Yi = β0 + β1D1i + β2D2i + β3Xi + ui where where Y = annual expenditure on clothing X = income
D1i = D2i =
1
if female
0
if male
1
if college graduate
0
otherwise
There may be intreaction between D1i and D2i. If so, the model becomes Yi = β0 + β1D1i + β2D2i + β3D1iD2i + β4Xi + ui Then, E(Yi | D1i = 1, D2i = 1) = (β0 + β1 + β2 + β3) + β4Xi
ECON 7710, By WONG Wing Keung, Professor of Economics
5-10
and so on. If there are intreactions between D1i and X and between D2i and X, the model becomes The model will be Yi = β0 + β1D1i + β2D2i + β3D1iD2i + β4Xi +β5D1iXi + β6D2iXi + ui Note that the dummy variables in this model affect the intercept as well as the slope. For example, E(Yi | D1i = 1, D2i = 0) = β0 + β1 + β4Xi + β5Xi = (β0 + β1) + (β4 + β5)Xi
E(Yi | D1i = 0, D2i = 1) = β0 + β2 + β4Xi + β6Xi = (β0 + β2) + (β4 + β6)Xi
and so on.
ECON 7710, By WONG Wing Keung, Professor of Economics
5-11
Piecewise Linear Regression Interaction Effect can be used in modelling Piecewise Linear Regression. Consider the following Model: Yi = β0 + β1Xi + β2(Xi − X ∗)Di + ui where where X ∗ = threshold value of X also known as a knot
1
if Xi ≥ X ∗
0
if Xi < X ∗
Di =
Refer to Figures 15.8 and 15.9 for the knot. When Xi < X ∗, E(Yi | Di = 0) = β0 + β1Xi . When Xi ≥ X ∗, E(Yi | Di = 1) = (β0 − β2X ∗) + (β1 + β2)Xi . To test whether there is no knot effect, we simply test H0 : β2 = 0 .
ECON 7710, By WONG Wing Keung, Professor of Economics
5-12
If there is a “jump” in the knot, we can use the following model: Yi = β0 + β1Di + β2Xi + β3XiDi + ui To test that there are no jump and no knot effect, we simply test H0 : β1 = β3 = 0 . For example: DATA a ; INPUT Y X @@ ; label Y = ’Total Cost, dollars’ X = ’Total Output’ ; if X > 5500 then D = 1 ; else D = 0 ; X1 = (X - 5500)*D ; CARDS; 256 1000 414 2000 634 3000 778 4000 1003 5000 1839 6000 2081 7000 2423 8000 2734 9000 2914 10000
ECON 7710, By WONG Wing Keung, Professor of Economics
5-13
; proc plot ; plot Y * X = ’*’ ; run ; proc reg data = a ; model Y = X X1 ; test X1 ; PLOT p.*x = ’p’ y*x = ’*’ / overlay ; run ; The output is Plot of Y*X. Symbol used is ’*’. T | o 3000 | * t | * a | l | * | C 2000 | * o | * s | t | , | 1000 | * d | * o | * l | * l |* a 0 | r ----------------------------------------------------------------s 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Total Output
ECON 7710, By WONG Wing Keung, Professor of Economics
5-14
Model: MODEL1 Dependent Variable: Y
Source Model Error C Total
DF 2 7 9
Root MSE Dep Mean
Variable INTERCEP X X1
Variable INTERCEP X X1
DF 1 1 1
DF 1 1 1
Total Cost, dollars Analysis of Variance Sum of Mean Squares Square F Value 8832644.8985 4416322.4492 129.608 238521.50152 34074.50022 9071166.4
184.59280 1507.60000
R-square Adj R-sq
0.9737 0.9662
Parameter Estimates Parameter Standard T for H0: Estimate Error Parameter=0 -145.716667 176.73414648 -0.824 0.279126 0.04600814 6.067 0.094500 0.08255241 1.145 Variable Label Intercept Total Output
Dependent Variable: Y Test: Numerator: 44651.2500 Denominator: 34074.5
DF: DF:
1 7
F value: Prob>F:
1.3104 0.2899
ECON 7710, By WONG Wing Keung, Professor of Economics ------------------------------------------------------------P PRED | | r 3000 | ? | e | * | d | p | i | ? | c | | t 2000 | ? | e | * | d | p | | | V | p | a 1000 | p * | l | * | u | ? | e | ? | | ? | o 0 | | f | | ------------------------------------------------------------Y 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Total Output X
To see the jump effect as well as the knot effect, we use the following: DATA a ; set a ; XD = X * D ; run ; proc reg data = a ; model Y = D X XD ; test D , XD ; run ;
5-15
ECON 7710, By WONG Wing Keung, Professor of Economics
5-16
The output is Model: MODEL1 Dependent Variable: Y
Source Model Error C Total Root MSE Dep Mean C.V.
Variable INTERCEP D X XD
DF 1 1 1 1
Total Cost, dollars
Analysis of Variance Sum of Mean DF Squares Square F Value 3 9062580.9 3020860.3 2111.136 6 8585.50000 1430.91667 9 9071166.4 37.82746 R-square 0.9991 1507.60000 Adj R-sq 0.9986 2.50912 Parameter Estimates Parameter Standard T for H0: Estimate Error Parameter=0 59.600000 39.67377387 1.502 96.200000 104.96693924 0.916 0.185800 0.01196209 15.532 0.094500 0.01691695 5.586
Dependent Variable: Y Test: Numerator: 137293.6258 Denominator: 1430.917
DF: DF:
2 6
F value: Prob>F:
95.9480 0.0001