Model Averaging for Linear Regression - CiteSeerX

Model Averaging for Linear Regression

Model Averaging for Linear Regression Erkki P. Liski University of Tampere Department of Mathematics, Statistics and Philosophy

Model Averaging for Linear Regression Outline

Outline É

The Model

É

Model selection

É

Model average estimator (MAE)


Outline É

The Model

É

Model selection

É


É

Why MAE?

É

General structure of MAE


Outline É

The Model

É

Model selection

É


É

Why MAE?

É

General structure of MAE

É

Selecting the model weights

É

Finite Sample Performance

Model Averaging for Linear Regression The Model

Homoscedastic linear regression Variables The response y and the predictors 1 , 2 , . . .


Homoscedastic linear regression Variables The response y and the predictors 1 , 2 , . . . The Model y = μ + ϵ,

μ=

∞ X

βj j ,

j=1

E(ϵ|) = 0,

E(ϵ2 |) = σ 2 ,

β1 , β2 , . . . and σ 2 are unknown parameters, and  = (1 , 2 , . . .).


Homoscedastic linear regression Variables The response y and the predictors 1 , 2 , . . . The Model y = μ + ϵ,

μ=

∞ X

βj j ,

j=1

E(ϵ|) = 0,

E(ϵ2 |) = σ 2 ,

β1 , β2 , . . . and σ 2 are unknown parameters, and  = (1 , 2 , . . .). Further E(μ2 ) < ∞ and

∞ P j=1

βj j converges in mean-square.

Model Averaging for Linear Regression Model Selection

Model Selection Covariates K potential predictors 1 , . . . , K available. Observe (y1 ,  1 ), . . . , (yn ,  n ),   = (1 , 2 , . . . , K ).


Model Selection Covariates K potential predictors 1 , . . . , K available. Observe (y1 ,  1 ), . . . , (yn ,  n ),   = (1 , 2 , . . . , K ). Approximating Linear Model y =

K X

j βj + b + ϵ ,

 = 1, 2, . . . , n,

j=1

b =

∞ X j=K+1

βj j

is the approximation error.


Model Selection Covariates K potential predictors 1 , . . . , K available. Observe (y1 ,  1 ), . . . , (yn ,  n ),   = (1 , 2 , . . . , K ). Approximating Linear Model y =

K X

j βj + b + ϵ ,

 = 1, 2, . . . , n,

j=1

b =

∞ X

βj j

is the approximation error.

j=K+1

Multiple models are present. Model m { I{∈m} | = 1, 2, . . . , K} ⊂ {1, 2, . . . , K}.

Model Averaging for Linear Regression Class of Appr. Models

A Class of Approximating Models A The M × K Incidence  1 1 . .  .. ..  A= 1 0  .. .. . . 1 1

Matrix   T  0 ... 0 0  .1  .. . . .. ..   ..  . . . .     T  1 . . . 0 1 =   m  .. . . .. ..   . ..  . . .  .   1 ... 0 1 TM

for the models in A. The 1’s in row m display the predictors in the mth model.



Matrix   T  0 ... 0 0  .1  .. . . .. ..   ..  . . . .     T  1 . . . 0 1 =   m  .. . . .. ..   . ..  . . .  .   1 ... 0 1 TM

for the models in A. The 1’s in row m display the predictors in the mth model. The Regression Matrix of the Model m X m = X dig(m ),



Matrix   T  0 ... 0 0  .1  .. . . .. ..   ..  . . . .     T  1 . . . 0 1 =   m  .. . . .. ..   . ..  . . .  .   1 ... 0 1 TM

for the models in A. The 1’s in row m display the predictors in the mth model. The Regression Matrix of the Model m X m = X dig(m ), m is the vector diagonal entries of dig(m ), X denotes the n × K regression matrix.

Model Averaging for Linear Regression Approximating Model m

Approximating Model m takes the form y = X m βm + bm + ϵ.


Approximating Model m takes the form y = X m βm + bm + ϵ. The LSE of βm ˆ m = (X T X m )+ X T y β m m and of μm = X m βm ˆ m = Hm y μ


Approximating Model m takes the form y = X m βm + bm + ϵ. The LSE of βm ˆ m = (X T X m )+ X T y β m m and of μm = X m βm ˆ m = Hm y μ under m ∈ M, where Hm = X m (X Tm X m )+ X Tm is a projector.

Model Averaging for Linear Regression Model Average Estimator

Model Average Estimator (MAE) É É

MAE is an alternative to model selection A model selection procedure can be unstable


Model Average Estimator (MAE) É É É É

MAE is an alternative to model selection A model selection procedure can be unstable When is combining better than selection? How to measure the uncertainty in selection?




Draper 1995 (JRSS B), Raftery, et. al. 1997 (JASA), Burnham & Anderson 2002 (Book), Hjort & Claeskens 2003 (JASA), Hansen 2007 (Econometrica)




Draper 1995 (JRSS B), Raftery, et. al. 1997 (JASA), Burnham & Anderson 2002 (Book), Hjort & Claeskens 2003 (JASA), Hansen 2007 (Econometrica)




Draper 1995 (JRSS B), Raftery, et. al. 1997 (JASA), Burnham & Anderson 2002 (Book), Hjort & Claeskens 2003 (JASA), Hansen 2007 (Econometrica) MAE of β and μ ˆ = β

M X

ˆ m , weights  ≥ 0 with m β

m=1

ˆ  = H y, μ

M X

m = 1

m=1

H =

M X m=1

m Hm is the implied hat matrix.

Model Averaging for Linear Regression The Alg. Structure of MAE

The Algebraic Structure of MAE Define 

 k1 k12 . . . k1M    k21 k2 . . . k2M  T =K AA =  .. ..  ..  . .  .  kM1 kM2 . . . kM

Model Averaging for Linear Regression The Alg. Structure of MAE

The Algebraic Structure of MAE Define 

 k1 k12 . . . k1M    k21 k2 . . . k2M  T =K AA =  .. ..  ..  . .  .  kM1 kM2 . . . kM

Properties of H , model weights  T = ( , . . . , M ). M P m km . (i) tr(H ) = m=1

(ii) tr(H2 ) =  T K. (iii) λM (H ) ≤ 1.

Model Averaging for Linear Regression The Risk under Sq-Error Loss

The Risk under Squared-Error Loss ˆ  k2 . Squared-Error Loss L() = kμ − μ


The Risk under Squared-Error Loss ˆ  k2 . Squared-Error Loss L() = kμ − μ ˆ The Conditional Risk of μ R() = E(L()|  1 . . .  n ) = k( − H )μk2 + σ 2  T K =  T (B + σ 2 K),


The Risk under Squared-Error Loss ˆ  k2 . Squared-Error Loss L() = kμ − μ ˆ The Conditional Risk of μ R() = E(L()|  1 . . .  n ) = k( − H )μk2 + σ 2  T K =  T (B + σ 2 K), 

 b11 b12 . . . b1M  .  .. ..  , with bmk = bT (−Hm )(−Hk )bk . .. B= . .   m bM1 bM2 . . . bMM


The Risk under Squared-Error Loss ˆ  k2 . Squared-Error Loss L() = kμ − μ ˆ The Conditional Risk of μ R() = E(L()|  1 . . .  n ) = k( − H )μk2 + σ 2  T K =  T (B + σ 2 K), 

 b11 b12 . . . b1M  .  .. ..  , with bmk = bT (−Hm )(−Hk )bk . .. B= . .   m bM1 bM2 . . . bMM

At least two non-zero  in the optimal . Example: Suppose M = 2,  T = (, 1 − ). Then  ∈ (0, 1) unless b11 = b12 or b22 = b12 .

Model Averaging for Linear Regression Selecting the Model Weights

Selecting the Model Weights  Mallows’ Criterion (MMAE) for MAE (Hansen 2007) C() = k( − H )yk2 + 2σ 2 k ,

k =

M X m=1

m km ,



k =

M X

m km ,

m=1

ˆ such that σ 2 is replaced with an estimate. Select  ˆ = rgmin C().  



k =

M X

m km ,

m=1

ˆ such that σ 2 is replaced with an estimate. Select  ˆ = rgmin C().  

Properties of C(): E[C()] = E[L()] + nσ 2 ˆ L() inf L()

−→p 1

as n → ∞.

and


Smoothed AC and BC P (SAC & SBC) M m = exp(− 12 ACm )/ =1 exp(− 12 AC ) (Bucland 1997, Burnham PM & Anderson 2002), m = exp(− 12 BCm )/ =1 exp(− 12 BC ),


Smoothed AC and BC P (SAC & SBC) M m = exp(− 12 ACm )/ =1 exp(− 12 AC ) (Bucland 1997, Burnham PM & Anderson 2002), m = exp(− 12 BCm )/ =1 exp(− 12 BC ), the AC and BC criteria for model m are 2 ˆm ACm = ln σ + 2km

and

2 ˆm BCm = ln σ + km ln n.



and


Smoothed MDL (SMDL) m = exp(−MDLm )/

M X =1

exp(−MDL ),

where



and


Smoothed MDL (SMDL) m = exp(−MDLm )/

M X

exp(−MDL ),

where

=1

MDLm = n ln sˆ2m +km ln Fm +ln[km (n−km )], (Rissanen 2000 & 2007, Liski 2006)

ˆ m k2 / km sˆ2m Fm = kμ

Model Averaging for Linear Regression Finite Sample Performance

Finite Sample Performance Simulation Model is the infinite order regression y =

∞ X

βj j + ϵ ,

j=1

É

j ∼ N(0, 1) iid (1 = 1), ϵ ∼ N(0, 1) and j ⊥ ⊥ ϵ .



∞ X

βj j + ϵ ,

j=1

É É É É

É

j ∼ N(0, 1) iid (1 = 1), ϵ ∼ N(0, 1) and j ⊥ ⊥ ϵ . p 2 c βj = c 2 j−−1/ 2 and the population R2 = 1+c 2. 50 ≤ n ≤ 1000 and M = 3n1/ 3 .

0.5 ≤  ≤ 1.5, for larger  the coefficients βj decline more quicly. c is selected such that 0.1 ≤ R2 ≤ 0.9.



∞ X

βj j + ϵ ,

j=1

É É É É

É

j ∼ N(0, 1) iid (1 = 1), ϵ ∼ N(0, 1) and j ⊥ ⊥ ϵ . p 2 c βj = c 2 j−−1/ 2 and the population R2 = 1+c 2. 50 ≤ n ≤ 1000 and M = 3n1/ 3 .

0.5 ≤  ≤ 1.5, for larger  the coefficients βj decline more quicly. c is selected such that 0.1 ≤ R2 ≤ 0.9.

ˆ  k2 over simulations. Mean of predictive loss kμ − μ


Simulation Results Comments É

AC and Mallows’ C similar, MMAE better than SAC.




É

SAC has lower risk than AC




É


É

MMAE better than SBC in most cases. Method SBC BC AC

Performs well for n and R2 small,  large ‘small’ models ‘large´ models




É


É


É


SMDL better than MDL.




É


É



É


É

SMDL emulates the best performance criterion.




É


É



É


É

SMDL emulates the best performance criterion.

É

SMDL has the best overall performance.

Model Averaging for Linear Regression References

References Buckland, S. T., Burnham, K. P. and Augustin, N. H. (1997), Model Selection: An Integral Part of Inference. Biometrics, 53, 603–618. Burnham & Anderson (2002), Model Selection and Multi-model Inference, Springer Draper, D. (1995), Assesment and Propagation of Model Uncertainty. Journal of the Royal Statistical Society B, 57, 45–70. Hansen, B. E. (2007), Least Squares Model Averaging. Econometrica, Forthcoming. Hjort, L. H. and Claeskens, G. (2003), Frequentist Model Average Estimators. Journal of the American Statistical Association, 98, 879–899.

Model Averaging for Linear Regression References

Liski, E. P. (2006), Normalized ML and the MDL Principle for Variable Selection in Linear Regression In: Festschrift for Tarmo Pukkila on His 60th Birthday, 159-172. Raftery, A. E., Madigan, D. and Hoeting, J. A. (1997), Bayesian Model Averaging for Regression Models. Journal of the American Statistical Association, 92, 179–191. Rissanen, J. (2000). MDL Denoising. IEEE Trans. Information Theory, IT-46, pp. 2537–2543. Rissanen, J. (2007), Information and Complexity in Statistical Modeling, Springer

Model Averaging for Linear Regression - CiteSeerX

Model Averaging for Linear Regression - CiteSeerX

Suggest Documents

Dynamic Model Averaging for Bayesian Quantile Regression

Improving Regression Estimation: Averaging Methods for ... - CiteSeerX

Dynamic Logistic Regression and Dynamic Model Averaging for ...

Multiple Linear regression model for predicting ...

LINEAR REGRESSION MODEL FOR KNOWLEDGE ... - AIRCCwww.researchgate.net › publication › fulltext › Linear-Re

DISCOUNTED LIKELIHOOD LINEAR REGRESSION FOR ... - CiteSeerX

MULTIPLE LINEAR REGRESSION FOR INDEX SNP ... - CiteSeerX

MULTIPLE LINEAR REGRESSION FOR INDEX SNP ... - CiteSeerX

Comparing Linear Regression and Regression Trees for ... - CiteSeerX

Comparing Linear Regression and Regression Trees for ... - CiteSeerX

Simple ensemble-averaging model based on generalized regression

Developing linear multiple regression model of Indian ... - CiteSeerX

A Bayesian Mixture Model with Linear Regression Mixing ... - CiteSeerX

On Model Selection in Robust Linear Regression - CiteSeerX

Extending the simple linear regression model to account ... - CiteSeerX

Accelerated linear iterations for distributed averaging

Adaptive Estimation of Heteroscedastic Linear Regression Model ...

Linear Mixed Model Robust Regression

predictions in linear regression model with

Linear Regression

Linear Regression

STEPWISE LINEAR REGRESSION FOR DIMENSIONALITY ...

1 Weighted Averaging Partial Least Squares regression ... - CiteSeerX

Using factor scores in multiple linear regression model for ... - TSpace