MODEL SELECTION IN LINEAR MIXED MODELS USING ... - CiteSeerX

MODEL SELECTION IN LINEAR MIXED MODELS USING MDL CRITERION WITH AN APPLICATION TO SPLINE SMOOTHING Erkki P. Liski1 and Antti Liski2 1

Department of Mathematics and Statistics, University of Tampere, FIN-33014 Tampere, FINLAND, [email protected] 2 Institute of Signal Processing, Tampere University of Technology, P.O.Box 553, FIN-33101 Tampere, FINLAND, [email protected] ABSTRACT For spline smoothing one can rewrite the smooth estimation as a linear mixed model (LMM) where the smoothing parameter appears as the variance of spline basis coefficients. Smoothing methods that use basis functions with penalization can utilize maximum likelihood (ML) theory in LMM framework ([8]). We introduce the minimum description length (MDL) model selection criterion in LMM and propose an automatic data-based spline smoothing method based on the MDL criterion. 1. INTRODUCTION This paper considers model selection for LMM using the MDL principle ([4], [5] and [6]). Regression splines that use basis functions with penalization can be fit conveniently using the machinery of LMMs, and thereby borrow from a rich source of existing methodology [8]. In this article we present the MDL criterion under LMM for choosing the number of knots, the amount of smoothing and the basis jointly. A simulation experiment was conducted to compare the performance of the MDL method with that of the corresponding techniques based on the Akaike information criterion, AIC, corrected AIC, AICc, and generalized crossvalidation, GCV . The model known as the linear mixed model may be written as y

= Xβ + Zb + ε, ε ∼ N(0, σ 2 In ),

2. ESTIMATION FOR LINEAR MIXED MODELS Estimation of the fixed effects β and random effects b entails the penalized least squares criterion ky − Xβ − Zbk2 +

(1)

where X and Z are known n × p and n × m matrices, respectively, b is is the m × 1 vector of random effects that occur in the n × 1 data vector y and β is the p × 1 vector of unknown fixed effects parameters. Compared with the ordinary linear regression model, the difference is Zb, which may take various forms, thus creating a rich class of models. Then under these conditions we have

1 kbk2 , λ

(4)

where λ > 0 is a tuning parameter. For a given λ, minimizing (4) with respect to β and b leads to the so-called mixed model equations (e.g. [9], Section 7.6.) ′ ′ ˆ XX X′ Z β Xy (5) 1 ′ ′ ˆ = Z′ y . Z X Z Z + λ Im b Let δ = (β ′ , b′ )′ , M = (X, Z) and D = diag(0, . . . , 0, 1, . . . , 1) a (p + m) × (p + m) diagonal matrix, whose first p diagonal elements are zero and the other m diagonal elements are 1. Then we obtain from (5) ˆ 1 β ˆ δ = ˆ = (M′ M + D)−1 M′ y. (6) λ b Assuming the estimator (6), the conditional M L estimator of σ 2 is σ ˆ2

b ∼ N(0, φ2 Im ), Cov(b, ε′ ) = 0,

them. For these we refer to large literature on mixed models (see e.g. [1] and [9]).

=

ˆ 2 = n−1 y ′ (I − H)2 y. (7) n−1 ky − Mδk

The fitted values are ˆ = Hy, y

(8)

where the hat matrix H can be written as H = M(M′ M+ 1 −1 M′ . The conditional likelihood arises from the λ D) conditional distribution of y|b ∼ N(Xβ + Zb, σ 2 In ) corresponding to (3).

y ∼ N(Xβ, σ 2 V)

(2)

3. MODEL SELECTION IN LINEAR MIXED MODELS USING MDL CRITERION

y|b ∼ N(Xβ + Zb, σ 2 In ),

(3)

Let the variable η index the set of candidate models. We consider a set of normal models of the form

and where V = λZZ′ + In and λ = φ2 /σ 2 . There are different types of LMMs, and different ways of classifying

y|bη ∼ N(Xη βη + Zη bη , σ 2 In ),

where Xη and Zη are n× pη and n× mη matrices, respectively, corresponding to the candidate model η. Here β η and bη are n × pη and n × mη parameter vectors for the ˆ ,ˆ model η. Note that the estimates β ˆη2 depend on η bη and σ the parameter λ ∈ [0, ∞]. In this conditional framework we specify a model by giving the pair (η, λ) and denote γ = (η, λ). Rissanen [4] developed an MDL criterion based on the normalized maximum likelihood (NML) coding scheme. Assume that the response data are modelled with a set of density functions f (y; γ, θ), where the parameter vector θ varies within a specified parameter space. The NML function is defined by ˆ f (y; γ, θ) , fˆ(y; γ) = C(γ)

(9)

5. REFERENCES

ˆ = θ(y) ˆ where θ is the ML estimator of θ and Z ˆ C(γ) = f (x; γ, θ(x)) dx

[1] Demidenko, E. (2004). Mixed models, Wiley. (10)

is the normalizing constant. The integral in (10) is taken over the sample space. Thus fˆ(y; γ) defines a density function, provided that C(γ) is bounded. The expression ˆ + log C(γ) − log fˆ(y; γ) = − log f (y; γ, θ)

(11)

is taken as the ”shortest code length” for the data y that can be obtained with the model γ and it is called the stochasˆ= tic complexity of y, given γ ([4]). Here the estimate θ ˆ σ (δ, ˆ 2 ) is given by (6) and (7). The last term in the equation (11) is called the parametric complexity, where the normalizing constant C(γ) is defined by (10). 4. SPLINE SMOOTHING USING MDL CRITERION We consider a parametric regression spline model r(x; β, b) = β1 +β2 x+· · ·+βp xp−1 +

m X

In smoothing we control three modeling parameters: the degree of the regression spline p − 1, the number of knots m and the smoothing parameter λ. The fitted values for a spline regression are given by (8). In addition to the value of λ, the degree of the regression spline and the number and location of knots must be specified. Here we adopt the procedure where the knots are located at ”equally spaced” sample quantiles of x1 , . . . , xn . A model estimator γˆ is obtained by minimizing the the M DL selection criterion with respect to model γ = (p, m, λ) using numerical optimization routines. The performance of the method was compared with that of the corresponding techniques based on the Akaike information criterion, AIC, corrected AIC, AICc, and generalized crossvalidation, GCV using simulation experiments.

bj zj (x), (12)

j=1

where the first p terms are a (p − 1)th order polynomial of x, covariates z1 (x), . . . , zm (x) are elements of a smoothing basis, and β = (β1 , . . . , βp )′ and b = (b1 , . . . , bm )′ are unknown parameters. Then (12) can be written as yi = x′i β + z ′i b + εi , where xi = (1, xi , . . . , xp−1 )′ and z i = (z1 (xi ), . . . , i zm (xi ))′ . Typically xi is low-dimensional and z i is highdimensional basis linearly independent of xi . A convenient choice is to use the truncated power basis of degree p−1. Then the ith row of Z is z i = ((xi −κ1 )p−1 + , . . . , (xi − p−1 κm )+ ) with x+ as positive part, so that for any number x, x+ is x if x is positive and is equal to 0 otherwise. The knots κ1 , . . . , κm are fixed values covering the range of x1 , . . . , xn . Penalized spline estimation for smoothing was made popular in statistics by Eilers and Marx [2].

[2] Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11, 89–121. [3] Green, D. J. and Silverman B. W. (1996). Nonparametric regression and generalized linear models, Chapman& Hall, London. [4] Rissanen, J. (1996). Fisher Information and Stochastic Complexity. IEEE Transactions on Information Theory, IT-42, No. 1, 40–47. [5] Rissanen, J. (2000). MDL Denoising. IEEE Trans. on Information Theory, IT-46, No. 1, 2537–2543. [6] Rissanen, J. (2007). Information and Complexity and in Statistical Modeling. New York, Springer-Verlag. [7] Ruppert, D. (2002). Selecting the number of knots for penalized splines. Journal of Computational and Graphical Statistics, 11, 735–754. [8] Ruppert, D., Wand, M. P., Carroll, R. J. (2003). Semiparametric regression, Wiley. [9] Searle, S. R., Casella, G., and McCulloch, C. E. (1992). Variance Components. New York, Wiley.

MODEL SELECTION IN LINEAR MIXED MODELS USING ... - CiteSeerX

MODEL SELECTION IN LINEAR MIXED MODELS USING ... - CiteSeerX

Suggest Documents

consistent procedures for mixed linear model selection

Longitudinal Data Analyses Using Linear Mixed Models in ... - CiteSeerX

Fitting Linear Mixed-Effects Models using lme4

ASSESSING GENERALIZED LINEAR MIXED MODELS USING ... - ijicic

Linear Mixed Models

Model Selection for Generalized Linear Models via GLIB ... - CiteSeerX

Bivariate linear mixed models using SAS proc MIXED - at www.arxiv.org.

Mixed Integer Linear Programming models for optimal crop selection

Model Selection for Linear Classifiers using Bayesian

Longitudinal Data Analyses Using Linear Mixed Models in SPSS

Longitudinal Data Analyses Using Linear Mixed Models in ... - Hindawi

Generalized Linear Mixed Models

in linear models - CiteSeerX

consistent procedures for mixed linear model selection - Sankhya

Model fitting and selection procedure We used linear mixed ... - PLOS

Model fitting and selection procedure We used linear mixed ... - PLOS

Using linear mixed model and dummy variable model approaches to ...

Profile Monitoring Using Mixed-Effects Models - CiteSeerX

Feature Selection using Linear Classifier Weights - CiteSeerX

S8 Table: Full model selection table for linear mixed models ... - PLOS

Linear and Non-Linear Mixed Models in Longitudinal Studies and ...

Linear regression model selection using p-values when the model ...

An Introduction to Generalized Linear Mixed Models Using SAS ...

Bayesian linear mixed models using Stan: A tutorial for psychologists ...