MODEL SELECTION IN LINEAR MIXED MODELS USING ... - CiteSeerX

1 downloads 0 Views 44KB Size Report
MODEL SELECTION IN LINEAR MIXED MODELS USING MDL CRITERION WITH. AN APPLICATION TO SPLINE SMOOTHING. Erkki P. Liski1 and Antti Liski2.
MODEL SELECTION IN LINEAR MIXED MODELS USING MDL CRITERION WITH AN APPLICATION TO SPLINE SMOOTHING Erkki P. Liski1 and Antti Liski2 1

Department of Mathematics and Statistics, University of Tampere, FIN-33014 Tampere, FINLAND, [email protected] 2 Institute of Signal Processing, Tampere University of Technology, P.O.Box 553, FIN-33101 Tampere, FINLAND, [email protected] ABSTRACT For spline smoothing one can rewrite the smooth estimation as a linear mixed model (LMM) where the smoothing parameter appears as the variance of spline basis coefficients. Smoothing methods that use basis functions with penalization can utilize maximum likelihood (ML) theory in LMM framework ([8]). We introduce the minimum description length (MDL) model selection criterion in LMM and propose an automatic data-based spline smoothing method based on the MDL criterion. 1. INTRODUCTION This paper considers model selection for LMM using the MDL principle ([4], [5] and [6]). Regression splines that use basis functions with penalization can be fit conveniently using the machinery of LMMs, and thereby borrow from a rich source of existing methodology [8]. In this article we present the MDL criterion under LMM for choosing the number of knots, the amount of smoothing and the basis jointly. A simulation experiment was conducted to compare the performance of the MDL method with that of the corresponding techniques based on the Akaike information criterion, AIC, corrected AIC, AICc, and generalized crossvalidation, GCV . The model known as the linear mixed model may be written as y

= Xβ + Zb + ε, ε ∼ N(0, σ 2 In ),

2. ESTIMATION FOR LINEAR MIXED MODELS Estimation of the fixed effects β and random effects b entails the penalized least squares criterion ky − Xβ − Zbk2 +

(1)

where X and Z are known n × p and n × m matrices, respectively, b is is the m × 1 vector of random effects that occur in the n × 1 data vector y and β is the p × 1 vector of unknown fixed effects parameters. Compared with the ordinary linear regression model, the difference is Zb, which may take various forms, thus creating a rich class of models. Then under these conditions we have

1 kbk2 , λ

(4)

where λ > 0 is a tuning parameter. For a given λ, minimizing (4) with respect to β and b leads to the so-called mixed model equations (e.g. [9], Section 7.6.)  ′    ′  ˆ XX X′ Z β Xy (5) 1 ′ ′ ˆ = Z′ y . Z X Z Z + λ Im b Let δ = (β ′ , b′ )′ , M = (X, Z) and D = diag(0, . . . , 0, 1, . . . , 1) a (p + m) × (p + m) diagonal matrix, whose first p diagonal elements are zero and the other m diagonal elements are 1. Then we obtain from (5)   ˆ 1 β ˆ δ = ˆ = (M′ M + D)−1 M′ y. (6) λ b Assuming the estimator (6), the conditional M L estimator of σ 2 is σ ˆ2

b ∼ N(0, φ2 Im ), Cov(b, ε′ ) = 0,

them. For these we refer to large literature on mixed models (see e.g. [1] and [9]).

=

ˆ 2 = n−1 y ′ (I − H)2 y. (7) n−1 ky − Mδk

The fitted values are ˆ = Hy, y

(8)

where the hat matrix H can be written as H = M(M′ M+ 1 −1 M′ . The conditional likelihood arises from the λ D) conditional distribution of y|b ∼ N(Xβ + Zb, σ 2 In ) corresponding to (3).

y ∼ N(Xβ, σ 2 V)

(2)

3. MODEL SELECTION IN LINEAR MIXED MODELS USING MDL CRITERION

y|b ∼ N(Xβ + Zb, σ 2 In ),

(3)

Let the variable η index the set of candidate models. We consider a set of normal models of the form

and where V = λZZ′ + In and λ = φ2 /σ 2 . There are different types of LMMs, and different ways of classifying

y|bη ∼ N(Xη βη + Zη bη , σ 2 In ),

where Xη and Zη are n× pη and n× mη matrices, respectively, corresponding to the candidate model η. Here β η and bη are n × pη and n × mη parameter vectors for the ˆ ,ˆ model η. Note that the estimates β ˆη2 depend on η bη and σ the parameter λ ∈ [0, ∞]. In this conditional framework we specify a model by giving the pair (η, λ) and denote γ = (η, λ). Rissanen [4] developed an MDL criterion based on the normalized maximum likelihood (NML) coding scheme. Assume that the response data are modelled with a set of density functions f (y; γ, θ), where the parameter vector θ varies within a specified parameter space. The NML function is defined by ˆ f (y; γ, θ) , fˆ(y; γ) = C(γ)

(9)

5. REFERENCES

ˆ = θ(y) ˆ where θ is the ML estimator of θ and Z ˆ C(γ) = f (x; γ, θ(x)) dx

[1] Demidenko, E. (2004). Mixed models, Wiley. (10)

is the normalizing constant. The integral in (10) is taken over the sample space. Thus fˆ(y; γ) defines a density function, provided that C(γ) is bounded. The expression ˆ + log C(γ) − log fˆ(y; γ) = − log f (y; γ, θ)

(11)

is taken as the ”shortest code length” for the data y that can be obtained with the model γ and it is called the stochasˆ= tic complexity of y, given γ ([4]). Here the estimate θ ˆ σ (δ, ˆ 2 ) is given by (6) and (7). The last term in the equation (11) is called the parametric complexity, where the normalizing constant C(γ) is defined by (10). 4. SPLINE SMOOTHING USING MDL CRITERION We consider a parametric regression spline model r(x; β, b) = β1 +β2 x+· · ·+βp xp−1 +

m X

In smoothing we control three modeling parameters: the degree of the regression spline p − 1, the number of knots m and the smoothing parameter λ. The fitted values for a spline regression are given by (8). In addition to the value of λ, the degree of the regression spline and the number and location of knots must be specified. Here we adopt the procedure where the knots are located at ”equally spaced” sample quantiles of x1 , . . . , xn . A model estimator γˆ is obtained by minimizing the the M DL selection criterion with respect to model γ = (p, m, λ) using numerical optimization routines. The performance of the method was compared with that of the corresponding techniques based on the Akaike information criterion, AIC, corrected AIC, AICc, and generalized crossvalidation, GCV using simulation experiments.

bj zj (x), (12)

j=1

where the first p terms are a (p − 1)th order polynomial of x, covariates z1 (x), . . . , zm (x) are elements of a smoothing basis, and β = (β1 , . . . , βp )′ and b = (b1 , . . . , bm )′ are unknown parameters. Then (12) can be written as yi = x′i β + z ′i b + εi , where xi = (1, xi , . . . , xp−1 )′ and z i = (z1 (xi ), . . . , i zm (xi ))′ . Typically xi is low-dimensional and z i is highdimensional basis linearly independent of xi . A convenient choice is to use the truncated power basis of degree p−1. Then the ith row of Z is z i = ((xi −κ1 )p−1 + , . . . , (xi − p−1 κm )+ ) with x+ as positive part, so that for any number x, x+ is x if x is positive and is equal to 0 otherwise. The knots κ1 , . . . , κm are fixed values covering the range of x1 , . . . , xn . Penalized spline estimation for smoothing was made popular in statistics by Eilers and Marx [2].

[2] Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11, 89–121. [3] Green, D. J. and Silverman B. W. (1996). Nonparametric regression and generalized linear models, Chapman& Hall, London. [4] Rissanen, J. (1996). Fisher Information and Stochastic Complexity. IEEE Transactions on Information Theory, IT-42, No. 1, 40–47. [5] Rissanen, J. (2000). MDL Denoising. IEEE Trans. on Information Theory, IT-46, No. 1, 2537–2543. [6] Rissanen, J. (2007). Information and Complexity and in Statistical Modeling. New York, Springer-Verlag. [7] Ruppert, D. (2002). Selecting the number of knots for penalized splines. Journal of Computational and Graphical Statistics, 11, 735–754. [8] Ruppert, D., Wand, M. P., Carroll, R. J. (2003). Semiparametric regression, Wiley. [9] Searle, S. R., Casella, G., and McCulloch, C. E. (1992). Variance Components. New York, Wiley.

Suggest Documents