Robust Estimation via In uence Function in the Logistic Regression Model Nino Kordzakhia Gita Mishra Department of Statistics The University of Newcastle Callaghan, NSW 2308 Australia
[email protected] In this paper M-estimators are studied for multiple logistic regression models. With the assumption that the sequence of distributions corresponding to the contaminated models is contiguous with respect to that from pure models, the asymptotic normality of the estimators is determined. The optimal in uence function is given as the analytical solution of the minimax problem. The latter consisted of minimizing the mean squared deviance in the worst case contamination.
1.Introduction There are two main approaches to nding robust parameter estimators, the rst minimizes some functional of the likelihood process, (Bianco et al.(1997)) and the second is based on in uence functions (Hampel et al.(1986)) now to be extended to multiple logistic regression model. The component processes de ned below, are used to form the contaminated sequence via the general replacement model: Y Bernoulli( p p ); the core process; W Bernoulli(q ); the contaminating process; and " Bernoulli(v= n); v 0; where Y is driven by generalised linear model with a vector of explanatory variables XT = (X1 ; : : : ; X ), the vector of parameter T = ( 1; : : : ; ) and the link function G such that p = G(XT :) Note that X has distribution : So the observed sequence is given by Y~ = (1 , " )Y + " W where (Y1; : : : ; Y ) and (W1; : : : ; W ) are assumed to be samples of i.i.d. random variables. The conditonal probability density functions of Y and W given X = x are respectively g (y; x) = G(xT ) (y , 1) + (1 , G(xT )) (y ) f (y; x) = F (x) (y , 1) + (1,F (x))(y ) where is the generalized delta function. i
i
i
i
i
n
r
r
i
i
n
n
i
n
i
n
i
i
2.Robust Estimates
~ = (Y~1; : : : ; Y~ ) with Y~ Let the sequence of probability measures P be generated by Y p Bernoulli(p + (q , p )): The localizing vector parameter u is de ned so that the probabilities of 'success' p = P (Y = 1); i = 1; : : : ; n in the pure logistic model are p = G(X ); where = + un,1 2 : Then the family of probability measures P is parameterized by u and the corresponding family is denoted by P : Under the condition of ergodicity, the likelihood ratio Z = is exponentially normal which implies the contiguity of fP g 1 with respect to fP g 1; that is P / P : v n
n
i
v
i
n
i
i
u
i
u
i
T
i
=
v
n
v;u
n
n
v;u
dPn
v;u n
dPn
n
v;u
n
v;u
n
n
n
u
Theorem 1 If ^n is a linear consistent estimator of a true parameter w.r.t. Pn, then pn( ^ , ) d,! P + Vu;v n (
v;u ) n
R1 R1 where N (0; V ); V = (y; x) (y; x; )T g(y; x; ) dy (dx); ,1 ,1 R1 R1 @g(y;x) T Vu;v = ,1 ,1 @ u+ v(f (y; x) , g(y; x)) (y; x; )dy (dx):
Let the score function
l(y; x; ) = @g(gy;(y;x)x=@ ) ;
then de ne the optimal in uence function as that function which minimizes
2
( ; ) =
Z
Rr+1
( (y; x; ) , l(y; x; ))T ( (y; x; ) , l(y; x; )) g(y; x) (dx)dy;
subject to constraints on Gross Error Sensitivity GES (
) = supf jj
Z
Rr+1
j(f (y; x) , g(y; x)j (y; x; ) (dx) dyjj < C:
The minimization of ( ; ) is equivalent to the minimization of the trace of the variancecovariance matrix V : Applying the Lagrange-multiplier technique, the solution to the minimax problem is (y; x) = [l(y; x; ) , (x)] C ,C where (x) = ( (x); (x); : : : ; r (x)) is a vector of solutions of the following system of equations Z1 [li (y; x; ) , i(x)],CC g(y; x; ) dy = 0; i = 1; : : : ; r + 1: 2
1
2
+1
,1
The behaviour of the robust estimator will be illustrated with a simulation study.
REFERENCES Bianco, A.M. and Yohai, V.J. (1996). Robust Estimation in the Logistic Regression Model. Robust Statistics, Data Analysis, and Computer Intensive Methods (ed H. Rieder), 17-34. Springer-Verlag. New York. Hampel, F.R., Ronchetti E.M, Rousseuw P.J. and Stahel W.A (1986). The Robust Statistics: The Approach Based on In uence Functions. Wiley & Sons. Berlin.
FRENCH RE SUME
Dans ce travail on presente des resultats concernant de la normalite asymptotique des Mestimateurs des parameters dans un modele de la regression logistic. A partir de la codition du contiguite des lois 'contaminees' envers dews lois 'purs' on presente le critere d'optimalite avec la solution exact du probleme d'optimization correspondent.