A New Nonparametic Estimation Method: Local and ... - CiteSeerX

0 downloads 0 Views 199KB Size Report
Mar 15, 1991 - where constants A and W do not deped on. 2 (0;1): Since for every function f 2 Fn we have jfj const K x?X h we get (Ejfj2)1=2 const hp.
A New Nonparametic Estimation Method: Local and Nonlinear Andrzej S. Kozek  Department of Computer Science, University of Wroclaw Przesmycki St. 20, 51-151 Wroclaw, Poland March 15, 1991

Abstract

In the paper we consider a new class (called ) of nonparametric estimators of regression function r(x) = E (Y X = x): The new estimators link nonparametric and parametric methods. extends the class of Macauley, Cleveland, and Stone estimators and replaces their locally linear models (Loess) given by (1.3) with a general locally parametric nonlinear model g(x; ). The parameter  is estimated locally by tting g(x; ) to a sub-sample (X 1 ; Y 1 ); : : : ; (X k ( ) ; Y k( ) ) with X 1 ; : : : ; X k ( ) being close to x. The resulting nonparametric estimator of r(x) is of the form g(x; ^ )(x). Under proper regularity assumptions it is strongly consistent. An interactive software package running under MS DOS on IBM PC and compatabile, and implementing a variety of nonparametric Loess- and NoLoess-type estimators has been developped. NoLoess

j

NoLoess

i

i

i

i

i

x

i

x

FS 2.0

Abbreviated Title: NoLoEss Keywords: regression, nonparametric, local, nonlinear, strong consistency, applications. * The author is visiting thru academic year 1991-1992 at the University of Arkansas, Fayetteville.

x

M() = 1.

where g(x; ) is a `trial' parametric regression function (in general g may be a nonlinear function of the vector parameter ), M is a convex function de ning a conditional M- functional, WM is a nonnegative weight function. The estimators of conditional M-functionals are of the form g (x; ^ (x)); where ^ (x) is the value of  which minimizes (1.2). They extend several important classes of nonparametric estimators considerd so far. We mentioned already NW, NN and OQ estimators of the regression function which are of the form (1.2) with M (y ) = y 2 and g(x; ) = ;  2 R1. If M (y) = y2 and

INTRODUCTION.

The importance of the use of the correct model or its satisfactory approximation is well recognized in Statistic. In this direction, in 1990 Parzen proposed a uni cation of methods in statistical data analysis. His four step scheme for the model identi cation process is based both on parametric and nonparametric methods, and on their comparison. Consistent tests of t of parametric models for regression function r(x) = E (Y jX = x) have also been proposed in Kozek (1990,1991). The tests are based on a comparison of parametric and nonparametric estimators of r(x): For tests and discussion on similar subject in case of non-random design x we refer to Eubank and Spiegelman (1990). In the present note we follow the idea of comparison of parametric and nonparametric estimators and propose an extension of nonparametric methods for estimation of a regression function. Here we consider only the random design. Let (X; Y ) be a (p + 1)dimensional random vector (r.v.), X 2 Rp ; Y 2 R1 and (X; Y ); (X1; Y1); : : :; (Xn; Yn ) be independent identically distributed (i.i.d) r.v. It is well known (cf. Hardle et al. (1988) and Hall and Jones (1990) that the most frequently considered nonparametric estimators such as NadarayaWatson (NW), k-Nearest Neighbor (NN), and pOptimal Quantile (OQ) (see Kozek and Schuster (1990) for more details) are minimizing a local t function

M0() =

n X

=1

i

(Yi

?  )2 K



x ? Xi  hv

  M (Yi ?g(Xi; ))K x ?h Xi WM (Xi); v i=1 (1:2)

n X

g(x; ) = 0 +  x with . standing for the usual inner product on Rp and  = ( 0 ; 1; : : :; p) = ( 0 ; ) then we get the class of estimators locally linear in parameters, called Loess (Cleveland (1979)) and introduced by Macauley (1931), Stone (1977), and Cleveland (1979). Since our estimators can be considered as nonlinear Loess we shall use abbreviation NoLoess in the sequel. Case of  2 R1 and a convex function M with a bounded derivative admits an easy interpretation. Then M 0 (y ) = c  (G(y ) ?  ); where G ia a cumulative distribution function (c.d.f.) of R some r.v. Z , and the integral M (y ? )F (dy ) achieves its minimum at a (1 ?  )-quantile of the c.d.f. of the r.v. Y ? Z where Y  F (Kozek (1984), p. 155). If the probability distribution of Y has a symmetry center and M is symmetric about zero then the minimum of the integral coincides with the symmetry center, median, and expectation of F. Clearly, this coincidence fails in general when F is not symmetric. Since M is used to make the corresponding estimator robust and no deviation from expectation is acceptable, the conditional M-functionals require additional symmetry assumption on the conditional probability distribution of Y given X = x. Estimators

(1:1)

where suitable window bandwidth hv is speci c for a given estimator type, and where K () ia a nonnegative kernel. We propose to extend (1.1) to the form 1

Smoothing parameter v0 is called `optimal' if it equals argmin of

of functionals of conditional probability distributions have been already considered in the literature, cf. Hardle et al. (1988), Hall and Jones (1990), and the references quoted in these papers. Our experience with software packages FS and FS 2.0 1 shows that Loess type estimators are typically performing much better in small and moderate sample sizes than NW, NN, and OQ estimators do. This should not be surprising since in case of a box-type kernel NW, NN, and OQ are approximating r(x) using piecewise constant functions while Loess uses locally a linear space of functions, e.g. polynomials. Smooth kernels are smoothing the resulting t and `make up' the lack of any formal smoothness requirement present in case of splines. Formal justi cation of better performance of Loess estimators in terms of the reduction of the Mean Squared error (MSE) can be found e.g. in Fan (1991). NW, NN, and OQ and Loess lack however the superb feature of parametric models: the ease of parameter interpretation. We hope that NoLoess and interpretation of the dependence of parameters (x) on x may help researchers to improve nonadequate parametric models or, in `weak or no dependence' case, may support the model and its interpretation. We recommend for g (x; ) a function which represents experts knowledge about the actual regression function. In this case only almost a parametric t and typically small or moderate correction through dependence of parameters on x are required.

(v ) =

n X

=1

i

(Yi ? g (Xi; ^v )) *W ;

(2:1)

where meets the same conditions as M in (1.2), !

*W =  P K (0)  W (Xi); K ( x?hX ) and W is a bounded and positive weight funci

v

tion. User can either supply his own value of a smoothing parameter v or choose an `automatic' selection of an optimal smoothing parameter v0. FS 2.0 provides a collection of penalizing functions  including those listed in Hardle (1991). All calculations reported in Section 5 were obtained using FS 2.0. We postpone a more detailed description of the modular structure of the package and implemented numerical solutions to another paper.

3.

Technical Requirements for NoLoess

In this section we list assumptions necessary for the consistency result from Section 4. A. (X; Y ) is a random vector in X  R1 ; X  Rp such that

Y = r(X ) + ;

2. Fit Short 2.0 | an Implementation of NoLoess

(3:1)

where r is bounded and has bounded and continuous derivatives on X. B. X and  are independent, and  has symmetric probability distribution; both X and  are assumed to have probability density functions, fX and f respectively, which are di erentiable and have bounded derivatives. C. M is a symmetric strictly convex function on R1 satisfying condition M (2t) < C  M (t) for all t, and with a continuous, bounded derivatives M 0. D. g (x; ) is bounded for every  and has a

is an interactive package c 2. FS 2.0 written in Borland's Turbo Pascal implements NoLoess in the form given by (1.2) for x 2 R1 and with a broad range of options. Fit Short 2.0 (FS 2.0)

FS and FS 2.0 were written by the author with an assistance of K.K Kozek and use a system of windows written by J. Witkowski. 2 Turbo Pascal is a registered trademark of Borland Int., Inc. 1

2

Section 3 imply F 1 and F 2 have polynomial discrimination. Thus|by the triangle inequality|  the family Fn = f1  f2 : f1 2 F 1; f2 2 F 2 also has polynomial discrimination. By Approximan o tion Lemma (Pollard (1984), p.27) we get a 1 (r(x)?; r(x)+)  2 R : = g (x; );  2  : bound for the covering number N1(; P; Fn)  A?W F. K (x) = k(jxj); where k is di erentiable, bounded, positive and decreasing on [0; 1); and constants A and W do not deped on k(1) = 0: WM and W are bounded and posi- where 2 (0; 1): Since for everyfunction f 2 Fn we tive. have jf j  const  K x?hX we get (E jf j2)1=2  G. E M () < 1, and E (M 0())2 < 1: consthpn : (see Pollard (1984), pp. 35-36 for more H. For every x 2 X and h > 0 the function details). Now, we infer from Theorem 37, Chapter 2 in Pollard (1984) that (x; ; h) = sup jEP f ? EP f j  const  hpn : E M 0(Y ? g(X; ))g(x; )K ((x ? X )=h)WM (X )

continuous vector of derivatives g (x; ); the euclidean norm of which has a bounded envelope on X  ;   Rq : E. For every x 2 X

f

has a unique zero at 0 (x; h): I. For every 0 <  < 0 , and 0 < h < h0 inf j(x; ; h)j > c  hp ;

2Fn

n

Let stand for a function from F 1: By Lemma 1 in Greblicki et al. (1984) we get

(3:2)

(1=(nhdn))

where the in mum is over (x; ) 2 X  B(0(x; h); ); and B(; ) stands for the ball in Rp with center at  and radius .

n X

=1

i

(Xi; Yi ; )K ((x ? Xi )=h) a:s: !

E ( (X; Y; )jx = X )  fX (x): Suppose that estimator g (x; ^(x)) is not consis-

tent with probability 1. Then assumption (3.2) on (x; ; h) and the convergence obtained above contradict property (??)f the estimators. The 4. Consistency of NoLoess consistency in case of the parametric model follows from Theorem 2 in Huber (1967) and the  3 Theorem. Let (X; Y ); (Xi; Yi); i = 1; : : :; n observation that under assumptions of xSection ?X atthe function E M ( Y ? g ( X;  ))  K h be independent random variables, and assume tains for a.e. x its minimum at the same point that the conditions stated in Section 3 are ful^  which is unique for every nonsingular prob0 lled. If an estimator g (x;  (x)) satis es ability distribution of X . Indeed, (3.1), indeEP M 0(Y ?g(X; ^ )g (X; ^ )K ((x ? X )=hn) = 0; pendence of X and , and the Anderson Lemma (4:1) imply that g (x; 0) = g (x; (x)) for a.e. x; where Pn is the empirical probability measure where (x) minimizes the conditional expecta generated by (Xi; Yi ); i = 1; : : :; n then tion E (M (Y ? g (X; ))  K s?hX jX = x : Now, (3.2) implies that 0 = (x) a.s.  g(x; ^ (x)) ! r(x) a.s. n

If r(x) = g (x; 0) for some 0 then the convergence holds also for a constant kernel K (x)  1. 5. Example Proof. Let F 1 = fM 0(y ? g(x; ))  g (x; ) :  2 g and F 2 = fK ((z ? x)=hn) : z 2 Xg be families of We applied NoLoess and FIT SHORT 2.0 for functions. It is easy to see that the conditions of data from volatile organic compounds (VOC)

3

emission considered in Dunn3 and Chao (1992). Dunn and Chao considered models based on systems of di erential equations with linear terms which relate several factors. These authors obtained the corresponding analytic solution and tted the parameters by the nonlinear least squares method. The original t was not satisfactory, but it was considerably improved with a proper system of weights. In the remaining part of the paper we brie y report on our attempt to answer the question if simpli ed models with variable coecients can be useful in modeling the VOC emission e ects. One of the considered simpli ed models was given by a di erential equation (d.e)

in the`power' case. Model (5.1) shows very weak dependence of the t given by (2.1) on the window bandwidth and the corresponding parametric estimator practically coincides with the nonparametric estimators (cf. the better t on Fig. 1). This suggests that d.e. describing model for VOC emission should include nonlinear expressions of c(x), and may indicate presence of slow desorption processes resulting from interactions between the VOC particles. Such interactions are adequately described in a model d.e. by c2 (x) and correspond to value C = 1 in (5.2). Figure 2 shows the joint pointwise behavior of parameters A; B; C in cases of small window and `exponential' model, and for both small and large windows and `power' model. The nine graphs show their (d=dt)c(x) = a ? b  c(x); c(x0 ) = d rather small stochastic uctuations and display the amount of numerical instability. It seems to with solution for x0 = 48:0 of the form be caused by a tolerant but speeding up calculations rule in nding minimum of (1.2) c(x) = A + B exp(?C (x ? 48:0)): (5:1) and bystopping a at graph of M near the minimum. An alternative trial model was suggested by a Whenever a parametric model g (x; ) is aclinear t for log|log plot of data and was given ceptable, function (p) given by (2.1) and the value of p minimizing it provide an information by a d.e. on how many observations are necessary to estib mate adequately the parameters of the model. In (d=dt)c(x) = a  c (x); b > 1:0; c(x0) = d the present case model (5.2) seems to meet this Its solution is of the form requirement and also suggests changes in the full model of the VOC desorption. c(x) = A=(x + B)C : (5:2) We let  = (A; B; C ); and use g (x; A; B; C ) = A + B exp(?C (x ? 48:0)) in the `exponential' case (5.1), and g (x; A; B; C ) = A=(x + B )C in the `power' case (5.2). Since the data are highly non-uniform we have chosen the OQ type of the window bandwidth. Figure 1 shows behavior of parametric estimators (K = =  = W = W  1:0) and nonparametric ones (OQ; K (x) = c  (1 ? x2)2 ; M (x) = (x) = jxj ? ln(1 + jxj); (u) = 1=(1 ? u)2; W = W  1:0): The nonparametric estimators have optimal p = 0:12 in the `exponential' case and p = 0:16

6.

References

Cleveland, W.S. (1979), "Robust Locally Weighted Regression and Smoothing Scatterplot", Journal of the American Statistical Association, 74, 829-836. Dunn, J.E. and Chen,T. (1992), "Critical evalution of the Di usion Hypothesis in the Theory of VOC Sources and Sinks". ASTM Symposium on Modeling of Indoor Air Quality and Exposure. Eubank,R.L. and Spiegelman, C.H. (1990), "Testing the Godness of Fit of a Linear Model Via Nonparametric Regression Techniques", Journal of the American Statistical Association, 74, 829-836.

The author would like to thank Professor J.E. Dunn for calling his attention to VOC emission modeling problems and for permission to use his research data from U.S.Environmental Protection Agency, Air and Energy Engineering Research Laboratory. Research Triangle Park,NC. 3

4

Fan, J. (1991), "Local Linear Regression Pollard, D. (1984), Convergence of Stochastic Smoothers and their Minimax Eciencies", sub- Processes. New York, NY: Springer-Verlag. Stone, C.J. (1977), "Consistent nonparametric mitted for publication. Greblicki, W., Krzy_zak, A., and Pawlak, M. regression", The Annals of Statistics, 5, 595-645. (1984), "Distribution-free pointwise consistency of kernel regression estimates". The Annals of Statistics, 12 1570-1575. Hall, P. and Jones, M. C. (1990), "Adaptive M-estimators in nonparametric regression", The Annals of Statistics, 18, 1712-1728. Hardle, W., Janssen, P., and Ser ing, R. (1988), "Strong Uniform Consistency Rates for Estimators of Conditional Functionals", The Annals of Statistics, 16, 1428-1449. Huber, P.J. (1967), "The behavior of maximum likelihood estimates under nonstandard conditions", In Proc. Fifth Berkeley Sympos. Math. Statist. Probab. 1, 221-233. Univ. of California Press, Berkeley. Kozek, A.S. (1984), "In uence curve for minimum distance estimators under supremum metrics", Statistics & Decisions, Supplement Issue No 1, 131-158. Kozek, A.S. (1990), "A nonparametric test of t of a linear model", Commun. Statist.-Theory Meth., 19, 169-179. Kozek, A.S. (1991), "A nonparametric test of t of a parametric model", Journal of Multivariate Analysis, 37, 66-75. Kozek, A.S. and Schuster, E.F. (1990), "Optimal quantile principle for selecting variable bandwidth in regression estimators", Proceedings of the Computer Science and Statistic: 22nd Annual Symposium on the Interface, ed. R. Le Page, pp. 401-405. Kozek, A.S. and Schuster E.F. (1991), "On `Fit the Short Curve' principle for smoothing nonparametric estimators" Proceeding of the Computer Science and Statistics: 23rd Annual Symposium on the Interface, pp.196-199. Macauley, F.R. (1931), "The smoothing of time series", New York: National Bureau of Economic Res. Parzen, E. (1990), "Uni cation of Statistical Methods for Continuous and Discrete Data", Proceedings of the Computer Science and Statistics: 22nd Annual Symposium on the Interface, ed. R. LePage, pp. 235-242. 5

Suggest Documents