A consistent estimator for nonlinear regression models - Springer Link

0 downloads 0 Views 389KB Size Report
F 032060/. 2000 and F 046061/2004 and by the Bolyai Grant of the Hungarian Academy of Sciences. A consistent estimator for nonlinear regression models. Г.
Metrika (2005) 62: 1–15 DOI: 10.1007/s001840400349

A consistent estimator for nonlinear regression models S. Baran Faculty of Informatics, University of Debrecen, P. O. Box 12, H-4010 Debrecen, Hungary (E-mail: [email protected]) Received October 2003

Abstract. In this paper an estimator for the general (nonlinear) regression model with random regressors is studied which is based on the Fourier transform of a certain weight function. Consistency and asymptotic normality of the estimator are established and simulation results are presented to illustrate the theoretical ones. Key words: Nonlinear regression, consistency, asymptotic normality, strong mixing, Fourier transform. 1. Introduction In statistical work one often encounters such situations when the linear model is not appropriate to describe the relationship between the observed processes. Throughout this paper we will deal with the model yi ¼ gðxi ; b0 Þ þ ei ;

i 2 N;

ð1Þ

where g is a known function and b0 denotes the true value of the unknown parameter b to be estimated on the basis of observations on yi and xi . The usual method of estimation in this case is the least squares one, however, the least squares estimator (LSE) is very sensitive to the outlying values of the error terms. For nonrandom regressors xi the strong consistency of the LSE was proved by Jennrich [12] under certain conditions for the sequence x1 ; x2 ;    and response function g. Malinvaud [14] proved consistency of the LSE under an another set of conditions and besides this he gave

*Supported by the Hungarian National Science Foundation OTKA under Grants No. F 032060/ 2000 and F 046061/2004 and by the Bolyai Grant of the Hungarian Academy of Sciences.

2

S. Baran

an example of a model where the LSE is not consistent. Both in [12] and [14] a compact parameter set was assumed. Bunke and Schmidt [6] considered special nonlinear models with a linear parameter vector which could take any value from the appropriate vector space, while La¨uter [13] generalized the results of Jennrich [12] to unbounded parameter spaces. Another popular estimator is the least absolute deviations estimator (LADE), that has been the subject of much attention because of its robust properties. Conditions for the consistency of the LADE are given in [15]. We remark that a good overview of the estimating methods can be found e.g. in [10] or in the recent monograph of Seber and Wald [17]. For the linear model, i.e. when gðx; bÞ ¼ x> b, and random regressors xi An et al. [1] introduced a new estimator of the unknown parameter, based on the Fourier transform of a symmetric weight function and they proved its strong consistency in case of i.i.d. observations ðyi ; xi Þ. In [3] the author generalized the estimator proposed in [1] for the linear measurement error model by using the so-called deconvolution method (see e.g. [18] or [2]), and assuming i.i.d. observations he proved its strong consistency. In [5] the results of [1] and [3] are generalized for the case of dependent (strong mixing) processes. In this paper we extend the method of An et al. [1] to the model (1). In Section 2 we describe the new estimator and we give some models where this estimator can be applied. Theorems about the consistency of this new estimator are formulated in Section 3, while Section 4 deals with the asymptotic normality. Proofs of the theorems of these two sections are given in Section 6. We remark, that these proofs in a more detailed form can be found in [4]. At the end, in Section 5 we present some simulation results where we compare the performance of our estimator with the performances of the LSE and LADE. We remark that through the paper we will use the same notation k    k for the Euclidean norm of vectors and for the Frobenius norm of matrices. 2. The estimator and the idea behind it Consider the model (1) and assume that the parameter set is p-dimensional: b 2 H  Rp ; xi is q-dimensional, yi and ei are scalar random variables (i 2 N) that have the same distribution as random variables x, y and e, respectively. Moreover, assume that sequences fxi g and fei g are independent of each other. Our results will be derived and analyzed under the following assumption: Assumption A. For all b 2 H; b 6¼ b0 , random variable gðx; bÞ  gðx; b0 Þ is non-degenerate, i.e.   Var gðx; bÞ  gðx; b0 Þ 6¼ 0: Assumption A is needed in order to ensure that our model can recognize the true value of the parameter b. Now, let wðtÞ be a continuous density kernel function that satisfies Z 1 wðtÞ ¼ wðtÞ  0; t > 0; and jtjwðtÞdt < 1 ð2Þ 1

and let uw ðvÞ denote the Fourier transform of w, that is Z 1 uw ðvÞ :¼ eitv wðtÞdt: 1

A consistent estimator for nonlinear regression models

3

Let us define the estimator bbn of b0 as the maximum point of n X n   1X u y‘  ys  ðgðx‘ ; bÞ  gðxs ; bÞÞ : n2 ‘¼1 s¼1 w

An ðbÞ :¼

ð3Þ

Here the kernel function wðtÞ should be chosen in such a way that its Fourier transform has a closed form. The usual kernels are the densities of the normal distribution Nð0; a2 Þ, the uniform distribution on ½a; a, the symmetric exponential (Laplace) distribution, that is wðtÞ :¼ ða=2Þeajtj , or a mixture of two Laplace distributions, e.g. a wðtÞ :¼ ð2eajtj  e2ajtj Þ: 3 The idea behind this method is the same as in [1]. Let

ð4Þ

uðt; bÞ :¼ E expðitðy  gðx; bÞÞ and Z

1

AðbÞ :¼

juðt; bÞj2 wðtÞdt:

1

As x and e are independent,       uðt; bÞ ¼ Eeite Eeitðgðx;b0 Þgðx;bÞÞ   Eeitðgðx;b0 Þgðx;bÞÞ   1; where by Assumption A, if b 6¼ b0 , the last term is strictly less then 1 except for countable many values of t. Hence, Aðb0 Þ ¼ sup AðbÞ > AðaÞ; b2Rp

for every a 6¼ b0 :

In fact, An ðbÞ is the empirical analogue of AðbÞ and, as we will see in Section 3, under certain conditions it converges to AðbÞ, uniformly in b. At the end, we give some examples of regression models, where Assumption A is satisfied. Example 1. Polynomial model. Let gðu; bÞ ¼ b0 þ b1 u þ b2 u2 þ    þ bp up ; where u; bi 2 R; i ¼ 0; 1;    ; p; and the parameter is b ¼ ðb0 ; b1 ;    ; bp Þ> , and assume that random vector ð1; x;    ; xp Þ> is non-degenerate (e.g. if x is absolutely continuous). Example 2. Trigonometric polynomial model. Let gðu; bÞ ¼

k a0 X þ a‘ cosð‘uÞ þ b‘ sinð‘uÞ; 2 ‘¼1

where u; ai ; bj 2 R; i ¼ 0; 1;    ; k; j ¼ 1; 2;    ; k; and the parameter is b ¼ ða0 ; a1 ; b1 ;    ; ak ; bk Þ> , and similarly to Example 1 assume that random vector ð1; cosðxÞ; sinðxÞ;    ; cosðkxÞ; sinðkxÞÞ> is non-degenerate.

4

S. Baran

Example 3. Exponential model. Let gðu; bÞ ¼ expðu> cÞs; where u; c 2 Rq ; s 2 R and the parameter is b ¼ ðc> ; sÞ> , and assume that x is a non-degenerate random vector. Example 4. Periodic model (sinusoid). Let gðu; bÞ ¼ R sinðxuÞ; where u; R; x 2 R and the parameter is b ¼ ðR; xÞ> , and assume that x is an absolutely continuous random variable. Example 5. Gaussian regression curve. Let gðu; bÞ ¼ expðku  mk2 =cÞs; where u; m 2 Rq , c and s are positive real numbers and the parameter is b ¼ ðm> ; c; sÞ> , and assume that x is a non-degenerate, absolutely continuous random vector with non-symmetric components. 3. Consistency In the present paper we assume that the sequences fxi g and fei g are strongly mixing. If A and B are r-algebras, then sup

aðA; BÞ :¼

jPðABÞ  PðAÞPðBÞj:

A2A;B2B

Let M‘k denote the r-algebra generated by fx> i ; ei : k  i  ‘g and let   aðnÞ :¼ sup a Mk1 ; M1 : kþn 1k i Þ; i 2 N, satisfy model equation (1) and Assumption A, the parameter set H is compact, g is continuous in b, E sup jgðx; bÞj < 1;

ð6Þ

b2H

b and mixing condition (5) is satisfied for ðx> i ; ei Þ; i 2 N. Then bn is a consistent estimator of the true parameter b0 . The proof of this theorem is based on the uniform law of large numbers (ULLN) (see e.g. [9]). Condition (6) and the compactness of H are needed in order to apply it. We remark that the latter is a natural requirement because in practical applications the parameters usually take their values from bounded sets. However, as e.g. for the LSE consistency can be proved without this assumption (see [13]), the case of unbounded parameter set H might be a direction of further research.

A consistent estimator for nonlinear regression models

5

Remark 1. If H is convex and g is differentiable with respect to b then Theorem 1 remains valid if (6) is changed for  @gðx; bÞ    ð7Þ E sup   < 1: @b b2H Example 6. Consider the special polynomial model gðu; bÞ ¼ ðu þ bÞ2 , where u 2 R and b 2 ½a; b  R; 1 < a < b0 < b < 1. Assume that Ejxj < 1, but the second moment of x is not finite. It is easy to see, that in this case (7) is satisfied, but (6) is not valid. Naturally, in the case of i.i.d observations ðyi ; x> i Þ; i 2 N, Theorem 1 remains valid without changes, so consistency of the estimator bbn is proved. However, in this case we can prove even strong consistency, as we are able to apply the following lemma, that can be found e.g. in [7, Appendix I, Theorem 2]. The statement can also be derived from the results discussed in [19, Chapter 2]. Lemma 1. Let G be a collection of subsets of Rm and for all M > 0 let   KM :¼ u ¼ ðu1 ;    ; um Þ> 2 Rm j kuk1 :¼ max jui j  M : 1im

Assume that for all d > 0 and D 2 G the d-neighborhood CdD of the boundary of D \ KM has Lebesgue measure kðCdD Þ  wðd; MÞ, where wðd; MÞ does not depend on D and for each fixed M, wðd; MÞ ! 0, as d ! 0. If n is an m-dimensional random vector that has an absolutely continuous distribution with respect to the m-dimensional Lebesgue measure, then G is a Glivenko-Cantelli class with respect to the distribution Pn of n, i.e.   lim sup Pn;n ðDÞ  Pn ðDÞ ¼ 0; a.s. n!1 D2G

where Pn;n is the empirical measure of the i.i.d. sample n1 ; n2 ;    ; nn drawn from the distribution Pn . The strong consistency result has the following form. > Theorem 2. Suppose that ðyi ; x> i Þ; i 2 N, are i.i.d observations on ðy; x Þ, where > the random vector ðy; x Þ satisfies model equation (1) and Assumption A, and its distribution is absolutely continuous with respect to the Lebesgue measure on Rqþ1 . Assume that the parameter set H is compact, gðu; bÞ is differentiable with is continuous in both of its variables. Moreover, assume respect to u and @gðu;bÞ @u that uw ðvÞ is differentiable and Z 1   duw ðvÞ  ð8Þ dv < 1:  dv 1 Then bb is a strongly consistent estimator of the true parameter b . n

0

Remark 2. Compactness of H, differentiability of g and condition on the distribution of ðy; x> Þ are needed to verify that conditions of Lemma 3 are satisfied for the collection G of sets

6

S. Baran

   Dðv; bÞ ¼ ðz; u> Þ 2 Rqþ1 z  gðu; bÞ < v ;

v 2 R; b 2 H;

ð9Þ

and in this way G is a Glivenko-Cantelli class with respect to the distribution of ðy; x> Þ. For example, in the linear model, i.e. if gðu; bÞ ¼ u> b, for all v 2 R and b 2 H the set Dðv; bÞ is a half plane of Rqþ1 , so G is automatically Glivenko-Cantelli (see e.g. [16]).

4. Asymptotic normality To prove asymptotic normality we need stronger conditions. The following mixing condition will be appropriate for our purposes: bða; mÞ :¼

1 X

am=ð2þmÞ ðkÞ < 1;

ð10Þ

k¼1

where m > 0. Observe that condition (10) implies (5). Wee are going to use the following CLT for mixing processes (see [11, Corollary 1]). N; be a centered vector valued stochastic process (i.e. Lemma 2. Let gi ; i 2 P Egi ¼ 0; i 2 NÞ; Sn ¼ ni¼1 gi and let Rn denote the variance matrix of Sn : Assume that mixing condition (10) is satisfied for some m > 0 and with the same m sup Ekgi k2þm < 1: i2N

Then lim supn n1

Pn

i;j¼1

kCovðgi ; gj Þk < 1. If, moreover,

lim n1 Rn ¼ R;

n!1

where R is a positive definite matrix, then ðRn Þ1=2 Sn ) Nð0; IÞ as n ! 1. Theorem 3. Suppose that ðyi ; x> i Þ; i 2 N; satisfy model equation (1) and Assumption A, the parameter set H is compact and convex, b0 is an interior point of H; g is twice differentiable with respect to b;  @gðx; b Þ  0 Var @b is a regular matrix and  @gðx; bÞ 2   E sup   < 1; @b b2H  @ 2 gðx; bÞ    E sup  >  < 1: @b@b b2H

ð11Þ ð12Þ

A consistent estimator for nonlinear regression models

7

Assume that limE

 @gðx; b Þ 2  1   jgðx; b1 Þ  gðx; b2 Þj ¼ 0;  @b kb1 b2 kc

ð13Þ

limE

 @ 2 gðx; b Þ   1   jgðx; b1 Þ  gðx; b2 Þj ¼ 0; @b@b> kb1 b2 kc

ð14Þ

limE

 @ 2 gðx; b Þ @ 2 gðx; b Þ   1 2   >  >  ¼ 0: @b@b @b@b kb1 b2 kc

ð15Þ

c!0

c!0

c!0

sup

sup

sup

Suppose also that for some positive constant m  @ 2 gðx; bÞ 1þ2m   E < 1;  @b@b>  @gðx; bÞ 2þ4m   < 1; E  @b

ð16Þ ð17Þ

for all b 2 H, and mixing condition (10) holds for ðx> i ; ei Þ; i 2 N, with the same constant. Moreover, assume that density kernel wðtÞ has finite third absolute moment, and n X  hðxk ; ek Þ ¼ R; ð18Þ lim n1 Var n!1

k¼1

where

Z hðu; vÞ : ¼

 @gðu; b Þ @gðx; b0 Þ  0 E sinðtvÞE cosðteÞ @b @b 1  cosðtvÞE sinðteÞ wðtÞdt; 1

t

u 2 Rq ; v 2 R, and R is a positive definite matrix. In this case, if bbn is consistent then it is asymptotically normal. Conditions (11) and (12) ensure that one can interchange expectation and differentiation, while conditions (13)–(15) are the continuity conditions in the ULLN. The other important tool of the proof is the Rosenthal inequality for mixing processes [8, Theorem 2, p. 26]. Conditions (16) and (17) are the appropriate moment conditions that are needed in order to apply it. Remark 3. Observe, that if g is twice continuously differentiable, then condition (12) implies (15). Moreover, conditions  @gðx; bÞ 3  @ 2 gðx; bÞ 2     and E sup  E sup   is in R2 , the original parameters are l0 ¼ m0 ¼ 0:5 and the samples on x ¼ ðxð1Þ ; xð2Þ Þ> and e are i.i.d. The components of the regressor x are both normal with mean 2 and variance 1 and they are independent of each other. Data are taken at 1  i  n, where n ranges from 15 to 60.

A consistent estimator for nonlinear regression models

11 γ

m 0.1

1.5

1 0.05 0.5

0

20

40

60

80

100

0

20

40

n

60

80

100

60

80

100

n

(a) normal error γ

m 2.5

7 6

2

5 1.5

4

1

3 2

0.5 0

1 20

40

60

80

100

0

20

40 n

n

(b) Cauchy error Fig. 4. The MSEs of the estimates of m and c in Example 8. Solid line: AE; dash-dot line: LADE; dotted line: LSE

We consider two different kernel functions wðtÞ: the density of the Nð0; a2 Þ distribution with tuning parameter a ¼ 0:1 and a mixture of two Laplace distributions defined by (4). In the latter case a ¼ 5. The error term e: a) has normal distribution with mean 0 and variance 9; b) has Bernoulli distribution with Pðe ¼ 4Þ ¼ Pðe ¼ 4Þ ¼ 12; c) e ¼ 3e , where e has Cauchy distribution with parameters ð0; 1Þ. On Figure 1 the means of the estimates of the two parameters are plotted versus the sample size, while Figure 2 shows the MSEs. These figures clearly show the consistency of the AE for both kernel functions in all of the three cases. For normal and Bernoulli error terms its performance is similar to the LSE and LADE but computationally it is not so efficient. In the Cauchy error case the LSE gives completely false estimates and in this way huge MSEs, so they are not indicated on Figures 1c and 2c. On Table 1 the means and the MSEs of this estimator are given for three different sample sizes. On figures 1c and 2c one can see that in the Cauchy error case, especially for large samples, the AE is a bit better than the LADE, so it can be well applied in models where the regression error has large variance.

12

S. Baran

Example 8. Consider a special case of the Gaussian regression model of Example 5   y ¼ 5 exp  ðx  mÞ2 =c þ e; where the parameter vector b ¼ ðm; cÞ> is in R2 and the original parameters are m0 ¼ 3; c0 ¼ 2. Data are taken at 1  i  n, where n ranges from 10 to 100. We consider a normal kernel function with tuning parameter a ¼ 2. The regressors are i.i.d normal with mean 4 and variance 1 and we have i.i.d. observations of the error term e, too. The distribution of e is: a) normal with mean 0 and variance 0:5; b) Cauchy with parameters ð0; 1Þ. Figure 3 shows the means of the estimates of the two parameters, while on Figure 4 the MSEs are plotted versus the sample size. Similarly to Example 5, one can see that when the variance of the regression error is small, then the performance of the three estimators are rather similar. However, Figures 3b and 4b clearly show the advantage of the AE in the Cauchy error case.

6. Proofs Proof of theorem 1. Let Un ðbÞ :¼ An ðbÞ  AðbÞ and let d > 0 and b 2 H fixed. Z  1 1 EjVn ðb; tÞj þ EjVn ðb; tÞj wðtÞdt; PðjUn ðbÞj > dÞ  d 1 where Vn ðb; tÞ :¼

n   1X eitðyk gðxk ;bÞÞ  Eeitðygðx;bÞÞ : n k¼1

Mixing condition (5) implies that lim EjVn ðb; tÞj ¼ 0 n!1

and

lim EjVn ðb; tÞj ¼ 0;

n!1

so An ðbÞ ! AðbÞ in probability as n ! 1. Moreover, for all d > 0 and c > 0, E

sup jUn ðb1 Þ  Un ðb2 Þj

kb1 b2 kc

 4E

sup kb1 b2 kc

jðgðx; b1 Þ  gðx; b2 Þj

Z

1

jtjwðtÞdt:

1

As gðu; bÞ is continuous in b and H is compact, lim

sup

c!0 kb b kc 1 2

jðgðu; b1 Þ  gðu; b2 Þj ¼ 0;

u 2 Rp ;

so by the ULLN An ðbÞ ! AðbÞ in probability as n ! 1 uniformly in b. This completes the proof. n

A consistent estimator for nonlinear regression models

13

Proof of Theorem 2. The proof of this theorem is based on the ideas given in [1] and [3]. Let Fb ðuÞ denote the distribution function of y  gðx; bÞ, where y ¼ gðx; b0 Þ þ e and denote by Fbb;n ðuÞ the empirical distribution function of the sample yi  gðxi ; bÞ, i ¼ 1; 2;    ; n. It is easy to see, that Z 1 Z 1 AðbÞ ¼ uw ðvÞdFb ðvÞ; An ðbÞ ¼ uw ðvÞd Fbb;n ðvÞ; 1

Fb ðvÞ :¼

Z

1

1

Fb ðv þ uÞdFb ðuÞ;

1

Fbb;n ðvÞ :¼

Z

1

1

Fbb;n ðv þ uÞd Fbb;n ðuÞ:

Using integration by parts we obtain sup jAn ðbÞ  AðbÞj  sup jFbb;n ðvÞ  Fb ðvÞj b

Z

v;b

  duw ðvÞ  dv;  dv 1 1

and it is not difficult to see that sup jFbb;n ðvÞ  Fb ðvÞj  2 sup jFbb;n ðvÞ  Fb ðvÞj: v;b

v;b

Hence, by condition (8), to show the almost sure convergence of bbn to b0 it is enough to prove lim sup jFbb;n ðvÞ  Fb ðvÞj ¼ 0;

n!1 v;b

a.s.

Consider the vector n ¼ ðy; x> Þ. For a given v 2 R and b 2 H the event fy  gðx; bÞ < vg is equivalent to the event fn 2 Dðv; bÞg, where Dðv; bÞ is the set defined by (9). Let KM be defined as in Lemma 1 with m ¼ q þ 1. For a given v 2 R and b 2 H the surface area of Dðv; bÞ \ KM is less than or equal to 2ðq þ 1Þð2MÞq þ .ðMÞ, where .ðMÞ is an upper bound of the area of the surface    ðz; u> Þ 2 Rqþ1 z  gðu; bÞ ¼ v; jzj  M; jui j  M; i ¼ 1;    ; q : The existence of such a function q is ensured by the compactness of H and continuity of @gðu;bÞ in b and u. Hence, conditions of Lemma 3 are fulfilled, @u that completes the proof. n Proof of theorem 3. Using Taylor series expansion and @An ðbbn Þ ¼0 @b we obtain pffiffiffi @An ðb0 Þ @ 2 An ðben Þ pffiffiffi b n ¼ nðbn  b0 Þ; @b @b@b> where ben is a point between bbn and b0 . Short calculation shows that @An ðb0 Þ ¼ Un þ Rn ; @b where

ð19Þ

14

S. Baran n 2X hðxk ; ek Þ; n k¼1 n X n Z 1   2X t g1 ðxk ; ek ; e‘ Þ  g2 ðxk ; ek ; e‘ Þ wðtÞdt; Rn :¼ 2 n k¼1 ‘¼1 1

Un :¼

and  @gðu; b Þ @gðx; b0 Þ  0 E sinðtv1 Þðcosðtv2 Þ  E cosðteÞÞ; @b @b  @gðu; b Þ @gðx; b0 Þ  0 E g2 ðu; v1 ; v2 Þ ¼ cosðtv1 Þðsinðtv2 Þ  E sinðteÞÞ; @b @b g1 ðu; v1 ; v2 Þ ¼

u 2 Rq ; v1 ; vp 2 2 ffiffiffi R. Rosenthal inequality and conditions (2) and (17) imply conditions (2), (17) and (18) that limn!1 nRn ¼ 0, in probability. Moreover, pffiffiffi ensure that we can apply Lemma 2, so nUn , and in this way the left hand side of (19) has a normal limit distribution. What concerns the right hand side of (19) an easy calculation shows that for all r; s 2 f1; 2;    ; pg @ 2 An ðbÞ

@ 2 AðbÞ ðr;sÞ ðr;sÞ ðr;sÞ  ¼ 2U1 ðbÞ þ 2U2 ðbÞ  2U3 ðbÞ; ðrÞ ðsÞ ðrÞ ðsÞ @b @b @b @b

ð20Þ

where

 X n  1 n @ 2 gðxk ; bÞ itðyk gðxk ;bÞÞ  1 X itðy‘ gðx‘ ;bÞÞ :¼ it e e n k¼1 @bðrÞ @bðsÞ n 1 ‘¼1 2 @ gðx; bÞ  E ðrÞ ðsÞ eitðygðx;bÞÞ Eeitðygðx;bÞÞ wðtÞdt; @b @b Z 1  X 1 n @gðxk ; bÞ itðyk gðxk ;bÞÞ  ðr;sÞ t2 e U2 ðbÞ :¼ n k¼1 @bðrÞ 1 n 1X @gðx‘ ; bÞ itðy‘ gðx‘ ;bÞÞ 

e n ‘¼1 @bðsÞ @gðx; bÞ itðygðx;bÞÞ @gðx; bÞ itðygðx;bÞÞ e E e E wðtÞdt; @bðrÞ @bðsÞ Z 1  X 1 n @gðxk ; bÞ @gðxk ; bÞ itðyk gðxk ;bÞÞ  ðr;sÞ U3 ðbÞ :¼ t2 e n k¼1 @bðrÞ @bðsÞ 1 n 1X 

eitðy‘ gðx‘ ;bÞÞ n ‘¼1 @gðx; bÞ @gðx; bÞ itðygðx;bÞÞ itðygðx;bÞÞ e Ee E wðtÞdt: @bðrÞ @bðsÞ ðr;sÞ U1 ðbÞ

Z

1

Using the Rosenthal inequality, the ULLN, the finiteness of the third absolute moment of wðtÞ and conditions (11)–(17) one can show that each com-

A consistent estimator for nonlinear regression models

15

ponent of the right hand side of (20) converges to 0 uniformly in b. Hence, as bbn is a consistent estimator of b0 , we have  @gðx; b Þ Z 1 @ 2 An ðben Þ @ 2 Aðb0 Þ 0 ¼ ¼2 Var t2 ue ðtÞue ðtÞwðtÞdt ð21Þ lim n!1 @b@b> @b @b@b> 1 in probability, where ue ðtÞ denotes the characteristic function of e. According to the conditions of the theorem the right hand side of (21) is a regular matrix, that completes the proof. n Acknowledgements. I am grateful to my wife A´gnes Baran for her suggestions and remarks concerning the numerical part of the computer simulations. I would like to thank also Istva´n Fazekas for helpful discussions about this topic and for his useful remarks. I also indebted to the unknown referee and to the editor for their valuable comments and remarks.

References 1. An H-Z, Hickernell FJ, Zhu, L-X (1997) A new class of consistent estimators for stochastic linear regressive models. J Multivariate Anal 63:242–258 2. Baran S (2000) A consistent estimator in general functional errors-in-variables models. Metrika 51:117–132 3. Baran S (2001) A new consistent estimator for linear errors-in-variables models. Comput Math Appl 41:821–833 4. Baran S (2003) A new estimator for nonlinear regression models. Institute of Mathematics Informatics, University of Debrecen, Preprint No. 308 (Technical Report No. 2003/12) 5. Baran S (2004) A consistent estimator for linear models with dependent observations. Comm. Statist. Theory Methods 33 (to appear) 6. Bunke H, Schmidt WH (1980) Asymptotic results on nonlinear approximation of regression functions and weighted least squares. Math Operationsforsch Statist 11:9–22. 7. Borovkov AA (1984) Mathematical Statistics. Nauka, Moscow (in Russian) 8. Doukhan P (1994) Mixing. Properties and Examples. Lecture Notes in Statistics 85, Springer, New York 9. Fazekas I, Kukush AG (1997) Asymptotic properties of an estimator in nonlinear functional errors-in-variables models with dependent error terms. Comput Math Appl 34:23–39 10. Gallant AR (1987) Nonlinear Statistical Models. Wiley, New York. 11. Herrndorf N (1984) A functional central limit theorem for weakly dependent sequences of random variables. Ann Probability 12:141–153 12. Jennrich RI (1969) Asymptotic properties of non-linear least squares estimators. Ann Math Statist 40:633–643 13. La¨uter H (1989) Note on the strong consistency of the least squares estimator in nonlinear regression. Statistics 20:199–210 14. Malinvaud E (1970) The consistency of nonlinear regressions. Ann Math Statist 41:956–969 15. Oberhofer W (1982) The consistency of nonlinear regression minimizing the L1 -norm. Ann Stat 10:316–319 16. Pollard D (1984) Convergence of Stochastic Processes. Springer, New York 17. Seber GAF, Wild CJ (2003) Nonlinear Regression. Wiley, New Jersey 18. Stefanski LA (1989) Unbiased estimation of a nonlinear function of a normal mean with application to measurement error models. Comm Statist Theory Method 18:4335–4358 19. van der Vaart AW, Wellner JA (1996) Weak Convergence and Empirical Processes. With Applications to Statistics. Springer, New York