MARTINGALE MODELS OF STOCHASTIC APPROXIMATION AND ...

3 downloads 0 Views 249KB Size Report
called Robbins–Monro method (see [5, pp. 238–252] or [15]). The development of this .... differential equation with respect to the special semimartin- gale a + m:.
THEORY PROBAB. APPL. Vol. 44, No. 2

Translated from Russian Journal

MARTINGALE MODELS OF STOCHASTIC APPROXIMATION AND THEIR CONVERGENCE∗ E. VALKEILA† AND A. V. MELNIKOV‡

(Translated by M. V. Khatuntseva) Abstract. Procedures of stochastic approximation are studied from a general theory of stochastic processes point of view. The results on convergence are obtained by the uniform methods both in the case of discrete and of continuous time. The asymptotic analysis (a.s. convergence, asymptotic normality) of procedures is based on the Lyapunov stochastic method, and a study of the rate of convergence of algorithms of stochastic approximation is based on the law of iterated logarithm for martingales. Key words. stochastic approximation, martingale methods, stochastic exponents, stochastic Lyapunov method PII. S0040585X97977549

1. Introduction. After the publication in 1951 of the famous study of Robbins and Monro, introducing “a stochastic component” in the classic Newton tangent method, stochastic approximation firmly entered into the theory of probability and its applications. The principal part in the investigation of procedures of stochastic approximation involves the different types of their stability (“in probability,” “almost sure,” “in distribution”). The Lyapunov method was a conception of many studies, and the methods used were based on the theory of diffusion (Markov) processes, their relations with differential equations (continuous models) and central limit theorems, and law of large numbers (discrete models). From our point of view, the adequate result of such 20 years of investigation was summarized by Nevel’son and Hasminskii [15]. In this monograph, where diffusion and discrete models of stochastic approximation are studied separately, the principal generality of the methods used is already seen. A more careful analysis of this monograph and a vast number of other publications (we mention the well-known monographs of Kushner, Tsypkin, etc.) say that in both models martingale limit convergence theorems are the basis of the study of stability, and the main technical method is the Kolmogorov–Itˆ o formula. The development of stochastic calculus in the 1970s and 1980s gave a fine technique for the introduction and detailed study of asymptotic properties of models of stochastic approximation which do not distinguish continuous and discrete time. Namely, these circumstances define the line of this paper and its place in an essentially effuse area called stochastic approximation. ∗ Received by the editors July 17, 1997; revised November 11, 1998. This work was supported by the Research Grants Committee of the University of Helsinki, by Russian Foundation for Basic Research grants 96-15-96033 and 96-01-01929, and by INTAS-RFBR grant 95-0061. http://www.siam.org/journals/tvp/44-2/97754.html † Department of Mathematics, University of Helsinki, P. O. Box 4, FIN-00014 Finland ([email protected]). ‡ Steklov Mathematical Institute RAN, Gubkin St. 8, 117966 Moscow, Russia ([email protected]).

333

334

E. VALKEILA AND A. V. MELNIKOV

1.1. Let us consider the following equation: (1.1)

R(x) = 0,

where R is a continuous function on the real line, R: R1 → R1 . Assume that (1.1) has a unique solution x . In what follows we consider that the function R satisfies the following condition: R.0 For all x = x the inequality (x − x )R(x) > 0 holds. If we know the function R explicitly, then we can either solve (1.1) or at least give its numerical approximation. For example, if the function R is continuously differentiable, then we can use the Newton method (see, for example, [20, p. 209]). However, as a rule we do not know the function R completely and, moreover, our observations of the values of the function R contain errors. This often happens in problems in mathematical statistics (regression analysis, recursive estimation, etc.) in the case when (1.1) is solved with the help of stochastic approximation, the socalled Robbins–Monro method (see [5, pp. 238–252] or [15]). The development of this method leads to the following procedure for solving (1.1):  t  t θt = θ0 − (1.2) R(θs− )γs das − γs dms , 0

0

where a is a predictable increasing process, γ is a positive predictable process, and m is a locally square integrable martingale. First, we give the content of the paper, and then we characterize a spectrum of the works concerning this subject. In the next section we show that under some additional assumptions on γ and m the procedure converges almost surely to the solution x . We shall consider therein the convergence of the standard procedure, where γ = (1 + a− )−1 , and the convergence of the procedure with slowly varying weights, where γ = (1 + a− )−r with 0 < r < 1. The proof is based on two facts: a suitable equivalent for (1.2) of the stochastic Lyapunov method, and a modification of results on convergence of positive semimartingales in [10] and [21]. Our approach gives information on the rate of convergence of the approximation under consideration. In section 3 we prove the asymptotic normality of the standard procedure and the procedure with slowly varying weights under the assumption that a and γ are deterministic. The proof of the asymptotic normality is based on the central limit theorem for square integrable martingales modified for our case. In section 4 we study the approximation procedures with averaging. We prove the convergence a.s. of the procedure with averaging and study the asymptotic normality. In section 5 we study the explicit approximation behavior of the passes of the procedure (1.2). The method proposed first by Gaposhkin and Krasulina [24] for a discrete model is based on the law of iterated logarithm for locally square integrable martingales. In section 6 we give examples necessary to illustrate the results obtained. Throughout the paper we use stochastic exponents defined with respect to an increasing function. In this connection all needed background and some attractive properties of these exponents which we have not found in the literature are given in the appendix. 1.2. Stochastic approximation algorithms as strong solutions of stochastic differential equations with respect to semimartingales and the method of stochastic exponents for studying their convergence were proposed by Melnikov [12] and were

MARTINGALE MODELS OF STOCHASTIC APPROXIMATION

335

extended in [13], [14], and other papers. Le Breton and Novikov [8], [9] studied stochastic approximation in the case when the function R is linear, in the special case of Gaussian errors (convergence a.s., asymptotic normality, the law of iterated logarithm). A detailed analysis of a nonlinear model (1.2) with Gaussian errors was given in [13]. In this paper we ask only that the martingale m in model (1.2) be square integrable, and this is the primary contribution of this paper compared with [12], [13], and [14]. Systematic use of the method of stochastic exponents as a method of separate interest is specific here as well as in [12], [13], and [14]. We also note [23], where the convergence (a.s.) of the more general procedure concerning (1.2) is studied. 2. Convergence of the algorithm. 2.1. General definitions. 2.1.1. Suppose that all procedures are right continuous and have left limits. In what follows we shall use the following notation: If f is right continuous and has left limits (i.e., c` adl` ag), we set fu− = lims↑u and denote by ∆ft = ft − ft− the jump of the function f . Let k be a nonnegative function with the property ks ∆as < 1 for all s  0. Set t (k · a)t = 0 ks das , where the integral is the ordinary Lebesgue–Stieltjes integral. Recall (see [10]), that a unique solution of the linear differential equation dYt = −Yt− kt dat , is given by (2.1)

Et (−k · a) = e(−k·a)t



Y0 = 1,

(1 − ks ∆as )eks ∆as .

st

2. Let us give some additional facts needed for understanding (1.2). For this aim we need some definitions and statements from the general theory of stochastic procedures (see [7] and [10] for more details). Let us fix a stochastic basis (Ω, F, F, P). Recall that predictable processes are measurable with respect to the sigma-algebra P, generated by all left continuous processes defined on the basis (Ω, F, F, P). If m is a locally square integrable martingale, denote by m, m a predictable process of bounded variation, which is a compensator m2 in the sense that the process m2 − m, m is a local martingale. If h is a predictable process with h2 ·m, m ∞ < ∞ (P-a.s.), then h·m is a stochastic integral, which is also a locally square integrable martingale. If m is a local martingale, denote d by mc the continuous martingale part of m and purely discontinuous com by m its c c ponent. We define by [m, m]t = m , m t + st (∆ms )2 the “bracket” of m. If m is a locally square integrable martingale, then [m, m] − m, m is a local martingale. Finally, X is a semimartingale if X = X0 + A + M , where A is a process with a locally bounded variation and M is a local martingale. A semimartingale X is called a special semimartingale if X = X0 + A + M , where A is a predictable process. Let a with a0 = 0 be a predictable increasing process, γ be a positive predictable process, and m be a locally square integrable martingale. The general stochastic approximation procedure is a unique strong solution of the following stochastic differential equation with respect to the special semimartingale a + m:  t  t θt = θ0 − R(θs− )γs das − γs dms . 0

0

336

E. VALKEILA AND A. V. MELNIKOV

2.2. The convergence of the stochastic approximation procedure. 2.2.1. Let the two following assumptions be satisfied: R.1 The functions k and a are such that 0 < Et (−k · a) → 0 as t → ∞. R.2.T The function R satisfies the condition kt (x − x )2 + γt2 ∆at R2 (x)  2γt (x − x ) R(x) for all t  T , x ∈ R, where T  0. The procedure θ = θ(γ, a) depends on predictable processes γ and a. We say that the square integrable martingale m is a noise of the procedure θ(γ, a), and the process k from conditions R.1 and R.2.T is a Lyapunov control parameter of the procedure θ(γ, a). Of course, there are many stochastic approximation procedures for the problem “R(x) = 0,” as well as many Lyapunov control parameters for a given approximation procedure θ(γ, a). In addition to conditions R.1 and R.2.T the following condition on the noise level is necessary: M The locally square integrable martingale m is such that  ∞  −1 2 Es (−k · a) γs dm, m s < ∞. 0

If X is a random process, we denote by {X →} a set of ω where the path Xt (ω) has a finite limit as t → ∞. By assumption, conditions R.1, R.2.T, and M are satisfied almost surely with respect to the measure P. Let T be a finite predictable stopping time. 2.2.2. The following theorem is a reformulation of Theorem 3.1 in [14, p. 190], where condition R.2.0 (condition R.2.T when I = 0) was used instead of the more natural condition R.2.T. That is why we give a brief proof. Theorem 2.1. Let conditions R.0, R.1, R.2.T, and M be fulfilled. Then as t→∞ θt → x

(2.2)

a.s.

Moreover, for almost all ω ∈ Ω and all t  0 (θt − x )2  C(ω) Et (−k · a)

(2.3)

(see the proof for the improving constant C). Proof. We apply the Itˆ o formula to Zt := (Et (−k · a))−1 (θt − x )2 and obtain Zt = (θ0 − x )2 + At + Bt + Mt ,

(2.4) where At =

 0

 Bt =

0

t

t

(Es (−k · a))−1 γs2 dm, m s ,   (Es (−k · a))−1 ks (θs− − x )2 + ∆as γs2 R2 (θs− ) − 2γs (θs− − x )R(θs− ) das .

We do not specify the local martingale M (see [14, p. 189]) in (2.4). Before we continue, let us consider a lemma concerning properties of the convergence of nonnegative martingales, which is a modification of the Spreij result (see [21,

MARTINGALE MODELS OF STOCHASTIC APPROXIMATION

337

p. 245], [18], and [10, p. 115], where the first general results of such a kind are proposed). Thereby it is useful to mention [23], in which the general Spreij result is proved with the help of the method proposed in [21]. Lemma 2.1. Suppose that the special martingale X has the following properties: (i) Xt  0 for all t  0; (ii) Xt = X0 + At − Bt + Mt , where A is a nonnegative predictable increasing process, B is a predictable process with bounded variation such that there exists a finite predictable stopping time T such that Bt  Bs  BT for all t  s  T , i.e., B increases after T , and M is a local martingale. Then (a.s.) (2.5)

{A∞ < ∞} ⊂ {X →, B →}.

The proof is obvious. The process X − A is a supermartingale bounded from below after the stopping time T and so it converges almost surely. Using condition R.2.T in (2.4), we see that the positive martingale satisfies (ii). From condition M it follows that the positive semimartingale Z converges almost ∗ surely. Since the positive semimartingale Z converges, then Z∞ := supt |Z|t < ∞. ∗ So we obtain (2.3) with C = Z∞ and condition R.1 implies (2.2). Theorem 2.1 is proved. 2.3. Convergence of the standard procedure and the procedure with “the slowly varying weights.” 2.3.1. Let us consider the standard procedure, i.e., the case where γ = (1+a− )−1 . We introduce the following additional conditions on the function R. R.3 The function R satisfies condition R(x) = β(x − x ) + U (x − x ), where U (y) = O(y 2 ) as y → 0. In addition, we suppose that the procedure θ(γ, a) has the following special structure: the weight γ is of the form γ = α/(1 + a− ), where α > 0 is a “free” constant; t the Lyapunov control parameter k is of the form k = βγ, i.e., (k · a)t = 0 αβ/ (1 + as− ) das , where the constant β can be found from R.3; constants α, β are such that (2.6)

αβ < 1 < 2αβ,

and all together with a satisfy condition αβ∆a/(1 + a− ) < 1. As a rule, at = t or at = t, but this is not necessary in this case. In what follows we suppose that (2.7)

a∞ = ∞ and

[γ · a, γ · a]∞ < ∞.

Furthermore, we have (for details see the appendix) Et (−βγ · a) = (1 + at )−αβ πt (−αβ),

 where πt (δ) = st (1 + δ∆vs )/(1 + ∆vs )δ and v = (1 + a− )−1 · a. Since αβ∆v < 1, we have 0 < π∞ (−αβ) < ∞. We write E(−βγ · a) = E(−αβ). Let us study the convergence of the procedure θ(γ, α). From condition (2.7) we obtain R.1. Suppose that after some fixed time t0 the following inequality is fulfilled: (2.8)

β(x − x )2 +

α ∆at R2 (x)  2R(x) (x − x ) 1 + at−

338

E. VALKEILA AND A. V. MELNIKOV

for all x ∈ R; i.e., condition R.2.T is satisfied. Since a∞ = ∞, this is an insignificant limitation on the “free” constant α. Instead of condition M we suppose that the increasing process m, m is absolutely continuous with respect to the increasing function a and 0 < σt2  C < ∞

(2.9)

with σt2 := dm, m t /(dat ). Furthermore, in view of (2.6) and the results of the appendix  ∞  ∞  −1 2 Et (αβ) γt dm, m t  C(α, β, a) (1 + at− )αβ−2 dat < ∞, 0

0

and we obtain condition M. So, we have proven the following theorem for the standard procedure. Theorem 2.2. Let conditions (2.6), (2.7), (2.9), (2.8) be satisfied after some t0 and R.3 hold. The standard procedure converges almost surely, and for all t and almost all ω |θt − x |  C(ω) (1 + at )−αβ/2 . 2.3.2. Let us consider a procedure with “slowly varying weights,” i.e., the case γ = (1 + a− )−r with some 0 < r < 1. Let γ(r) = α(1+a− )−r and let the function R satisfy R.0 and R.3. We choose the Lyapunov control parameter k = −αβ(1+a− )−1 . Suppose that R.2.T is fulfilled, i.e., (2.10)

αβ ∆at 2(x − x ) R(x) (x − x )2 + R2 (x)  2r 1 + at− (1 + at− ) (1 + at− )r

after some fixed t0 > 0. Note that this fact weakens the assumption on the function R if we choose the parameter r small enough. Let the martingale m be such that m, m = σ 2 · a, where σt2  C < ∞ for all t  0 as above. Suppose that “free” parameters α, r together with β satisfy the inequality (2.11)

1−

r < αβ < 2r − 1. 2

Since Et (−αβ) = (1 +at )−αβ πt (−αβ), using the above assumption on α and r ∞ in (2.11), we obtain that 0 (Et (−αβ))−1 (1 + at− )2r dat < ∞. So, condition M is fulfilled, and we obtain the following theorem. Theorem 2.3. Let (2.7), (2.10) be satisfied after some t0 , (2.11), (2.9), and R.3 hold. Then the procedure with “slowly varying weights” converges (a.s.) with the rate |θt − x |  C(ω)(1 + at )−αβ/2 . Remark 2.1. The rate of convergence in Theorem 2.3 also depends on r in view of limitation (2.11). 3. Asymptotic normality. In this section we study the asymptotic normality of the standard procedure and the procedure with slowly varying weights. We assume that the increasing a is deterministic, and hence γ and γ(r) are deterministic too.

MARTINGALE MODELS OF STOCHASTIC APPROXIMATION

339

3.1. Asymptotic normality of the standard procedure. 3.1.1. We start with a description of the asymptotic normality of procedure (1.2). To prove the asymptotic normality, we suppose that P

σt2 −→ σ 2 ,

(3.1) P

where σ 2 > 0 is a constant and −→ means convergence in probability. Set gst (α; a) = (1 + at )α−1/2 (1 + as∧t− )1−α , where s, t  0. Note that limt gst (αβ; a) = ∞ and gst (αβ; a) increases in s. Define the process L(ε, g t (αβ; a)) by   t   t  −2 gs (αβ; a) x2 ν m (ds, dx). Lt ε, g t (αβ; a) = 0

{|x|>εgst (αβ; a)}

Let the jumps of the noise m satisfy the following Lindeberg condition: P L (L(ε, g t (αβ; a)))t −→ 0 for any ε > 0 and as t → ∞. The following theorem is the general result of this section. Theorem 3.1. Let (3.1), (2.6), (2.7), (2.9), (2.8) be satisfied after some t0 , R.3, and L hold. Then  α2 σ 2 d (1 + at )1/2 (θt − x ) −→ N 0, (3.2) , 2αβ − 1 d

where −→ denotes convergence in distribution and N (µ, σ 2 ) is a normal random variable with a mean µ and a variance σ 2 . 3.1.2. Before we start the proof, let us describe the method we use to prove the theorem. −1/(2αβ) (−αβ). Recall that Set ht = Et −1/(2αβ)

Et

(−αβ) = (1 + at )1/2 (πt (−αβ))−1/(2αβ) .

We write the normalized deviation in the following form: Yt = ht (θt − x ) = ft (θ0 − x ) + ft Mt + ft At 1−1/(2αβ)

with ft = Et (−αβ). Here M is a locally square integrable martingale and A is a predictable process. Note that ft = (1 + at )1/2−αβ (πt (−αβ))1−1/(2αβ) −→ 0 as t → ∞. Let us fix t and define the process Y (t, s) as Y (t, s) = ft (θ0 − x ) + ft Mst + ft Ats .

(3.3)

Here X t is a stopped process: Xst = Xs∧t . The function f is a deterministic function and hence the process s → ft Mst is a square integrable martingale and s → ft Ats is a predictable process. Let us now consider a sequence (tn ) with tn ↑ ∞ as n → ∞. We want to show d

that Ytn −→ N (0, C 2 ) as n → ∞. Since Ytn = Y (tn , tn ), it is necessary to prove that d

Y (tn , tn ) −→ N (0, C 2 ).

340

E. VALKEILA AND A. V. MELNIKOV

Finally, making the time change t → t/tn , we obtain a sequence of special semimartingales on the interval [0, 1], defined as Yun = Y0n + Mun + Anu , tn where Y0n = ftn (θ0 − x ), Anu = ftn Atutn n , and Mun = Mut . Here the filtration is n n n defined as Fu = Futn . Obviously, Y1 = Ytn and we prove the asymptotic normality P

P

d

of Ytn by showing Y0n −→ 0, An1 −→ 0, and M1n −→ N (0, C 2 ). From these facts the convergence of (3.2) follows. 3.1.3. We start with the proof of Theorem 3.1. In what follows we shall often use the following simple lemma, which can be easily proved with the help of integration by parts using (1.2) and the fact that  t  −1 Et (−αβ) =1+β (Es (−αβ))−1 γs das . 0

Lemma 3.1. Let the function R satisfy condition R.3. Then  t  −1  −1   Et (−αβ) (θt − x ) = θ0 − x − γs U (θs− ) Es (−αβ) das 0  t  −1 − γs Es (−αβ) (3.4) dms . 0

For simplicity, we assume that x = 0. We study the asymptotic normality of the expression −1/(2αβ)

(3.5)

Et

(−αβ) θt . 1−1/(2αβ)

Multiply θt by Et−1 (αβ), use Lemma 3.1, and multiply the result by Et and we obtain the following equality: (3.6)

−1/(2αβ)

Et

(−αβ),

(−αβ) θt = A0 (t) − At − Vt ,

where 1−1/(2αβ)

A0 (t) = Et

(−αβ) θ0 ,  t 1−1/(2αβ) (−αβ) Es−1 (−αβ)U (θs− )γs das , At = Et 0  t 1−1/(2αβ) Vt = Et (−αβ) Es−1 (−αβ)γs dms . 0

By assumption (2.6) 1 − 1/(2αβ) > 0 and, hence, using (2.7) we obtain that  1−1/(2αβ) A0 (t) = (1 + at )−αβ πt (−αβ) (3.7) −→ 0 as t → ∞. The next step is a proof of the convergence of At → 0 (a.s.) as t → ∞. Recall ∗ < ∞. that Zt = (θt − x )2 (Et (−αβ))−1 and Z∞ Lemma 3.2 (see [15, p. 142] and [14, Lemma 3.1, p. 191]). Let conditions R.0, R.1, R.2.T, and M be fulfilled. Then for any ε > 0 and δ > 0 there exists a deterministic time T = T (ε, δ) such that

P ω: sup θ2 < δ  1 − ε. (3.8) tT

341

MARTINGALE MODELS OF STOCHASTIC APPROXIMATION

∗ ∗ Proof. Obviously, suptT θt2  ET (−αβ) Z∞ . The random variable Z∞ is bounded (a.s.); from condition R.1 it follows that Et (−αβ) −→ 0. So since βγ · a is a deterministic function, we can find a number T := T (δ) such that

2

 δET−1 (−αβ) < ε, P sup θ2  δ  P Z∞ tT

i.e., we have (3.8). Now use Lemma 3.2 and choose a δ > 0 such that |U (x)|  c|x|2 , and, for ε > 0, let T be as in (3.8). Then we can write  T   1−1/(2αβ) |At |  Et (−αβ) γs Es−1 (−αβ) U (θs− ) das 0  t   + γs Es−1 (−αβ) U (θs− ) das . T

The time T is fixed, so using condition R.1, we obtain as in (3.7) that 1−1/(2αβ)

Et

 (−αβ)

0

T

  γs Es−1 (−αβ) U (θs− ) das −→ 0

a.s.

Using (2.3), on the set {suptT θt2 < δ} we obtain 1−1/(2αβ)

Et

 (−αβ)

t

T

  1−1/(2αβ) ∗ γs Es−1 (−αβ) U (θs− ) das  Et (−αβ) cZ∞ 1−1/(2αβ)

In view of Et (−B)  e−Bt , we obtain Et

(−αβ)

P

t 0

 0

t

γs das .

γs das −→ 0 a.s, and by

Lemma 3.2 we have At −→ 0. Hence, by the asymptotic normality of the procedure (1.2) it is sufficient to study the asymptotic normality of the expression  t 1−1/(2αβ) Vt = Et (−αβ) Es−1 (−αβ) γs dms . 0

We return now to the method described in subsection 3.1.2 (see above). Choose tn ↑ ∞ and consider the sequence Y1n = ftn θ0 + An1 + M1n . We have shown that P ftn θ0 → 0 and An1 −→ 0 as n → ∞. To finish, we must show that (3.9)

P

M n , M n 1 −→ 

(3.10)

0

1

α2 σ 2 (π∞ (−αβ))−1/(αβ) , 2αβ − 1

 {|x|>ε}

n

P

x2 ν M (ds, dx) −→ 0.

It follows from the central limit theorem for locally square integrable martingales that  α2 σ 2 n d −1/(αβ) π∞ (−αβ) M1 −→ N 0, 2αβ − 1 (see [10, Theorem 5.5.4, p. 314]).

342

E. VALKEILA AND A. V. MELNIKOV

It is obvious that 2−1/(αβ)

M n , M n 1 = Etn

 (−αβ)

tn

0

γs2 Es−2 (−αβ) dm, m s .

Note that E 2−1/(αβ) (−αβ) = (1 + at )1−2αβ (πt (−αβ))2−1/(αβ) . Since (1 + at )1−2αβ −→ 0 as t → ∞, it follows that the asymptotic behavior of the integral  t 2−1/(αβ) Et (−αβ) γs2 Es−2 (−αβ) dm, m s 0

depends on the tail behavior of the integral. Hence we can replace the study of the limit of this integral by the respective study for this integral  t 2−1/(αβ) Et (3.11) (−αβ) γs2 Es−2 (−αβ) σ 2 das , 0

where we used condition (3.1) to obtain this statement. Lemma A.3 in the appendix implies that  t α2 σ 2 P 2−1/(αβ) (−αβ) γs2 Es−2 (−αβ) dm, m s −→ (3.12) Et π∞ (−αβ)−1/(αβ) . 2αβ − 1 0 Hence condition (3.9) is valid for the sequence M1n . Let us show that condition (3.10) is valid too. Write sn = stn and note first that   −2 (3.13) (∆msn )2 , (∆Msn )2  C π(−αβ), a gst n (αβ; a) where the constant C is such that 0 < C(π(−αβ), a) < ∞. From inequality (3.13) it follows that



(3.14) 1 |∆Msn | > ε  1 |∆msn | > C −1 gst (αβ; a)ε . Let us use relations (3.13) and (3.14) to obtain  1 n x2 ν M (ds, dx) 0

(3.15)

{|x|>ε}

   tn −2  tn   K π(αβ), a g (αβ; a) · L ε , g (αβ; a) , tn

with ε = εC −1 and a finite constant K. So (3.10) follows from condition L. We have shown that  α2 σ 2 d −1/(2αβ) −1/(αβ) Et θt −→ N 0, (−αβ) . π 2αβ − 1 ∞ −1/(2αβ)

1/(2αβ)

However, (1+at )1/2 = Et (−αβ) πt (−αβ); thus statement (3.2) is proved. Remark 3.1 (see [14] and [13]). Let m be a Gaussian martingale with independent increments and m, m = σ 2 · a, σt2 → σ 2 , as t → ∞. In this case for proving the asymptotic normality of the procedure θ(α/(1 + a− ) · a, a) no condition is needed on the jumps of the martingale m.

MARTINGALE MODELS OF STOCHASTIC APPROXIMATION

343

3.2. The asymptotic normality of the procedure with “slowly varying weights.” 3.2.1. We study the asymptotic normality of the procedure with slowly varying weights. Let v(r) = (1 + a− )r · a. Suppose that   v(r), v(r) ∞ < ∞. (3.16) We have

    αβ 1−r E − αβv(r) = exp − (1 + at ) − 1 ρ − αβ; v(r) , 1−r 

(3.17) where

   ρ − αβ; v(r) = 1− 

αβ (1 + a− )r



 exp

 αβ 1−r . ∆ (1 + a) 1−r

Using condition (3.16), we have 0 < ρ∞ (−αβ; v(r)) < ∞ (see Lemma A.4 in the appendix). In what follows we shall use the notation E(−αβv(r)) = E(−αβ; r). Define the function ht := ht (α, r; a) by  −1 hts = (1 + at )−r/2 Et (−α; r) Es∧t (−α; r) (1 + as∧t− )r . Theorem 3.2. Let the function a be deterministic and γ(r) = α/(1 + a− )−r . Let conditions (2.7), (2.10) be fulfilled after some t0 , (2.11), (2.9), R.3, (3.1), (3.16) be fulfilled, and condition L be fulfilled with ht (αβ, r; a) replacing g t (αβ; a). Then  ασ 2 d r/2 (1 + at ) (θt − θ) −→ N 0, (3.18) as t → ∞. 2β Proof. Multiplying θt − θ by (E(−αβ; r))−1 , we obtain  −1 Et (−αβ; r) (3.19) (θt − θ) = (θ0 − θ) − At − Mt , where

 At =

0

 Mt =

0

t



−1 Es (−αβ; r)

α U (θs− ) das , (1 + as− )r

t



−1 Es (−αβ; r)

α dms . (1 + as− )r

Multiplying (3.19) by Bt = (1 + at )r/2 (Et (−αβ; r)), we obtain (1 + at )r/2 (θt − θ) = Bt (θ0 − θ) − Bt At − Bt Mt .

(3.20)

By Lemma A.4, Bt → 0 as t → ∞ and, hence, Bt (θ0 − θ) −→ 0. The remaining steps are similar to the steps in the proof of Theorem 3.1, and we give only the main steps. We use Lemma 3.2 to choose the fixed time such that P{suptT θt2 < δ}  1 − ε, ∗ where ε, δ > 0. Further, if t  T , then |U (θt− )|  Z∞ Et (−αβ), and we obtain that P

Bt At −→ 0 (see Lemma A.6). P Let us use Lemma A.5 to check the relation Bt2 M, M t → ασ 2 /(2β), and we have condition (3.9) for the process Btn Ms∧tn /tn . To finish, we use condition L with respect to ht (αβ, r; a) to check (3.10) for the martingale M1n := Btn Mtn . As a result we obtain (3.18).

344

E. VALKEILA AND A. V. MELNIKOV

4. Averaged procedures. 4.1. Convergence properties of averaged procedures. 4.1.1. We study the convergence properties of the averaged procedures. Consider procedure (1.2) under conditions R.0, R.1, R.2.T, and M. Define the averaged estimate  t 1 θ¯t = θs das 1 + at 0 with θ¯t = θ0 on {(ω, s): as = 0}. By Theorem 2.1 we have θt → x a.s. The statement that θ¯t → x a.s., as at → ∞ a.s. as t → ∞, follows from the following simple lemma. Lemma 4.1. Let functions a, b be such that a increases, a0 = 0, a∞ = ∞, and limt→∞ bt = 0. Then as t → ∞  t 1 bt := bs das −→ 0. 1 + at 0 4.1.2. Now we consider a standard procedure, where γ = α(1 + a− )−1 . Using Theorem 2.2, we obtain the following theorem on the rate of convergence of the standard procedure. Theorem 4.1. Let conditions (2.6), (2.7), (2.9), (2.8) be fulfilled after some t0 and R.3 hold. The standard averaged procedure converges (a.s.) and |θ¯t − x |  C(ω) (1 + at )−αβ/2 . Proof. The statement follows from the estimate  t C(ω)  ¯ |θt − x |  (1 + as )−αβ/2 das . (1 + at ) 0 Remark 4.1. For the averaged procedure with “slowly varying weights,” we obtain the same rate of convergence using Theorem 2.3. 4.2. Asymptotic normality of the averaged standard procedure. 4.2.1. We saw above that the upper bound for the rate of convergence for the averaged procedure is no better than for the original estimates. One way to compare the quality of the different procedures to estimate x is to prove the central limit theorem and compare the variances of the limit normal distribution. From this point of view the stochastic approximation procedures do not behave optimally. However, sometimes the averaged procedures are optimal in the sense that their limit achieves the asymptotic optimal variance. (See [16], [19], [17], [2], [22], [8], and [9].) We proved the asymptotic normality of the standard procedure under assumptions (3.1), R.0, (2.6), (2.7), (2.9), (2.8) after some t0 , R.3, and L. We show that if the assumptions above are valid, then the averaged standard procedure is asymptotically normal too. 4.2.2. First, we consider the scheme used in the proof. As before, it essentially depends on the assumption that the process a is deterministic. Let θt satisfy (1.2). Set for simplicity θ0 = x = 0. By Lemma 3.1 θt = Et (−αβ) (−Ut − Mt ),

MARTINGALE MODELS OF STOCHASTIC APPROXIMATION

345

where  Ut =

t



0

−1 Es (−αβ) U (θs− )γs das

t (recall that R(x) = β(x−x )+U (x)) and Mt = 0 (Es (−αβ))−1 γs dms . Now consider t the integral 0 θs das . Using the representation above and integration by parts, we obtain the following formula for the averaged procedure:  t   1/2 ¯ −1/2 (1 + at ) θt = (1 + at ) Jt (−αβ) − Js− (−αβ) d(Ms − Us ), (4.1) where Jt (−αβ) =

t 0

0

Es−1 (−βγ · a) das . Note that if a is continuous, then Jt (−αβ) =

  1 (1 + at )1−αβ − 1 . 1 − αβ

Let us fix t and define the process Y (t, u) by the formula  u   −1/2 Y (t, u) = (1 + at ) Jt (−αβ) − Js− (−αβ) d(Mts − Ust ). 0

Choose tn and define again a sequence of semimartingales Y n on the interval [0, 1] by the formula Yun = Y (tn , utn ). Since a is deterministic, the process  u   u −→ Nun := (1 + atn )−1/2 Jtn (−αβ) − Js− (−αβ) dMtsn 0

is locally square integrable martingale. Hence, it is necessary to show that Y1n = An1 + M1n

(4.2)

n has the normal limit, where Mun = Nut is a locally square integrable martingale and n

Anu = −(1 + atn )−1/2

 0

utn



 Jtn (−αβ) − Js− (−αβ) d Ustn

is a predictable process on the interval [0, 1]. 4.2.3. Let us return to the proof of the asymptotic normality of the averaged procedure. Using Lemma 3.2, we choose a fixed T > 0 such that |U (θs− )|  C(ω)Es (−αβ), and this inequality holds for all ω ∈ B, where P(B) > 1 − ε. For ω ∈ B, we have     An1  C π∞ (−αβ) (1 + atn )−1/2 C(ω, T ) + (1 + atn )1−αβ log(1 + atn ) + (1 + atn )−αβ (see the appendix for the computations) and, hence, (4.3)

P

An1 −→ 0.

Using Lemma A.2, we obtain that  1 α2 σ 2 2 2ασ 2 P n n M , M 1 −→ (4.4) − +1 = . 1 − αβ 2αβ − 1 αβ β(2αβ − 1)

346

E. VALKEILA AND A. V. MELNIKOV

Now we have to check the Lindeberg condition for the martingale M1n . Note that |∆Mun |  2C|∆Vutn |, where V is defined in (3.6). Hence, the Lindeberg condition L implies the corresponding condition for the martingale sequence M n at time = 1. As a result we obtain the following. Theorem 4.2. Let conditions (3.1), R.0, (2.6), (2.7), (2.9), (2.8) be fulfilled after some t0 ; R.3 and L hold. Then  2ασ 2 d (4.5) . (1 + at )1/2 θ¯t −→ N 0, β(2αβ − 1) Remark 4.2. Since 2ασ 2 α2 σ 2 > , β(2αβ − 1) 2αβ − 1 the limit asymptotic variation of the averaged procedure is bigger than for the standard procedure. 4.3. Asymptotic normality of the averaged procedure with slowly varying weights. 4.3.1. Let θt be a unique strong solution of (1.2) with γ = γ(r). Moreover, assumptions (2.7), (2.10) are satisfied after some t0 ; (2.11), (2.9), R.3, (3.1), (3.16) hold; and let condition L be satisfied with ht (αβ, r; a) instead of g t (αβ; a). Recall that here γ(r) = α(1 + a− )−r and    αβ  Et (−αβ; r) = exp − (1 + at )1−r − 1 ρt (αβ; r), 1−r where ρt (−αβ; r) =



1−

st

αβ∆as (1 + as− )r



 exp

 αβ ∆(1 + as )1−r . 1−r

It turns out that we can again compute as if the function a were continuous. 4.3.2. In order to simplify the expressions, we set θ0 = x∗ = 0. By Lemma 3.1   θt = −Et (−αβ; r) Ut (r) + Mt (r) , where

 Ut (r) =

0

 Mt (r) =

0

t



−1 Es (−αβ; r) U (θs− ) γs (r) das ,

t



−1 Es (−αβ; r) γs (r)dms .

t In view of the notation Ht (−αβ; r) = 0 Es (−αβ; r) das , we obtain the following representation of the averaged procedure:  t     Ht (−αβ; r)−Hs− (−αβ; r) d Ms (r)+Us (r) . (4.6) (1+at )1/2 θ¯t = (1+at )−1/2 0

Again we consider the sequence tn ↑ ∞ and define a sequence of semimartingales Y n defined on the intervals [0, 1], with decomposition Yun = Anu + Mun ,

MARTINGALE MODELS OF STOCHASTIC APPROXIMATION

where Anu

−1/2



= (1 + atn )

0

Mun = (1 + atn )−1/2



utn



 Htn (−αβ; r) − Hs− (−αβ; r) d Ustn (r),

utn



 Htn (−αβ; r) − Hs− (−αβ; r) dMtsn (r).

0

347

4.3.3. We show that if 3 − αβ < r < 1, 2

(4.7) then

P

An1 −→ 0.

(4.8)

Since (1 + at )−1/2 → 0, it is sufficient to consider the tail behavior of the integral, and we can assume that if s  T , then P{ω: |U (θs− )|  C(ω) (1 + as )−αβ } > 1 − ε for any ε > 0. Note that    Ht (−αβ; r) − Hs− (−αβ; r) Es (−αβ; r) −1  C(ω; α, β) at . n Hence An1 (ω)  (1 + atn )−1/2 C(ω, T ) + C(ω; α, β) (1 + atn )2−1/2−αβ−r . Taking account of (4.7) we obtain (4.8). Remark 4.3. Condition (4.7) together with the condition 1 − r/2 < αβ < 2r − 1 means that r ∈ ( 56 , 1). So we have β of R.3, after which we can choose r ∈ ( 56 , 1) and then fix the free parameter α. Let us consider the martingales M n . Integrating by parts, we obtain   t 1 r r 1 − Et (−αβ; r) (1 + at ) + Ht (−αβ; r) = Es (−αβ; r) d(1 + as ) . αβ 0 In what follows we use the notation Kt (−αβ; r) =

 0

t

Es (−αβ; r) d(1 + as )r .

Hence we can rewrite the martingale M n in the form M n = M n,1 + M n,2 + M n,3 ,

(4.9) where Mun,1 Mun,2 M n,3

r 1 + as = (1 + atn ) dms , 1 + as− 0  utn  −1 1 Es (−αβ; r) = − Etn (−αβr) (1 + atn )r−1/2 (1 + as− )−r dms , β 0  utn Kt (−αβ; r) − Ks− (−αβ; r) 1 dms . = (1 + atn )−1/2 β (1 + as− )r Es (−αβ; r) 0 −1/2

1 β



utn



348

E. VALKEILA AND A. V. MELNIKOV

We show that

σ2 −→ N 0, 2 , β d



(4.10)

M1n,1

(4.11)

M1n,2 + M1n,3 −→ 0.

P

For the proof of (4.11) we need the following lemma, which follows from the Lenglart inequality (see [10]). Lemma 4.2. If (mn , Fn )n1 is a sequence of locally square integrable martingales with P

mn , mn 1 −→ 0,

(4.12) then

P

(mn )∗1 −→ 0.

(4.13)

Recall that dm, m t /dat  C < ∞. Using this fact we obtain  tn  −2 Es (−αβ; r) (1 + as )−2r das . M n,2 , M n,2 1  CEt2n (−αβ; r) (1 + atn )2r−1 0

It is not difficult to verify that  tn (Es (−αβ; r))−2 (1 + as )−2r das P 0 −→ C. (Etn (−αβ; r))−2 (1 + atn )−r P

P

Hence, M n,2 , M n,2 1 −→ 0 and by Lemma 4.2 (M n,2 )∗1 −→ 0. Further, we consider the martingale M n,3 . Integrating by parts, we obtain the estimate 1 Es (−αβ; r) (1 + as )2r−1 . Kt (−αβ; r) − Ks− (−αβ; r)  αβ(2 − 2r) P

Hence, M n,3 , M n,3 1  C(1 + atn )2r−2 . Using Lemma 4.2, we obtain (M n,3 )∗1 −→ 0. We have proved (4.11). Obviously, P

M n,1 , M n,1 1 −→

σ2 . β2

Comparing the jumps of the martingales M n,1 with the jumps of the corresponding martingale obtained from (3.20), it is easy to see that we have the Lindeberg condition for the martingale M n,1 for the nonaveraged procedure. We have proved the following theorem. Theorem 4.3. Let conditions (2.7), (2.10) be satisfied after some t0 ; (2.11), (2.9), R.3, (3.1), (3.16) hold; and condition L be satisfied with ht (αβ, r; a) instead of g t (αβ; a) and (4.7). Then  σ2 d (1 + at )1/2 θ¯t −→ N 0, 2 . β Remark 4.4. The variance σ 2 /β 2 is an optimal variance for certain least squares estimators (see [8]).

MARTINGALE MODELS OF STOCHASTIC APPROXIMATION

349

5. On the exact asymptotic behavior of procedure (1.2). 5.1. In this section we study the exact asymptotic behavior of the paths θt as t → ∞. Our approach is based on the law of the iterated logarithm for martingales under the assumption that γt is equal to α/(1 + at ) (the standard procedure) or α/(1 + at )r , r ∈ (0, 1) (the procedure with slowly varying weights). Let M be a locally square integrable martingale and    1/2 Lt = 2M, M t log log max M, M t , e2 . Denote by ν X the compensator of the jump measure of the semimartingale X. The following proposition follows from [25] and [26]. Proposition 5.1. Let the locally square integrable martingale M satisfy the following conditions (almost surely) with respect to the measure P: e.0 M, M ∞ = ∞; ∞ e.1 0 |x|>Ls ν M (ds, dx) < ∞; t e.2 L−2 x2 ν M (ds, dx) −→ 0 as t → ∞; t 0 |x|>Ls ∞ M e.3 0 |x|Ls x4 L−4 (ds, dx) < ∞. s ν Then (a.s.) lim supt→∞ L−1 t |Mt |  1. The main result on the exact behavior of procedure (1.2) is the following. Let M be the locally square integrable martingale  t α (5.1) Es−1 (−βα) dms Mt := 1 + as− 0 with the square characteristic  M, M t =

t

0

Et−2 (−βα)

α2 dm, m s . (1 − as− )2

Theorem 5.1. For the standard procedure let conditions R.0, R.3, (2.7), (2.8), (2.9), (3.1), 1 < 2βα < 2 be satisfied and the martingale M , defined from (5.1), satisfy e.1– e.3. Then (almost surely with respect to the measure P) (5.2)

 lim sup Et−1 (−βα) L−1 t |θt − x |  1. t→∞

Proof. Without loss of generality, we assume that θ0 = x = 0. We apply the Kolmogorov–Itˆ o formula to the Et−1 (−βα) θt and obtain  t α Et−1 (−βα) θt = − (5.3) Es−1 (−βα) U (θs− ) das − Mt . 1 + as− 0 Multiplying the left- and right-hand sides of (5.3) by L−1 t , we obtain (5.4)

−1 −1 L−1 t Et (−βα) θt = −I(t) − Lt Mt ,

where I(t) := L−1 t

 0

t

Es−1 (−βα)

α U (θs− ) das . 1 + as−

350

E. VALKEILA AND A. V. MELNIKOV

∞ If 2αβ > 1 we have M, M ∞ = C(ω) 0 (1 + as− )2(βα−1) das = ∞. By Proposition 5.1 for the martingale M defined in (5.1), we have lim sup L−1 t |Mt |  1.

(5.5)

t→∞

Hence, to prove (5.2) it is necessary to show that limt→∞ I(t) → 0. Let us fix T > 0 and write I(t) in the form  T α I(t) = L−1 Es−1 (−βα) U (θs− ) das t 1 + as− 0  t α (5.6) + L−1 Es−1 (−βα) U (θs− ) das . t 1 + as− T The first summand in the right-hand side of (5.6) tends to zero as t → ∞, since T Lt → ∞ and the integral 0 Es−1 (−βα) α/(1 + as− ) U (θs− ) das is a finite number which may depend on ω (as well as T ). For the second summand of the right-hand side of (5.6) we use R.3, the upper bound (2.3), and (A.9) to obtain the following sequence of inequalities:  t  t α α −1 −1 Es (−βα) U (θs− ) das  C(ω) Lt Es−1 (−βα) Es (−βα) das 1 + a 1 + as− s− T T  t α  C(ω) L−1 das t T 1 + as−  C(ω) L−1 t log(1 + at ).

(5.7)

βα/2 . However, L−1 t log(1 + at ) −→ 0, since Lt  C(ω)(1 + at ) The following theorem gives the exact asymptotic behavior in terms of the process a. Theorem 5.2. Let conditions R.0, R.3, (2.7), (2.8), (2.9), (2.10), (3.1), 1 < 2βα < 2 be satisfied and let the above martingale M from (5.1) satisfy e.1– e.3. Then (almost surely with respect to the measure P) we have

(5.8)

lim sup t→∞

(1 + at )1/2 ασ |θt − x |  √ . 2 1/2 (2 log log(max(1 + at , e ))) 2βα − 1

Proof. From the appendix it follows that Et (−βα) = (1 + at )−βα πt (−βα). Moreover,  t α2 σs2 M, M t = Es−2 (−βα) das . (1 + as− )2 0 From here, (3.1), and Lemma A.2, we obtain M, M t α2 σ 2 −→ π −2 (−βα). (1 + at )2βα−1 2βα − 1 ∞ This statement proves (5.8). 5.2. Let us consider the procedures with slowly varying weights. In this case procedure (1.2) has the following structure: γt :=

α , (1 + at− )r

where

r ∈ (0, 1).

MARTINGALE MODELS OF STOCHASTIC APPROXIMATION

351

Let the following stability condition be fulfilled: For almost all ω and all t and x − 2(x − x ) R(x) (5.9)

+ ∆at

α βα + (x − x )2 (1 + at− )r 1 + at−

α2 R2 (x)  0. (1 + at− )2r

Consider also the stochastic exponent E(δ; r), which is a solution of the linear equation  Et (δ; r) = 1 +

0

t

Es− (δ; r)

δ das . (1 + as− )r

Assuming that conditions R.1, R.3, (2.7), (3.1), (5.9) are satisfied and βα < 2r − 1,  with the condition (∆as /(1 + as− )r )2 < ∞ instead of (∆as /(1 + as− ))2 < ∞, we obtain (2.3). Denote by M (r) the martingale  Mt (r) :=

0

t

Es−1 (−βα; r)

α das . (1 + as− )r

As before, we define L = L(r) with M (r). Theorem 5.3. Let conditions R.0, R.3, (2.7), (3.1) be satisfied, the martingale M (r) satisfy e.1– e.3 and 1 − r/2 < βα < 2r − 1. Then (almost surely with respect to the measure P) (5.10)

 lim sup Et−1 (−βα; r) L(r)−1 t |θt − x |  1. t→∞

Proof. Assume again θ0 = x = 0. Using the Kolmogorov–Itˆ o formula we obtain that (5.11)

Et−1 (−βα; r) θt = −It (r) − Mt (r),

where  It (r) :=

0

t

Es−1 (−βα; r)

α U (θs− ) das . (1 + as− )r

If βα > 1 − r/2, then M (r), M (r) ∞ = ∞ and we have e.0 for M (r). The rest of the proof is the same as for Theorem 5.1 with some obvious changes. The following theorem gives the exact asymptotics in terms of the process a. Theorem 5.4. Let conditions R.0, R.3, (2.7), (2.10), (3.1) be satisfied, the martingale M (r) satisfy e.1– e.3 and 1 − r/2 < βα < 2r − 1. Then (almost surely with respect to the measure P) (5.12)

r/2

(1 + at ) lim sup  σ t→∞ 2 log (1 + at )



α (1 − r) . 2β

352

E. VALKEILA AND A. V. MELNIKOV

6. Examples. In this section we show how the general technique developed; the results obtained above can be applied to the classic procedures of stochastic approximation with discrete time. 6.1. The standard procedure. We consider the special case of procedure (1.2): (6.1)

θ n = θ0 −

n 

n

R(θi−1 )

i=1

α α − ξi . i i i=1

 Here γn := α/(n + 1), m is a martingale such that mn = in ξi , E(ξi | Fi−1 ) = 0, E(ξi2 | Fi−1 ) = σi2 , and a is a “counting” measure (see [10]). Let conditions R.0, R.3, (2.8) be satisfied. In view of procedure (6.1) above the stability condition (2.8) has the following form: α 2 R (x)  2(x − x ) R(x). n   Condition (2.7) follows from above since i (i + 1)−1 = ∞ and i 1/i2 < ∞. Suppose that (3.1) holds, i.e., σi2 → σ 2 > 0 as i → ∞. For simplicity, let 0 < c  σi2  1/c with some constant c. Finally, we assume that for some δ > 0 the Lyapunov condition β(x − x )2 +

sup E|ξi |2+δ < ∞

(6.2)

i

holds. Proposition 6.1. Assume that procedure (6.1) satisfies conditions R.0, R.3, (2.8), and (3.1). Then (almost surely with respect to the measure P) √ n ασ lim sup |θn − x |  √ (6.3) . 1/2 (2 log log n) 2αβ −1 n Proof. This statement, which was first obtained under weaker assumptions in [24], follows from the general Theorem 5.2. For the proof it is sufficient to verify that martingale (5.1) for procedure (6.1), (6.4)

Mn =



αiαβ−1 πi−1 (−βα) ξi ,

in

satisfies the assumptions of Proposition 5.1. From (6.4) it follows that M, M n = O(n2αβ−1 ) and, hence, e.0 is satisfied. Define a process L with process (6.4) as before. For n large enough, which can depend on ω, we obtain the following estimate: nαβ−1  Ln  Cnαβ−1+ε C

for any ε > 0.

Omitting the constant C in the statements below we obtain that ∆Mn = (n)αβ−1 πn−1 (−αβ) ξn with constant k independent of ω.

and k(n)αβ−1 |ξn |  |∆Mn |  K(n)αβ−1 |ξn |

MARTINGALE MODELS OF STOCHASTIC APPROXIMATION

353

Using the Chebyshev inequality, (6.2), and the remark above, which permits us to estimate the measure M of jumps in terms of the measure m of jumps, we obtain the inequality  

P |ξi | > i1/2 |Fi−1  K i−1−δ/2 , E in

in

which proves the condition e.1. Hence, using (6.2) again, we obtain 1

n

E 2αβ−1



i2αβ−2 E[ξ 2 1 |ξi | > i1/2 |Fi−1 ]  Cn−δ/2 .

in

In view of the fact that n−δ/2 → 0 as n → ∞, we have e.2. Finally,   

i−2 E[ξ 4 1 |ξ|  i1/2+ε |Fi−1 ]  K 1 + n2ε−εδ−δ/2 , E in

and the right-hand side of the inequality is finite for all n if ε < δ/(4 − 2δ). Hence, e.3 is satisfied. Proposition 6.1 is proved. 6.2. A procedure with slowly varying weights. Let us consider a discrete procedure with slowly varying weights. Here γn (r) := α/nr , i.e.,  α α θn = θ0 − (6.5) R(θi−1 ) r − ξi , i ir in

in

where ξk , k  1, are defined as in the preceding subsection and satisfy (6.2). We have 1 i

where (6.6)

1 2

i

=∞

and

 1 < ∞, i2 r i

< r < 1. The stability condition is β(x − x )2 + αn1−2r R2 (x)  2(x − x ) n1−r .

Proposition 6.2. Let procedure (6.5) satisfy R.0, R.3, (3.1), (6.6), and (6.2). Then (almost surely with respect to the measure P)  nr/2 α(1 − r) lim sup √ (6.7) . σ 2β 2 log n n Proof. It is necessary to verify that the martingale M (r) satisfies the assumptions of Proposition 5.1, where   n  αβ 1−r −r M (r)n = i i ρi (−αβ; r) ξi , exp 1−r i=1    η  1−r 1−r i . (1 + ηi−r ) exp − − (i − 1) ρk (η; r) = e−η/(1−r) 1−r ik

354

E. VALKEILA AND A. V. MELNIKOV

Here (see the appendix) 0 < c < ρk (−αβ; r) < 1/c for all k and c > 0 is a deterministic constant. Use this to obtain the following estimate for the jumps of the martingale M (r):     αβ 1−r −r αβ 1−r −r 1 c exp i i |ξi |  ∆M (r)i  exp i |ξi |. i 1−r c 1−r Define the process L(r) with respect to M (r) and obtain the estimates     1 αβ 1−r −r/2 αβ 1−r −r/2+ε exp i i i  L(r)i  K exp , i K 1−r 1−r where ε > 0 is some positive number. Using the estimates above we check the conditions e.0–e.3 as in the previous subsection. 6.3. Let us consider the case m = 0 for model (1.2). Suppose that conditions R.0, R.1, and R.2.0 are fulfilled. Then Theorem 2.1 implies that (θt − x )2  Et (−k · a) (θ0 − x )2 .

(6.8)

If in (6.8) at = t, γk = α/k, and k = β, and R.2.0 is satisfied, then |θt − x |  C/n1/4 . Certainly, if we know the function R and can use the Newton method to find a root of (1.1), then we obtain the rate of convergence |θt − x |  C/n2 , at least, in a neighborhood of x (see [20, p. 210]). In what follows we shall use n instead of t. Suppose that conditions R.3, R.0, (2.6),  (3.1) are satisfied and a noise of the procedure is the martingale difference n mn = i=1 ξi , where σi2 = EFi−1 ξi2 and supi E|ξi |2+δ < ∞, δ > 2. It is easy to verify that if σi < C < ∞ for all i, then the procedure converges. We show how to verify condition L. Here gin := gin (αβ) = (1 + n)αβ−1/2 × (1 + i)1−αβ . So we can write condition L as n 

(6.9)



 P (gin )−2 EFi−1 ξi2 1 |ξi | > gin ε −→ 0

as n → ∞.

i=1

From the H¨ older inequality it follows that 

 EFi−1 ξi2 1 |ξi | > gin ε  K 2/2+δ ε−δ (gin )−δ . Use this inequality in (6.9) to obtain the estimate n 

n  

 (gin )−2 EFi−1 ξi2 1 |ξi | > (gin )ε  K 2/(2+δ) ε−δ (gin )−(2+δ) . i=1

i=1

Since rem 3.1

n

n −(2+δ) i=1 (gi )

−→ 0 as n → ∞, condition L is fulfilled. Hence, by Theo-

 d (1 + n)1/2 (θn − x ) −→ N 0,

α2 σ 2 . 2αβ − 1

Now we define the procedure with weights (1 + n)−r . Assume that the conditions above are satisfied with replacement γ(r) by γ in R.2.T and hni by gin in L, where     αβ αβ n 1−r −r/2 1−r (1 + n) (1 + i) hi ∼ exp (1 + n) (i + 1)−r . exp − 1−r 1−r

MARTINGALE MODELS OF STOCHASTIC APPROXIMATION

355

Condition L can be verified analogously. Therefore,  ασ 2 d r/2 (1 + n) θn −→ N 0, . 2β Appendix. Method of stochastic components. A.1. We give a systematic overview of the computations we need when we are computing certain stochastic exponents with respect to the increasing processes. This work extends [8], [12], [13], and [14]. The main idea is simple: assume that a nondecreasing function a is continuous. Then (1 + a)−1 · a = log(1 + a) and (1 + a)r · a = (1 + r)−1 (1 + a)r+1 for r = −1. If the function a is only right-continuous, we still want to compute as if a was continuous. Of course we must require something on the jumps of the function a so that the asymptotic behavior is qualitatively the same compared to the continuous a. Let a be an increasing c` adl` ag function, a0 = 0, and  t 1 vt (r) = (A.1) das , r 0 (1 + as− ) where 0 < r  1. Let v := v(1). In what follows we shall often use the following change of variables formula:  t  f (At ) = f (0) + (A.2) fx (As− ) dAs + (∆f (As ) − fx (As− ) ∆As ) , 0

st

where f is a smooth function with a continuous derivative fx and A is an increasing c`adl` ag function with A0 = 0 (see [3, p. 155] and [10]).  For the increasing c` adl` ag function b we set [b, b]t = st (∆bs )2 . Suppose that the function v satisfies the following two conditions: v.1 For any t  0 vt < ∞ and v∞ := limt vt = ∞; v.2 [v, v]∞ < ∞. A.2. We study the asymptotic properties of the functions v(r), 0 < r  1, under conditions v.1 and v.2. First we note that by (A.2)  (∆vs − log(1 + ∆vs )) . vt = log(1 + at ) + (A.3) st

Recall that Et (b) = ebt (A.4)



st (1

+ ∆bs )e−∆bs and, hence, (A.3) implies

Et (v) = (1 + at ),  Et (−v) = (1 + at )−1 st (1 − (∆vs )2 )

and, in the more general form, for any α  −1 (A.5)

Et (α) := Et (αv) = (1 + at )α

 (1 + α∆vs ) st

(1 + ∆vs )α

Set πt (α) :=

 (1 + α∆vs ) st

(1 + ∆vs )α

.

.

356

E. VALKEILA AND A. V. MELNIKOV

Let ∆vt < 1 for all t  0. Lemma A.1. Let function v satisfy condition v.2. Then for any α  −1 0 < π∞ (α) < ∞.

(A.6) Proof. Note that πt (α) = exp



  log (1 + α∆vs ) − α log(1 + ∆vs )



st

and thus the statement follows from the inequality | log(1 + αx) − α log(1 + x)|  cx2 , which is true for all 0  x < 1 with the constant c := c(α) > 0. Using (A.2), we obtain the following formula for v(r):  1 (1 + at )1−r − 1 vt (r) = 1−r   ∆as 1−r − (A.7) ∆(1 + as ) . − (1 − r) (1 + as− )r st

Note that     (∆as )2 ∆a    s  ∆(1 + as )1−r − (1 − r)  r(1 − r) (A.8)  .   (1 + as− )r  (1 + as− )r+1 st

st

Recall that at → ∞ and πt (1) = 1 for all t  0. Lemma A.2. Suppose that conditions v.1 and v.2 are fulfilled. Then vt = 1, t log(1 + at ) Et (α) lim = π∞ (α), t (1 + at )α

(A.9)

lim

(A.10) and for 0 < r < 1

lim

(A.11)

t

vt (r) 1 = . (1 + at )1−r 1−r

Proof. The validity of relations (A.9) and (A.10) follows directly from the formulas above. As for the result (A.11), we have to show that (1 + at )r−1

 st

(∆as )2 −→ 0 (1 + as− )r+1

as t → ∞. But this follows from the Kronecker lemma, since (1 + at )r−1

 st

 (∆as )2 (∆as )2 r−1 = (1 + a ) , t (1 + as− )r+1 (1 + as− )2 (1 + as− )r−1

and we obtain that [v, v]∞ = lemma is proved.

st

 t

(∆as )2 /(1 + as− )2 < ∞ by condition v.2. The

MARTINGALE MODELS OF STOCHASTIC APPROXIMATION

357

Remark A.1. Instead of condition v.2, Le Breton [8] considered the following condition: lim ∆v(r) = 0.

(A.12)

t→∞

Using v.1 and (A.12), he proved (A.9) and (A.11), and instead of (A.10), the following: lim

t→∞

log Et (αv(r)) = α, Fr (v)

where Fr (v) = (1 + v)1−r /(1 − r) for 0 < r < 1 and F1 (v) = log(1 + v) (see [8]). Note that condition (A.12) does not guarantee that πt (α) converges to a positive limit. A.3. Now introduce the parameter α > 0 and set   2(1−1/(2α)) t 2  −2 It (α; v) := Et (−α) γs Es (−α) das . 0

Here γ := γ(α) = α/(1 + a− ) and 0 < α < 1 < 2α. The notation ft ∼ gt stands for limt ft /gt = 1 as t → ∞, where ft > 0, gt > 0 are some functional expressions. Lemma A.3. Let conditions v.1 and v.2 be satisfied. Then for α < 1 < 2α (A.13)

lim It (α; v) = t

−1/α α2  π∞ (−α; v) . 2α − 1

Proof. For any δ  −1 the exponent Et (δv) is of the form Et (δv) = (1 + at )δ πt (δ). Hence, by Lemma A.1, for t large enough   Et (δv) − (1 + at )δ π∞ (δ)  (1 + at )δ (1 + ε). (A.14) So when we consider the asymptotic behavior of the integral It (α, β; v) we can use the approximation Et (δv) ∼ (1 + at )δ π∞ (δ). Hence, for t large enough  −1/α 2 t  It (α; v) ∼ (1 + at )1−2α π∞ (−α) (A.15) α (1 + as− )2α−2 das . 0

Note that in (A.15) we have also used the relation   −2 −2 Es (−α) = (1 + as− )2α (1 + ∆vs )2α πt (−α) and for t large enough we obtain (Es (−α))−2 ∼ (1+as− )2α (π∞ (−α))−2 since ∆vs → 0 as s → ∞. By assumption 2α > 1 and the fact that (1 + at )1−2α −→ 0, it follows that the t integral It (α, β; v) has a finite limit if the expression (1 + at )1−2α 0 (1 + as− )2α−2 das has a finite limit. But this follows from (A.11). Hence, the relation lim It (α, v) = t

−1/α α2  π∞ (−α; v) 2α − 1

holds and so we have (A.13). A.4. In this final section we consider the expression E(αv(r)), where v(r) is defined by (A.1). We assume that 0 < r < 1 and α > −1. It follows from (A.7) that        α  1−r (1 + at ) Et αv(r) = exp (A.16) − 1 ρt α; v(r) , 1−r

358

E. VALKEILA AND A. V. MELNIKOV

where

   1+ ρt α; v(r) := st

α∆as (1 + as− )r



 exp



 α ∆(1 + as )1−r . 1−r

Note that the equality log y = limr→0 y r − 1/r, for y > 0, easily implies lim Et (αv(r)) = Et (αv).

r→1

Condition v.r: For 0 < r < 1 let a function a satisfy condition  ∞ (1 + as− )−2r d[a, a]s < ∞. 0

Lemma A.4. Assume that conditions v.1 and v.r are satisfied. Then          −1 α  1−r (1 + at ) lim Et αv(r) exp − = ρ∞ α; v(r) . (A.17) −1 t 1−r Proof. We have to show that 0 < ρ∞ (α; v(r)) < ∞. For t sufficiently large,      α α∆at  1−r  ∆(1 + a − )   log 1 + t   (1 + at− )r 1−r    C(α, r) (1 + at− )−2r + (1 + at− )−(1+r) (∆at )2 . This together with assumption v.r proves that 0 < ρ∞ (α; v(r)) < ∞. The lemma is proved. Before we continue, assume that the function a is continuous, a0 = 0, a∞ = ∞. Moreover, it is not difficult to verify that  t    2α 2α 1 r 1−r −2r 1−r (1+at ) (1+as ) das −→ (1+at ) exp − (1+as ) exp 1−r 1−r 2α 0 as t → ∞, α > 0, and 0 < r < 1. Choose E(rα) = E(αv(r)) and  t   r 2 Jt α; v(r) := (1 + at ) Et (−rα) (1 + as− )−2r Es−2 (−rα) das . 0

Lemma A.5. Let conditions v.1 and v.r be satisfied. Then for α > 0 and 0 1 − r/2. Then   Ht α; v(r) −→ 0. (A.23) The proof follows from Lemma A.1 and the fact that Et (−α; r) (Es (−α; r))−1  1. REFERENCES [1] P. E. Caines, Linear Stochastic Systems, Wiley, New York, 1988. [2] H. F. Chen, Asymptotically efficient stochastic approximation, Stochastics Stochastics Rep., 45 (1993), pp. 1–16. [3] C. Dellacherie and P.-A. Meyer, Probabilities and Potential B, North-Holland, Amsterdam, 1982. ¨ ngy and N. V. Krylov, On stochastic equations with respect to semimartingales, [4] I. Gyo Stochastics, 4 (1980), pp. 1–21. [5] P. Hall and C. Heyde, Martingale Limit Theory and Its Application, Harcourt Brace Jovanovich, New York, 1980. [6] J. Jacod, Calcul stochastique et probl` emes de martingales, Springer-Verlag, Berlin, Heidelberg, 1979. [7] J. Jacod and A. N. Shiryaev, Limit Theorems for Stochastic Processes, Springer-Verlag, Berlin, New York, 1987. [8] A. Le Breton, About the averaging approach in Gaussian schemes for stochastic approximation, Math. Methods Statist., 2 (1993), pp. 295–315. [9] A. Le Breton and A. A. Novikov, Averaging for estimating covariances in stochastic approximation, Math. Methods Statist., 3 (1994), pp. 244–266. [10] R. Sh. Liptser and A. N. Shiryaev, Theory of Martingales, Kluwer, Dordrecht, 1989.

360

E. VALKEILA AND A. V. MELNIKOV

[11] N. V. Krylov and B. L. Rozovskii, Stochastic evolution equations, J. Sov. Math., 16 (1981), pp. 1233–1277. [12] A. V. Melnikov, Stochastic approximation procedures for semimartingales, in Statistics and Control of Random Processes, Nauka, Moscow, 1989, pp. 147–156 (in Russian). [13] A. V. Melnikov, Stochastic differential equations: Nonsmoothness coefficients, regression models and stochastic approximation, Uspekhi Mat. Nauk, 51 (1996), pp. 43–136 (in Russian). [14] A. V. Melnikov, A. E. Rodkina, and E. Valkeila, On a general class of stochastic approximation algorithms, in Proceedings of the Third Finnish–Soviet Symposium on Probability Theory and Mathematical Statistics, H. Niemi et al., eds., Front. Pure Appl. Probab. 1, VSP, Utrecht, 1992, pp. 183–196. [15] M. B. Nevel’son and R. Z. Hasminskii, Stochastic Approximation and Recursive Estimation, AMS, Providence, RI, 1973. [16] B. T. Polyak, New method of stochastic approximation type, Autom. Remote Control, 51 (1990), pp. 937–946. [17] B. T. Polyak and A. B. Juditsky, Acceleration of stochastic approximation by averaging, SIAM J. Control Optim., 30 (1992), pp. 838–855. [18] H. Robbins and D. Siegmund, A convergence theorem for nonnegative almost supermartingales and some applications, in Optimizing Methods in Statistics, J. S. Rustagi, ed., Academic Press, New York, 1971, pp. 233–257. [19] D. Ruppert, Efficient Estimators from a Slowly Convergent Robbins–Monro Process, Technical Report 781, Cornell University, Ithaca, NY, 1988. [20] H. R. Schwarz, Numerical Analysis. A Comprehensive Introduction, Wiley, New York, 1989. [21] P. Spreij, Recursive approximate maximum likelihood estimation for a class of counting process models, J. Multivariate Anal., 39 (1991), pp. 236–245. [22] G. Yin and I. Gupta, On a continuous time stochastic approximation problem, Acta Appl. Math., 33 (1993), pp. 3–20. [23] N. Lazrieva, T. Sharia, and T. Toronjadze, The Robbins–Monro type stochastic differential equations, Stochastics Rep., 61 (1997), pp. 67–87. [24] V. F. Gaposhkin and T. P. Krasulina, On the law of iterated logarithm in stochastic approximation processes, Theory Probab. Appl., 19 (1974), pp. 844–850. [25] A. I. Ekushov, A strong invariance principle and some of its applications, Russ. Math. Surv., 39 (1984), pp. 1574–158. [26] A. I. Ekushov, A Strong Invariance Principle for Semimartingales, Master’s Thesis, MGU, Moscow, 1985 (in Russian).