on recursive estimation in incomplete data models - ENT

14 downloads 0 Views 351KB Size Report
Sep 24, 1997 - d'2 k log %ЕXnЗ1; nЖ И 1ZnЗ1Иk h. └ 1. 2'2. kYn. З 1. 2'4. kYn. ЕYnЗ1 └ mkYnЖ2 i. V bb`. bbX. Е26Ж. 47. RECURSIVE ESTIMATION FROM ...
Statistics 34 (2000) pp. 27 ± 51 Reprints available directly from the publisher Photocopying permitted by license only

# 2000 OPA (Overseas Publishers Association) N.V. Published by license under the Gordon and Breach Science Publishers imprint. Printed in Malaysia.

ON RECURSIVE ESTIMATION IN INCOMPLETE DATA MODELS JIAN-FENG YAO SAMOS, Universit e Paris I (Received 24 September 1997; In ®nal form 30 September 1999) We consider a new recursive algorithm for parameter estimation from an independent incomplete data sequence. The algorithm can be viewed as a recursive version of the well-known EM algorithm, augmented with a Monte-Carlo step which restores the missing data. Based on recent results on stochastic algorithms, we give conditions for the a.s. convergence of the algorithm. Moreover, asymptotical variance of this estimator is reduced by a simple averaging. Application to ®nite mixtures is given with a simulation experiment. Keywords: Incomplete data; EM algorithm; recursive estimation; mixtures; stochastic algorithm AMS Classi®cation Codes: 62 F 12, 62 L 20

1. INTRODUCTION In many statistical models data are observed only upon some deterministic distorsion. Such examples include censored data, mixture models and the deconvolution problem. The most popular parameter estimator for these models is probably the EM estimator found in the seminar paper (Dempster et al., 1977). An up to date report on its various extensions can be found in (Meng and van Dyk, 1997) with an extensive discussion. Nevertheless on-line parameter estimation for theses models has not been fully addressed. Standard recursive estimation based on the observed likelihood (Fabian, 1978) do not directly apply, mainly because this likelihood is ill-de®ned for most of incomplete data sequence. 27

28

J.-F. YAO

Recently, Ryden (Ryden, 1998) develops a projection-based recursive likelihood estimator for the mixture problem. However it is not clear how to extend this approach to other incompletely observed models. We propose in this paper a new recursive algorithm, called RSEM, which can be viewed as a recursive version of the EM algorithm, augmented with a Monte-Carlo data imputation step which restores the missing data. To state the problem, let us call, following (Dempster et al., 1977), complete data some unobservable random variable X with domain X . We indeed observe a transformation of this, say Y =  ( X ), where  is a known non-random map from X into some observation space Y. Typically  is a ``many-to-one'' map, we thus call the observations Y incomplete data, since many information is lost by . Assume that X (resp. Y ) has a density  (x ; ) (resp. g (y ; )) with respect to some -®nite measure dx (resp. dy). These densities are linked by the following Z g… y ; † ˆ

X … y†

…x ; †dx;

where X … y† denotes the set {x : (x) = y}. It is worth noting that the conditional distribution of X given Y = y, denoted  ( X j Y = y ; ), has the density k…x j y ; † ˆ 1X … y† …x†…x ; †=g… y ; †: Let an i.i.d. sequence Y1, . . . , Yn, . . . of Y be successively observed. For the on-line parameter estimation purpose, we consider the following recursive algorithm, named as RSEM ( for recursive SEM-like algorithm). This algorithm is ®rst introduced in (Du¯o, 1996). The sequence ( n) below ( gains sequence) is any sequence of positive numbers which decreases to 0. The parameter space is some open subset  of Rp. AL G O R I T H M

RSEM:

(i) Pick an arbitrary initial value 0 2 . (ii) At each time n  1, perform the following steps with the new observation Yn ‡ 1:

RECURSIVE ESTIMATION FROM INCOMPLETE DATA

29

R-step (restoration) restore the corresponding unobserved complete data X by drawing an independent sample Xn ‡ 1 from the conditional distribution  ( X jY =Yn ‡ 1 ; n). E-step (Estimation) update the estimation by n‡1 ˆ n ‡ n r log … Xn‡1 ; n †:

…1†

In a non recursive framework, the above restoration R-step is the key novelty found in a class of so-called SEM, MCEM or SRE algorithms as introduced in (Celeux and Diebolt, 1985), (Wei and Tanner, 1990) and (Qian and Titterington, 1991) respectively. For a recent account on SEM algorithms, we refer to (Celeux et al., 1996) or (Lavielle and Moulines, 1995). This R-step plays a similar role as the E-step of the EM algorithm: it enables the use of the likelihood  from the complete data for parameter updating. This paper is devoted to an accurate study of the RSEM algorithm including convergence and asymptotic eciency. We shall ®rst show in x 2 that the RSEM algorithm is a stochastic gradient algorithm which minimizes Kullback-Leibler divergence from the true model. Unfortunately, for most of incomplete data models of interest, it is not clear from the current state of the stochastic algorithms theory (up to our knowledge), whether or not the RSEM algorithm should converge without transformation (see x2). Therefore, we shall (x3) truncate this algorithm at randomly varying bounds in spirit of (Chen et al., 1988; Chen, 1993). This technique is surprisingly e€ective and yields the a.s. convergence of the truncated algorithm. Finally in x4, we implement the RSEM algorithm for ®nite mixtures and report a simulation experiment.

2. DOES THE RSEM ALGORITHM CONVERGE? From now on, the true parameter value is denoted . Let us de®ne the Kullback-Leibler divergence Z K…† :ˆ

Y

g… y ;  † log

g… y ;  † dy; g… y ; †

…2†

and some assumptions on the smoothness of the likelihood function.

30

J.-F. YAO

AS S U M P T I O N s (S)  is an open convex set of Rp. The likelihood  7! log  (x ; ) is twice di€erentiable on  such that: (a) K is twice continuously di€erentiable on  and the identity (2) can be twice di€erentiated under the integral sign. R (b) For all  and all y, we have rg… y ; † ˆ X … y† r…x ; †dx These conditions are merely what is necessary for a statistical model to be regular. In particular, the assumption (S)-(b) implies the following identity which says that the incomplete score function is equal to the average of the unobserved complete score function: Z r log …x ; †k…x j y ; †dx: …3† r log g… y ; † ˆ X … y†

We shall also use the ®ltration F ˆ …F n †, with F 0 ˆ f ; ;g; F n ˆ …Y1 ; . . . ; Yn ; X1 ; . . . ; Xn † for n  1, which is the natural ®ltration associated to the RSEM algorithm. LE M M A 1 Under the assumptions (S )-(a)-(b), the RSEM algorithm is a stochastic gradient algorithm as follows n‡1 ˆ n ÿ n ‰ rK…n † ‡ "n‡1 Š; where "n‡1 ˆ ÿf r log … Xn‡1 ; n † ÿ E ‰ r log … Xn‡1 ; n † j F n Šg; is a F -adapted noise sequence, that is E …"n‡1 jF n † ˆ 0 for all n. Proof

We have by (S)-(a)-(b) h…† :ˆ E ‰ r log … Xn‡1 ; n ˆ † j F n Š ˆ E ‰ r log … X1 ; 0 ˆ †Š Z Z ˆ r log …x ; †k…x j y ; † g… y ;  † dx dy Y X … y† Z ˆ ‰ r log g… y ; †Š g… y ;  † dy ˆ ÿrK…†:

Š

Y

Therefore, the RSEM algorithm is a stochastic gradient algorithm obtained by perturbation of the following gradient system _ ˆ ÿrK…†:

…4†

RECURSIVE ESTIMATION FROM INCOMPLETE DATA

31

The convergence analysis of such an algorithm is usually set up in two steps (see e.g. (Du¯o, 1996)). First we attempt to stabilize the algorithm, that is to check whether a.s. the sequence (n) would live in a compact subset of  or not. After the algorithm is stabilized, there are many methods to ensure the a.s. convergence of (n) to a point (equilibrium) of {rK = 0}. Recent results based on the so-called ODE method (Kushner and Clark, 1978) is reported in (Fort and Pages, 1996). The gradient system (4) has a natural Lyapounov function V = K : for any solution (t)t  0 of the system, the function t 7! K (t) is decreasing. If in addition K is inf-compact, i.e., for all a 2 R; fK  ag is a compact set of ; …i:e:; lim K…† ˆ ‡1† !@ 

…5† the whole trajectory (t)t  0 would stay in the compact set {K  K (0)}. Therefore this inf-compacity of K is a basic tool to stabilize such an algorithm. Another widely used requirement for stabilization of this system is the following Lipshitz condition rK is Lipshitz on :

…6†

Unfortunately, Kullback-Leibler divergence K from incomplete data models often does not meet both the Conditions (5) and (6). In the example below with censorated data, (6) is not ful®lled, while for a ®nite mixture (cf. x4), (5) is no longer satis®ed. Consequently, we do not know whether or not the RSEM algorithm could be stabilized without transformation (actually our simulation experiment in x4 in the mixture case seems to show it would be not). Therefore we shall develop in the next section a truncated version of the RSEM algorithm which will converge. Example with censored data Let X be a real-valued variable and Y = min( X, c) with a known constant c 2 R. Let also a = P ( X < c ; ) and the reference measures be dx = 1x 6ˆ c dm(x) ‡ dc(x), dy = 1y < c dm(y) ‡ dc(y) with the Lebesgue measure m and the Dirac mass c at c. Then g… y ; † ˆ … y ; †1y0,   1 D :ˆ  2  : d…; @ †  and kk   : 

37

…14†

When  ! 1, D ! . Assume that on D, max[ÿ()] can be bounded by an increasing map '( ) sup max ‰ÿ…†Š  '… †:

…15†

D

Necessarily, lim '( ) = 1 when  ! 1. Let be some constants b > 1 and c > 0. We may take, by inversion of ', a strictly increasing sequence ( s) such that '… s †  c

s2 ÿ1 …log s†b

; for s  2:

…16†

For example, if ' is continuous in addition, we can take s = 'ÿ 1 (cs2 ÿ 1 (log s)ÿ b). Finally let us take for large enough s, say s  s0, Cs :ˆ Ds :

…17†

Thus for n  s0, since n 2 C (n)  Cn = Dn, we have max ‰ÿ…n †Š  '…n †  c

n2 ÿ1 …log n†b

:

Hence, (13) as well as (T.c) holds. Summarizing, we have PR O P O S I T I O N 2 In the case where max[ÿ()] is unbounded on , assume there exists an upper bound '() satisfying (15) for large . With P the compact sets Cs de®ned in (16) ± (17), it holds that n"n‡1 converges a.s. (condition (T.c)). Š Such a construction will be explicited in x4 for ®nite mixtures.

4. APPLICATION TO FINITE MIXTURES This section is devoted to show the e€ectiveness of the truncated RSEM algorithm for mixture models. Since the moment estimators

38

J.-F. YAO

introduced in (Pearson, 1894), ®nite mixture models has been and continue to be widely analysed by statisticians. We refer to (Redner and Walker, 1984) and (Titterington et al., 1985) for classical backgrounds. Recent advances including Bayesian approach, stochastic estimators or dimension testing can be found in (Celeux and Diebolt, 1985; DacunhaCastelle and Gassiat, 1997; Robert, 1996; Richardson and Green, 1997; Celeux et al., 1996). For recursive estimation, (Ryden, 1998) proposed a recursive likelihood estimator using projection on some suitable compact subset of . Despite of some similarity, several di€erences arise between Ryden's algorithm and the RSEM algorithm. First the RSEM algorithm is based on the complete data likelihood up to a restoration step; second, instead of projections on a compact subset of  as proposed in Ryden's procedure, the RSEM algorithm achieves stabilisation by truncations at randomly varying bounds. Let be m a positive integer, ÿ an open convex set of Rq and D ˆ f f …; †;  2 ÿg a parametric family of univariate densities. A ®nite mixture with m components is a variable Y with the following density g…y; † ˆ

m X

k f …y; k †

…18†

kˆ1

where a :ˆ ( k) is a probability distribution (mixing distribution) on the set {1, . . . , m}, and f :ˆ (1, . . . , m) are component parameters of the mixture. Therefore, the whole parameter vector is  = (a, f) which belongs to  =   ÿm where  denotes the open simplex { 1 > 0, . . . , mÿ1 > 0, 1 ‡    ‡ mÿ1 < 1}. A classical way to view such a mixture model as an incomplete data model is to think about an unobservable location variable Z on the integers {1, . . . , m}, and that the observation Y is drawn conditionally to this location (see (Dempster et al., 1977)). That is to consider the vector X = (Y, Z ) where Z 2 f1; . . . ; mg;

with P…Z ˆ k† ˆ k ;

and …YjZ ˆ k†  f …y; k †dy: It is easy to see that the marginal distribution of Y is exactly the mixture (18). The missing data here is the location variable Z.

RECURSIVE ESTIMATION FROM INCOMPLETE DATA

39

4.1. Kullback-Leibler Divergence Between Finite Mixtures Let us show that in general, the inf-compacity (5) does not hold for mixtures. First consider the case with one of the mixing probability k, say 1, tends to 0. The density g(; ) degenerates to the density of a smaller mixture with m ÿ 1 components. Therefore, we may have lim K…† < 1;

1 !0

when for example, the probability distributions in D are all equivalents. This ®nite limiting behaviour may still happen if the component parameters f go to the boundary. Consider indeed the family of exponential distributions f …y; k † ˆ k eÿk y ; k 2 …0; 1† and take m = 2. The parameter vector is  = ( , 1, 2 ) with  = (0, 1)  (0, 1)  (0, 1). It is easy to see that with fixed … ; 1 †;

lim K…† < 1:

2 !1

4.2. Set up of the RSEM Algorithm for Mixtures Let n = ( 1,n, . . . , mÿ1, n, 1,n,. . . m,n) be the current estimate at each step n. The restoration R-step draws a sample Zn‡1 from the following conditional distribution (ZjY =Yn‡1; n) k;n f …Yn‡1 ; k;n † ; P…Z ˆ kjY ˆ Yn‡1 ; n † ˆ Pm `ˆ1 `;n f …Yn‡1 ; `;n †

1  k  m: …19†

The p.d.f of the complete data X = (Y, Z ) is (x; ) = ( y, z; ) = z f (y; z). However, the mixing probabilities a: = ( k) have to be kept within the simplex  and this constraint is particularly hard to satisfy for a recursive computation procedure. For instance, the updating rule (1) applied to 1 gives at step n, 1;n‡1 ˆ 1;n ‡ n …@=@ 1 † log …Xn‡1 ; n †, and we see that even if 1,n 2 (0, 1), 1,n‡1 may escape from (0, 1) with a positive probability. Therefore to overcome such numerical instability, we propose a new parametrisation based on the Logit transformation. That is instead of a = ( k), we shall use v :ˆ (!k) de®ned as   k ; for 1  k < m; and !m ˆ 0: …20† !k ˆ log m

40

J.-F. YAO

Note that v :ˆ (!k) belongs to Rmÿ1 and this transformation has an easy inversion formula. Hence the new parameters are  = (!1, . . . , !mÿ1, 1, . . . m). The estimation E-step writes as  !k;n‡1 ˆ !k;n ‡ n ‰1Zn‡1 ˆk ÿ k;n Š; 1  k < m …21† k;n‡1 ˆ k;n ‡ n 1Zn‡1 ˆk r log f …Yn‡1 ; k;n †; 1  k  m:  †, as  n † and …f Recall that the ®nal estimates are the averages …v n de®ned in (12). 4.3. A Simulation Experiment We shall consider two univariate Gaussian mixtures with two components each, taken in (Redner and Walker, 1984). Thus  ˆ …!1 ; m1 ; 21 ; m2 ; 22 † and g…; † ˆ 1 N …m1 ; 21 † ‡ …1 ÿ 1 †N …m2 ; 22 †. The two ``true'' mixtures are M1 ˆ 0:3  N …3; 1† ‡ 0:7  N …ÿ3; 1† and M2 ˆ 0:3  N …1; 1† ‡ 0:7  N …ÿ1; 1†. The ®rst is a well-separated bimodal distribution, while the second is an unimodal one. Note that the smoothness conditions (S) are met in this Gaussian mixture case. We now describe in details the set up of the algorithm and the constants used throughout all the simulations. Initialisation

The algorithm starts with   3 1 2 3 1 2 0 ˆ 0; m1; ; 1 ; m2; ; 2 : 2 2 2 2

Again the same initial value is used in (Redner and Walker, 1984) for their experiments on the EM algorithm. Note that !1 = 0 i€ 1 =1/2. Š Gains sequence The gains sequence is a piecewise constant version of the general rule (8) with exponent = 3/4: we set for each n 2 ‰100…k ÿ 1†; 100k† with integer k  1; n ˆ

10 …100k ‡ 10†3=4

:

Š Compacts Cs for truncation For sake of simplicity, the description is given by means of the proportion parameter 1: the actually used

RECURSIVE ESTIMATION FROM INCOMPLETE DATA

41

bounds for !1 could be easily derived. Thus Cs will take a product form As  Ms  Vs  Ms  Vs, for 1 2 As, mk 2 Ms and 2k 2 Vs . We use a two-staged set-up for (Cs). (i) For 100 ®rst compacts with 0  s< 100, a constant step size is used As ˆ ‰10ÿ1 ÿ s10ÿ4 ; 1 ÿ …10ÿ1 ÿ s10ÿ4 †Š; Ms ˆ ‰ÿ100 ÿ s; 100 ‡ sŠ; Vs ˆ ‰10ÿ3 ÿ s10ÿ6 ; 100 ‡ sŠ: Note that the starting compact is C0 ˆ f0:1  1  0:9; jmk j  100; 10ÿ3   2k  100g

…22†

In particular, the true parameter  is believed to belong to C0. In real data case, C0 should be set with a rough preliminary estimate of . (ii) For next compacts with s  100, we use the construction de®ned in Proposition 2. Taking into account the believed fact  2 C0, this procedure yields the following  Cs ˆ

1 1 1  1  1 ÿ ; jmk j  s ;  2k  s s s s

 …23†

with 8 ÿ  91=8 < s 1=2 = s ˆ 103 ÿ 100 2 : : log s ;

…24†

log 100

Detailed derivations of these formula are postponed to Lemma 2 at the end of the section. It is worth noting that 100 = 103, so that C99  C100 which links up two stages well. Finally, the restart value w used after each truncation is set to be the same as the starting value: w = 0. Š For each of the two mixtures M1 and M2, 100 independent runs RSEM over N = 1000 iterations are generated and then averaged. Table I and Table II show respectively statistics about ®nally minimised

42

J.-F. YAO

TABLE I Mean of the minimised Kullback distances from 100 independent runs of the truncated RSEM algorithm, with their ratio to the starting value

Mixture M1 Mixture M2

Mean of K…N †

Starting value K(0)

ratio K…N †=K…0 †

0.0538 0.0152

2.4819 0.2386

2.2% 6.3%

TABLE II Truncation statistics from 1000 independent runs of the truncated RSEM algorithm: number of runs with at least one truncation, total number of truncations  (N ) over 100 independent runs and mean of the last truncation time T from those runs with truncations

Mixture M1 Mixture M2

Number of runs actually truncated

Sum of (N)

Mean of T

37 91

47 265

119.1 243.5

Kullback distance K…N †, number of truncations (N ) and the last truncation time T de®ned as T ˆ maxfn : n ‡ n r log…Xn‡1 ; n † 2 = C…n† and n  Ng: From Table I, we see that the Mixture M2 is more dicult to handle with since the minimisation ratio (6.3%) is about 3 times bigger than for the mixture M1 (2.2%), even absolute Kullback distances are smaller. A histogram of minimised Kullback distances is given on the top row in Figure 2. These distances are highly grouped near 0. For the truncation behaviour shown in Table II, the mixture M2 need more truncations (91 truncated runs against 37 ones, and 265 truncations in total against 47). Also the last truncation can occur much later (243.5 in average against 119.1). The middle row in Figure 2 show the corresponding histograms. Note that the maximum of the truncation number observed over 100 runs is respectively 9 for M2 and 4 for M1. Again these histograms are concentrated near 0. Thus it seems true that few truncations are enough to stabilize the algorithm. Figure 3 shows the evolution of a density estimate g…; n † for time n = 0, 250, 500 and 1000, taken from one of the 100 generated samples for the mixture M1. This sample run has the following characteristics: …N† ˆ 1; K…N † ˆ 0:076 and T = 69, which are close to the mean values given in Table I. This sample can then be viewed as a mean behaviour of the truncated RSEM algorithm for the mixture M1.

FIGURE 2 Histograms from 100 independent runs for the mixture M1(a, c, e) and M2(b, d, f ).

FIGURE 2 (Continued).

FIGURE 2

(Continued).

FIGURE 3 Evolution of density estimates for the M1 (dashed lines): for time 0,250,500 and 1000 from top to bottom and left to right. The corresponding Kullback distances are 2.482, 0.155, 0.104 and 0.076. The true density is shown with solid lines.

RECURSIVE ESTIMATION FROM INCOMPLETE DATA

47

An analogous sample for the mixture M2 is given in Figure 4 with (N) = 3, K…N † ˆ 0:0188 and T = 69. It is here worth noting that the algorithm starts with a bimodal density far enough from the true unimodal density, and yields a close approximation after 250 iterations only. Š Truncation compacts Cs for ®nite mixture of normal variables We prove the e€ectiveness of used barrier compacts Cs de®ned by (23) ± (24). According to the construction of Proposition 2, we shall use for the set D, de®ned in (14)  D ˆ

 1 1 1 2  1  1 ÿ ; jmk j  ;   k   :   

Recall that we have to bound max(ÿ()) on D with some increasing map '(). LE M M A 2 For mixture of two univariate normal variables with  ˆ …!1 ; m1 ; 21 ; m2 ; 22 †, by taking into account a rough preliminary estimate  2 C0, we can take for upper bound ' and   103 '…† ˆ 38 :

…25†

Moreover, for s  100, the barrier compacts Cs de®ned by (23) ± (24) are valid. Proof Let us denote respectively by En; and Vn; , the conditional expectation and variance with respect to ‰F n ; n ˆ Š. Recall that ÿ…† ˆ Vn; ‰r log …Xn‡1 ; n †Š. We have max ‰ÿ…†Š  trace ‰ÿ…†Š ˆ

X

 Vn;

j

 @ log …Xn‡1 ; n † : @j

where j stands for !1, mk and 2k . For mixture of two univariate normal variables, we have by (21) 8 @ log …Xn‡1 ; n † ˆ 1Zn‡1 ˆ1 ÿ 1;n > > < @!@ 1 Yn‡1 ÿmk;n @mk log …Xn‡1 ; n † ˆ 1Zn‡1 ˆk 2 h k;n i > > : @ 2 log …Xn‡1 ; n † ˆ 1Z ˆk ÿ 12 ‡ 14 …Yn‡1 ÿ mk;n †2 n‡1 @ 2 2 k

k;n

k;n

…26†

FIGURE 4 Evolution of density estimates for M2 (dashed lines): for time 0,250,500 and 1000 from top to bottom and left to right. The corresponding Kullback distances are 0.2386, 0.0254, 0.0255 and 0.0188. The true density is shown with solid lines.

RECURSIVE ESTIMATION FROM INCOMPLETE DATA

49

Straightforward computation shows that for  2 D Vn; ‰1Zn‡1 ˆ1 ÿ 1;n Š  1; 2 Yn‡1 ÿ mk;n  22 …2 ‡ 2; †; En; 1Zn‡1 ˆk 2k;n   2 1 1  4 …4 ‡ 22; 2 ‡ 4; †: En; 1Zn‡1 ˆk ÿ 2 ‡ 4 …Yn‡1 ÿ mk;n †2 2k;n 2k;n 

where j; ˆ E jYj j . Hence for   1, trace ‰ÿ…†Š  24 …4 ‡ 42; 2 ‡ 4; ‡ 2† ‡ 1: It remains to bound the expectations j,. But if W is a normal variable N …m; 2 †, we have EjWj j  2 jÿ1 …jmj j ‡ 4 j †: Now we have assumed  2 C0, so that by (22), jmk,j  100 and 10ÿ3  2k;  100: It follows 2;  105 ;

4;  4  109 :

Hence for   103, 24 …4 ‡ 42; 2 ‡ 4; ‡ 2† ‡ 1  38 : and (25) follows. Finally, to de®ne s and Cs = Ds for s  100, let us take b =2 in the rule (16). Since ' is continuous, and we have chosen the exponent = 3/4 for ( n), we are looking for s satisfying '…s † ˆ 3…s †8 ˆ c

s1=2 …log s†2

;

for large s:

We ®x the constant c by the initial condition for s ˆ 100; s ˆ 103 : Hence (24) follows.

Š

50

J.-F. YAO

Acknowledgement I thank Marie Du¯o for several helpful discussions on this work. References Brandiere, O. and Du¯o, M. (1996) Les algorithmes stochastiques contournent-ils les pieges? Ann. Inst. Henri Poincar e, 32, 395 ± 427. Celeux, G., Chauveau, D. and Diebolt, J. (1996) Stochastic versions of the EM algorithm: an experimental study in the mixture case. J. Statist. Computation and Simulation, 55, 287 ± 314. Celeux, G. and Diebolt, J. (1985) The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Computational Statistics Quaterly, 2, 73 ± 82. Chen, H. (1993) Asymptotically ecient stochastic approximation. Stochastics and Stochastics Reports, 45, 1 ± 16. Chen, H., Guo, L. and Gao, A. (1988) Convergence and robustess of the RobbinsMonro algorithm truncated at randomly varying bounds. Stochastic Processes and their Applications, 27, 217 ± 231. Dacunha-Castelle, D. and Gassiat, E. (1997) Testing in locally conic models and applications to mixture models. ESAIM Probab. Statist., 1, 285 ± 317. Dempster, A., Laird, N. and Rubin, D. (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Soc. Stat. B, 39, 1 ± 38. Du¯o, M. (1996) Algorithmes Stochastiques. Springer-Verlag, Berlin. Fabian, V. (1978) On asymptotic ecient recursive estimation. Ann. of Stat., 6, 854 ± 856. Fort, J. and Pages, G. (1996) Convergence of stochastic algotithms: from the KushnerClark theorem to the Lyapounov functional method. Adv. Appl. Probab., 28, 1072 ± 1094. Kushner, H. and Clark, D. (1978) Stochastic Approximation for Constrained and Unconstrained Systems. Springer-Verlag, Berlin. Lavielle, M. and Moulines, E. (1995) On a stochastic approximation version of the EM algorithm. Technical Report 95.08, Universite Paris-Sud. Meng, X. and van Dyk, D. (1997) The EM algorithm ± and old folk-song sung to a fast new tune. J. Royal Statist. Soc., 59, 511 ± 567. Pearson, K. (1894) Contributions to the mathematical theory of evolution. Phil. Trans Royal Soc. A, 185, 71 ± 110. Polyak, B. T. (1990) New stochastic approximation type procedures. Avtomat. i Telemekh., 7, 98 ± 107. English translation: Automation and remote control 51, 1991. Polyak, B. T. and Juditsky, A. B. (1992) Acceleration of stochastic approximation by averaging. SIAM J. Control and Optimization, 30, 838 ± 855. Qian, W. and Titterington, D. (1991) Estimation of parameters in hidden Markov models. Phil. Trans. R. Soc. Lond., 337, 407 ± 428. Redner, R. and Walker, H. (1984) Mixture densities, maximum likelihood and the EM algorithm, SIAM Review, 26, 195 ± 239. Richardson, S. and Green, P. J. (1997) On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. Roy. Statist. Soc. B, 59, 731 ± 792. Robert, C. (1996) Mixtures of distributions: inference and estimation. In: Gilks, W. R., Richardson, S. and Spiegelhater, D. J., Editors, Markov Chain Monte-Carlo in Pratice, chapter 24, pages 441 ± 464. Chapman and Hall, London. Ryden, T. (1998) Asymptotically ecient recursive estimation for incomplete data models using the observed information. Metrika, 44, 119 ± 145.

RECURSIVE ESTIMATION FROM INCOMPLETE DATA

51

Titterington, D., Smith, A. and Markov, U. (1985) Statistical Analysis of Finite Mixture Distributions. Wiley, New York. Wei, G. and Tanner, M. (1990) A Monte-Carlo implementation of the EM algorithm and the poor man's data augmentation algorithm. J. American Statist. Association, 85, 699 ± 704. Jian-feng Yao, SAMOS, Universite Paris I, 90 rue de Tolbiac 75634 Paris Cedex France

Suggest Documents