EPI-CONVERGENCE OF SEQUENCES OF NORMAL ... - Project Euclid

4 downloads 0 Views 297KB Size Report
epi-convergence, for proving the almost sure convergence of the MLE. This is ... STRONG CONVERGENCE OF THE MAXIMUM LIKELIHOOD ESTIMATOR 1299.
The Annals of Statistics 1996, Vol. 24, No. 3, 1298 ] 1315

EPI-CONVERGENCE OF SEQUENCES OF NORMAL INTEGRANDS AND STRONG CONSISTENCY OF THE MAXIMUM LIKELIHOOD ESTIMATOR BY CHRISTIAN HESS Universite´ Paris Dauphine Using the variational properties of epi-convergence together with suitable results on the measurability of multifunctions and integrands, we prove a strong law of large numbers for sequences of integrands from which we deduce a general theorem of almost sure convergence Žstrong consistency. for the maximum likelihood estimator.

1. Introduction. In mathematical statistics the maximum likelihood principle is one of the most popular in order to construct estimators. On the other hand, an important property of estimators is the strong consistency, that is, the almost sure convergence toward the true value of the unknown parameter, as the sample size tends to `. Since Wald w 33x , who first gave a rigorous and general proof of the consistency for the maximum likelihood estimator ŽMLE., several other proofs of this fact have appeared, under more or less restrictive hypotheses; for example, a more recent proof can be found in the book by Ibragimov and Has’minski w 21x ŽTheorem 4.3 and Remark 4.1.. The main objective of this paper is to present an approach, based on epi-convergence, for proving the almost sure convergence of the MLE. This is done under weaker hypotheses than those usually assumed. Indeed, in the present paper, the space V of parameters is only assumed to be a Suslin metric space, the space E of observations is an arbitrary measurable space and the observations are pairwise independent Žinstead of mutually independent. identically distributed E-valued random variables. Further, the function f from the product space E = V into the positive reals, which defines the family of densities, is not assumed to be continuous with respect to the parameter v. It only satisfies a sup-compactness assumption with respect to v. This hypothesis ensures the existence of MLE’s, but we also consider the convergence of approximate MLE’s which are more realistic in concrete situations. Our method, using epi-convergence, can also apply to convergence results concerning other kinds of estimators whose construction involves an optimization procedure Žthis is typically the case of M-estimators.. The principle of our approach consists of regarding the consistency of the MLE as the approximation of a deterministic optimization problem whose solution is the true value of the unknown parameter, by a suitable sequence

Received April 1992; revised July 1995. AMS 1991 subject classifications. 28B20, 49K45, 54D35, 62F12, 62H12, 62L20. Key words and phrases. Maximum likelihood estimator, strong consistency, measurable multifunctions, normal integrands, set convergence, epi-convergence.

1298

STRONG CONVERGENCE OF THE MAXIMUM LIKELIHOOD ESTIMATOR

1299

of stochastic optimization problems associated with the samples. This allows us to apply a method for approximating optimization problems, which appeared in the 1960s in the papers of Wijsman w 34x and Mosco w 26x . It is based on a notion of convergence for sequences of functions often called ‘‘epi-convergence’’ which means the Painleve]Kuratowski convergence of the epigraphs ´ Župper graphs. of these functions, viewed as subsets of the product space V = R. Our proof of the strong consistency of the MLE precisely relies upon an epi-convergence result for the averaged sum of integrands which can be viewed as a parametrized version of the strong law of large numbers ŽTheorem 5.1.. This type of result is also useful in stochastic optimization Žsee w 23x. , in the calculus of variations Žsee w 10x. and in certain problems of mechanics involving nonhomogeneous materials Žsee w 9x. . Moreover, let us mention that this method, for studying the consistency of estimators Žand also the rate of convergence., has already been used by Dupacova and Wets w 12x under more restrictive hypotheses. On the other hand, the measurability properties of functions Žsingle-valued as well as multivalued. play an important part in our work, since they are needed to apply the strong law of large numbers to suitable sequences of functions, and to prove the existence of exact and approximate MLE’s. This is why we also provide briefly the needed results, following mainly w 6x and w 8x . The paper is organized as follows: in Section 2 we present our main result which is Theorem 2.1. In Section 3 we provide a short description of epi-convergence and we present convenient expressions of the lower and upper epi-limits. Section 4 is devoted to the measurability of multifunctions and integrands. In Section 5 we prove a strong law of large numbers for integrands, which involves epi-convergence and is one of the main arguments in the proof of Theorem 2.1. Finally, in Section 6, after having shown how our method uses epi-convergence, the proof of the main result is provided as well as some comments and examples. 2. Main result. Let Ž V, A, P . be a probability space, X a random variable defined on V with values in a measurable space Ž E, E . and m a positive s-finite measure defined on Ž E, E .. Further, let Ž V, d . be a Suslin metric space whose Borel s-field is denoted by BŽ V .; without restricting the generality we can assume d F 1 on V. Also let f be a function from E = V into Rq, the set of positive reals, which satisfies the following hypotheses: Ža. f is E m BŽ V .-measurable. Žb. For every u g V, f Ž?, u. is a probability density function relative to m. Žc. For m-almost every x g E, f Ž x, ? . is sup-compact in the following sense: for each strictly positive real r, the subset  u g V : f Ž x, u. G r 4 is compact; but the subset  u g V : f Ž x, u. s 04 is not assumed to be compact. Žd. u1 / u 2 implies f Ž?, u1 . / f Ž?, u 2 ., that is, m x g E : f Ž x, u1 . / f Ž x, u 2 .4 ) 0. Že. For some u g V, f Ž?, u. is ‘‘the’’ density function of the random variable X with respect to m.

1300

C. HESS

Žf. Let Ž X n . n G 1 be a sequence of E-valued random variables defined on V, being pairwise independent and having the same distribution as X; for any integer n G 1, let Ž x 1 , . . . , x n . be an n-tuple of possible values of the sample Ž X 1 , . . . , X n .. In mathematical statistics, V is called the set of elementary outcomes, A is the s-field of events and P is the probability measure. Further, Ž E, E . is the space of observations and V the space of parameters. The random variable X represents the outcome of any observation; it is an E-valued random variable whose density relative to the s-finite dominating measure m is assumed to be a member of the family  f Ž?, u. : u g V 4 . From condition Žd., it is clear that u is unique; it is the unknown Ž or true . value of the parameter. The problem of statistical estimation consists of obtaining an approximate value of u after having observed values x 1 , . . . , x n of a random sample X 1 , . . . , X n of X, where n, the sample size, is large enough. The approximate value of u is provided by a suitable function of X1 , . . . , X n which is called an estimator of the parameter u. We begin with some definitions and notation. First, define the function b on E by

Ž 2.1.

b Ž x . [ sup  f Ž x , u . : u g V 4 .

The likelihood function L n is defined, for any u g V and Ž x 1 , . . . , x n . g E n , by

Ž 2.2.

Ln Ž x1 , . . . , x n , u . [

n

Ł f Ž x i , u. .

is1

For each n G 1, we consider the n-fold product measure space Ž E n , E n , m n . associated with the measure space Ž E, E , m .. So, E n is the product s-field on E n and m n the n-fold product measure on the measurable space Ž E n , E n .. We also define the function Bn by

Ž 2.3.

Bn Ž x 1 , . . . , x n . [ sup  L n Ž x 1 , . . . , x n , u . : u g V 4 .

Our main result is as follows. THEOREM 2.1. Let Ž V, A, P . be a probability space, X a random variable defined on V with values in a measurable space Ž E, E . and m a positive s-finite measure defined on Ž E, E .. Further, let V be a Suslin metrizable space and f a function from E = V into Rq which satisfies hypotheses Ža. to Že. above. Also assume the integrability condition

Ž H.

HE f Ž x , u . log

ž

bŽ x . f Ž x , u.

/

m Ž dx . - q`,

where bŽ?. is defined by Ž2.1.. Then for every decreasing sequence Ž a n . n G 1 of nonnegative numbers verifying lim n a n s 0, the two following statements hold true:

STRONG CONVERGENCE OF THE MAXIMUM LIKELIHOOD ESTIMATOR

1301

ŽA. There exists a sequence Ž t n, a . n G 1 w also denoted by Ž t n . n G 1 x of a nn approximate MLE’s, namely, a sequence of maps from E n into V satisfying the two following properties: Ž j. for every n G 1, t n is En-measurable; Ž jj. for every Ž x 1 , . . . , x n . g E n , L n Ž x 1 , . . . , x n , t n Ž x 1 , . . . , x n .. G Bn Ž x 1 , . . . , x n . y a n . ŽB. For every sequence Ž t n . as above, one has for almost all v g V, lim t n Ž X 1 Ž v . , . . . , X n Ž v . . s u. n

REMARK 2.2. Ži. In case the X n’s are not mutually independent, it would be better to say that t n is an M-estimator relative to Ž2.2. rather than an MLE. Žii. In the foregoing result it can be observed that the Žexact or approximate. MLE, whose existence is provided by statement ŽA., need not be unique and that statement ŽB. is valid for any sequence of MLE’s. Moreover, condition Žc. yields the existence of exact MLE’s as well. REMARK 2.3. In w 25x , Le Cam presented miscellaneous examples where estimators constructed by the maximum likelihood method misbehave. In particular, he studied one, adapted from Bahadur w 4x , where the MLE is not consistent. In view of Theorem 2.1, it is interesting to note that in this example Žw 25x , pages 157]158. the integrability condition ŽH. does not hold. However, we shall see in Remark 6.5 that condition ŽH. is not necessary for the MLE to be consistent. REMARK 2.4. Moreover, we shall see that one of the main arguments in the proof of Theorem 2.1 is related to the possibility of approximating a normal integrand Ži.e., a measurable function defined on E = V, lower semicontinuous with respect to the second variable., by an increasing sequence of Lipschitz integrands ŽProposition 4.4.. In w 24x , Lanery ´ proved a result comparable to Theorem 2.1 for exact MLE’s. However, his approach is based on a different approximation scheme of integrands and does not use epi-convergence. On the other hand, see Proposition 3.9 in w 12x for a related result, but in a much more special setting. REMARK 2.5. Let us also mention the important monograph of Hoffmann-Jørgensen w 20x ŽI was not aware of this reference when I wrote the first version of the present paper. in which the general problem of studying the accumulation and limit points of ‘‘sequences of approximating maximums’’ Žsee Remark 6.4 for the definition. is addressed. This includes the study of the consistency of the MLE, but in w 20x ‘‘consistency’’ is used in a different meaning. For example, the above accumulation points are allowed to be outside the given parameter space V. Although the method is different from ours, one can observe that epi-convergence Žin fact hypo-convergence. is implicitly used.

1302

C. HESS

3. Epi-convergence of sequences of functions defined on a metric space. In this section we present some needed facts about epi-convergence, but we refer the reader to the monographs w 2x and w 9x . Let Ž V, d . be a metric space and f a function from V into R, the set of extended reals. f is said to be proper if it is not identically q` and if it does not take the value y`. The epigraph of f, denoted epiŽ f ., is defined by epi Ž f . [  Ž u, l . g V = R : f Ž u . F l4 . The epigraph is also called the ‘‘upper graph.’’ A function f is lower semicontinuous Žlsc. if and only if epiŽ f . is a closed subset of V = R. Now, let us recall the definition of epi-convergence. In the present section Ž f n . n G 1 w or Ž f n . for shortx denotes a sequence of functions from V into R. For any u g V, the lsc functions li e f n and ls e f n are defined by

Ž 3.1.

li e f n Ž u . [ sup lim inf kG1

nª`

inf

fn Ž v .

inf

fn Ž v . ,

vgB Ž u , 1rk .

and

Ž 3.2.

ls e f n Ž u . [ sup lim sup kG1

nª`

vgB Ž u , 1rk .

where B Ž u, 1rk . denotes the open ball of radius 1rk centered at u. The function li e f nŽ u. is the epi-limit inferior and ls e f nŽ u. is the epi-limit superior of the sequence Ž f n .. The sequence Ž f n . is said to be epi-convergent at u, if the following equality holds true:

Ž 3.3.

li e f n Ž u . s ls e f n Ž u . .

If Ž3.3. is satisfied for any u g V, we simply say that Ž f n . epi-converges. In this case, the common value of Ž3.3. defines an lsc function which is called the epi-limit of the sequence Ž f n . and will be denoted by lim e f n . Now we recall the sequential characterization of epi-convergence ŽProposition 1.14 in w 2x. . PROPOSITION 3.1. A sequence of functions Ž f n . from V into the extended reals epi-converges to a function f at v g V, if and only if the two following properties hold: Ža. For each sequence Ž vn . converging to v in V, one has f Ž v . F lim inf n f nŽ vn .. Žb. There exists a sequence Ž vn . in V converging to v and such that f Ž v . G lim sup n f nŽ vn .. The term ‘‘epi-convergence’’ comes from the fact that f s lim e f n if and only if the sequence of subsets ŽepiŽ f n .. in V = R converges to epiŽ f . in the sense of Painleve ´ and Kuratowski. Now, let us recall the variational properties of epi-convergence, which play a crucial role in the present paper. For this purpose, some notation is useful.

STRONG CONVERGENCE OF THE MAXIMUM LIKELIHOOD ESTIMATOR

1303

For any extended real-valued function f defined on V, we define the set of Ž exact . minimizers of f on V by setting

½

5

Argmin Ž f . [ u g V : f Ž u . s inf f Ž v . . vgV

More generally, for a function f such that the infimum on V is different from y`, we define, for any a G 0, the set of a-approximate minimizers by

½

5

a-Argmin Ž f . [ u g V : f Ž u . F inf f Ž v . q a . vgV

Whenever a ) 0, the set a-approximate minimizers is nonempty Žunlike the set of exact minimizers which is obtained for a s 0.. The variational properties of epi-convergence that we need are stated in the following result Žsee w 2x , Corollary 2.10, page 131.. PROPOSITION 3.2. Let Ž a n . be a sequence of positive reals converging to 0. Assume that Ž f n . is epi-convergent to f, that is, f s lim e f n . Then, the following inequality holds:

Ž 3.4.

ž

/

inf f Ž v . G lim sup inf f n Ž v . .

vgV

nª`

vgV

Further, for any n G 1, let vn be an a n-approximate minimizer of f n . If the sequence Ž vn . admits a subsequence converging to some u g V, then u belongs to ArgminŽ f . and Ž3.4. becomes

Ž 3.5.

ž

/

min f Ž v . s lim sup inf f n Ž v . . vgV

nª`

vgV

Now, let us recall the following procedure of approximation for a function which goes back to Hausdorff. For any u g V and any integer k G 1, we set f k Ž u . s inf  f Ž v . q k d Ž u, v . : v g V 4 . The function f k is the Lipschitz approximation of f, of order k. Its main properties are listed in the following proposition. PROPOSITION 3.3. Let f be a lower semicontinuous function from V into R, not identically q` and satisfying the following condition: there exist a ) 0, b g R and u 0 g V such that, for each u g V, f Ž u. q a dŽ u, u 0 . q b G 0. Then, the following properties hold: Ži. for any k ) a and u g V, one has f k Ž u. q a dŽ u, u 0 . q b G 0; Žii. for every integer k G 1, f k is finite valued and Lipschitzian with Lipschitz constant k; Žiii. for any u g V, the sequence Ž f k Ž u.. is increasing and f Ž u. s sup k f k Ž u.. The next proposition, which will serve in the proof of Theorem 5.1, provides expressions of li e f n and ls e f n in terms of the Lipschitz approximations of

1304

C. HESS

functions f n . Although this type of result is not entirely new, it extends several previous ones, and the method of proof presented here can be generalized to the case of functions with values in an ordered vector space Žsee w 22x. . Similar formulas for the Moreau]Yosida approximation were obtained by Attouch w 2x , Theorem 2.65, page 232 Žsee also w 27x. . PROPOSITION 3.4.

Ž 3.6.

Assume the following hypothesis:

there exist a ) 0, b g R and u 0 g V such that , for any integer n G 1 and for any u g V , f n Ž u . q a d Ž u, u 0 . q b G 0.

Then, for any u g V, the two following equalities hold:

Ž 3.7.

li e f n Ž u . s sup lim inf f nk Ž u . , kG1

Ž 3.8.

nª`

ls e f n Ž u . s sup lim sup f nk Ž u . . kG1

nª`

PROOF. Let us prove Ž3.7.. Fix u g V. First, we shall show that the left-hand side is greater than or equal to the right-hand side. Using the definitions of the epi-limit inferior and the Lipschitz approximation, we easily obtain li e f n Ž u . G sup pG1

ž lim inf f nª`

k n

/

Ž u . y krp s lim inf f nk Ž u . . nª`

This being true for any k G 1, we obtain the desired inequality. In order to show the reverse inequality, set, for any u g V,

Ž 3.9.

c Ž u . [ sup lim inf f nk Ž u . . kG1

nª`

We ought to prove c Ž u. G li e f nŽ u.. If c Ž u. equals q`, there is nothing to prove; otherwise, fix an integer p G 1 and a g Ž0, 1.. For every n, k G 1, consider the two following conditions:

Ž 3.10.

f n Ž v . q k d Ž u, v . G f nk Ž u . q a

; v f B Ž u, 1rp .

and

Ž 3.11.

f nk Ž u . s inf  f n Ž v . q k d Ž u, v . : v g B Ž u, 1rp . 4 .

Clearly, Ž3.10. implies Ž3.11.. On the other hand, looking at Ž3.9., we see that, for any k G 1, one can find a strictly increasing sequence of integers Ž nŽ k, m.. m G 1 verifying

Ž 3.12.

c Ž u . q a G f nŽk k , m. Ž u . .

Our goal is to obtain condition Ž3.10. with n s nŽ k, m., for any m G 1 and for any k G k 0 , where k 0 will be chosen below. So, for every v f B Ž u, 1rp . we want the following inequality to hold:

Ž 3.13.

f nŽ k , m. Ž v . q k d Ž u, v . G f nŽk k , m. Ž u . q a ,

STRONG CONVERGENCE OF THE MAXIMUM LIKELIHOOD ESTIMATOR

1305

which, in view of Ž3.12., is a consequence of

Ž 3.14.

f nŽ k , m. Ž v . q k d Ž u, v . G c Ž u . q 2.

Now, using condition Ž3.6., it is readily seen that Ž3.14. is implied by the following choice of k: k G k 0 [ integer part of  p Ž c Ž u . q 2 q a d Ž u, u 0 . q b . q a q 1 4 . Thus, we can deduce that, if k G k 0 , inequality Ž3.13. holds for any m G 1 and v f B Ž u, 1rp . which, as already noticed, implies

Ž 3.15.

f nŽk k , m. Ž u . s  f nŽ k , m. Ž v . q k d Ž u, v . : v g B Ž u, 1rp . 4 .

Consequently, using Ž3.12., we see that, for any k G k 0 , the following relationships are valid:

c Ž u . q a G lim inf f nŽk k , m. Ž u . s lim inf mª`

mª`

inf

vgB Ž u , 1rp .

f nŽ k , m. Ž v . q k d Ž u, v . ,

whence

c Ž u . q a G lim inf nª`

inf

vgB Ž u , 1rp .

fn Ž v . .

This last inequality, being true for any p G 1 and a g Ž0, 1., yields c Ž u. G li e f nŽ u., which is the desired inequality and, thus, completes the proof of Ž3.7.. The proof of Ž3.8. is similar. I REMARK 3.5. It can be observed that condition Ž3.6. is of a global nature. An inspection of the above proof shows that Ž3.6. can be replaced by the following local condition: for any u g V there exist a neighborhood W of u and b ) 0 such that, for every n G 1 and v g W, one has f nŽ v . G b . REMARK 3.6. Epi-convergence is a one-sided notion which has been devised for analyzing the behavior of sequences of minimization problems. The corresponding symmetric notion for maximization problems is that of ‘‘hypoconvergence’’ defined by considering the Painleve]Kuratowski convergence of ´ the hypographs Žlower graphs. of functions. REMARK 3.7. For a detailed treatment of epi-convergence, set convergence and miscellaneous applications, we refer the reader to w 2x , w 3x , w 5x , w 9x , w 10x , w 12x , w 14x , w 15x , w 23x , w 26x , w 29x and w 34x . 4. Some facts on the measurability of multifunctions and integrands. In this section we present several measurability properties of functions Žor integrands. defined on a product space S = V. These properties concern the measurability of several operations to be used later, infimum, argmin multifunction, sum and Lipschitz approximation. They will also serve to prove the existence of exact and approximate MLE’s. Most of the results of this section are known Žsee, e.g., w 6x , w 8x , w 17x , w 18x and w 28x. .

1306

C. HESS

Let Ž S, S . be a measurable space and Ž V, d . a separable metric space whose Borel s-field is denoted by BŽ V .. A map G defined on S which assigns to each s g S the set G Ž s . in V is called a multifunction with values in V. The graph GrŽ G . and the domain domŽ G . of G are, respectively, defined by Gr Ž G . [  Ž s, u . g S = Vru g G Ž s . 4

and dom Ž G . [  s g SrG Ž s . / B 4 . ˆ Consider the s-field S on S defined by Sˆs F l Sl , where l ranges over the

set of all positive s-finite measures on Ž S, S . and where Sl denotes the l-completion of S . Sˆ is the s-field of universally measurable Žor absolutely measurable. subsets of S, relative to S . A selector of G is a function g from dom G into V such that, for any s g domŽ G ., g Ž s . g G Ž s .. Concerning the existence of measurable selectors, we recall the following fact ŽTheorem 3.22 in w 8x. : if V is a Suslin space, then any multifunction G whose graph is S m BŽ V .-measurable admits at least one selector which is measurable relative to the s-fields Sˆ and BŽ V .. DEFINITION 4.1. Let h be an extended real-valued function defined on S = V and c an extended real-valued function on S. Define the infimum function mŽ?. and the level set multifunction G associated with h and c , by setting, for every s g S. m Ž s . [ inf  h Ž s, v . : v g V 4

and

G Ž s . [  s g S : h Ž s, v . F c Ž s . 4 .

In the special case where c Ž?. s mŽ?., the multifunction G is called the Žexact. Argmin multifunction of h. Further, assume that mŽ s . ) y` and consider, for any positive real a , the function c defined on S by c Ž s . [ mŽ s . q a . For such a function c , the multifunction G defined above is the approximate Argmin multifunction of order a . Now, we provide a result concerning the measurability of the infimum function and the level set multifunction of a function defined on S = V. It is a slight extension of Theorem 2 of w 6x , page 908. PROPOSITION 4.2. Let V be a Suslin space and h an S m BŽ V .-measurable function defined on S = V. Then: Ža. The infimum function mŽ?. is Sˆ-measurable. Žb. For any real-valued S-measurable function c , the graph of the level set multifunction G, associated with h and c , is Sˆm BŽ V .-measurable. Žc. The exact and approximate Argmin multifunctions of h have Sˆm BŽ V .measurable graphs. Consequently, each of them admits at least one Sˆ-measurable selector. PROOF. To prove Ža., it suffices to observe that, for any a g R, the following equality holds true:

 s g S : m Ž s . - a 4 s proj S Ž  Ž s, v . g S = V : f Ž s, v . - a 4 . and to invoke the measurable projection theorem Žsee, e.g., Theorem 3.23 in w 8x. .

STRONG CONVERGENCE OF THE MAXIMUM LIKELIHOOD ESTIMATOR

1307

As for Žb., since the function Ž s, v . ª hŽ s, v . y c Ž s . is S m BŽ V .measurable, GrŽ G . is a member of this product s-field. Statement Žc. follows easily. I REMARK 4.3. The version of the measurable projection theorem invoked in the above proof is due to Sainte-Beuve w 30x and extends previous results of Aumann and Von Neumann. On the other hand, Brown and Purves also proved a variant of Proposition 4.2 ŽCorollary 1 in w 6x , page 904.. Their proof is based on another projection theorem due to Arsenin and Novikov Žw 1x , w 11x and w 32x. . This second projection theorem states that if B is a nonempty member of the product s-field S m BŽ V . and satisfies the condition: ‘‘for any s g S, the set B Ž s . [  v g VrŽ s, v . g B4 is the countable union of compact subsets of V,’’ then proj S Ž B . is a member of S . Further, this theorem implies the existence of an S-measurable selector for multifunction B Ž?.. Clearly, this could provide an alternate setting for the measurability issues of the present paper. See w 18x and w 31x for more recent developments in this direction and w 16x for another variant of Proposition 4.2. The following result concerns the measurability of the Lipschitz approximation. PROPOSITION 4.4. Let V be a metric Suslin space, h an S m BŽ V .measurable function defined on S = V, u 0 g V and k G 1. In addition, assume that there exists a positive constant a and a function a Ž?. such that for every Ž s, v . g S = V, one has hŽ s, v . q a dŽ v, u 0 . q a Ž s . G 0. Then, the Lipschitz approximation of order k of h, denoted by h k , is Sˆm BŽ V .-measurable. PROOF. For every u g V the function Ž s, v . ª hŽ s, v . q k dŽ u, v . is S m BŽV.-measurable. Hence, using Proposition 4.2Ža., we can see that s ª h k Ž s, u . [ inf  h Ž s, v . q k d Ž u, v . : v g V 4 is an Sˆ-measurable function. Thus, it suffices to show that, for any s g S, h k Ž s, ? . is continuous. For this purpose define S0 s  s g S : hŽ s, ? . is identically q`4 . Clearly, on S0 the function h k Ž s, ? . is equal to the constant q`, whence it is continuous, whereas for every s g S R S0 the function h k Ž s, ? . is finite valued and Lipschitzian with Lipschitz constant k. I 5. A strong law of large numbers for integrands. Let Ž V, A, P . be a probability space, Ž E, E . a measurable space, Ž V, d . a metrizable Suslin space and g a normal integrand defined on the product space E = V, that is, an E m BŽ V .-measurable function which is lsc with respect to the second variable. Also consider an E-valued random variable X defined on V. In the proof of the main result of the section, we shall use Etemadi’s SLLN Žsee w 13x. . It extends the Kolmogorov SLLN in that the real random variables are only supposed to be pairwise independent, instead of mutually independent. For positive random variables the SLLN holds even if the expectation is equal to q`.

1308

C. HESS

So, let Ž X n . n G 1 be a sequence of E-valued random variables defined on V, pairwise independent and having the same distribution as X. The following theorem, already contained in w 16x , will be the main argument in the proof of the almost sure convergence of the MLE ŽSection 6.. It also provides an SLLN for integrands which may serve in other situations. THEOREM 5.1. Let V be a Suslin metric space and g a normal integrand on E = V with values in Rq. For any n G 1, define h n on V = V by hn Ž v , u. [

1 n

n

Ý g Ž Xi Ž v . , u .

is1

and also assume that the function f Ž?. [ E g Ž X, ? . is not identically q` on V. Then, there exists a P-negligible subset N of V such that, for every u g V and v g V R N, one has

Ž 5.1.

E g Ž X , u . s lim e h n Ž v , u . , nª`

where E stands for the expectation of a random variable. PROOF. In order to show Ž5.1., it suffices to prove the two following inequalities:

Ž 5.2. Ž 5.3.

li e h n Ž v , u . G E g Ž X , u .

; v g V R N1 , ; u g V ,

ls e h n Ž v , u . F E g Ž X , u .

; v g V R N2 , ; u g V ,

where N1 and N2 are some negligible subsets of V. First, recall that, for any v g V and for any fixed k G 1, the Lipschitz approximation, of order k, of h nŽ?, v . is defined by h kn Ž v , u . [ inf  h n Ž v , v . q k d Ž u, v . : v g V 4

; u g V.

Now, using the super-additivity of the infimum operation, we easily obtain h kn Ž v , u . G

1 n

n

Ý g k Ž Xi Ž v . , u . .

is1

An appeal to Proposition 4.4 shows that g k and h kn are Aˆm BŽ V .-measurable. Consequently, for any u g V and k G 1, we can apply Etemadi’s SLLN to the sequence Ž g k Ž X n , u.. n G 1. This proves the existence of a negligible subset N1Ž u, k . such that, for any v g V R N1Ž u, k .,

Ž 5.4.

lim inf h kh Ž v , u . G E g k Ž X , u . n

Set N1 [ D u g D D k G 1 N1Ž u, k ., where D is a dense countable subset of V. Inequality Ž5.4. is valid for v g V R N1 , k G 1 and u g D; moreover, it remains valid for any u g V, because each side of Ž5.4. defines a Lipschitzian function of u, with Lipschitz constant k. Then, taking the supremum, with respect to k, in both sides of Ž5.4. and using formula Ž3.7. together with the monotone convergence theorem, we obtain Ž5.2..

STRONG CONVERGENCE OF THE MAXIMUM LIKELIHOOD ESTIMATOR

1309

To prove Ž5.3., it is useful to put, for any k ) 1 and u g V, f k Ž u. [ inf f Ž v . q k dŽ u, v .rv g V 4 . First, observe that, due to the properness of f , f k is finite on V. On the other hand, an appeal to Fatou’s lemma shows that f is lsc on V. Further, for any u g D, p G 1 and k G 1, one can find vX s vX Ž u, p, k . g V such that f Ž vX . q k dŽ u, vX . F f k Ž u. q 1rp. Hence, for each u g D and k G 1, the following equality holds true:

Ž 5.5.

f k Ž u . s inf  f Ž vX Ž u, p, k . . q k d Ž u, vX Ž u, p, k . . : p G 1 4 .

Further, applying Etemadi’s strong law of large numbers to the sequence  g Ž X n , vX Ž u, p, k ..4n G 1 , we can see that, for every u g D, k G 1 and p G 1, there exists a negligible subset N2 Ž u, p, k . such that, for every v g V R N2 Ž u, p, k .,

f Ž vX Ž u, p, k . . s lim h n Ž v , vX Ž u, p, k . . .

Ž 5.6.

n

Put N2 [ D u g D D p G 1 D k G 1 N2 Ž u, p, k . and consider v g V R N2 . For any u g D and k G 1, we have lim sup h kn Ž v , u . F inf lim sup h n Ž v , v . q k d Ž u, v . . vgV

n

nª`

Restricting the infimum to the subset  vX Ž u, p, k . : p G 14 and using Ž5.6. and Ž5.5., we obtain lim sup h kn Ž v , u . F inf f Ž vX Ž u, p, k . , v . q k d Ž u, vX Ž u, p, k . . s f k Ž u . . n

pG1

So, we have proved, for each k G 1 and v g V R N2 ,

Ž 5.7.

lim sup h kn Ž v , u . F f k Ž u . n

; u g D.

Then, invoking once more the Lipschitz property, we conclude that Ž5.7. remains valid for all u g V. Finally, taking the supremum on k in both sides of Ž5.7. and using Ž3.8., we get Ž5.3.. I REMARK 5.2. Let us mention some extensions of Theorem 5.1. Ža. It is not difficult to check that the conclusion of this theorem still holds under the following less restrictive hypotheses: Ži. g is a normal integrand on E = V. Žii. There exist an integrable function b Ž?., a constant a ) 0 and u 0 g V such that, for any Ž x, u. in the product space E = V, g Ž x, u. q b Ž x . q a dŽ u, u 0 . G 0. Žb. Returning to Remark 3.5, it is readily seen that the conclusion of Theorem 5.1 also holds under the following local condition: for any u g V there exist a neighborhood W of u and an integrable function b Ž?. such that, for every Ž x, v . g E = V, g Ž x, v . G b Ž x .. Žc. More generally, an inspection of the above proof shows that if g is not assumed to be lsc on V, but only E m BŽ V .-measurable, the conclusion

1310

C. HESS

continues to hold at each point u such that g Ž x, ? . is lsc at u, for almost all x in E. REMARK 5.3. Results similar to Theorem 5.1 have been obtained by King and Wets w 23x when V is a reflexive separable Banach space and by Attouch and Wets w 3x when V is a separable Banach space. On the other hand, the measurably parametrized Lipschitz approximation studied in Section 4 has already been used by Castaing w 7x in order to prove epi-convergence results for certain sequences of integrands. 6. Proof of the main result. Observe first that, by condition Žc. in Section 2 and Proposition 4.2Ža., the function b defined by b Ž x . [ sup  f Ž x , u . : u g V 4 ˆ is finite valued and E-measurable. Also define the function F from V 2 into the extended reals by

Ž 6.1.

F Ž u, v . [

HE f Ž x , u . log

ž

bŽ x . f Ž x, v.

/

m Ž dx . .

REMARK 6.1. In Ž6.1., it is possible to replace the measure m by bm , which is still s-finite, and the function f by frb, so that F can be rewritten as F Ž u, v . [

1

HE f Ž x , u . log f Ž x , v . m Ž dx . .

Therefore, in view of the definition of F, f can be assumed to take its values in w 0, 1x . This will be assumed from now on. Further, as in Section 5, define the function f on V by f Ž v . [ E g Ž X, v . s F Ž u, v ., where g Ž x, v . [ ylog f Ž x, v .. Recall that u denotes the unknown value of the parameter to be estimated. REMARK 6.2. Ži. Recall that, using the convexity of the function t ª ylogŽ t . and Jensen’s inequality, it is readily shown that f Ž u. s infw f Ž v . : v g V x . Žii. In addition, because the convexity is strict, the inequality f Ž u. F f Ž u. implies u s u. PROOF OF THEOREM 2.1ŽA.. Here, the sample size n is assumed to be fixed. From condition Žc. it follows that the function Bn , defined on E n by

Ž 6.2.

Bn Ž x 1 , . . . , x n . [ sup  L n Ž x 1 , . . . , x n , u . : u g V 4 ,

x1 , . . . , x n g E n ,

is finite valued. For any fixed n and a ) 0, define the multifunction Gn, a on E n , with closed values in V, by Gn , a Ž x 1 , . . . , x n . [  u g V : L n Ž x 1 , . . . , x n , u . G Bn Ž x 1 , . . . , x n . y a 4 and also, the multifunction Gn such that Gn Ž x 1 , . . . , x n . [  u g V : L n Ž x 1 , . . . , x n , u . s Bn Ž x 1 , . . . , x n . 4 .

STRONG CONVERGENCE OF THE MAXIMUM LIKELIHOOD ESTIMATOR

1311

Now consider the n-fold product measure m n on the n-fold product space Ž E n , E n .. Using Proposition 4.2, it is not hard to prove the measurability of function Bn , and of multifunctions Gn and Gn, a , relative to the s-field Ž E n . m n which denotes the m n-completion of the product s-field E n on E n. Then, denoting by t n a selector of Gn and, for any a ) 0, by t n, a a selector of Gn, a , which are measurable with respect to Ž E n . m n , we obtain, for any Ž x 1 , . . . , x n . g E n,

Ž 6.3.

L n Ž x 1 , . . . , x n , t n Ž x 1 , . . . , x n . . s Bn Ž x 1 , . . . , x n .

and

Ž 6.4.

L n Ž x 1 , . . . , x n , t n , a Ž x 1 , . . . , x n . . G Bn Ž x 1 , . . . , x n . y a .

Clearly, t n is an MLE and t n, a an a-approximate MLE for the parameter u g V. Thus part ŽA. of Theorem 2.1 is proved. I Now, it will be convenient to introduce the following notation: g Ž x , u . [ ylog f Ž x , u .

; Ž x , u. g E = V

and, for every n G 1 and v g V, hnŽ v , u. [

1 n

n

1

Ý g Ž X i Ž v . , u . s y n log L n Ž X1Ž v . , . . . , X n Ž v . , u . .

is1

By the convention of Remark 6.1, g takes its values in the interval w 0, q`x . Further, it is clear that, for each n G 1 and v g V, the maximization of L n with respect to u is equivalent to the minimization of h nŽ v , ? .. Now, let us explain how epi-convergence is involved in our approach to the convergence of MLEs Žor more generally of M-estimators.. For each n G 1 and for almost all v in V, we have to consider the minimization problem

Ž Pn .

min  h n Ž v , u . : u g V 4 .

The SLLN for positive random variables implies that, for any u g V, E g Ž X, u. s lim n h nŽ v , u. for all v in V R N Ž u., where N Ž u. is a negligible subset of V Ždepending generally of u.. Thus it is natural to consider also the following minimization problem:

Ž P` .

min  E g Ž X , u . : u g V 4 .

Note that Ž Pn . is a stochastic minimization problem, in that the variable v appears in the objective function, whereas Ž P` . is a deterministic minimization problem in which v no longer appears. In Theorem 5.1, we have shown that, for almost all v g V, the sequence Ž h nŽ v , ? .. epi-converges on V to E g Ž X, ? .. We shall deduce, from this and from the sup-compactness assumption on f, the almost sure convergence of the MLE, as the sample size n tends to `.

1312

C. HESS

PROOF

OF

THEOREM 2.1ŽB..

Step 1. V is assumed to be compact. As already noted, we may assume that bŽ x . s 1 on E. Further, observe that hypothesis ŽH. implies that E g Ž X, ? . is proper, which allows us to apply Theorem 5.1. Thus, there exists a negligible subset N of V verifying, as n tends to `. E g Ž X , u . s lim e h n Ž v , u .

; u g V , ; v g V R N.

Now, let Ž a n . n G 1 be a decreasing sequence of positive reals converging to 0 and, for each n G 1, let t n be an a n-approximate MLE. Thus, for each v g V R N, t nŽ v . [ t nŽ X1Ž v ., . . . , X nŽ v .. satisfies h n Ž v , t n Ž v . . F inf h n Ž v , v . q a n . vgV

Consequently, using Proposition 3.2, we get

Ž 6.5.

lim sup h n Ž v , t n Ž v . . F lim sup inf h n Ž v , v . F inf E g Ž X , v . , n

n

vgV

vgV

whence from Remark 6.2Ži. lim sup h n Ž v , t n Ž v . . F E g Ž X , u . .

Ž 6.6.

n

On the other hand, for any v g V R N, one can extract a subsequence Ž t nŽ k .Ž v .. k G 1 converging to some v 0 g V Žthe subsequence and the limit point generally depend upon v .. Using Proposition 3.1, we obtain lim inf h nŽ k . Ž v , t nŽ k . Ž v . . G E g Ž X , v 0 . , k

whence by Ž6.6. E g Ž X, v 0 . F E g Ž X, u. which, by Remark 6.2Žii., implies v 0 s u. Therefore, u is the unique cluster point of the sequence Ž t nŽ v .. n G 1 so that u s lim n t nŽ v .. Step 2. The general case. Consider the embedding C from V into the Hilbert cube W, and define g X , hXn , for n G 1, and f X on E = W, V = W and W, respectively, by setting, for any v g V, x g E and w g C Ž V ., g X Ž x , w . [ g Ž x , Cy1 Ž w . . ,

hXn Ž v , w . [ h n Ž v , Cy1 Ž w . . ,

f X Ž w . [ f Ž Cy1 Ž w . . . These functions are taken to be equal to q` if w f C Ž V .. From these definitions, we deduce that, for any Ž v , w . g V = W, one has hXn Ž v , w . s

1 n

n

Ý gX Ž Xi Ž v . , w .

and

fX Ž w . s E gX Ž X , w . .

is1

For any x g E, the sup-compactness of f Ž x, ? . implies the inf-compactness of g Ž x, ? . on V, which, in turn, implies the lower semicontinuity of g X Ž x, ? . on W. Indeed, for any r g R and x g E, we have  w g W : g X Ž x, w . F r 4 s C Ž v g Vrg Ž x, v . F r 4.. Thus, for each n G 1, hXn is inf-compact, too. Further,

STRONG CONVERGENCE OF THE MAXIMUM LIKELIHOOD ESTIMATOR

1313

applying Fatou’s lemma, it is easily seen that f X is lsc on W. Now, let Ž t n . n G 1 be a sequence of an a n-approximate MLE and consider the sequence Ž tXn . n G 1 defined by tXnŽ v . [ C Ž t nŽ v ... As in the first step, we can show that the sequence Ž tXnŽ v .. n G 1 converges to some w g W. Then, using Ž3.5., it is not hard to see that w s C Ž u. for u g V, which is a solution of Ž P` .. Finally, the continuity of Cy1 yields u s lim n t nŽ v . a.s. I REMARK 6.3. Several variants of Theorem 2.1 could be given. For instance, using Remark 5.2Ža. instead of Theorem 5.1 in the proof of part ŽB., it is easily seen that condition ŽH. can be replaced by the two following conditions: Ža. There exist u 0 g V, a ) 0 and b Ž?. g L1 such that, for any Ž x, v . g E = V, f Ž x , v . F exp  a d Ž v, u 0 . q b Ž x . 4 .

Ž b.

HE f Ž x , u . log f Ž x , u . 4 y1

y

m Ž dx . - q`,

where, for any real u, uy[ maxŽyu, 0.. REMARK 6.4. It is interesting to make some additional remarks comparing our results with those obtained recently by Hoffmann-Jørgensen in the first chapter of his monograph w 20x . Ži. In w 20x , in order to study the consistency of estimators obtained by a maximization procedure, Hoffman-Jørgensen has introduced the notion of sequence of approximating maximums ŽSAMs.. In our setting, in terms of minimization Ždue to the use of epi-convergence instead of hypo-convergence., it can be observed that the sequence Ž t n . appearing in inequality Ž6.6. precisely corresponds to a SAM. In fact, the above proof shows that, in the presence of epi-convergence, a sequence Ž t n . of a n-approximate MLE’s, such that lim n a n s 0, is a SAM. Žii. On the other hand, in w 20x , no sup-compactness condition like Žc. is explicitly assumed. In our approach, this condition yields the finiteness of function bŽ?. and ensures that any limit point of the sequence Ž t nŽ v .. is a member of V. In w 20x , there is no similar requirement, since the accumulation points of sequences of MLE’s are allowed to go outside the given parameter space V. REMARK 6.5. Finally, it is important to point out that, in spite of its interest in explaining some pathological behaviors of the MLE Žsee Remark 2.3., condition ŽH. is not necessary for the MLE to be consistent, as the following example shows. Take E s NU s the set of strictly positive integers, V s the set of all probability measures on E Ž V is endowed with the l 1 metric. and m the measure on E which assigns the value 1 to each integer. Then we have f Ž x, v . s v x and bŽ x . ' 1. Using the SLLN and Lebesgue’s dominated convergence theorem, it is readily seen that the MLE is consistent, but hypothesis ŽH. need not be satisfied. Indeed, it suffices to assume

1314

C. HESS

that the true value of the parameter is a sequence u s Ž u n . in V such that Ý n G 1 u n logŽ u n .y1 s q`. Acknowledgments. I am indebted to Professors E. Lanery ´ and M. Valadier and Doctor V. Jalby for valuable discussions. I also thank Professor L. D. Brown as well as the referees for their numerous remarks which helped me to improve the preliminary versions of this paper. REFERENCES w 1x ARSENIN, W. J., LJAPUNOV, A. A. and CEGOLKOV, E. A. Ž1955.. Arbeiten zur deskriptiven mengenlehre. Deutscher Verlag der Wissenschaften, Berlin. w 2x ATTOUCH, H. Ž1984.. Variational Convergence for Functions and Operators. Pitman, Boston. w 3x ATTOUCH, H. and WETS, R. Ž1991.. Epigraphical processes: laws or large numbers for random lsc functions. Expose d’Analyse Convexe, Univ. Montpellier, ´ No. 13, Seminaire ´ France. w 4x BAHADUR, R. R. Ž1958.. Examples of inconsistency of maximum likelihood estimates. Sankhya Ser. A 20 207]210. w 5x BEER, G. Ž1993.. Topologies on Closed and Closed Convex Sets, Mathematics and Its Applications. Kluwer, Dordrecht. w 6x BROWN, L. D. and PURVES, R. Ž1973.. Measurable selections of extrema. Ann. Statist. 1 902]912. ` propos de l’existence des sections separement w 7x CASTAING, C. Ž1976.. A mesurables et separe´ ´ ´ ´ ment continues d’une multiapplication separement semi continue inferieurement. ´ ´ ´ Travaux Sem. ´ Anal. Convexe 6. w 8x CASTAING, C. and VALADIER, M. Ž1977.. Convex Analysis and Measurable Multifunctions. Lecture Notes in Math. 580. Springer, New York. w 9x DAL MASO, G. Ž1993.. An Introduction to G-Convergence. Birkhauser, Boston. ¨ w 10x DE GIORGI, E. Ž1979.. Convergence problems for functions and operators. Proceedings of the International Meeting on Recent Methods in Nonlinear Analysis, Roma 1978 ŽE. De Giorgi, E. Magenes, J. Mosco, Pitagora and Bologna, eds.. 131]188. w 11x DELLACHERIE, C. Ž1975.. Ensembles analytiques: theoremes de separation et applications. ´ ` ´ Seminaire de Probabilites IX. Lecture Notes in Math. 465. Springer, Berlin. ´ w 12x DUPACOVA, J. and WETS, R. Ž1988.. Asymptotic behavior of statistical estimators and of optimal solutions of stochastic optimization problems. Ann. Statist. 16 1517]1549. w 13x ETEMADI, N. Ž1981.. An elementary proof of the strong law of large numbers. Z. Wahrsch. Verw. Gebiete 55 119]122. w 14x HESS, C. Ž1991.. Convergence of conditional expectations for unbounded random sets. Math. Oper. Res. 16 627]649. w 15x HESS, C. Ž1991.. On multivalued martingales whose values may be unbounded: martingale selectors and Mosco convergence. J. Multivariate Anal. 39 175]201. w 16x HESS, C. Ž1991.. Epi-convergence of sequences of normal integrands and strong consistency of the maximum likelihood estimator. Cahier de Mathematique de la Decision, Univ. Paris Dauphine. 9121. w 17x HIMMELBERG, C. J. Ž1975.. Measurable relations. Fund. Math. 87 53]72. w 18x HIMMELBERG, C. J., PARTHASARATHY, T. and VAN VLECK, F. S. Ž1981.. On measurable relations. Fund. Math. 111 161]167. w 19x HOFFMANN-JøRGENSEN, J. Ž1970.. The Theory of Analytic Sets. Various Publications Series 10. Mat. Inst., Aarhus Univ. w 20x HOFFMAN-JøRGENSEN, J. Ž1992.. Asymptotic Likelihood Theory. Functional Analysis III. Various Publications Series 40. Matematisk Institute, Aarhus Univ. w 21x IBRAGIMOV, I. A. and HAS’MINSKI, R. Z. Ž1981.. Statistical Estimation, Asymptotic Theory. Springer, New York.

STRONG CONVERGENCE OF THE MAXIMUM LIKELIHOOD ESTIMATOR

1315

w 22x JALBY, V. Ž1993.. Semi-continuite, ´ convergence et approximation des applications vectorielles, Loi des grands nombres. Preprint. Laboratoire d’Analyse Convexe, Univ. Montpellier. w 23x KING, A. J. and WETS, R. Ž1991.. Epi-consistency of convex stochastic programs. Stochastics Stochastics Rep. 34 83]92. w 24x LANERY ´ , E. Ž1991.. Convergence presque sure ˆ du maximum de vraisemblance. C. R. Acad. Sci. Paris Ser. ´ I Math. 313 301]303. w 25x LE CAM, L. Ž1990.. Maximum likelihood: an introduction. Internat. Statist. Rev. 58 153]171. w 26x MOSCO, U. Ž1969.. Convergence of convex sets and of solutions of variational inequalities. Adv. in Math. 3 510]585. w 27x PENOT, J. P. and POMMELET, A. Ž1985.. Toward minimal assumptions for the infimal convolution regularization. Technical Report 23, AVAMAC, Univ. Perpignan. w 28x ROCKAFELLAR, R. T. Ž1976.. Integral functionals, normal integrands and measurable selections. Non-linear Operators and Calculus of Variations. Lecture Notes in Math. 543. Springer, Berlin. w 29x ROCKAFELLAR, R. T. and WETS, R. Ž1984.. Variational systems, an introduction. In Multifunctions and Integrands, Stochastic Analysis, Approximation and Optimization ŽG. Salinetti, ed... Lecture Notes in Math. 1091 1]54. Springer, Berlin. w 30x SAINTE-BEUVE, M. F. Ž1974.. On the extension of Von Neumann]Aumann’s theorem. J. Funct. Anal. 17 112]129. w 31x NOUGUES de Novikov et d’Arsenin. ` -SAINTE-BEUVE, M. F. Ž1981.. Une extension des theoremes ´ ` Sem. ´ Anal. Convexe 18. w 32x SAINT-RAYMOND, J. Ž1976.. Boreliens ´ `a coupes Ks . Bull. Soc. Math. France 104 389]400. w 33x WALD, A. Ž1949.. Note on the consistency of the maximum likelihood estimate. Ann. Math. Statist. 20 595]601. w 34x WIJSMAN, R. A. Ž1966.. Convergence of sequences of convex sets. II. Trans. Amer. Math. Soc. 123 32]45. CENTRE

DE RECHERCHE DE MATHEMATIQUES DECISION Ž CEREMADE. URA CNRS NO . 749 UNIVERSITE ´ PARIS DAUPHINE 75775 PARIS CEDEX 16 FRANCE E-MAIL : [email protected] DE LA