Edward J. Wegman. 1. Introduction. ... Both estimates are based on a sample of size 90. ..... Since both terms on the right-hand side are positive, we have. J[. ] ...
MAXIMUM LIKELIHOOD ESTIMATION OF A UNIMODAL DENSITY FUNCTION by EDWARD J. WEGMAN Department of Statistics University of North Carolina at Chapel Hill Institute of Statistics Mimeo Series No.608 January 1969
This research arises in part from the author's doctoral dissertation submitted to the Department of Statistics, The Univeristy of Iowa. Supported in part by PHS Research Grant No. GM-14448-01 from the Div. of General Medical Sciences and in part by N. S. F. Development Grant GU-2059.
Maximum Likelihood Estimation of a Unimodal Density Function Edward J. Wegman 1.
Introduction.
Robertson [4] has described a maximum likelihood
estimate of a unimodal density when the mode is known.
This estimate is
represented as a conditional expectation given a a-lattice.
Discussions of
such conditional expectations are given by Brunk [1] and [2].
L,
A a-lattice, of subsets of
~
L,
the real line, measure, let
A L
2
those members of
(~, A,~)
is a collection
closed under countable unions and countable intersections
¢
and containing both a a-lattice,
of subsets of a measure space
A function
~
and
if the set, [f
a], is in
>
f
is measurable with respect to
L for each real a.
be the set of square-integrable functions and L
2
is
= A is Lebesgue
~
is the collection of Borel sets, and
~
If
which are measurable with respect to
L.
L (L) 2
be
Brunk [1] shows
the following definition of conditional ecpecfBtion with restect to L is suitable. Definition.
If
f
L , 2
E
conditional expectation of
then
f
g
€
L,
given
J f • 8(g)dA
( 1.1)
for every
8,
J
A
E
is equal to
L
with
0
(.I'
....
"
....
,,/
/ •o
""......"l"( /
/
1 /
/
"" "
//
,,""
r--
/
/ /
/
)
;; / /
/
/
o
""
"
'
....
'
....
....
'
'
.
'. .... ....
--
"
+ ![R,R']
~nm(Yr(n»)d>" +
~nm(Yf(n))d>"}
- ![L,L']
.
Combining the first and third terms of the right-hand side and observing A[L, R]
=
we have for
£
y
£
Finally, observing that both
[L', R']
>..[R, R']
>..[L, L']
and
~
and that g(y) > ~
nm
(y)
for all
y
nm
on
(L)
in
are
1
2 n ,
that
[L',R],
[L', R].
Thus it follows that g(y) This implies that
~
>
nm
g
contrary to choice of {Yl' ..• , Yn }
or
(y)
for all
~ nm
Hence either
~nm(Yf(n))
having proper end points.
compute
~nm(Yr(n))
L
or
R must lie in
If the latter case holds, an
If one or more of the sets
(- 00, L),
[L, R],
R belongs to 2n
estimates
2n
intervals of the form
... , ~nm. '
intervals.
Calculate the
g,
This completes the proof.
[L, R]
i = 1, 2, ••. , 2n, 2n
where one of
If the modal interval is unspecified,
y }. n
corresponding to these
1
2n
which is
contain no observations, we may define an alternate function,
There are at most or
[Yf(n)+l' Yr(n)-l]
~ nm may clearly be defined with modal interval
a similar manner to the above.
L
in
has a larger likelihood product than
estimate equally likely as
(R, 00)
y
likelihood products, n IT
j=l
f
nmi
(y . ) • J
or in
10 Select the maximum of these estimate. L
Let
m
n
2n
~
numbers and let
be the center mode.
be the corresponding
n
(Notice that if 2 or more of the
~
are equal, we could have some ambiguity of definition of
nm.
1
adopt the convention of choosing center mode).
~n
Let us
n
to be the estimate with the smallest
~
Clearly the likelihood product of
is as large as that of
n
any other estimate. Before closing this section, we note the following lemma that may be replaced by Lemma 2.2.
n
L([L, R]).
~ nm which is measurable
The maximum likelihood estimate L ([L, R])
with respect to
L ([L, R))
is given by
E(g
nm
where
IL([L, R]),
is as
before.
by
~
and
E
Let us abbreviate for purpose of this lemma
Proof: A
gnm
by
A
g •
We want to show
is measurable with respect to
L([L, R]).
~ =E
(g nm IL ([L,
(g nm ILn ([L,
R)))
Clearly,
R]).
Property (1.1) is easy to verify,
so let us show for We may assume
A
E
A is a closed interval, and
L([L, R))
[a, b]
,
0
0
or
Then since
~(Yj+l)'
~
and
/I
g
are both constant on
From this, it follows that
Thus
But
J[
a,y j+1
(g - ~) dA
]
=
J[
a,b
(g - ~)dA
]
+ J[b
,Y j+1
]
(g - ~)dA
•
Since both terms on the right-hand side are positive, we have
J[
a,y j+1
]
(g - ~)dA
>
O.
Rewriting
The last term on the left-hand side is nonpositive by (1.2), since belongs to
L ([L, R]). n
J Since Thus
~
and
/I
g
[a,yi+1]
From this, we have
(g - ~)dA
are also constant on
>
O. we have
g(y.) 1.
>
~(Y.)' 1.
12 Hence
so that
But
[y., YO+l] J
1
E
L ([L, R]), n
This is a contradiction.
which means by (1.2)
Our assumption that
A similar argument follows if either both.
~
[a,b]
(~
~)dA
L
or
R < b ~ Yr(n)
a
is false. or
This concludes the proof.
3.
Conditional Expectations of the true density.
investigate the conditional expectation, f
f
In this section, we
E(fIL([L, R]»,
of the true density
with respect to the a-lattice of intervals containing the interval,
Notice
ff 2 dA
or
E(fIL([L, R]»
f
E(fIL([L, R)))
(R - L)
If
f(L)
K
~
f(R)
>
(R - a)
f
E(fIL([L, R]»
impossible.
Notice that
f(L)
such that
a
=
If
the set defining
a.
f*(y)
(R - a)
-1
iii
sup {y :5: L': (R - y)
L' = min{L, M} •
Let
K
-1
y:5: L'
and
·f [a,R] f dA
f* = E(f!L([L, R))).
f
f[y,R]
•
and
dA
~
f(y):5: K
Clearly
fA (f - f*)dA :5: 0
y
for y
for
M
f(L)
K :5: f(R)
>
are Case
follows in a manner similar to case
and
f(y)
R :5:
[L, b)
on
f(y)}
in f*
for every
in
it is clear that
[a, R]
[a, R]. E
E
ii.
y
belongs to
a, indeed, exists.
and by
We wish to show
L([L, R)). A
c
Hence it remains that
L([L, R])
with
0
pair of observations r
l
..
on
[L, R] .
For sufficiently large
[~nm
>
t]
n belongs to n , [L -
n,
n*
to see
CJ
n
n,
Yo E [L , R ]· n n
Thus if
[L, R] c
on
yO > R
or
[4] or Wegman [5].
n
t =
~nm
In this case, any element of
L([L, R ])
-1
is the number of observations in
n
n
For sufficiently large
so that for sufficiently large
nm (yO) ~ (R - L + 2n) n
(yO)'
n
be any positive number.
R + n] E L([L , R ]), n
= f
L < yo < R.
eventually. Let
~
If
=
m
Yo < L
For
we use methods quite analogous to those found in Robertson Let us turn our attention to
f
n,
A
• f[L-n,R+n] gnm d>" . n [L - n, R + n], it is not difficult
17 1\ (n* + 2)/n ~ J [L-n,R+n] gnm
Let us write
Fn (x-)
for
limy t x Fn (y).
~
dA
(n* - l)/n .
n
Then we may rewrite the above inequality
as F (R + n) - F (L - n-) +
n
n
1n ~ J[L-n,R+n] gnm dA
+
~ F (R
n
n
1
n) - F (L - n-) n
n
Hence we have n lim -+
00
1\
dA
J [L-n, R+n l gnm
F(R + n) - F(L - n-)
=
=
J [L-n,R+n] f dL
n
From this, lim inf
~
t
nm (YO) ~ [R - L + 2nl
-1
• J[L-n,R+n] fdA.
n
But this is true for all
n
On the other hand, i f an interval.
Clearly
lim sup Yi(n)
:$
lim ooYi(n) = L. n-+
L
T
Yi(n)
and
0,
>
so
[~ t = nm
:$
~
n and
L n
lim inf
Yj (n)
then
tl, ~
Yj (n) R.
~
T
R , n
t = [Yi(n), Yj (n) l
so that
We wish to show
To see this, it is necessary to show
lim inf Yi(n)
~
this is not true, there is a subsequence which we shall again denote for simplicity, and a positive constant, n
~nm
moment,
t
Since
Yi(n)
:$
L - n.
Yi(n).
L.
If
Yi(n) For the
Since
~
nm
n
T t
eventually,
= ~nm (YO) = ~ nm
~ nm
such that
will mean the subsequence corresponding to
n is constant on
is
n
= ~ nm
(L - n)
n
(L - n)
converges to
f
n
lim sup ~
nm
n
=
f
n (L -
m
m
[L - n, R).
eventually on n)
(L - n)
we have on
[L -
n,
R)
•
18
[L - n, R + n]
Eventually,
![L-n,R+n] where
L([L , R ])
E
n
~nmn
so that by (1.2)
n
dA
![L-n,R+n]
$
~nmn
dA,
is the subsequence corresponding to
By Fatou's lemma,
A
1 im sup ! [L-n,R+n] gnm
dA
![L-n,R+n] lim sup
$
n
But
lim sup
~ nm
(x)
f
$
n
(L-n)
~nm
lim sup ![L-n,R+n]
dL
n
for all
m
~nm
dA
x
so that
![L-n,R+n] fm(L - n) dA .
$
n
We have just seen A
lim00
![L-n,R+n] gnm
n -+
= ![L-n, R+n] f dA ,
dA n
so
(4.3) But
![L-n,R+n] f dA [L, R]
must contain
Similarly,
Yj(n)
M.
Since
has unique mode,
f
Hence (4.3) cannot hold.
converges to
$
![L-n,R+n] fm(L - n) dA.
M since otherwise the characterization of case
in Theorem 3.1 could not hold. some neighborhood of
$
R.
Since
~
(Yj(n) - Yi(n))-l • ![y
Writing
dA
=
Thus
f
lim n-+ oo
belongs to
Y ] i (n)' j (n)
F (y. ( )) - F (y. ( ))
nJn
nln
> f
m
(L -
n
by (4.1)
dA.
+ (1) n
it is clear
that (R - L)-l • (F(R) - F(L)) Thus
n lim -+ 00
~nm
n
(YO)
=
for
yo
=
such that
in
y.I (n ) = L.
Hl(T ), t
~nm
n)
i
L
are in
M+
and
2
are in the support.
m'
m and
1 M- - E
M are
be the larges t .
d
1
2
E),
sufficiently
In the next lemma, when
we shall mean that subsequence of the sequence of maximum likelihood By
mI.
the maximum likelihood estimate whose mode is Lemma 5.4.
on
E.
1 R = m +-E 2
estimates for which the center modes converge to
~
c - u-".
] g
n
*
d'/\ •
For sufficiently large
This is easily seen, since 1 m - - E > n 2
As soon as
Yi,Y i *
1
m n
C -
0,
2
E
A
gn
n,
and
converges to
and
gn
*
agree.
This
together with (5.1) and (5.2) shows (5.3)
(Y i * - Yi ) Now let
(5.4)
t
-1
~ n(t O)
. J [Yi,Y and
P
~
A
gn dA i *]
[~
t
n
(y. - Y 1
~
(y. - Yj*) 1
> t].
j*
-1
f
A
[Yj*,Y i ]
gn dL
Again by use of (4.2), we have
) -1 • f [
* n be
for which there is a jump
Let
= [Y j *, Yk *]
Tt
Then
~ n has a
We wish to show equality holds
in the last inequality.
T
(to' c - 0) •
for which
to
be the smallest observation greater than
Yi *
Pick
Yj*'Y i
] ~
n
dL
26 Finally letting
t = ~ n (y i)
~ n (y i)
(5.5)
:s;
and using (4.1) , we have
(y i* - y i)
.f
-1
1\
[Yi'Yi*]
gn d".
Using (5.3), (5.4) and (5.5),
~
~
is a jump point in
But
n
,
Y f Y * i i
[f
n
*
>
t].
(y.). 1.
so
~ n (to) Thus our assumption
n
~ n (y.). 1.
are in the support of
0 f.
28 For sufficiently large probability one. f
m
,
1
"2
>
f
*
~
and
n
agree on
n
d + 11]
~n
and because
uniformly with probability one and
f
n
*
d + 11)c
(c - 11,
= min{f(c - 11), f(d + 11)}.
k
[c - 11,
on
k
Let
n,
Because
f
converges to
converges to
f
m
1 k
> -
m f
with
2
m
and
almost
,
almost uniformly
with probability one, logd ) converges to log(f ,) n m * log(f ) converges to log(f ) n m 6 >
o.
Pick the set
P(A) • 10g(kE/2)
> -
almost uniformly with probability one and almost uniformly with probability one.
Let
A of arbitrarily small measure so that 6.
Now i f
are the ordered observations,
let 1
L(f *) - L(~ ) n
n
n
n
For sufficiently large (c - 11,
c d + 11) •
[c - 11 ,
d + 11],
f
n
*
A, log(f
k
n
2
=
*)
log(~ n )
~n ~
Yi
in
n*
bilityone,
n
d + 11] n A.
1n
l -n L {log(f *(Y.)) l n 1 >
1. E
log(kE/2),
sufficiently large
n
n.
1
(y.))}. 1
~
and
agree on
n
in
Yi
log(~ n (y.))} 1
eventually.
.
This follows since
Thus n*
> n
Since
(n*/n)
L {log(f *(Y.)) - log(~ (y.))} 2
f * n
n
with probability one,
is the number of observations in
[c - 11,
log(~
1
be the summation over
l
log(~ n (y.))} 1
1 L {log(f *(Y.)) n 2 n 1 where
L
for sufficiently large
n
n
*(y.))
with probability one,
Thus i f we let
eventually and
>-
{log(f
L
n i=l
L(f *) - Ld ) n n n n Now on
n
n
1
log(kE/Z) A and
L
Z
converges to > -
6
eventually, is the summation over P(A)
with proba-
with probability one for
29 In a similar manner, one may show eventually 1
- -
n
Then if
L
3
L {10g(f (y.» m 1 2
is the sum over
Yi
- log(f ,(y.»} m 1 [c - n,
in
log(f
n
*)
log(~)
and
on
n
8 •
d + n] - A,
with probability one for sufficiently large properties of
> -
n.
By the uniform convergence d + n] - A,
n,
[c -
we have
eventually with probability one, L(f *) - L(~ ) ~ 1 L {log(f (y.» nn nn n l m1 On
[c -
n,
c
d - n] ,
sufficiently large
f
f
m
=
f
,
m
- log(f ,(y.»} - 38. m 1
so that with probability one and for
n, n
L(f *) - L(~ ) ~ 1 E {log(f (y.» n n n n n i=l m 1
- log(f ,(y.»} - 38 • m 1
By the Kolmogorov Strong Law, with probability one lim
n -+
Since
00
*
(L(f ) - L(~ » n n n n
m is the unique mode of
f
~ f log(f m ) f dA - 38. m'
f log(f ) fdA, m
and since
8
was arbitrary,
with probability one lim (L(f *) - L(~ » n-+ oo nn nn This is equivalent to is a contradiction.
f
n
*
>
0 •
having a larger likelihood product than
Hence with probability one,
{m} n
converges to
~
n
.
m.
This
30
6.
Summary.
For
a unimodal density. fm
= E(fIL([m
length
E.
-
%E,
Here
E >
0,
we have given a maximum likelihood estimate of
This estimate is a strongly consistent estimate of m
1
+ 2 E]).
Of course,
m is the mode of
H(m)
f
m
=f
except on an interval of
= J log(f m) fdA.
same method yields a maximum likelihood estimate.
If
E
= 0,
the
The lack of a uniform bound
causes the consistency arguments described in this paper to fail.
Hence the
asymptotic behavior of this sort of estimate is unknown. Wegman [5] describes a consistency argument for a related continuous estimate. 7.
The analogue for Acknowledgements.
~
n
is not difficult.
I wish to thank Professor Tim Robertson for
suggesting the topic from which this work arose and for his guidance as my thesis advisor at the University of Iowa.
31
REFERENCES
[1]
Brunk, H.D. (1963). expectation,"
[2]
Brunk, H.D. (1965).
"On an extension of the concept of conditional Proc. Amer. Math. Soc. 14,
"Conditional expectation given a a-lattice and
applications," [3]
Robertson, Tim (1966). given
[4]
Ann. Math. Statis t. 36,
a-lattices,"
Robertson, Tim (1967).
Wegman, E.J. (1968).
1339-1350.
"A representation for conditional expectations Ann. Math. Statist.]2,
1279-1283.
"On estimating a density which is measurable
with respect to a a-lattice," [5]
298-304.
Ann. Math.
Statist.~,
482-493.
"A note on estimating a unimodal density,"
Institute of Stat. Mimeo Series No. 599, The Univ. of North Carolina at Chapel Hill. (Sub. to Ann. Math. Statist.)