The estimation of a convex domain when inside and outside observations are available. Marcel REMON.
May 18, 1993
Abstract The estimation of convex sets when inside and outside observations are available is often needed in current research applications. The key idea in this presentation is to try to estimate the measure of a convex domain when inside and outside realizations are randomly observed. We can think here to oil elds detection, population polls, pattern recognition, etc. This problem is known in the literature as the third problem of Grenander [1]. Though the simplicity of the problem, it is indeed dicult to use, for the estimation of the original convex set, all the relevant information one gets, especially this amount of information coming from outside observations. In 1977, Ripley and Rasson [12] proposed a solution to a similar problem formulated by D.G. Kendall: the estimation of a convex domain when only inside observations are available. See also Moore [6]. The solution in that case is a dilatation of the convex hull statistic of the inside data. We propose here an estimator for the measure of this convex unknown domain. This estimator is a generalization of the one proposed by Ripley and Rasson [12]. A comparison with another possible estimator is given.
Keywords: Poisson point process, Lebesgue measure, convex hull, invariant statistics,
pattern recognition, Grenander's problems.
1
2
Marcel Remon
1 The inside / outside problem: Presentation Suppose that X is a point Poisson process within a xed window F IRd . In F , we have a compact convex domain D. We observe (n + m) realizations of X in F , from which n turn to be inside the domain D and m outside D. Let us note Y and Z the respective restrictions of X to the domain D and its complementary D in F . We suppose that the Poisson process is homogeneous on D with density 1 and on D with density 2. We want to estimate the unknown convex domain D. This problem is indeed the third problem of Grenander [1]. The second Grenander's problem which consists of the estimation of a convex set D with only inside observations was solved by many authors: Rasson in 1976[8], Ripley and Rasson in 1977[12], Rasson in 1979 [9] and Moore in 1984 [6].
2 Likelihood function and minimal sucient statistic. Following the same decomposition as in Ripley and Rasson [12], we have D = (g(D); s(D)) where:
g(D) is the centroid of the domain D; s(D) = D ? g(D) is the shape{area of D (the translation of D at the origin.) 8A; B IRd: H (A) is the convex hull of the set A; A + B is the vector sum of A and B; A B = fy j y + B Ag (Minkowski subtraction ). m(D) is the Lebesgue measure of D in IRd . Let us denote the n realizations of the Poisson point process X inside D by Y = (Y1 ; Y2; ; Yn ) and the m outside realizations by Z = (Z1; Z2; ; Zm ). Writing the likelihood with these notations, we have:
LD ((y; z)j(n + m) observations in F ) = LD ((y ; z )jn inside and m outside observations )
LD (n observations 2 D; m observations 2 D j(n + m) observations in F )
Estimation of Convex Domains with Inside/Outside Observations
9 ( 8 n m = < n m ) Y Y [ 1 1m(D)] [2m(D)] n I (y ) I (z ) Cn+m [ m(D) + m(D )]n+m =: [m(D)]n[m(D )]m i=1 D i j =1 D j ; 1 2
1 2
We see that the likelihood function is depending on 1; 2 only through the ratio = . The likelihood becomes
LD ((y; z)j(n + m)observations in F ) Y n m Y
n = Cnn+m I ( y ) I (z ) [ m(D) + m(D )]n+m i=1 D i j =1 D j
= K ( ; m(D); m(D ); n; m)
n Y
i=1
ID (yi )
m Y
j =1
ID (zj )
For simplicity, we denote the function K ( ; m(D); m(D ); n; m) by K (m(D); ). And we have nally,
LD ((y; z)j(n + m)observations in F ) = K (m(D); ) ID (H (y))ID (J (z)): where H (Y ) is the convex hull statistic and J (Z ) is the " shadow " statistic de ned in Hachtel, Meilijson and Nadas [2]. J (Z ) is de ned by
J (z1; z2; ; zm) = [zi 2D fa 2 IRd j a = zi + (zi ? b); 0; b 2 H (y)g e (H (Y ); J (Z )) is a minimal sucient statistic for the estimation of D. See gure 1.
3
4
Marcel Remon
Figure.1
3 Estimation of the area m(D). For most of the practical examples, the real parameter of interest is the measure of D, the exact description of the domain D being of second interest. One would rst think to the maximum likelihood estimator of m(D). However, as expected with uniform models, this will give very unsatisfactory estimations. Indeed this maximum likelihood estimator always under{estimates or over{estimates the area m(D), depending on the relative number of inside and outside observations. The computation of such an estimator is moreover very dicult, except for the one-dimensional case. The natural solution is to nd an unbiased estimator of m(D). The estimator we propose here is a quasi{ unbiased estimate of m(D). It is a generalization of the estimator proposed by Ripley and Rasson [12]. Let An and Vn denote the area and the number of extreme points of H (Y ). Let Bm and Vm be the corresponding notations for J (Z ). The reasoning done in Ripley and Rasson [12] can be developed here for both H (Y ) and J (Z ), taking the expectation over the n inside and m outside observations. We add a new inside observation Yn+1
Estimation of Convex Domains with Inside/Outside Observations
or a new outside point Zm+1 . Then, An P[Yn+1 is an extreme point of H(fY1; ; Yn+1g)jfY1; ; Yn; Z1; ; Zm g] = 1 ? m(D) P[Zm+1 is an extreme point of J(fZ1 ; ; Zm+1 g)jfY1; ; Yn; Z1; ; Zm g] = 1 ? Bm m(D) Thus n] P[Yn+1 is an extreme point ] = 1 ? E[A m(D) P[Zm+1 is an extreme point ] = 1 ? E[Bm ] m(D)
So
E [Vn+1] = E [Vm+1] = and
nX +1
i=1 mX +1 i=1
P [Yi is an extreme point ] = (n + 1)[1 ? Em[(ADn)] ]
P [Zi is an extreme point ] = (m + 1)[1 ? Em[(BDm)] ] m(D) = EE[A[Vnn]+1 ] 1 ? n+1 m(D ) = EE[B[Vmm]+1 ] 1 ? m+1
An and Vn+1 do not depend on the realization of Z , because the outside observations do not enter in the construction of H (Y ). On the contrary, Bm and Vm+1 are depending on the realization of Y through the construction of J (Z ). Both E [Vn+1] and E [Vm+1] are only depending on D through its normalized shape ms((DD)) . We have then two unbiased estimators of m(D): [m(H (Y ))] md 1 (D) = n+1 ] 1 ? E [nV+1 [m(J (Z ))] md 2(D) = m(F ) ? 1 ? E [mVm+1+1 ]
5
6
Marcel Remon
Indeed, for i = 1; 2, E[md i (D)jn + m observations] = E[E[md i(D)jn inside and m outside obs.]jn + m obs. ] = E[m(D)jn + m observations ] = m(D)
Any linear combination of these two estimators is still unbiased. If 1 and 2 are known, the natural combination will be
2 md md (D) = +1 md 1(D) + 1 + 2 2 (D) 1 2
If 1 and 2 are unknown, we propose:
m md md (D) = n +n m md 1(D) + n + m 2(D) as:
E [md (D)] = E [E [md (D)jn inside and m outside observations]] n m E [md = E [ n + m En;m [md 1(D)] + n + m n;m 2(D)]] = E [m(D)] = m(D) In practice, E [Vn+1] and E [Vm+1] are unknown, excepted for the one{dimensional case where there are equal to 2. One could calculate the exact value of E [Vn+1] and E [Vm+1] if one knows the normalized shape of the domain D. If this normalized shape is unknown, one can use instead its relative maximum likelihood estimator. See Rasson, Remon and Kubushishi [7]. The asymptotical values of these quantities can be calculated (see Renyi and Sulanke [11], Rasson [9]) but their computations are n+1 ] and E [Vm+1 ] by Vn and Vm however tremendous. Practically, we approximate E [nV+1 m+1 n m which leads to a negligibly biased estimator. See Moore [6].
4 Comparison with another estimator of m(D). Let ^1 (1; 2) denote the estimator of m(D), we propose in this paper for known 1 ; 2: ^1 (1; 2) = +1 m(H (VYn)) + +2 (m(F ) ? m(J (VZm)) ) 1? n 1? m 1 2 1 2
Estimation of Convex Domains with Inside/Outside Observations
If i are unknown, we use ^1 ( n+nm ; n+mm ) which is still a quasi{unbiased estimate. However, there exists a more natural generalization of the estimator proposed by Ripley and Rasson [12]. Indeed, P[ a new observation Xn+m+1 is an extreme point of H(Y ) or J(Z)jY1; ; Yn; Z1 ; ; Zm ] ? m(J(Z)) ? m(H(Y )) ) + P[X ] ( m(D) = P[Xn+m+1 2 D ] ( m(D) m(D) ) n+m+1 2 D m(D) ? m(J(Z)) m(D) ? m(H(Y )) ] + [ m(D) 1 m(D) 2 m(D) ] [ ] [ ] = [ m(D) m(D) + 2 m(D) 1 m(D) + 2 m(D) m(D) 1
Thus,
E [Vn+m+1] = (n + m + 1) 1 ? E([1m?(H ()Ym))(D+) +2m(mJ((FZ )))] 1
2
2
and 8 9 < 1m(H (Y )) + 2m(J (Z )) 2 m(F )= = m(D) ? E: ; (1 ? 2)(1 ? En[V+nm+m+1+1 ] ) 1 ? 2
Let ^2 (1; 2) denote our new estimator of m(D): (J (Z )) ? 2 m(F ) ^2 (1; 2) = 1m(H (Y )) + 2m V n 1 ? 2 (1 ? 2)(1 ? n++mm )
^2 (1; 2) is a quasi{unbiased estimator of m(D) as Vnn++mm is a very good approximation
of En[V+nm+m+1+1 ] . However, ^2 (1; 2) has many disaventages with respect to ^1 (1; 2). First, it is not de ned in the uniform case, where 1 = 2. Secondly, ^2 (1; 2) cannot be generalized for unknown i ; i = 1; 2. Indeed, to use n+nm and n+mm instead of 1 and 2 in its de nition increases drastically the bias of ^2 (1; 2). Finally, its variance is always greater than var[^1 (1; 2)]. This can be seen from gure 2. where the ratio of the corresponding empirical variances - estimated from simulations - is plotted for dierent value of 1 and 2. Our simulations have shown a ratio ranging from 1.0 [for
7
8
Marcel Remon
very dierent 1 and 2 ] to about 400.0 [for very similar 1 and 2]. See gure 2.
log[varθ1/varθ2]
1.0
-2.0
0.0
log[ λ1/λ2 ]
Figure.2
References [1] Grenander, U., Statistical geometry: a tool for pattern analysis, Bulletin of the American Mathematical Society, Vol.79, pp.829{856, 1973. [2] Hachtel, G.D., Meilijson, I., and Nadas, A., The estimation of a convex subset of IRk and its probability content, IBM research report, Yorktown Heights N.Y., 1981. [3] Kallenberg, O., Random Measures, Akademie{Verlag, Berlin, 1986(4th edition). [4] Karr, A.F., Point processes and their statistical inference, Marcel Dekker, New York, 1991. [5] Krickeberg, K., Processus ponctuels en statistique. In Ecole dete de Probabilites de Saint{Flour X{1980, 205{313, Springer{verlag, Berlin, 1982. [6] Moore, M., On the estimation of a convex set, Annals of Statistics, 12, pp.1090{ 1099, 1984.
Estimation of Convex Domains with Inside/Outside Observations
[7] Rasson, J.P.,Remon M. and Kubushishi T., Finding the edge of a Poisson forest with inside and outside observations, Internal Report 93/14, Department of Mathematics, Namur University, 1993. [8] Rasson, J.P., De quelques Problemes d'Entropie et d'Inference pour des Processus ponctuels, Dissertation doctorale, FUNDP, Namur, 1976. [9] Rasson, J.P., Estimation des formes convexes du plan, Statistiques et Analyse des Donnees 1, pp.31{46, 1979. [10] Remon , M., On a concept of partial suciency : L{suciency, International statistical review, 52, 2, pp. 127{135, 1984. [11] Renyi, A. and Sulanke, R. , U ber die konvexe Hulle von n zufallig gewahlten Punkten II, Z. Wahrscheinlichkeitstheorie und verw. Geb, 3, 138{147, 1964. [12] Ripley, B.D., and Rasson, J.P.,Finding the edge of a Poisson forest, Journal of Applied probability, 14, pp. 483{491, 1977. [13] Ripley, B.D.,Stochastic Simulation, John Wiley & Sons, New York, 1987.
Marcel Remon F.U.N.D.P., Departement de Mathematique Rempart de la Vierge, 8 B{5000 Namur, Belgium Email:
[email protected]
9