Asymptotic Properties of Maximum Likelihood Estimators ... - CiteSeerX

1 downloads 0 Views 197KB Size Report
Generally, the maximum likelihood estimator (MLE) for a MRF model is analytically ..... also just the second part of the log-likelihood function, without edge ...
Asymptotic Properties of Maximum Likelihood Estimators for Partially Ordered Markov Models Noel Cressie Department of Statistics Iowa State University Ames, IA 50011-1210

Hsin-Cheng Huang Institute of Statistical Science Academia Sinica Taipei 115, Taiwan

ABSTRACT Partially ordered Markov models (POMMs) are Markov random elds (MRFs) with neighborhood structures derivable from an associated partially ordered set. The most attractive feature of POMMs is that their joint distributions can be written in closed and product form. Therefore, simulation and maximum likelihood estimation for the models is quite straightforward, which is not the case in general for MRF models. In practice, one often has to modify the likelihood to account for edge components; the resulting composite likelihood for POMMs is similarly straightforward to maximize. In this article, we use a martingale approach to derive the asymptotic properties of maximum (composite) likelihood estimators for POMMs. One of our results establishes that, under regularity conditions that include Dobrushin's condition for spatial mixing, the maximum composite likelihood estimator is consistent, asymptotically normal, and also asymptotically ecient. AMS 1991 subject classi cations: Primary 62M40; secondary 62F12. Key words and phrases: Acyclic directed graph, asymptotic eciency, asymptotic normality, composite likelihood, consistency, Dobrushin's condition, Markov random eld, martingale central limit theorem, strong mixing, triangular martingale array.

1 Introduction Partially ordered Markov models (POMMs), introduced by Cressie and Davidson (1998), are a natural extension of one-dimensional Markov chains and a generalization of two-dimensional Markov mesh models (Abend, Harley, and Kanal, 1965). The models are actually Markov random eld (MRF) models with neighborhood structures derivable from an associated partially ordered set (poset). That poset determines the conditional distribution of a datum at any site s, conditioned on data at sites that are less than s by the partial order; speci cally, the conditional distribution depends only on the data at the adjacent lower neighborhood of s. This development of POMMs in the spatial context has direct parallels to models on acyclic directed graphs and causal network models (e.g., Lauritzen and Spiegelhalter, 1988; Lauritzen et al., 1990). Generally, the maximum likelihood estimator (MLE) for a MRF model is analytically intractable and numerical solutions are usually computationally intensive. Therefore, several alternative estimation procedures, which are computationally ecient but less ecient than maximum likelihood (ML), have been proposed. For example, Besag (1974, 1975) proposed the coding and the maximum pseudo-likelihood methods, and Possolo (1986) proposed the logit method for binary Markov 1

random elds. However, on a large sub-class of MRFs, namely the POMMs, the joint distributions can be written in closed and product form, and hence ML estimation is relatively straightforward. In practice, one often has to modify the likelihood to account for edge components; the resulting composite likelihood can be similarly maximized in a straightforward manner. Many authors have considered the consistency and the asymptotic normality of ML estimation. The literature for one-dimensional dependent processes in a multi-parameter framework includes Basawa et al. (1976), Crowder (1976), Heijmans and Magnus (1986a, 1986b), and Sarma (1986). Unfortunately, only a small proportion of the literature on ML estimation is concerned with higher dimensional, dependent random elds. Gidas has proved the consistency (Gidas, 1988) and asymptotic normality (Gidas, 1993) of the MLE for a MRF model under certain conditions. However, his parameterization is somewhat restricted in the sense that the potential function of his MRF model is linear in the parameters. That is, his MRF model belongs to a certain exponential family (which is typically not the case for POMMs). In this article, we shall adapt a martingale approach used by Crowder (1976) to derive the asymptotic properties of maximum (composite) likelihood estimators for POMMs, without making a stationarity assumption. For example, one of our results establishes that, under regularity conditions that include Dobrushin's condition (Dobrushin, 1968) for spatial mixing, the maximum composite likelihood estimator is consistent, asymptotically normal, and also asymptotically ecient. In Section 2, we de ne a POMM. The consistency and asymptotic normality of the maximum (composite) likelihood estimator for POMMs are proved in Section 3. A brief discussion is given in Section 4.

2 Partially Ordered Markov Models We shall now give the basic de nitions associated with a partially ordered Markov model (POMM); Cressie and Davidson (1998), Davidson and Cressie (1993), and Davidson, Cressie, and Hua (1999) can be consulted for further details. First, we introduce the notion of directed graphs. A directed graph is a pair (V; F ), where V denotes the set of vertices and F denotes the set (possibly empty) of directed edges between vertices. A directed edge is an ordered pair (v; v 0) that represents a directed connection from a vertex v to a di erent vertex v 0. A directed path is a sequence of vertices v1 ; v2; : : :; vk ; k  1, such that (vi ; vi+1) is a directed edge for each i = 1; : : :; k ? 1. A cycle is a path (v1 ; v2; : : :; vk ) such that v1 = vk . An acyclic directed graph is a directed graph that has no cycles in it. To construct a poset from an acyclic directed graph (V; F ), we de ne a binary relation  on V such that v  v 0 if either v = v 0 or there exists a directed path from v to v 0 . Then it is straightforward to check that (V; ) is a poset on V with the partial order . Before giving the de nition of a POMM on ZZd , the d-dimensional integer n lattice, wedointroduce some notation and de nitions. Consider a real-valued random eld Z (s) : s 2 ZZ ; for any set D  ZZd , let Z (D)  fZ (s) : s 2 Dg, and for any nite D1  D, let p (z (D1)) denote the probability density of Z (D1 ). Further, for any nite D1 ; D2  D, let p (z (D1)jz (D2)) denote the conditional probability density of Z (D1) conditioned on Z (D2) = z (D2). For s; s0 2 ZZd , let d(s; s0)  1max js ? s0 j, where s = (s1; s2; : : :; sd) and s0 = (s01; s02; : : :; s0d). id i i For any D1; D2  ZZd , let d(D1; D2)  inf fd(s1 ; s2 ) : s1 2 D1; s2 2 D2 g. For any D1  ZZd , let diam D1  supfd(s; s0 ) : s; s0 2 D1g. Consider now some de nitions that are related to an acyclic directed graph (ZZd ; F ) and its associated poset ZZd ;  : 2

De nition 1 For any s 2 ZZd, the cone of s is the set n o cone s  s0 2 ZZd n fsg : s0  s : De nition 2 For any s 2 ZZd, the adjacent lower neighborhood of s is the set n o adjl s  s0 2 ZZd n fsg : (s0; s) 2 F : Throughout this article, we only consider acyclic directed graphs that yield nite adjacent lower neighborhoods.

De nition 3 For any D  ZZd, the cover of D is the set n o covr D  s 2 ZZd n D : adjl s  D : De nition 4 A set D  elements,

ZZd is said to be bounded below, or a b-set, if its subset of minimal





L0  s 2 D : for any s0 2 D; either s  s0 or s; s0 unrelated is nonempty.

De nition 5 A set D  ZZd is said to be a lower set if [1k=0 covr k D = ZZd ; where covr 0 D  D and covr k D  covr k?1 (covr D); k = 1; 2; : : : . It is not dicult to see that the complement of a lower set D  ZZd , is bounded below. De nition 6 The level sets of a b-set D  ZZd are a sequence of nonempty cover sets fLn : n = 0; 1; : : :g, de ned recursively as: n ?1 k co L [ D \ D; n = 1; 2; : : :; Ln  covr [nk=0 where L0 is de ned in De nition 4. n i j It is straightforward to see that, for D a b-set, [1 n=0 L = D and L \ L = ;; i 6= j . Also note that the elements in each level set Ln ; n = 0; 1; : : : , are mutually unrelated by the partial order , and an element in Li can not be larger than an element in Lj if i < j . If D is a nite b-set, then n n , for some nite integer M . De ne mn  X jLk j; n = 0; 1; : : :; M , where j  j denotes D = [M L n=1 k=0

the cardinality of a set in ZZd , and write





Ln = sm ?1 +1 ; : : :; sm ; n = 0; 1; : : :; M; n

n

where m?1  0. Then D = fs1 ; : : :; sm g has the property that if two elements are related, say si  sj , then i  j ; i; j = 1; : : :; mM . This kind of ordering based on level sets is important later for proving the asymptotic properties of MLEs. We now have all the ingredients necessary for the de nition of a POMM on ZZd . M

3

De nition 7 Let









ZZd ; F be an acyclic directed graph with its associated poset ZZd ;  , and let L be a nonempty lower set of ZZd . Also, let Us denote any nite set such that adjl s  nUs  cone s, ando let Vs denote any nite set of points not related to s by the partial order . Then Z (s) : s 2 ZZd is said to be a partially ordered Markov model (POMM) on ZZd if, for all s 2 ZZd n L and for any

Us and Vs,

p (z(s)jz(Us [ Vs )) = p (z(s)jz(adjl s)) :

Notice that the POMM de ned by Cressie and Davidson (1998) is on a nite set of sites. The existence of a POMM on ZZd can be proved by using Kolmogoro 's extension theorem (see Durrett, 1991).

3 Limit Theorems for Maximum Likelihood Estimation Throughout the paper, we let Z (ZZd ) be a POMM with lower set L and level sets fLn : n = 0; 1; : : :g (De nition 6), and let fn : n = 1; 2; : : :g be a strictly increasing sequence of nite subsets of ZZd n L that satisfy the following condition:  nlim !1 jnj=jnj = 1;

where

(1)

n  fs 2 n : adjl s  n g;

for n 2 IN  f1; 2; : : :g. Let En  n n n ; n 2 IN . For each n 2 IN , suppose

n = fsn;1 ; sn;2 ; : : :; sn;j j g (2) such that i  j if sn;i  sn;j ; i; j = 1; : : :; jnj. In the spatial context of POMMs, Cressie and Davidson (1998) give the joint probability density (or mass) function of Z (n ) as n

p (z(n)) =

s

Y

n;k

2E

p (z(sn;k )jz(sn;1 ); : : :; z(sn;k?1 ))

n

Y

s2

p (z(s)jz(adjl s)) ; n 2 IN:

n

In the context of graphical models, this result can be found in, inter alia, Kiiveri, Speed, and Carlin (1984) and Lauritzen and Spiegelhalter (1988), although it should be noted that their results do not deal with the edge e ects, arising from Z (En ), in as much generality. The asymptotic properties of maximum likelihood estimators developed below are given in spatial setting. We assume that, for s 2 ZZd n L, the conditional probability density p (z (s)jz (adjl s); ) depends on the vector of parameters  2 , where  is an open connected subset of IRp . Hence, for each n 2 IN , the log-likelihood function based on data Z (n) is

ln() =

s

X

n;k

2E

log fp (Z (sn;k )jZ (sn;1 ); : : :; Z (sn;k?1 ); )g +

n

X 

log fp (Z (s)jZ (adjl s); )g :

(3)

n

In practice, with only the parametric form of p (z (s)jz (adjl s); ) given, we may not know the parametric structure of p (z (sn;k )jz (sn;1); : : :; z (sn;k?1 ); ), for sn;k 2 En ; n 2 IN . So, we consider also just the second part of the log-likelihood function, without edge components, given by

ln () =

X 

log fp (Z (s)jZ (adjl s); )g :

n

4

(4)

We call (4) a composite likelihood after Lindsay (1988). Note that, by (1), jEn j=jnj ! 0 as n ! 1; hence the rst part of the log-likelihood function contains comparatively little information. Suppose that 0 is the vector of the true values of the parameters. For each n 2 IN , let ^ n be  a solution of the likelihood equation, @ln()=@  = 0, and let ^ n be a solution of the compositelikelihood equation, @ln ()=@  = 0. For each n 2 IN and for sn;k 2 n ; k = 1; : : :; jnj, let





b(n)(sn;k ; )  b(1n)(sn;k ; ); : : :; b(pn)(sn;k ; ) T  @@ log fp (Z (sn;k )jZ (sn;1); : : :; Z (sn;k?1); )g ;  2  A(n)(sn;k ; )  A(j;kn)(sn;k ; ) pp  @@2 log fp (Z (sn;k )jZ (sn;1); : : :; Z (sn;k?1); )g :

Then, by the mean value theorem, the likelihood equation can be written as @ln() = X b(n) (s;  ) + X A~ (n) (s;  ; )( ?  ) = 0; n 2 IN;

@



0

n



0

0

(5)

n

~ (n) (s; 0 ; ) denotes A(n) (s; ) with rows evaluated at possibly di erent points on the line where A segment between 0 and .

3.1 Consistency, Asymptotic Normality, and Asymptotic Eciency

Several approaches have been taken in proving the consistency and asymptotic normality of the generic MLE. Here, a martingale approach used by Crowder (1976) is adapted to the spatial models we are considering. For each n 2 IN , let Fn;0  f ; ;g, and let Fn;k  f! : Z (sn;1); : : :; Z (sn;k )g be the  -algebra generated by Z (sn;1 ); : : :; Z (sn;k ), for k = 1; : : :; jnj. Then, for each n 2 IN , Fn;0  Fn;1      Fn;j j, and p (z(sn;k )jFn;k?1) = p (z(sn;k )jz(adjl sn;k )) ; sn;k 2 n: We assume the following regularity conditions: (A:1) The distributions P of Z (ZZd ) have common support for all  2 ; n

@ 2 log fp (z(s )jz(s ); : : :; z(s ); )g exists for all  2 ; s 2  ; and n 2 IN ; n;k n;1 n;k?1 n;k n 2 Z@  @ j Z @ j p (z(s )jz(s ); : : :; z(s ); ) = 0; (A:3) p ( z ( s ) j z ( s ) ; : : :; z ( s );  ) = n;k n;1 n;k?1 n;k n; 1 n;k ? 1 @ j @ j for all  2 ; sn;k 2 n ; n 2 IN; and j = 1; 2: Note that (A.3) implies that the following conditions hold for all sn;k 2 n ; n 2 IN : o n (A:3a) E b(n) (sn;k ; 0)jFn;k?1 = 0; o o n n (A:3b) E ?A(n) (sn;k ; 0)jFn;k?1 = var b(n) (sn;k ; 0)jFn;k?1 : If, for each n 2 IN , k = 1; 2; : : :; jnj, we let (A:2)

M n;k  W n;k 

k X i=1

k X i=1

b(sn;i; 0); fA(sn;i ; 0) ? E (A(sn;i ; 0)jFn;i?1)g : 5

Then, from (A.3), (M n;k ; Fn;k ) and (W n;k ; Fn;k ) are martingales for each n 2 IN . That is, (M n;k ; Fn;k ) and (W n;k ; Fn;k ) are triangular martingale arrays. The weak law of large numbers for a martingale array is given by the following proposition.

Proposition 1 Assume that f(Sn;i; Fn;i) : i = 1; : : :; kn; n 2 IN g is a triangular martingale array, where kn ! 1 as n ! 1. Let Xn;1 = Sn;1 and Xn;i = Sn;i ? Sn;i?1 ; i = 2; : : :; kn. If sup E jXn;i j1+ < 1 for some  > 0. Then as n ! 1, n;i

p Sn;k =kn ?! 0: n

Proof: In this form, a proof of the result is given by consulting Proposition 3.2 and Corollary 3.1 2

of Huang and Cressie (1996).

Theorem 1 Consider the log-likelihood function given by (3) and suppose that the true value of

the parameter is 0 . Assume that (A.1) through (A.3) and the following conditions hold:

j j n o X p (n) (s ; ) ? A(n) (s ;  ) ?! (A:4) lim sup j1 j 0; as  ! 0; A n;i n;i 0 n!1 n i=1 q   (A:5) sup E b(jn)(sn;i ; 0) < 1; for some q > 1; n;i;j 9 8 j j = < ?1 X T (n) (A:6) P : j j c A (sn;i ; 0 )c > "; ! 1; as n ! 1; for all kck = 1; and for some " > 0: n

n

n i=1

Then there exists a solution ^ n of the likelihood equation (5) such that as n ! 1, p ^n ?! 0 :

Proof: By Proposition 1, (A.5) implies that as n ! 1,  j 1 jX (n) jnj i=1 b (sn;i; 0) ! 0 a:s:; where fsn;i g are given by (2). Recall from (5) that for each n 2 IN ,  jX  j  j @ln() = jX (n) ( n ) ( n ) ~ b (sn;i; 0) + A (sn;i; 0; ) ? A (sn;i; 0) ( ? 0) @ n

n

n

i=1 jX  j

+

(6)

n

i=1

i=1

A(n)(sn;i; 0)( ? 0):

Pre-multiplying this by j1 j ( ? 0 )T , then (6), (A.4) and (A.6) together imply that there exists n  > 0 such that as n ! 1,   P j1 j ( ? 0)T @l@n() < 0 ! 1; n 6

for k ? 0 k =  , and for any 0 <  < . It follows from Lemma 2 of Aitchison and Silvey (1958) that as n ! 1,  @ln()  ^ ^ P has a zero at n such that kn ? 0k <  ! 1;

@

p for any 0 <  < . Therefore, ^ n ?! 0, as n ! 1. This completes the proof of the theorem. 2

Theorem 2 Assume that 0jthej conditions of1 Theorem 1 hold except that (A.5) now holds for some X q > 2. De ne B n  var @ b(n) (sn;i; 0)A; n 2 IN . Assume also the following condition: n

i=1

(A:7)

  j  2 1 jX T ( n ) E c b (sn;i ; 0 ) Fn;i?1 ! 1; as n ! 1; for all kck = 1: 0 n

c B nc i=1

Then as n ! 1,





d N(0; I ): B1n=2 ^n ? 0 ?! p

Proof: From (A.6), we have

(

lim inf kcinf n!1 k=1

cT

 1  ) jnj B n c  ";

(7)

for some " > 0. It follows B n is positive-de nite for large n. Moreover, (A.4) and (A.6) Xthat (n) ~ imply that the inverse of A (s; 0 ; ^ n ) exists with probability tending to 1. Therefore, with  probability tending to 1, n

9?1 8j j 8 j j 9 < = < = X X ( n ) B 1n=2(^n ? 0) = B 1n=2 :? A~ (sn;i ; 0; ^n); : b(n)(sn;i; 0); : n

n

i=1

i=1

Applying Slutsky's Theorem, it is enough to show that as n ! 1,

9?1 8 1  j = p < ?1=2 0 jX ~ (n) (sn;i ; 0 ; ^ n )A B ?n 1=2 @ ?! I p; B A ? n ; : i=1 n

(8)

and

B?n 1=2

jX  j n

i=1

d N (0; I ): b(n)(sn;i; 0) ?! p

(9)

 1 ?1=2 It follows from (7) that the largest eigenvalue of j j B n is bounded above for n large. n

Therefore by (A.4), (A.5) with q > 2, (A.7), Proposition 1, and for suciently large n, we obtain

8 j j 9 < = X ( n ) B ?n 1=2 :? A~ (sn;i; 0; ^n); B?n 1=2 ? I p n

i=1

7

9  1 ?1=2 8  j = 1 ?1=2 < ?1 jX  ( n ) ~ ^ = j j B n : jnj i=1 A(sn;i; 0; n ) ? A (sn;i; 0) ; jnj Bn n  1 ?1=2 8 9  j =  1 ?1=2 < ?1 jX   ( n ) ( n ) + j j B n : jnj i=1 A (sn;i ; 0) ? E A (sn;i ; 0)jFn;i?1 ; jnj Bn n 9 8 j j = <   X p 0; as n ! 1: + B ?n 1=2 :? E A(n) (sn;i ; 0 )jFn;i?1 ? B n ; B ?n 1=2 ?! i=1 n

n

n

This gives (8). It remains to prove (9). It is not dicult to show that (A.5) with q > 2 implies the following Lindeberg condition:   p  j  2 2  1 jX ?! 0; (10) E cT b(n) (s ;  ) I cT b(n) (s ;  )  "(c0 B c) F n



c0B nc i=1

n;i 0





n;i 0





n

n;i?1

as n ! 1, for all kck = 1 and for any " > 0. Therefore, using (A.7), (10), and by the central limit theorem for a martingale array (e.g., Durrett and Resnick, 1978, Theorem 2.3), we have



cT B

nc

 j ?1=2 jX n

i=1

d N (0; 1); cT b(n)(sn;i; 0) ?!

as n ! 1;

1 2

for all kck = 1. Since

c BB cc 1 2

= kck = 1, there exists an orthogonal matrix Tn such that ( ) =

n

T

=

n

cT B 1n=2 T n = cT ; (cT B n c)1=2

It follows that

for all n 2 IN:

 j jX  j  ?1 ?1=2 jX d N (0; 1); ( n ) T T ? 1 = 2 cT b(n)(sn;i; 0) ?! b (sn;i ; 0) = c B nc c T n Bn i=1 i=1 n

n

as n ! 1;

for all kck = 1, which is equivalent to

jX  j d N (0; I ); T ? 1 = 2 b(n)(sn;i; 0) ?! T n Bn p i=1 Cramer-Wold device (see Durrett, 1991). Since Tn is n

as n ! 1;

by the an orthogonal matrix for all n 2 IN , (9) then follows. This completes the proof. 2 Note that B n in the statement of Theorem 2 is actually the Fisher information matrix of 0 contained in Z (n ). Further, we have as n ! 1,

8 9 jX  j < = B1n=2 :^n ? 0 ? B ?n 1 b(n)(sn;i; 0); i=1 8 0 98 9 1?1 > > jX  j  j = p < 1=2 jX < = 0: = >B n @? A~ (n) (sn;i ; 0 ; ^ n )A B 1n=2 ? I p > :B ?n 1=2 b(n) (sn;i ; 0); ?! : ; i=1 i=1 n

n

n

Therefore, ^ n is also asymptotically ecient (see Basawa and Rao, 1980, Section 7.2.4). In practice, we use just the second part of the log-likelihood function given by (4); a similar result can be obtained for ^  as well (see Huang and Cressie, 1996). 8

3.2 Mixing Conditions

In this section, we shall show that under a certain mixing condition, a solution ^ n of the compositelikelihood equation, @ln ()=@  = 0, is consistent and asymptotically normal. We shall also introduce Dobrushin's condition which can be used to ensure this mixing condition. Consider a random eld Z (D); D  ZZd . Let FBZ denote the  -algebra generated by Z (B ) for B  D. The strong mixing coecients for Z (D) are de ned as

Zk;l (m)  sup f Z (B1 ; B2) : jB1 j  k; jB2j  l; d(B1 ; B2)  m; B1 ; B2  Dg ; where m 2 IN; k; l 2 IN [ f1g, and

o

n

Z (B1 ; B2 )  sup jP (A1 \ A2 ) ? P (A1)P (A2)j : A1 2 FBZ1 ; A2 2 FBZ2 :

De nition 8 Consider a random eld Z (D); D  ZZd. Let sZ (jx) denote the conditional probability measure of Z (s) given that Z (D n fsg) = x, and let k kvar denote the total variation norm of a signed measure  . For s 2 D; t 2 D n fsg, let

sZ;t  (1=2) sup

sZ (jx) ? sZ (jx~)

var ;

x;x~

where the sup is taken over all con gurations x and x~ identical except at site t. Then a random eld Z (D) is said to satisfy Dobrushin's condition (Dobrushin, 1968) if

?  sup

X

Z

s2D t2Dnfsg s;t < 1:

Theorem 3 Let Z (ZZd) be a POMM (De nition 7) with lower set L (De nition Assume  5).  that d  (A.1), (A.2), (A.3), (A.4) hold, and (A.5) holds for q > 4. Suppose that Z ZZ n L satis es

Dobrushin's condition,

n

o

sup diam (fsg [ adjl s) : s 2 ZZd n L  k < 1; and

1 cT B  c > 0; lim inf n n!1 j j n

1 0j j X for all kck = 1, where B n  var @ b(n) (sn;i ; 0 )A; n 2 IN . Then n

i=1

 (i) there exists a solution ^ n of the composite-likelihood equation, @ln ()=@  = 0, such that as  p 0 ;    n ! 1, ^ n ?! d N(0; I ).  (ii) as n ! 1, (B n )1=2 ^ n ? 0 ?! p

Proof: See Huang and Cressie (1996).

2

9

4 Discussion This short paper presents quite general asymptotic distribution theory for maximum likelihood estimators and maximum composite likelihood estimators based on data arising from POMMs. It is seen that POMMs are contained in the class of Markov random elds and hence our work has lled in a part of the dicult puzzle of parametric inference where data are spatially dependent. Huang and Cressie (1996) apply this theory to an example where the de ning conditional distribution of the POMM is a binomial distribution. They consider the poset ZZ2 ;  with partial order de ned by adjl (u; v ) = f(u ? 1; v ); (u ? 1; v ? 1); (u; v ? 1); (u + 1; v ? 1)g ; (u; v ) 2 ZZ2 ; and POMM de ned by T H (u; v )) ! exp(  Z (u; v)jZ (adjl(u; v))  Bin G ? 1; ; 1 + exp(T H (u; v ))

where G 2 f2; 3; : : :g is the number of grey levels,   ( 0 ; 1; 2; 3; 4)T is the vector of parameters to be estimated, and H (u; v)  (1; Z (u ? 1; v); Z (u ? 1; v ? 1); Z (u; v ? 1); Z (u + 1; v ? 1))T : Then Dobrushin's condition (de ned in Section 3) is shown by Huang and Cressie (1996) to be satis ed in region of the parameter space IR5 that can be easily visualized.

Acknowledgments This research was supported by the Oce of Naval Research under grant no. N00014-93-1-0001.

References [1] Abend, K., Harley, T. J., and Kanal, L. N. (1965). Classi cation of binary random patterns. IEEE Transactions on Information Theory, IT-11, 538-544. [2] Aitchison, J., and Silvey, S. D. (1958). Maximum-likelihood estimation of parameters subject to restraints. Annals of Mathematical Statistics, 29, 813-828. [3] Basawa, I. V., Feigin, P. D., and Heyde, C. C. (1976). Asymptotic properties of maximum likelihood estimators for stochastic processes. Sankhya, Series A, 38, 259-270. [4] Basawa, I. V., and Rao, L. S. P. (1980). Statistical Inference for Stochastic Processes. Academic Press, London. [5] Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B, 34, 75-83. [6] Besag, J. (1975). Statistical analysis of non-lattice data. The Statistician, 24, 179-195. [7] Cressie, N., and Davidson, J. L. (1998). Image analysis with partially ordered Markov models. Computational Statistics and Data Analysis, forthcoming. [8] Crowder, M. J. (1976). Maximum likelihood estimation for dependent observations. Journal of the London Mathematical Society, 38, 45-53. 10

[9] Davidson, J. L., and Cressie, N. (1993). Markov pyramid models in images analysis. In Image Algebra and Morphological Image Processing IV (E. R. Dougherty, P. D. Gader, and J. C. Serra, Ed.), Society of Photo-Optical Instrumentation Engineers (SPIE) Proceedings, Vol. 2030, 179-190. SPIE, Bellingham, WA. [10] Davidson, J. L., Cressie, N., and Hua, X. (1999). Texture synthesis and pattern recognition for partially ordered Markov models. Pattern Recognition, tentatively accepted. [11] Dobrushin, P. L. (1968). The description of a random eld by means of conditional probabilities and conditions of its regularity. Theory of Probability and its Applications, 13, 197-224. [12] Durrett, R. and Resnick, S.I. (1978). Functional limit theorems for dependent variables. Annals of Probability, 6, 829-846. [13] Durrett, R. (1991). Probability: Theory and Examples. Wadsworth, Belmont, CA. [14] Gidas, B. (1988). Consistency of maximum likelihood and pseudolikelihood estimators for Gibbs distributions. In Stochastic Di erential System, Stochastic Control Theory and Applications (W. Fleming, and P.L. Lions, Ed.), 129-145. Springer, New York. [15] Gidas, B. (1993). Parameter estimation for Gibbs distributions from fully observed data. In Markov Random Fields (R. Chellappa, and A. Jain, Ed.), 471-498. Academic Press, Boston. [16] Heijmans, R. D. H., and Magnus, J. R. (1986a). On the rst-order eciency and asymptotic normality of maximum likelihood estimators obtained from dependent observations. Statistica Neerlandica, 40, 169-188. [17] Heijmans, R. D. H., and Magnus, J. R. (1986b). Consistent maximum-likelihood estimation with dependent observations: the general (non-normal) case and the normal case. Journal of Econometrics, 32, 253-285. [18] Huang, H.-C., and Cressie, N. (1996). Asymptotic properties of MLEs for partially ordered Markov models. Technical Report 96-7, Department of Statistics, Iowa State University, Ames, IA. [19] Kiiveri, H., Speed, T.P., and Carlin, J.B. (1984). Recursive causal models. Journal of the Australian Mathematical Society, 36, 30-52. [20] Lauritzen, S.L., and Spiegelhalter, D.J. (1988). Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society, Series B, 50, 157-224. [21] Lauritzen, S.L., Dawid, A.P., Larson, B.N., and Leimer, H.G. (1990). Independence properties of directed Markov elds. Networks, 20, 491-505. [22] Lindsay, B.G. (1988). Composite likelihood methods. In Statistical Inference from Stochastic Processes (N.U. Prabhu, Ed.). Contemporary Mathematics Vol. 80, 221-239. American Mathematical Society, Providence, RI. [23] Possolo, A. (1986). Estimation of binary Markov elds. Technical Report no. 77, Department of Statistics, University of Washington, Seattle, WA. [24] Sarma, Y. R. (1986). Asymptotic properties of maximum likelihood estimators from dependent observations. Statistics and Probability Letters, 4, 309-311. 11