Aug 8, 1989 - of Gibbs Random Fields Using Simulated Annealing. SRIDHAR LAKSHMANAN AND HALUK DERIN, MEMBER, IEEE. Abstract-An adaptive ...
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE. VOL. I I , NO. 8. AUGUST 1989
199
Simultaneous Parameter Estimation and Segmentation of Gibbs Random Fields Using Simulated Annealing SRIDHAR LAKSHMANAN
AND
Abstract-An adaptive segmentation algorithm (ASA) is developed which simultaneously estimates the parameters of the underlying Gibbs random field (GRF) and segments the noisy image corrupted by additive independent Gaussian noise. The additivity and Gaussian assumptions on noise a r e not critical to the algorithm and hence, can be relaxed. The number of region types that comprise the image, the associated gray-levels, and the noise variance a r e assumed known. The algorithm, which aims a t obtaining the maximum a posteriori (MAP) segmentation, is basically a simulated annealing algorithm that is interrupted at regular intervals for estimating the GRF parameters. Maximum-likelihood (ML) estimates of the parameters based on the current segmentation a r e used to obtain the next segmentation. It is proven that the parameter estimates and the segmentations converge in distribution to the M L estimate of the parameters and the MAP segmentation with those parameter estimates, respectively. Thus, the theoretical justification for the algorithm is established. Due to computational difficulties, however, only a n approximate version of the algorithm is implemented. Specifically, the M L estimation of the parameters is replaced by the maximum pseudo-likelihood (MPL) estimation. MPL estimates a r e obtained by a second level of simulated annealing. This procedure provides a fast and reliable way of obtaining the MPL o r coding estimates of GRF parameters. The approximate algorithm is applied on several 2- and 4-region noisy images with different noise levels and with first- and second-order neighborhoods. This algorithm constitutes a significant step toward a completely data-driven segmentation of noisy images modeled with GRF's.
Index Terms-Gibbs distributions, Gibbs random fields, image segmentation, Markov random fields, parameter estimation.
I. INTRODUCTION HE goal of image segmentation is to partition the given image into regions based on some similarity criterion. Segmentation is performed by assigning each pixel to one of the allowed classes (or region types) based on some local processing on the neighborhood of the pixel. The existing segmentation algorithms can be grouped into two categories: statistical and structural. The algorithm presented in this paper falls into the statistical category. In this category, the images are modeled as realizations of random fields; and, for segmentation, statistically optimal estimation techniques, such as minimum meansquared error (MMSE), maximum likelihood (ML), and
T
Manuscript received September 15, 1987; revised October 5 , 1988. Recommended for acceptance by W. E. L. Grimson. This work was supported in part by the National Science Foundation under Grants ECS8403685 and ECS-8617995 and the Office of N a v a l Research under Grant N00014-85-K-0561. The authors are with the Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA 01003. IEEE Log Number 8928493.
HALUK DERIN,
MEMBER, IEEE
maximum a posteriori (MAP) estimation, are used. Recently Markov random field (MRF) and Gibbs random field (GRF) image models have been used to obtain MAP and other related optimal segmentations (or restorations) of noisy, textured and noisy textured images [31, [61, [71, [9]-[121, [151, [16], [20], [22]. For obtaining these MAP and other related segmentations different techniques have been used, namely: dynamic programming [IO], [ 1 I], [ 151, [20], stochastic relaxation/simulated annealing [ 161, [22], split and merge [6], and deterministic relaxation [3], 161, [71, ~ 2 1 . It is desirable that a model-based segmentation algorithm such as ours is completely data-driven, in that it does not depend on the knowledge of the model parameters or depends on as little of it as possible. In all of the above mentioned studies [3], [6], [7], [9]-[121, [151, [161, [20], MRF/GRF model parameters necessary for the segmentation are assumed to be known. In many instances, this is not a valid assumption and model parameters need to be estimated. To estimate the parameters, one needs realizations from the related random fields (segmentations in this case) and to obtain segmentations one needs the model parameters. This paper aims at solving precisely this problem. It presents an adaptive segmentation algorithm (ASA), which recursively segments the noisy image and estimates the parameters. For segmentation, the algorithm uses the SA procedure. It has been proven [16] that with known parameters and proper annealing schedules, the SA procedure converges to a MAP segmentation, a global maximum of the posterior distribution. In the case of unknown parameters, we propose an adaptive algorithm that uses the SA procedure with the current estimates of the parameters and interrupts the SA procedure at regular intervals to update the parameter estimates by using the current segmentation. We show that the proposed adaptive algorithm also has desirable convergence properties. Although the MRF/GRF model parameters are assumed unknown, the number of regions, the corresponding gray-levels and the noise variance are assumed known. For the Gibbsian parameters, ML estimation is proposed; but in practice an approximation to ML, a procedure which maximizes a pseudo-likelihood [3], is used. This procedure, which we call maximum pseudo-likelihood (MPL) estimation, is an extension of the coding method [l]. A second level of SA is employed in maximizing the pseudo-likelihood, which presents a convenient and reliable implementation of MPL estimation. An
0162-8828/89/0800-0799$01.OO 0 1989 IEEE
800
IEEE TRANSACTIONS ON PATTERN ANALYSIS A N D MACHINE INTELLIGENCE, VOL. 11. NO. 8. AUGUST 1989
abridged version of this work appeared in [ 131 and an earlier version in [21]. An adaptive segmentation algorithm, which recursively estimates the model parameters and segments the noisy image, has also been proposed by Geman [17]. This algorithm, which also uses the SA procedure interleaved with the ML estimation of the parameters, is conceptually the same as our algorithm. Geman proposed to compute the ML estimate of the parameters by setting the gradient of the likelihood to zero. This procedure, although simple in concept and appearance, was not explicitly described or implemented in [17]. In this study, we addressed and settled a number of issues left open in [17], namely: 1) the convergence properties of the adaptive scheme, 2) the computational concerns in implementing it, and 3) its performance (experimental results). The resulting ASA is shown to yield good segmentation results and accurate estimates for the parameters. During the writing of this paper, we became aware of another study on adaptive segmentation of Gibbsian fields (Younes [26]). Younes proposes a new technique using a stochastic gradient algorithm to obtain the ML estimate of Gibbsian field parameters. He then incorporates steps of this parameter estimation scheme into the SA procedure, thus obtaining a recursive segmentation-parameter estimation algorithm similar to ours, but no experimental results are reported. The rest of this paper is organized as follows. In Section 11, we briefly describe the image model, which is basically a discrete-valued GRF corrupted by additive independent Gaussian noise. Section I11 presents a formal statement of the problem and the ASA as its solution. Section IV contains the main convergence result for the ASA. An approximate version of the ASA and experimental results on the segmentation of noise corrupted Gibbsian and hand-drawn images are presented in Section V and some concluding remarks in Section VI. Appendix A presents the proof of the main convergence result which is quite tedious. We included it here because it serves as the main theoretical justification for the algorithm. Appendix B contains the parameter estimation scheme used as part of the adaptive algorithm.
would still be valid for nonadditive and non-Gaussian, e.g., multiplicative and exponential, noise. However, the independent noise assumption is necessary for the ASA. The scene X = { X , } is a (finite) discrete-valued r.f., where each Xl, takes one of A4 values in G = { g l , g2, . . . , g M } called “gray-levels.’’ We will use x , w , and p to represent realizations of the X r.f. X is a MRF with respect to a neighborhood system q = { ql,: (i, j ) E L }, and is described in terms of the local characteristics
p(x,
=
x,Ix,, = Xkl,
=
P(X, =
~, Ix, ,~
(k, 1 )
f
= x,,,)
k j ) )
for ( i , j ) E L
(2) where q0, the neighborhood of pixel (i, j ), is such that ( i , j ) $ q v and (k, 1 ) E ql, implies ( i , j ) E q k [ . The neighborhood systems that are commonly used in image processing are q ’ , q2, . . . , where ql denotes the system of first-order neighborhoods consisting of the four nearest neighbors of each pixel and q2 denotes the system of second-order neighborhoods consisting of the eight nearest neighbors of each pixel. There is an equivalent characterization [ l ] of MRF’s satisfying P ( X = x ) > 0, for all x, through their joint distributions, which are Gibbs distributions (GD’s) having the form
where c is a set of pixels, called a clique, that consists of either a single pixel or a group of pixels such that if ( i , j ) # ( k , l),then(i,j)Ecand(k, 1)~ctogetherimply that ( i , j ) E qkl; C is the set of all cliques for the neighborhood system q over the lattice L; V c ( x ) , called the potential function associated with clique c , is an arbitrary function of the realization x that depends only on the restriction of x to c ; and Z is a normalizing constant called the partition function. A GD is usually expressed with a constant in the exponent called the temperature. In (3) the temperature constant is lumped with the potential functions. The clique types associated with the first-order and second-order neighborhood systems are shown in Fig. 1 . 11. IMAGEMODEL The neighborhood system q specifies the associated In this section, we present a description of the assumed cliques and the potential functions defined for these image model. All images are defined on an NI X N2 rect- cliques specify the GD. A more detailed discussion on angularlatticeL = { ( i , j ) : 1 s i 5 N I , 1 s j 5 N 2 } . MRF’s and GRF’s can be found in [l], [ l l ] , [16]. In this The observed image (matrix) y = { y , } is a realization study, we use the term “distribution” generically to stand from the random field (r.f.) Y = { Y o } which is the sum for the probability mass function in the case of discrete of two r.f.’s: 1) the scene (noise-free image) r.f. X, and r.v.’s and for probability density function in the case of 2) the corruptive noise r.f. W . In other words, continuous r.v.’s. In this and some of our previous work [lo], [ 111, [ 151, KJ = xJJ + ( i , j )E L ( 1 ) we have used a class of GD’s which is particularly suitor in matrix form Y = X + W . The noise r.f. W = { W[,} able for describing the regions in an image, that is, the consists of i.i.d. Gaussian r.v.’s with mean 0 and vari- scene r.f. X in the above model. We have called this class ance u 2 . We point out that neither additivity nor Gaus- of GD’s, multilevel logistic (MLL) distribution. The MLL sianity of the noise are critical properties of the model. In distribution is defined as follows. A parameter is associother words, with trivial modifications, the proposed ASA ated with each clique type, except for single pixel cliques,
w]
801
LAKSHMANAN A N D DERIN: SIMULTANEOUS PARAMETER ESTIMATION A N D SEGMENTATION
++ ...
._.... ...._ ...(.( ... ... ... ... (
First-order neighborhood q 1
Clique types o f q 1
Second-order neighborhood q 2
Clique types of q 2
Fig. 1. Clique types associated with first-order and second-order neighborhood systems.
say Pk with clique type k . The potential function for all cliques of that type is then defined as VCk(4
=
& -&
if all xij in ck are equal otherwise
(4)
where ck denotes any clique of type k . For the single pixel cliques, the potential function is defined as
mentation problem is posed as follows. It is desired to devise an estimation scheme which, based on the observed image y, will yield an optimal estimate x* = x* ( Y ) I y = of the noise-free scene. A commonly sought one is the MAP estimate [3], [6], [7], [9]-[12], [15], (161, [20]. Determining the MAP or any other nontrivial estimate which is optimal in some sense is a difficult estimation problem. The difficulty is compounded by the fact that the model parameters, in particular the parameter vector 8,necessary for the estimation procedure are not known. Therefore, both the r.f. X and its parameters 8 have to be estimated simultaneously or recursively from the observed noisy image y. In other words, we seek an estimation scheme which, based on the observed image, an estimate pair for X and 8 that is will yield (x *, optimal in some sense. One possible criterion of optimality is
e*),
e*)= arg max P ( X = x, Y = y ( 8 ) . ( 6 ) e e*)that satisfies (6) is the global maximum of
(x*,
x,
The (x*, P ( X = x, Y = y 1 8 ) with respect to x and 8.However, the maximization implied in (6) is an extremely difficult V c ( x ) = a, ifx, = g, for c = ( i , j ) ( 5 ) one, having no solution known to the authors. It is conwhere CY, is a parameter associated with region m. By as- cievable to propose the SA procedure to achieve this maxsigning the same potential function to all cliques of a cer- imization, but even that is not implementable because the tain type, independent of their positions in the image, it local characteristics with respect to the components of the is implicitly assumed that the X r.f. is homogeneous. The parameter vector 8 are not readily computable from P ( X values of the parameters { & } influence the sizes and = x, Y = y l e ) . Since the optimality in (6) cannot be implemented with shapes of the resulting regions, while those of {CY,} influence the relative likelihood of each region type. The reasonable computation times, we adopt the following parameters { a,} and { & } constitute the entries of the criterion instead: model parameter vector, denoted by 8.In this work, the x* = arg max P(X = x, Y = y l e * ) , scene r.f. X is assumed to be a MLL field with respect to X a first-order or a second-order neighborhood system. Of e* = arg m a x P ( X = x*, Y = yle). ( 7 ) the Gibbs model parameters only those associated with 0 the single pixel and pair cliques are assumed to be nonzero and are estimated. Other model parameters such as The ASA presented here can be used to achieve the maxthe gray-levels { g , } , the number of region types M and imization in (7) and furthermore a close approximation of it can be implemented. The criterion in (7) appears to the noise variance u2 are assumed known. By considering higher order neighborhoods than the first be similar to the one in (6), but it is weaker. In other and second-order and also including all the clique types words, the estimate pair satisfying (7) is suboptimal with associated with the neighborhood, one would expect to respect to the criterion in (6). It has been shown in [25] get better segmentation provided that the parameter esti- that the pair (x*,e*)satisfying the criterion in (7), called mation can be done accurately and also that the basic the partial optimal solution, is not necessarily a local model assumption (i.e., MLL) is valid. However, this maximum. In practice, however, it is almost always a lowould imply a significant increase in the computational cal maximum [25]. Also, (x*, e*)is a global maximum burden for parameter estimation as well as for segmenta- with respect to x for 8 = 8" and a global maximum with tion. In this one and in our previous studies, we have ob- respect to 8 for x = x * . Furthermore, the estimate pairs served that the MLL model with the above mentioned re- satisfying (6) and (7) share the following property: x* is strictions captures the essential features of the regions in the MAP estimate of X based on y and 8*, where 8" is the ML estimate of 8 based on P ( X = x* 1 8 ) . This is an image and also yields good segmentation results. true, because P ( X = x, Y = y 1 e*)= P(X = x 1 Y = y , STATEMENT AND THE ADAPTIVE 111. PROBLEM e*) P ( Y = y Je*),where P ( Y = y ( e*)is indepenSEGMENTATION ALGORITHM dent of x, and hence x* is the MAP estimate; and also In this section, we present a formal statement of the P ( X = x*, Y = y l e ) = P ( Y = y j x = x * ) P ( X = problem and an adaptive segmentation algorithm (ASA) x* 1 e), where P ( Y = y I X = x * ) is independent of 8, as its solution. For the model described above, the seg- and hence 8* is the ML estimate based on x*. e
no2
IEEE TRANSACTIONS ON PATTERN ANALYSIS A N D MACHINE INTELLIGENCE. VOL. I I . NO. 8. AUGUST I Y X Y
The objective now is to obtain an estimate pair (x*, we suppress this dependence in our notation for convenience. sible way of getting such an estimate pair is to devise a The updating of the x variables are done according to a recursive estimation scheme { (i ( t ), 6 ( t ) ) } that con- visitation schedule { n , } , specified as a function of the verges to (x*, e*).We propose a procedure, which we discrete time variable t. n, denotes the pixel to be visited call the adaptive segmentation algorithm (ASA) , that will at time t . The visitation schedule is periodic and a period The ASA uses the SA procedure of these visits, called a package, consists of at least one yield such an (x*, as a major component. Therefore, we first describe SA visit to each pixel of the lattice. A sequence of visits conbriefly. SA is an iterative procedure for finding the sisting of exactly one visit to each pixel, possibly in a mode(s) of a function (say H ) of many variables [ 161. To raster scan fashion, is called an iteration. That is, a packfind the modes of H,the procedure treats -H as the en- age of visits may consist of one or more iterations of visergy of a (hypothetical) physical system. Then, it simu- its. After each package is completed, the parameter estilates the physical process of annealing by slowly decreas- mate is updated by determining the ML estimate of 8 ing a parameter corresponding to the temperature of the based on the x configuration at the end of the package. system. Thereby, it forces the system into its lowest en- The time variable t is frozen during the parameter estiergy state(s) corresponding to the mode(s) of H.Theoret- mation. The updated parameter estimate is kept fixed and ically, SA can be viewed as a procedure that generates a used during the next pacFage of x updates. In other words, nonstationary Markov chain whose distribution converges the parameter estimate 8 ( t ) as a function of t is allowed to the uniform distribution over the modes of H. For a to change only at the end of each package and is kept first thorough treatment of SA and the main convergence constant through the next package of x updates. The temresult we refer the reader to [ 161. perature T decreases with t according to a prespecified The ASA is basically a SA procedure over the x vari- annealing schedule T ( t ) . ables, which is interrupted at regular intervals to get an Theoretically, the recursion consisting of a package of ML estimate of 8 based on the most recent x configura- x updates and estimate of 8 update should be carried out tion, and then SA is continued with the current estimate indefinitely; but, in practice, it is done a sufficient number of 8 and so on. In order to introduce temperature to the of times to yield good estimates and segmentations. To annealing process, we consider the following function for start the recursion, an initial estimate pair (x,,, e,) is maximization needed. Theoretically, the algorithm converges indepen-
e*)that satisfies the optimality criterion in (7). One pos-
e*).
for T a positive number. By assuming that 8 is restricted to a finite volume, we can ensure the existence of the integral in the denominator of (8). The optimality criterion in (7) is equivalent to the same criterion expressed in terms of @ ( x ) . Thus we seek to maximize @(x) in the same sense as (7). Note that although @(x) is a function of y,
dent of the initial estimate. However, in practice, judiciously selected initial estimates may lead to better parameter estimates and segmentations in fewer recursion steps. The individual updates of xQ's are done as follows. At time t , if n, = (i, j ), then xij is replaced by a random value drawn from the conditional distribution
(9)
803
LAKSHMANAN A N D DERIN: SIMULTANEOUS PARAMETER ESTIMATION A N D SEGMENTATION
where X ' denotes the restriction of X to L - (i, j ) , C, cenotes the set of cliques containing the pixel (i, j ) and VF)( ) represents the potential of clique c for the given argument ( . ) and with the parameter estimate 6 ( t ) . Let t = tk mark the end of the kth package of x updates. Then at t A = tk the parameter estimate is updated from 6 ( t k ) to 8 ( t k + 1 ) based on i ( t k ) , the segmentation at the end of the kth package, in the following way:
-
6(tk+
1 ) = arg max 0
@,,,)(x= i ( t k ) )
= arg max P ( X = i ( t k ) , Y =
e
= arg max 0
yle)
P ( X = i(tk)le).
(10)
The third equality in (10) follows from model assumptions. According to (lo), 6 ( tk + 1 ) is the ML estimate from P ( X = i ( t , ) I Here, we claim that the ASA described above yields a sequence of estimates { (ti )( , 6 ( t ) ) ] that converges to a (x*, which satisfies the criterion in (7). In the next section, we make this statement more precise and in Appendix A, we prove it. Thus, we establish that the ASA is a theoretically justified procedure. The implementation of the ASA, however, is not a simple matter, because the maximization in (10) is difficult to realize. This is precisely the difficulty in estimating the parameters of GD's, which necessitates the use of approximations such as the coding method [ 11 and the maximum pseudo-likelihood (MPL) procedure [3]. We adopt the MPL procedure which aims at maximizing the pseudolikelihood
e).
e*)
e)
with respect to 8 , instead of the likelihood P ( X = x 1 in (10). It has been noted [3] that the MPL estimate that maximizes the pseudo-likelihood in (11) is a good approximation for the ML estimate of (10). Specifically, it has been shown [19] that the MPL estimate is consistent, that is, as image size increases, the MPL estimate converges to the true value with probability 1. Also, Besag [ 2 ] has shown, for Gaussian fields, that the MPL estimates are as efficient as the ML estimates and are numerically close to them. The MPL estimate is obtainable with reasonable effort. We propose to maximize (1 1) through a second level of SA. A detailed account of the MPL procedure and our implementation using SA is presented in Appendix B. Some additional approximations are necessary in implementing the ASA, such as performing only a finite number of recursion steps and making compromises in the theoretically valid annealing schedule. Thus, although the ASA presented in this paper is proven to be optimal [in the sense of (7)], the actually implemented algorithm is an approximation to it. However, these approximations necessitated by practical limitations such as reasonable
computation times, are the ones that are frequently made in most SA implementations. As an aside, we note that the recursive nature of the ASA resembles the similar property of the EM algorithm. EM algorithm is a recursive procedure which aims at determining the ML estimate of a parameter using incomplete data. The recursion consists of completing the data by determining the conditional expectation of the missing part of data based on the available portion and the current parameter estimate (E-step) and determining the ML estimate of the parameter based on current estimate of the complete data (M-step). It has been shown [8] that with this scheme the parameter estimate converges to the ML estimate based on incomplete data. Interpreting the noisy image as the incomplete data and the noisy image together with the underlying region process realization as the complete data, the similarity between our ASA and the EM algorithm becomes evident. However, the E-step of the EM algorithm cannot be implemented in our segmentation problem. Hence, we cannot directly make use of the EM algorithm for our purpose. IV. CONVERGENCE RESULTFOR THE ADAPTIVE SEGMENTATION ALGORITHM Following the introduction of some new notation, we present the main convergence theorem.
! & ( 0 ( t ) ) The set of realizations of X that are the ( * ) [defined in global maxima of (@I. exponent [except for T ( t ) ] in the conu ~ ( ~ ) ( x , The ) ditional distribution (9) of x, given its neighbors, where s is a single pixel. Note that the dependence on the neighboring pixels is suppressed in the notation. A [ 6 ( t )] Denotes max,?
for all q # s
Lemma 2:
otherwise.
0
(Al) We make the following observations about the evolution of f ( t ) : 1) The transition probability in (Al) depends on n,, 6 ( t ) , and T ( t ) , and consequently on t; therefore, R ( t ) is a nonstationary Markov chain. 2) Giveni(t’), i ( t ) fort > t’ does not depend o n i ( Z ) for I < t‘ and on 6 ( l )for 1 5 t ‘ . 3) The evolution of 6 ( t ) is different from that of f ( t ) . 6 ( t )is kept constant during the time period associated with a package of visits and it is updated at the end of each package to the ML estimate of 8 based on f at that instant. In other words, 6 ( t )is a deterministic function of the 2 at the end of the most recently completed package.
We now show how Lemmas 1 and 2 imply the theorem. First, note that
(1 P(t9
* 1 0 3
I(P(t, *
5
+
n,”?.) 1) ( 0 , p ) - n$Y. )(I
P) -
lp-@t)(
.)[I
* ) - I$:(
(A2)
due to the triangular inequality. Notice that for each w , P(t, U . IO, p ) is a sequence of real numbers in t , cor any p; U ) is a sequence of r.v.’s in t (because 0 ( t ) is e* a r.v.); and, no1 ( a )is just a real number. It follows from Lemma 1 that the first term in the RHS of (A2) converges to zero pointwise (although it is a random sequence) and it follows from Lemma 2 that the second term in the RHS of (A2) converges to zero in probability. These combined A . 2 Some Notation imply that the LHS of (A2) converges to zero in probae’ We now introduce some notation which makes the proof bility. But, since no‘( U )is just a real number for each a, of the convergence theorem more convenient. it follows that P ( t , ~ ( 0 p,) converges pointwise to be(‘) is the uniform distribution over Q o ( O ( t ) ) . e* no‘ (U),independent of p. This means that, lim/-+ f ( t ) (Note that l$(‘)( ) does not depend on T ( t ) whereas D ne(/) = x,*k for some k and i. T(f)( 1 does.) The argument in the previous paragraph says that P ( t , P ( t , * It’, p ) 6 P ( i ( t ) = If(!’) = p ) . e* Prr,f(2 E, P ( t , . I t ’ , p ) It:(“)( p ) . Note that w IO, p ) converges to IIoi ( U ) . This means that for any E P,,,,( * ) is a function of 6 ( t ’), but we suppress the 6 ( t ‘ ) > 0, there is a t o ( € )such that f o r t 2 t o ( € ) ,( I P ( f ( t )= e* dependence from the notation. * ) - no‘ ( * ) 1) < E . This, on the other hand, implies that h ( t ) b i n f , i n f l , , , N @ : ~ ~ ( ~ , = wSlxy = wY,forall R ( t ) E Do(8,*) w.p. 2 1 - 6 . By the definition of q f s). This corresponds to tbe smallest possible condi- Qo(8:),for i ( E Qo(8:), the next update of 8 will tional probability given 8 = 8 ( t ) . , ( t )= 0: w.p. L yield 0;. Therefore, f o r t 2 t o ( € ) 6 P D =G X G X * x G. It is the set of all realiza- 1 - E . Thus, lim,+w 6 ( t )= 8:. P tions of the r.f. X . D These results, lim,,,f(t) = x,*k an9 l i m t - + m 6 ( r )= IIP - QII A E, ( P ( w ) - Q ( u ) ~ This . corresponds 07, together imply that lim,+w( R ( t ) , 8 ( t ) ) (x:, e,*) to a metric on probability measures defined on Q . 1 . 1 is r4, P. 271. U sometimes used to denote absolute value, and sometimes It now remains to prove Lemmas 1 and 2 . the cardinality of a space. The one intended should be clear from the context. A . 4 Proof of Lemma 1 Let T be the length of a “package.” Then, tk = k 7 To prove Lemma 1, we need the following two new marks the end of the kth package. Note that if one “packlemmas. age” consists of only one “iteration” of visits, then T = Lemma I . 1: 0 N. m , ( t k ) = sup { t : t 5 t k , n, = s } , f o r s E L . In other lim sup ( p ( r ,U(t’, p ’ ) - ~ ( tu ,( t ’ ,p 2 ) ( = 0, t + w ,,,‘,!.I2 words, m, ( t k ) is the last time site s was visited before (or at) tk. for any t’. (A3)
-
-
e
)
0
-
)
808
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 11. NO. 8, AUGUST 1989
U
Lemma 1.2: lim IIP,-l,i(
e )
- I$(r)(*)ll
= 0.
(A4)
It follows from the conditions set on T ( * ) in (14b) that (A10) is
t-m
We now show how Lemmas 1 . 1 and 1.2 imply Lemma 1 . First note that, lim l I ~ ( t* ,
10,
p) -
I@(”(
*)\I
KN - kr ’ 1-
From (A9)-(A11) we obtain
t+m
by the triangular inequality. The second term in the RHS of (A5) is equal to zero by Lemma 1.2. We now show that the first term i-s also zero. Using the fact that P ( t 1 , IO, p ) and II:(‘)( ) have total mass 1 , upon some calculations we obtain the following
-
Therefore, 13( k ) represents a lower bound for the transition probability from the beginning to the end of the kth package. Now let t > tl . Using the Chapman-Kolmogorov theorem, we obtain
-
= lim r+m
-
C sup w
5 lim 2 t-m
1, pilo, p ) -
C sup w
U ’ )P ( t l , w ’ ( 0 , p )
r
- P ( t , w ( t - 1, P 2 ) )
. {p(t-
c P ( t , coltl,
inf P
p2
~$(t-l)(pl))l
I P ( r , w l r - 1 , pI)
1
. (A13)
We now establish an upper bound for the first term inside the supremum over w in (A13). We know from (A12) that P ( t l , w ’ 10, p ) 2 Z K N / r = a( 1). Let be a func( ~ 6 ) tion over such that +(a’)2 M - N / r for all w’ and & 4 (a’)= 1 . Such a 4 ( . ) exists; for example, P ( t l , IO, p ) itself is a possibility. It now follows that + ( a )
~ 1 . ~ 2
I
.
- P(t, w ( t - 1, p 2 ) ( .
The RHS of (A7) is zero by Lemma 1 . 1 . Therefore, it follows that the LHS of (A5) is zero, which completes the proof of Lemma 1. We now prove Lemmas 1 . 1 and 1.2. Proof of Lemma 1 . 1 : Using the definitions of ( and A ( * it can be shown that
6
[ l - M%(l)]
+ a(1)
N
c P(t,
Wltl,
a’).
(A15)
w‘
i n f C P ( t , coltl, w r ) ~ ( t l~, ‘ ( p0 ), U’
2 [l
-
M%(l)]
P(t, wltl,
U*)
+ a ( 1 ) c P ( t , w l t l , U’).
(A16)
W I
Therefore, using (A151 and (A16), the LHS of (A131 becomes
- [l I 2 -
P ( t , W J t l , U*)
Now, looking at the second term inside the supremum Over in (A13), let = arg min,, p ( t , I t l , Using similar arguments as for the first term, we obtain a lower bound for the second term P
where for convenience we write mi ( t k ) as mi. Recall that: 1) 8 is constant between t k - 1 and t k , and 2) T ( ) is decreasing between t k - and t k (i.e., T( t k ) T ( m j )for a l l j ) . Using these two facts along with (A8), we get the RHS of (A9) to be
(A14)
w’
Let U* = arg max,, P ( t , w 1 t l , U ’ ). Then the supremum in the RHS of (A14) is obtained by choosing a 4 ( * ) that places a ( 1 ) on each U ’ ,and the remaining 1 - M N a( 1 ) on a*.The value of the RHS of the (A14) so obtained is =
where is the number of graY-levels and x ( t > i,S the smallest conditional probability corresponding to 8 ( t ) . By breaking P ( t k , w I t k - I , U ’ into a Product of conditiOnalS and using the definition of A ( * ), we obtain the inequality
P ( t , coltl, U ’ )4 ( w ’ ) .
5 sup
M % ( l ) ] sup IP(r, w l t , , w,P
M - exp ~ -
P(t,
I
tl,
P2)
I.
‘.
pl)
P2
(A17)
~
809
LAKSHMANAN A N D DERIN: SIMULTANEOUS PARAMETER ESTIMATION A N D SEGMENTATION
Using this procedure over successive packages yields the following upper bound for the LHS of (A13)
where I ' ( e ( t + I ) , p ) is the number of times Qcr (a,,, I, pn,n f n, + ) is nonzero in the summation +
E~;!+,
n # n , + ' ) . Cancelling some
~ ~ $ ( ~ + l ) ( w ; , + , , pn,
terms, the RHS of (A22) becomes
Since the supremum in (A18) is less than or equal to unity an upper bound for (A18) is given by K(r)
I I
k=l
K t,f l,
[ l - MNc3(k)] =
/
-
II
-+
lim IlIIf!;:)) -
@(')/I
=
n0(6(t +
1))
( A23 1
otherwise.
k= 1
As t 00, K ( t ) 00 and since & ( k r ) - ' is divergent for every 7,the above product in the RHS of (A19) goes to zero [24] and consequently the LHS of (A13) goes to 0 zero. This completes the proof of Lemma 1 . 1 . Proof of Lemma 1.2: We point out that Lemma 1.2 implies the pointwise convergence of a random sequence. To prove it, we state and prove the following claim which also expresses a pointwise convergence of a random sequence. Claim I : -+
for w E
\
0
pointwise.
This expression in (A23) is the uniform distribution over Q o ( e (+t 1 ) ) . Hence, it is equal to I I t ( ' + ' ) ( w ) . Thus, from (A2 1) we have 1 P,,,+ ( ) - IItcrI ) ( ) 1 < 6. This 0 completes the proof of Lemma 1.2. It now remains to prove Lemma 2. +
A . 5 Proof of Lemma 2 Since Lemma 1 implies pointwise convergence, for any outcome and any E > 0, there is a to( E ) such that for t 2 t o ( € ) , II ~ ( t * ,10, p ) - I I ~ ( ~ ) ( 11 < E , for any p. Since this true for any p, it follows that
-
(A20)
*)
llP(.t(t) =
-
0, there exists a to = to(6') such that for t 2 to, I @ { $ ( w ) - IIt(')(u)l < 6'. Determining t o ( 6 ' ) requires a tedious analysis which we have done. However, we choose not to include it here, mainly for the sake of brevity. Continuing with the proof of Lemma 1.2, for any w and any 6 > 0, using Claim 1 , we can prove that there exists a t l = t l ( 6 ) such that for t 2 tl,
For any 6 > 0, there exists an to > 0 such that, by the above argument, there is a and for t 2 to, everything that follows is true. (We have devised a scheme to specify eo in terms of 6, but we choose to omit it for the sake of brevity.) Let 2 ( t ) = p ' . Since the parameters 8 are allowed to change only at the end of a package, le! t correspond to tk, the end of the kth package. Then, e(t 1) = arg maxe P(X = p'l e) implies that
+
p(a(t
+ 1)
= p'Ii)(t
+ 1))
2 P(.t(t)
=
p.'(6(t)). (A25 )
From (A24), we have P(a(t @+I)
Pn,
n
pn,
+ 1))
-
II$(f+')(p')
-eo.
(A27)
I
(A21) Again, proving this involves some tedious analysis and we choose to leave it out of this-presentation. Using the * ) and IIt(')( * ), the second term definitions of II;('+')( in the LHS of (A21) can be expressed as
f
= plJe(t
f ",+I)
c ~t(t+1)(4,+l~
*nr*
+ 1)
1
Subtracting (A27) from (A26) and using (A25) we get
IIt(r+l)( p') > IItcr)( p') - 2
~ ~ .(A28)
From (A24), (A25), and (A26), it follows that p' E Q , ( e ( t ) ) and Q o ( 6 (+ t 1 ) ) w.p. 2 1 - eo. If p' E Q0 ( 8 ( t )) and no( 6 ( t 1 ) ), then (A28) implies that
+
1
1
(n,(e(t + 1))/ >
Q
O
(
260.
(A29)
W
1 no( ) I is integer valued. For the chosen eo, (A2!) implies that f o r t 2 t O ( c O ) , ( Q , ( e ( t + 1)1 5 IQ,(&e(t)( W.P. 2 1 - c0. First, I Q o ( O ( t + 1 ) ) ) < IQ,(e(t))l obviously implies that 6 ( t 1 ) # 0 ( t ) . Second, because of (A24) and the fact that the ( t l ) is a unique
+
e +
810
IEEE TRANSACTIONS ON PATTERN ANALYSIS A N D MACHINE INTELLIGENCE. VOL. I I , NO. 8. AUGUST 1989
ML estimate, ( Q o ( 6 ( t + 1 ) ) l = l Q o ( 6 ( t ) ) ( implies 6 ( t + 1) = & t ) . Suppose that at time to, I no(6 ( t o ) ) I = m. If 6 ( t o ) # 8; for some i, then there exists a nonempty set H C Q o ( 6 ( t O ) ) such that, for any p E H , arg m a x e P ( i ( t o ) = p l 8 ) # 6 ( t o + 1) (from the discussion above, this implies I Q o ( 6 ( t o + 1 ) ) 1 < l Q o ( 6 ( t o ) ) \ ) . While for p E Q o ( 6 ( t o ] ) - H , arg m a x e P ( i ( t o ) = pcL(e)= 8 ( t o ) (i.e., @ ( t o + 1) = 8 ( t o ) ) . Hence, Pr (,?(to)
= p
1 - Eo E H ) 2 -, and
+ ( K ’- K2 - K3)))/= 3 )
IQo(6(t0
. . . . P r ( l a o ( B ( t o + K , ~ ,=) ) *
~ Q O ( W 0 ) )= ~
Id).
(A361
Using (A35) and the definition of K‘,the RHS of (A36) becomes
2 (1 - E o )
K’
( 1 - 6)
101- 1
.
2 ( 1 - 6).
In conclusion, for any 6 > 0, there exists a t’( K’)such that for t 2 t’,
(‘431) It follows from the above argument that in K subsequent steps never get out (A321 Pr (of n o ( e ( t ) ) ) L ( 1 - E O ) K . The event “never get out of no(6 ( t ) ) ” is the union of the two disjoint events “always stay in no(6 ( t o ) ) - H” and “get out of no(6 ( t o ) ) - H a t least once, while staying in n o ( 6 ( t ) ) all the time.” From (A30) and (A31), we also have
Pr
(
Qalways O ( ~ ( tstay 0 ) )-inH
)s
[(%)(I
- E,)]“.
(A33 1 Let K, = K, ( 6 ) be the smallest integer such that [ (m l)/rnIKm5 6. Using (A32) and (A33), we have, in K, subsequent steps Pr
i
get out of no(6 (to)) - H at least once, while staying in no(6( t)) all the time
)
(‘437)
For the chosen eo, (A37) becomes
(A30)
m
11
-
Pr (II!(~)
z
riff:) < 6.
(A381 =
to
+
(A39)
Hence, it follows that
This completes the proof of Lemma 2 and therefore the 0 proof of the theorem. APPENDIXB RANDOMFIELDS I N MARKOV PARAMETER ESTIMATION B. I Problem Statement The adaptive segmentation algorithm presented in this paper requires that, at the end of every package of updates of { X u } done according to (9), the parameter vector 8 be updated to the value that maximizes I: ( x ) with current x and T. We have shown in (10) that maximizing II? ( x ) with respect to 8 is equivalent to maximizing P ( X = x 18)with respect to 8. Hence, the periodic update required by the ASA is an ML estimate of 0 based on the current x.
B.2 The M L Estimate Since we cannot compute the ML estimate of 8 ex2 (1 - Eo)Km(l- 6 ) . ( A34 1 actly, we have to settle with an approximation to it. There The event “get out of no(6 ( t o ) ) - H a t least once, while are various methods for obtaining approximations to the staying in no( 6 ( t ) ) all the time” implies that I no(6 (to ML estimate [ l ] , [3], [ l l ] , [17]. We will briefly describe of these methods: the “coding method” and the + K,)) 1 = m - 1 given that 1 no(6 ( t o ) ) 1 = m. There- two “maximum pseudo-likelihood (MPL) method,” both fore, (A34) implies that suggested by Besag [ l ] , [3]. p(IQO(W0 + = m - 11 I Q o ( W 0 ) ) = m) Coding Method [I]: The entire lattice L , upon which the process X is defined, is divided into J codes. The codK,, 1 ( 1 - Eo) ( 1 - 6). (‘435) ing scheme depends on the order of the neighborhood sysDefine K’ = E\!\ K,.The worst situation for reaching tem and it is such that, all points in L belonging to the 8: from 6 ( t o ) is when l Q o ( 6 ( t o ) ) l = and same code I are conditionally independent given the other codes. Due to the above mentioned conditional indepen1 no(8:) I = 1 . Therefore, dence property, the conditional joint distribution over Pr (reaching 87 in K’ steps after to) pixels in one code given the others can be easily maximized with respect to 8, at least in principle. This enables 2 P r ( l ~ o ( i ) ( t o+ K’))(= one to obtain an estimate of 8 corresponding to each code given the others, thereby obtaining J distinct estimates of - In,(6(t, + (K‘- K*)))1= 2 ) 8. In most instances, each one of these J estimates is a good approximation to the ML estimate, denoted here by P r ( l Q o ( 6 ( t o+ ( K ’ = 21
&))I
I
11
K,)I
81 1
LAKSHMANAN AND DERIN: SIMULTANEOUS PARAMETER ESTIMATION AND SEGMENTATION
8”.That is,
e* = arg max II e codel
Bet.
P(X, =
X~IX,,~
= x,,,,,
e).
T
(BI)
However, the “coding method” has some drawbacks. The method uses only part of the data and also there is no optimal way of combining the J estimates of 8. Even though a direct average is commonly used for combining the estimates, it has not been shown to be optimal. Maximum Pseudo-Likelihood Method [3]: Due to the above mentioned drawbacks of the “coding method,” an alternative method for obtaining an approximation to f3* was suggested in [3]. In this method, the pseudo-likelihood function
I
Bet.
2
is maximized. In our experimentations with both of these methods, we have observed that the MPL yields more accurate estimates than the “coding method” and hence we have used the 8 that maximizes (B2) in our adaptive scheme. B.3 Metropolis Algorithm To maximize (B2) over 8, we use the Metropolis algorithm which is also a stochastic relaxation procedure similar to the Gibbs sampler [16]. It is one of the many ways of implementing SA and it has all the convergence properties of SA. The reason for choosing the Metropolis algorithm over the Gibbs sampler is the following: the Gibbs sampler involves integrating (B2) over e,,each time 8,is updated, which is practically impossible; the Metropolis algorithm, on the other hand, involves computing only the ratio of (B2) for two values of 8, say and e*,which is considerably easier. To achieve the desired maximization, we use the Metropolis algorithm in the manner described below. First, a vistation schedule ( m , > as a function of v is established, where v denotes the time variable for this SA procedure (not related to the time t in the main ASA). For each U, m, identifies a component of the parameter vector 8. If m, = j , then at time U , 8,is updated as folloys: a candidate value for 8,is chosen at random between e,(U - 1 ) - y and e,(v - 1 ) + y, for a y appropriately small v - 1 ) denotes the value of 8,before the and where update. This gives us a candidate parameter vector 8’. The following ratio is then computed with the candidate 8’and the old value e ( v - 1):
oj(
r
No. of I t e r s t i o n s [
033 ) where T, ( v ) denotes the temperature in this SA procedure (not related to T ( t ) , the temperature in the main ASA). Y
.
I
-
601
(b)
Fig. 10. (a) Trajectory of the parameter estimates, (b) rate of convergence of the estimates. (In both graphs, different curves correspond to different starting points.)
812
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. I I . NO. 8, AUGUST 1989
Bets 1
1
Bets 2
No. of Iteration- [
(a)
- 601
(b) Fig. 1 1 . (a) Trajectory of the parameter estimates, (b) rate of convergence of the estimates. (In both graphs different curves correspond to different starting points.)
Then
6 ( v ) is chosen according to the following
final estimates. Note that the estimates converge to the same final value for a wide range of initial values. The if p 2 1 or with probability p results presented in these figures correspond to MPL esifp < 1 e(v)= timates from two realizations for which the actual param1 ) otherwise. eter values are known. Figs. 10(b) and l l ( b ) show the e’ 034) rate of convergence of { 6 ( v ) } to the actual values through a plot of the Euclidean distance between the two. This procedure generates a sequence { 6 ( v ) } such that limv+o3 6 ( v ) maximizes (B2), under the conditions that: ACKNOWLEDGMENT 1) limu+03T,(v) = 0 and 2) T , ( v ) 2 ch/log v, for an The authors would like to gratefully acknowledge the appropriately specified ch. helpful discussions with Professors D. Geman and P. A. Kelly during the course of this work. They also thank the B. 4 Experimental Results We have extensively experimented with this implemen- anonymous reviewers for their many useful comments and tation (Metropolis algorithm) of the MPL. A sample of suggestions that helped improve this paper. our experiments is presented here. We note that for some REFERENCES special cases the computation of p in (B3) can be made in [ l ] J . E. Besag, “Spatial interaction and the statistical analysis of lattice a much simpler way. Depending on the number of regions systems,” J. Roy. Statist. Soc., ser. B, vol. 36, pp. 192-236, 1974. and the neighborhood system, there are only a relatively “Efficiency of pseudo-likelihood estimates for simple Gaussian [2] -, few distinct terms in the numerator and denominator of fields,” Biometrika, vol. 64, pp. 616-618, 1977. [3] -, “On the statistical analysis of dirty pictures,” J . Roy. Statist. (B3). Thus, computing (B3) reduces to merely computing Soc., ser. B, vol. 48, pp. 259-302, 1986. these distinct terms and counting their respective num[4] P. Billingsley, Convergence of Probability Measures. New York: Wiley, 1968. bers. In these special cases, the speed of this scheme can [5] A. C. Cohen, “Estimation in mixture of two normal distributions,” be improved by many orders of magnitude [21]. Figs. Technometrics, vol. 9, pp. 15-28, 1967. 10(a) and ll(a) depict the trajectories of the estimates [6] F. S. Cohen and D. B . Cooper, “Simple parallel hierarchical and starting from ‘different initial values and converging to the relaxation algorithms for segmenting noncausal Markovian random
I
e(,
LAKSHMANAN AND DERIN: SIMULTANEOUS PARAMETER ESTIMATION AND SEGMENTATION
fields,” IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-9, pp. 195-219, Mar. 1987. 171 R. Cristi, and M. Shridhar, “A parallel algorithm for image segmentation based on the Gibbs field model,” in Proc. ISCS 85, Japan. [SI A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J . Roy. Srarisr. Soc., ser. B, vol. 39, pp. 1-38, 1977. 191 H. Derin, H. Elliott, R. Cristi, and D. Geman, “Bayes smoothing algorithms for segmentation of binary images modeled by Markov random fields,” IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-6, pp. 707-720, NOV. 1984. [IO] H. Derin and W. S. Cole, “Segmentation of textured images using Gibbs random fields,” Comput. Vision, Graphics, Image Processing, vol. 35, pp. 72-98, 1986. [ I I ] H. Derin and H . Elliott, “Modeling and segmentation of noisy and textured images using Gibbs random fields,” IEEE Trans. P & m Anal. Machine Intell., vol. PAMI-9, pp. 39-55, Jan. 1987. H. Derin and C.-S. Won, “A parallel image segmentation algorithm using relaxation with varying neighborhoods and its mapping to array processors,” Comput. Vision, Graphics, Image Processing, vol. 40, pp. 54-78, 1987. H . Derin and S . Lakshmanan, “Adaptive segmentation of noisy images: An EM algorithm approach,” in Proc. 24th Allerton Conj Commun., Contr., Comput., Allerton House, Monticello, IL, Oct. 1986, pp. 705-706. H . Derin, “Estimating components of univariate Gaussian mixtures using Prony’s method,” IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-9, pp. 142-149, Jan. 1987. H. Elliott, H . Derin, R. Cristi, and D. Geman, “Application of the Gibbs distribution to image segmentation,” in Proc. 1984 Int. ConJ Acoust., Speech, Signal Processing, ICASSP’84, San Diego, CA, Mar. 1984, pp. 32.5.1-32.5.4. S . Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” IEEE Trans. Partern Anal. Machine Infell., vol. PAMI-6, pp. 721-741, Nov. 1984. D. Geman, “Bayesian image analysis by adaptive annealing,” in Dig. 1985 Int. Geosci. Remote Sensing Symp., IGARSS’85, Amherst, MA, Oct. 1985. -, private communication, 1986, 1987. S. Geman and C. Graffigne, “Markov random field image models and their applications to computer vision,” in Proc. Int. Congr. Math., A. M. Gleason, Ed., Amer. Math. Soc., Providence, RI, 1987. F. R. Hansen and H. Elliott, ”Image segmentation using simple Markov random field models,” Compur. Graphics Image Processing, vol. 20, pp. 101-132, 1982. S. Lakshmanan, “Adaptive segmentation of noisy images,” M.S. thesis, Dep. ECE, Univ. Massachusetts, Amherst, July 1987. J. Morroquin, S . Mitter, and T. Poggio, “Probabilistic solution of ill-posed problems in computer vision,” J . ASA, vol. 82, no. 397, pp. 76-87, 1987.
813
[23] R. Redner and H. Walker, “Mixture densities, maximum likelihood and the EM algorithm,” SIAM R e v . , vol. 26, no. 2 , pp. 195-239, Apr. 1984. [24] E. C. Titchmarsh, The Theory ofFunctions. London: Oxford University Press, 1939, pp. 13-18. [25] R. E. Wendell and A. P. Hurter, Jr., “Minimization of a non-separable objective function subject to disjoint constraints,” Oper. Res., vol. 24, pp. 643-657, July-Aug. 1976. [26] L. Younes, “Estimation and annealing for Gibbsian fields,” Eleve de I’Ecole Normale Superieure, 45 Rue d’Ulm, 75005 Paris, France.
Sridhar Lakshmanan was born in Salem, India, on January 12, 1964. He received the B.S. degree in electronics and communication engineering from Birla Institute of Technology, Mesra, in 1985, and the M.S. degree in electrical and computer engineering from the University of Massachusetts, Amherst, in 1987. He is currently pursuing the Doctoral degree in electrical and computer engineering at the University of Massachusetts, Amherst, where he has been a Research Assistant since 1985. His research interests are in the areas of communications theory, control theory, and image processing.
Haluk Derin (S’70-M’72) was born in Istanbul, Turkey, on September 18, 1945. He received the B.S. degree from Robert College, Istanbul, in 1967, the M.S. degree from the University of Virginia, Charlottesville, in 1969, and the Ph.D. degree from Princeton University, Princeton, NJ, in 1972, all in electrical engineering. From 1972 to 1981 he was with the Department of Electrical Engineering, Middle East Technical University, Ankara, Turkey. He served as Associate Dean in the School of Engineering of the Middle East Technical University from 1979 to 1981. Following a visiting position at Syracuse University, Syracuse, NY, in 1981-1982, he joined the University of Massachusetts, Amherst, in 1982 as Associate Professor of Electrical and Computer Engineering. His current research interests are in digital image processing and statistical communication theory. Dr. Derin is a member of Eta Kappa Nu.