Image Registration and Segmentation by Maximizing the Jensen-Rényi Divergence. A. Ben Hamza and Hamid Krim. Department of Electrical and Computer ...
Image Registration and Segmentation by Maximizing the Jensen-R´ enyi Divergence A. Ben Hamza and Hamid Krim Department of Electrical and Computer Engineering North Carolina State University Raleigh NC 27695, USA Abstract. Information theoretic measures provide quantitative entropic divergences between two probability distributions or data sets. In this paper, we analyze the theoretical properties of the Jensen-R´enyi divergence which is defined between any arbitrary number of probability distributions. Using the theory of majorization, we derive its maximum value, and also some performance upper bounds in terms of the Bayes risk and the asymptotic error of the nearest neighbor classifier. To gain further insight into the robustness and the application of the Jensen-R´enyi divergence measure in imaging, we provide substantial numerical experiments to show the power of this entopic measure in image registration and segmentation.
1
Introduction
Information-theoretic divergence measures [1] have been successfully applied in many areas including but not limited to statistical pattern recognition, neural networks, signal/image processing, speech processing, graph theory and computer vision. Kulback-Liebler (or directed) divergence, one of Shannon’s entropybased measures, has had success in many applications including indexing and image retrieval [3]. A generalization of the directed divergence is the so-called αdivergence based on R´enyi entropy [4, 5]. The α-divergence measure has been applied in image registeration/alignment as well as a variety of other problems [4]. Another entropy-based measure is called the Jensen-Shannon divergence [6]. This similarity measure may be defined between any number of probability distributions. Due to this generalization, the Jensen-Shannon divergence may be used as a coherence measure between any number of distributions and may be applied to a variety of signal/image processing and computer vision applications including image edge detection [7] and segmentation of DNA sequences into homogenous domains [8]. Inspired by the successful application of the mutual information measure [9, 10], and looking to address its limitations in often difficult imagery, we recently proposed an information-theoretic approach to ISAR image registration [11]. The objective of the proposed technique was to estimate the target motion during the imaging time, and was accomplished using a generalized R´enyi’s entropybased similarity measure called the Jensen-R´enyi divergence [11, 12]. This divergence in fact measures the statistical dependence between an arbitrary number A. Rangarajan et al. (Eds.): EMMCVPR 2003, LNCS 2683, pp. 147–163, 2003. c Springer-Verlag Berlin Heidelberg 2003
148
A. Ben Hamza and Hamid Krim
of consecutive ISAR image frames, which would be maximal if the images were geometrically aligned. In contrast to when using the mutual information [9, 10], one is able through the Jensen-R´enyi divergence to adjust the weights, and also the exponential order of R´enyi entropy to control the measurement sensitivity of the joint histogram, that is the relative contributions of the histograms together with their order [12]. This flexibility ultimately results in a better registration accuracy. The most fundamental and appealing characteristics of this divergence measure are its convexity and symmetry. In addition to its generality in involving an arbitrary number of probability distributions with possibly different weights, the Jensen-R´enyi divergence measure enjoys appealing mathematical properties such as convexity and symmetry affording a great flexibility in a number of applications. In this paper, we investigate some of the theoretical properties of the JensenR´enyi divergence as well as their implications. In particular, we derive its upper bounds, which are very useful for normalization purposes. Using the theory of majorization [13], we derive the maximum value of this divergence measure. Furthermore, we derive its performance bounds in terms of the Bayes risk and the asymptotic error of the nearest neighbor classifier within the statistical pattern recognition framework [14]. The remainder of the this paper is organized as follows. In the next section, we briefly recall some facts about R´enyi entropy prior to introducing the Jensen-R´enyi divergence. Section 3 is devoted to the theoretical properties of this divergence measure. Subject to some conditions, its convexity is subsequently established. Next, by exploiting these properties, the maximum value of the Jensen-R´enyi divergence is derived based on the theory of majorization. Then, we derive the maximum value of the Jensen-R´enyi divergence for a weighted distribution, followed by a discussion on its application in parametric classification with context of statistical pattern recognition, and some performance upper bounds are also derived. And finally in section 4, some experimental results are provided to show the much improved performance of the Jensen-R´enyi divergence in image registration and segmentation.
2
Jensen-R´ enyi Divergence Measure
Let k ∈ N and X = {x1 , x2 , . . . , xk } be a finite set with a probability distribution k p = (p1 , p2 , . . . , pk ), i.e. j=1 pj = 1 and pj = P (X = xj ) ≥ 0, where P (·) denotes the probability. Shannon’s entropy is defined as H(p) = − kj=1 pj log(pj ), and it is a measure of uncertainty, dispersion, information, and randomness. The maximum uncertainty or equivalently minimum information is achieved by the uniform distribution. Hence, we can think of the entropy as a measure of uniformity of a probability distribution. Consequently, when uncertainty is higher it becomes more difficult to predict the outcome of a draw from a probability distribution.
Image Registration and Segmentation
149
α
R (p)
α=0 α=0.2 α=0.5 α=0.7 Shannon
p
0
1
Fig. 1. R´enyi’s entropy
A generalization of Shannon entropy is R´enyi entropy [5] given by Rα (p) =
k 1 log pα j, 1−α j=1
α ∈ (0, 1) ∪ (1, ∞).
(1)
For α > 1, the R´enyi entropy is neither concave nor convex. For α ∈ (0, 1), it is easy to see that R´enyi entropy is concave, and tends to Shannon entropy H(p) as α → 1. One may easily verify that Rα is a nonincreasing function of α, and hence Rα (p) ≥ H(p),
∀α ∈ (0, 1).
(2)
When α → 0, R´enyi entropy is equal to the logarithm of the cardinality of the set {j ∈ [1, k] : pj > 0}. Fig. 1 depicts R´enyi entropy for a Bernoulli distribution p = (p, 1 − p), with different values of the parameter α. As illustrated in Fig. 1, the measure of uncertainty is at a minimum when Shannon entropy is used, and it increases as the parameter α decreases. R´enyi entropy attains a maximum uncertainty when its exponential order α is equal to zero. Definition 1. Let p1 , p2 , . . . , pn be n probability distributions. The JensenR´enyi divergence is defined as n n ω JRα (p1 , . . . , pn ) = Rα ω i pi − ωi Rα (pi ), i=1
i=1
150
A. Ben Hamza and Hamid Krim
where Rα (p) n is R´enyi’s entropy, and ω = (ω1 , ω2 , . . . , ωn ) be a weight vector such that i=1 ωi = 1 and ωi ≥ 0 . Using the Jensen inequality, it is easy to check that the Jensen-R´enyi divergence is nonnegative for α ∈ (0, 1). It is also symmetric and vanishes if and only if the probability distributions p1 , p2 , . . . , pn are equal, for all α > 0. Note that the Jensen-Shannon divergence [6] is a limiting case of the JensenR´enyi divergence when α → 1. Unlike other entropy-based divergence measures such as the Kullback-Leibler divergence, the Jensen-R´enyi divergence has the advantage of being symmetric and generalizable to any arbitrary number of probability distributions or data sets, with a possibility of assigning weights to these distributions. Fig. 2 shows three-dimensional representations and contour plots of the Jensen-R´enyi divergence with equal weights between two Bernoulli distributions for α ∈ (0, 1) and also for α ∈ (1, ∞).
JRα
q
q
p
p
JR
α
q
q
p
p
Fig. 2. 3D and contour plots of Jensen-R´enyi divergence with equal weights. Top: α ∈ (0, 1). Bottom: α > 1
Image Registration and Segmentation
151
In the sequel, we will restrict α ∈ (0, 1), unless specified otherwise, and will use a base 2 for the logarithm, i.e., the measurement unit is in bits.
3
Properties of the Jensen-R´ enyi Divergence
The following result establishes the convexity of the Jensen-R´enyi divergence of a set of probability distributions. ω is a convex Proposition 1. For α ∈ (0, 1), the Jensen-R´enyi divergence JRα function of p1 , p2 , . . . , pn .
In addition to its convexity property, the Jensen-R´enyi divergence is shown to be an adapted measure of disparity among n probability distributions as follows. Proposition 2. The Jensen-R´enyi divergence achieves its maximum value when p1 , p2 , . . . , pn are degenerate distributions, that is pi = (δij ), where δij = 1 if i = j and 0 otherwise. 3.1
Maximum Value of the Jensen-R´ enyi Divergence
Since the Jensen-R´enyi divergence is a convex function of p1 , . . . , pn , it achieves its maximum value when the R´enyi entropy function of the ω -weighted average of degenerate probability distributions, achieves its maximum value as well. Assigning weights ωi to the degenerate distributions ∆1 , ∆2 , . . . , ∆n , where ∆i = (δij )1≤j≤k , the following upper bound n ω JRα ≤ Rα ωi ∆i , (3) i=1
which easily falls out of the Jensen-R´enyi divergence, may be used as a starting point. Without loss of generality, consider the Jensen-R´enyi divergence with equal weights ωi = 1/n for all i, and denote it simply by JRα , to write n (∆i /n) JRα ≤ Rα i=1
=
k n α 1 log (δij /n) 1−α j=1 i=1
= Rα (a) +
α log(n), α−1
where a = (a1 , a2 , . . . , ak ) such that aj =
(4) n i=1
δij .
(5)
152
A. Ben Hamza and Hamid Krim
Since ∆1 , ∆2 , . . . , ∆n are degenerate distributions, it follows that kj=1 aj = n. From (4), it is clear that the maximum value of JRα is also a maximum value of Rα (a). In order to maximize Rα (a), the concept of majorization will be used [13]. Let (x[1] , x[2] , . . . , x[k] ) denote the non-increasing arrangement of the components of a vector x = (x1 , x2 , . . . , xk ). Definition 2. Let a and b ∈ Nk . a is said to be majorized by b, written a ≺ b, if k k b[j] j=1 a[j] = j=1 = 1, 2, . . . , k − 1. j=1 a[j] ≤ j=1 b[j] , Since Rα is Schur-concave function, it follows that Rα (a) ≥ Rα (b) whenever a ≺ b. The following result establishes the maximum value of the Jensen-R´enyi divergence. Proposition 3. Let p1 , . . . , pn be n probability distributions. If n ≡ r (mod k), 0 ≤ r < k, then (k − r)q α + r(q + 1)α 1 JRα ≤ log , (6) 1−α (qk + r)α where q = (n − r)/k, and α ∈ (0, 1). Proof. It is clear that the vector r
k−r g = (q + 1, . . . , q + 1, q, . . . , q) is majorized by the vector a defined in (5). Therefore, Rα (a) ≤ Rα (g). This completes the proof using (4). Corollary 1. If n ≡ 0 (mod k), then JRα (p1 , p2 , . . . , pn ) ≤ log(k), where k is the number of components of each probability distribution. 3.2
Jensen-R´ enyi Divergence and Mixture Models
Mixture probability distributions provide a multimodal distribution that model the data with greater flexibility and effectiveness. These mixture models have been applied to a number of areas including unsupervised learning [15], and computer vision applications such as detection and object tracking [16]. It is therefore of great interest to evaluate the Jensen-R´enyi divergence between two probability distributions p and q with weights {λ, 1 − λ}, where λ ∈ [0, 1].
Image Registration and Segmentation
153
If we denote by r the weighted probability distribution defined by r = (1 − λ/2)p + (λ/2)q, then the Jensen-R´enyi divergence can be expressed as a function of λ as follows JRα (λ) = Rα (r) −
Rα ((1 − λ)p + λq) + Rα (p) . 2
Proposition 4. The Jensen-R´enyi divergence JRα (λ) achieves its maximum value when λ = 1. Proof. Let p = (pi )ki=1 and q = (qi )ki=1 be two distinct probability distributions. The Jensen-R´enyi divergence can then be written as α k 1 λ log JRα (λ) = pi + (qi − pi ) 1−α 2 i=1 k k 1 α α − (pi + λ(qi − pi )) + log pi . log 2(1 − α) i=1 i=1 Using calculus, we can show that λ = 0 is a singular point of the Jensen-R´enyi (λ) vanishes at λ = 0. Furtherdivergence JRα (λ), i.e. the first derivative JRα more, it can be verified that the second derivative JRα (λ) is always positive. Hence, the first derivative JRα (λ) is an increasing function of λ, and therefore JRα (λ) ≥ 0 for all λ ∈ [0, 1]. Consequently, JRα (λ) is an increasing function of λ. This concludes the proof. Fig. 3 depicts the Jensen-R´enyi divergence as a function of λ when p = (.35, .12, .53), q = (.25, .34, .41) and α = 0.6. 3.3
Jensen–R´ enyi Divergence: Performance Bounds
In this subsection, performance bounds of the Jensen-R´enyi divergence in terms of the Bayes error and also of the asymptotic error of the nearest-neighbor (NN) classifier are derived. Let C = {c1 , c2 , . . . , cn } be a set of n classes. By a classifier we mean a function f : X → C that classifies a given feature vector (pattern) x ∈ X to the class c = f (x). It is well known that the classifier that minimizes the error probability P (f (X) = C) is the result of the Bayes classifier with an error LB written in disk crete form as LB = inf f :X→C P {f (X) = C} = 1− j=1 max1≤i≤n {ωi pij }, where ωi = P (C = ci ) are the class probabilities, and pij = P (X = xj |C = ci ) are the class-conditional probabilities. Denote by ω = {ωi }1≤i≤n , and pi = (pij )1≤j≤k , ∀i = 1, . . . , n. Proposition 5. The Jensen-R´enyi divergence is upper bounded by ω (p , p , . . . , p ) ≤ R (ω) − 2L , JRα α B 1 2 n where Rα (ω) = Rα (C), and α ∈ (0, 1).
(7)
A. Ben Hamza and Hamid Krim
α
JR (λ)
154
0
λ
1
Fig. 3. Jensen-R´enyi divergence as a function of λ Proof. It follows from the definition of the Jensen-R´enyi divergence that ω (p , p , . . . , p ) = R (C|X). Rα (C) − JRα α 1 2 n It can be shown that H(C|X) ≥ 2LB (see [17]), therefore Eq. (2) im plies Rα (C|X) ≥ 2LB . This completes the proof.
A method that provides an estimate for the Bayes error without requiring knowledge of the underlying class distributions is based on the NN classifier which assigns a test pattern to the class of its closest pattern according to some metric [18]. For n sufficiently large, the following result relating the Bayes error LB and the asymptotic error LN N of the NN classifier holds [18] n n−1 (8) 1− 1− LN N ≤ LB ≤ LN N . n n−1 Using (7), the following inequality is deduced 2(n − 1) n ω LN N . JRα ≤ Rα (ω) − 1− 1− n n−1
Image Registration and Segmentation
4
155
Imaging Applications of the Jensen-R´ enyi Divergence
4.1
Image Registration
Image registration is an important problem in computer vision [19], remote sensing data processing [20] and medical image analysis [21]. The goal of image registration is to find a spatial transformation such that a similarity metric between two or more images taken at different times, from different sensors, or from different viewpoints achieves its maximum. Given two images f1 , f2 : Ω ⊂ R2 → R, where Ω is a bounded set (usually a rectangle), the goal of image registration in the context of the Jensen-R´enyi divergence is to find the spatial transformation parameters (∗ , θ∗ , γ ∗ ) such that ω (f , T ω (∗ , θ∗ , γ ∗ ) = arg max JRα 1 (,θ,γ) f2 ) = arg max JRα (p1 , . . . , pn ), (,θ,γ)
(9)
(,θ,γ)
where pi = pi (f1 , T(,θ,γ)f2 ), and T is a Euclidean transformation with translational parameter = (x , y ), a rotational angle θ and a scaling factor γ. Denote X = {x1 , x2 , . . . , xn } and Y = {y1 , y2 , . . . , yn } the sets of pixel intensity values of f1 and T(,θ,γ)f2 respectively, and let X, Y be two random variables taking values in X and Y. pi (f1 , T(,θ,γ) f2 ) = (pij )1≤j≤n is defined as pij = P (Y = yj |X = xi ), j = 1, 2, . . . , n which is the conditional probability of T(,θ,γ)f2 given f1 for the corresponding pixel pairs. Here the Jensen-R´enyi divergence acts as a similarity measure between images. If the two images are exactly matched, then pi = (δij )1≤j≤n , i = 1, . . . , n. Since pi ’s are degenerate distributions, it follows from Proposition 2 that the Jensen-R´enyi divergence is maximized for a fixed α and ω . Fig. 4(1)-4(2) show two brain MRT images in which the misalignment is an Euclidean rotation. The conditional probability distributions {pi } are crisp, as in Fig. 4(3), when the two images are aligned, and dispersed, as in Fig. 4(4), when they are not matched. It is worth noting that the maximization of the Jensen-R´ enyi divergence holds for any α and ω such that 0 ≤ α ≤ 1 and ωi ≥ 0, i ωi = 1. If we take α = 1 and ωi = P (X = xi ) then, by Proposition 1, the Jensen-R´enyi divergence is reduced to the Shannon mutual information. Indeed, the JensenR´enyi divergence measure provides a more general framework for the image registration problem. If the two images f1 and T(,θ,γ)f2 are matched, the JensenR´enyi divergence is maximized for any valid weight. Assigning ωi = P (X = xi ) is not always a good choice. Fig. 5 shows the registration results of the two brain images in Fig. 4 using the mutual information and the Jensen-R´enyi divergence of α = 1 and uniform weights. The peak at the matching point generated by the Jensen-R´enyi divergence is clearly much higher than the peak by the mutual information. ωi = P (X = xi ) gives the background pixels the largest weights. In the presence of noise, the matching in background is corrupted. Mutual information may fail to identify the registration point. This phenomenon is
156
A. Ben Hamza and Hamid Krim
(1)
(2)
(3)
(4)
1
1
0.5
0.5
0 250 200 150 100
0 250 200 150 100
50
50
100
150
200
250
50
50
100
150
200
250
Fig. 4. Conditional probability distributions
demonstrated in Fig. 6 which demonstrates the registration results by mutual information and by the Jensen-R´enyi divergence, in the presence of the noise. SN R = 1.92dB. For the Jensen-R´enyi divergence, the parameters used are α = 1 and ωi = 1/n. The following result establishes the optimality of the uniform weights for image registration in the context of the Jensen-R´enyi divergence. Proposition 6. Let β be an n-dimensional uniform weight vector, and ω be n any n-dimensional vector such that ωi ≥ 0, i=1 ωi = 1. If the misalignment between f1 and f2 can be modeled by a spatial transformation T ∗ , then the following inequality holds β (p , . . . , p ) ≥ JRω (p , . . . , p ), ∀α ∈ [0, 1], JRα 1 n α 1 n
(10)
where pi = pi (f1 , T ∗ f2 ), i = 1, . . . , n. Proof. When f1 and f2 are aligned by the spatial transformation T ∗ , then pi = ω (·) becomes ∆i , and hence JRα ω (p (f , T ∗ f ), . . . , p (f , T ∗ f )) = R (ω ). JRα 2 2 α 1 1 n 1
Image Registration and Segmentation
157
8 Mutual Information JR Divergence 7
6
d(A,B)
5
4
3
2
1
0
2
4
6
8
10 θ
12
14
16
18
20
Fig. 5. Mutual information vs. Jensen-R´enyi divergence of uniform weights Since β ≺ ω and Rα (·) is Schur-concave, it follows that Rα (β ) ≥ Rα (ω ). This completes the proof. After assigning uniform weights to the various distributions in the JensenR´enyi divergence, a free parameter α, which is directly related to the measurement sensitivity, remains to be selected. In the image registration problem, one desires a sharp and distinguishable peak at the matching point. The sharpness of the Jensen-R´enyi divergence can be characterized by the maximal value as well as the width of the peak. The sharpest peak is clearly a Dirac function. The following proposition establishes that the maximal value of the Jensen-R´enyi divergence is independent of α if the two images are aligned, and α = 0 yields the sharpest peak, a Dirac function. The next result follows from Proposition 3. Proposition 7. Let β be a uniform weight vector. If the misalignment between f1 and f2 can be modeled by a spatial transformation T ∗ , then for all α ∈ [0, 1] β (p (f , T ∗ f ), . . . , p (f , T ∗ f )) = log(n). JRα 2 2 1 1 n 1
(11)
β
In particular for α = 0, we have JRα (p1 , . . . , pn ) = 0, for any probability distribution pi . In addition, we have β (p , . . . , p ) = log(n) JRα 1 n
if and only if
pi = ∆ i .
Fig. 7(1) demonstrates the registration results of the two brain images in Fig. 4 with the choice of different α. In this case, α = 0 is the best choice and would
158
A. Ben Hamza and Hamid Krim
(1) Image A
(2) Image B
2
Mutual Information JR Divergence 1.8
1.6
1.4
1.2
1
0.8
0
2
4
6
8
10 θ
12
14
16
18
20
(3) d(A,B)
Fig. 6. Registration result in the presence of the noise. SN R = 1.92dB. For the Jensen-R´enyi divergence, α = 1 and ωi = 1/n is used generate a Dirac function with a peak at the matching point, as illustrated in Fig. 7(2). If there exists local variation between f1 and f2 , or, if the registration of the two images is in the presence of noise, then an exact alignment T ∗ may not be found. The conditional probability distribution pi (f1 , T ∗ f2 ) is no longer a degenerate distribution in this case. The next result establishes that taking α = 1 would provide a higher peak than any other choice of α for the non-ideal alignment. (δpij )1≤j≤n is Proposition 8. Let pi = ∆i + δpi , i = 1, 2, . n. . , n, where δpi = n δp = 0 and a real distortion vector such that pij ≥ 0, ij j=1 i=1 δpij = 0. Let ω be a weight vector, then for all α ∈ [0, 1], we have ω (p , p , . . . , p ) ≥ JRω (p , p , . . . , p ). JRα=1 1 2 n α 1 2 n
(12)
Image Registration and Segmentation
159
Proof. Eq. (2) yields n
ωi H(pi ) ≤
i=1
n
ωi Rα (pi ),
∀α ∈ (0, 1).
(13)
i=1
n n enyi entropy for α = 1 is the Since j=1 δpij = 0, i=1 δpij = 0, and the R´ Shannon entropy, the inequality (12) is equivalent to the inequality (13). It is worth pointing out that the Jensen-R´enyi divergence is not equivalent to mutual information by setting α = 1. The equivalence holds only if α = 1 and ωi = P (X = xi ). Discussion: The parameter α of the Jensen-R´enyi divergence plays a role of scaling factor to adjust registration peaks, and the location of registration point is independent of α. In real world applications, there is a trade off between optimality and practicality in choosing α. If one can model the misalignment between f1 and f2 completely and accurately, α = 0 would correspond to the best choice since it generates a Dirac function at the matching point. It is, however, also the least robust selection, as it tends to make all the pi s the same as the uniform distribution, if pi is not degenerate distribution and pij > 0, then the Jensen-R´enyi divergence would be zero for the whole transformation parameter space as in case where the adapted transformation group can not accurately model the relationship between f1 and f2 . On the other hand, α = 1 is the most robust choice, in spite of also resulting in the least sharp peak. The choice of α therefore depends largely on the accuracy of the invoked model and on the specific application as well as the available computational resource. We further showed that ωi = 1/n is optimal, thus the best choice for non-ideal image
(1)
(2)
8
8 α=0.1 α=0.3 α=0.8 α=1.0
7
6
6
5
5
d(A,B)
d(A,B)
7
α=0.0
4
4
3
3
2
2
1
1
0
0
5
10 θ
15
20
0
0
5
10 θ
Fig. 7. Effect of the order α in image registration
15
20
160
A. Ben Hamza and Hamid Krim
registration in the context of the Jensen-R´enyi divergence is {α = 1, ω = 1/n}, in comparison with mutual information based methods, in which the parameters are set to {α = 1, ω = P (X = xi )}. Fig. 7 depicts the effect of the parameter α in image registration. 4.2
Image Segmentation
Edge detection is a fundamental problem in image processing. The use of the Jensen-R´enyi divergence in image segmentation can be formulated as follows: an image sliding window W is split into two equal and adjacent subwindows W1 and W2 . For each position of the sliding window, the histograms p1 and p2 of the subwindows W1 and W2 are computed, as well as the Jensen-R´enyi divergence between p1 and p2 . We repeat this procedure for several orientations of the sliding window to ensure detection in all directions, and we then select the maximum divergence at each pixel of a given image. Consequently, we construct a divergence mapping matrix where the largest values in linear neighborhoods are identified. These largest pixel values correspond to edge points. It is worth noting that the window W can be split into any finite number of equal and adjacent subwindows since the Jensen-R´enyi divergence is defined between any number of probability distributions. In the simulations we consider without loss of generality the Jensen-R´enyi divergence between the histograms of two equal and adjacent subwindows which form the entire sliding window. To illustrate the behavior of the Jensen-R´enyi divergence, consider an image which consists of two regions A and B with respective histograms pa and pb , the former at the left and the latter at the right, with a vertical boundary between them as shown in Fig. 8. The window W slides from left to right, so that it passes gradually from the region A to the region B. Under a hypothesis of statistical homogeneity, the histograms p1 and p2 are constant and equal when W is located in the same region, thus the Jensen-R´enyi divergence between p1 and p2 vanishes. If the sliding window is located over an edge, one of the two subwindows will participate in the two regions. Without loss of generality, we consider a subwindow W2 . Again under the hypothesis of homogeneity, the parts of W2 located in the regions A and B have partial histograms pa and pb , with weights proportional to the sizes of the subregions A ∩ W2 and B ∩ W2 , respectively. More precisely, if λ ∈ [0, 1] represents the fraction of W2 included in the region B, then p2 = (1−λ)pa +λpb , giving rise, consequently, to a histogram which varies with respect to the position of W . Since the subwindow W1 is located in the region A, then p1 = pa and therefore p = (1−λ/2)pa +(λ/2)pb . The corresponding Jensen-R´enyi divergence can then be expressed as a function of λ JRα (λ) = Rα (p) −
Rα ((1 − λ)pa + λpb ) + Rα (pa ) , 2
where p = (1 − λ/2)pa + (λ/2)pb , and α ∈ (0, 1). The performance of the proposed algorithm using the Jensen-R´enyi divergence for various values of α is illustrated in Fig. 9. We consider this application
Image Registration and Segmentation
161
Region
Region
Fig. 8. Vertical Edge
Original image
α = 0.5
α=1
α=2
α=3
α=8
Fig. 9. Edge detection results using Jensen-R´enyi divergence for various values of α
162
A. Ben Hamza and Hamid Krim
on image segmentation to be a preliminary investigation. An important issue for further research in this direction is development of robust edge linking algorithms to be included in conjunction with the image edge detection based on the Jensen-R´enyi divergence measure.
References [1] S. Ali and S. Silvey, “A general class of coefficients of divergence of one distribution from another,” J. Roy. Soc., vol. 28, pp. 131-142, 1966. 147 [2] S. Kullback and R. Liebler, “On information and sufficiency,” Ann. Math. Statist., vol. 22, pp. 79-86, 1951. [3] R. Stoica, J. Zerubia, and J. M. Francos, “Image retrieval and indexing: A hierarchical approach in computing the distance between textured images” IEEE Int. Conf. on Image Processing, Chicago, 1998. 147 [4] A. O. Hero, B. Ma, O. Michel and J. Gorman,“Applications of entropic spanning graphs,” IEEE Signal Processing Magazine, vol. 19, no. 5, pp. 85-95, Sept. 2002. 147 [5] A. R´enyi, On Measures of Entropy and Information, Selected Papers of Alfr´ed R´enyi, vol.2, pp. 525-580, 1961. 147, 149 [6] J. Lin, “Divergence Measures Based on the Shannon Entropy,” IEEE Trans. Information Theory, vol. 37, no. 1, pp. 145-151, 1991. 147, 150 [7] J. F. Gomez, J. Martinez, A. M. Robles, and R. Roman, “An analysis of edge detection by using the Jensen-Shannon divergence,” Journal of Mathematical Imaging and Vision, vol. 13, no. 1, pp. 35-56, Aug. 2000. 147 [8] R. Roman, P. Bernaola, and J. L. Oliver, “Sequence compositional complexity of DNA through an entropic segmentation method,” Physical Review Letters, vol. 80, no. 6, pp. 1344-1347, Feb. 1998. 147 [9] P. Viola and W. M. Wells, “Alignment by maximization of mutual information,” International Journal of Computer Vision, vol. 24, no. 2, pp. 173-154, 1997. 147, 148 [10] F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, P. Suetens, “Multimodality image registration by maximization of mutual information,” IEEE Trans. on Medical Imaging, vol. 16, no. 2, pp. 187-198, 1098. 147, 148 [11] Y. He, A. Ben Hamza, H. Krim, V. C. Chen, “An information theoretic measure for ISAR imagery focusing,” Proc. SPIE, vol. 4116, San Diego, 2000. 147 [12] Y. He, A. Ben Hamza, H. Krim, “A generalized divergence measure for robust image registration,” IEEE Trans. Signal Processing, vol. 51, no. 5, May 2003. 147, 148 [13] A. W. Marshall and I. Olkin, Inequalities: Theory of Majorization and Its Applications, Academic Press, 1979. 148, 152 [14] L. Devroye, L. Gyorfi, G. Lugosi, A probabilistic theory of pattern recognition, New York, Springer, 1996. 148 [15] M. A. Figueiredo and A. K. Jain, “Unsupervised learning of finite mixture models,” IEEE Trans. on pattern analysis and machine intelligence, vol. 24, no. 3, pp. 381-396, March 2002. 152 [16] N. Paragios and R. Deriche, “Geodesic active contours and level sets for the detection and tracking of moving objects,” IEEE Trans. on pattern analysis and machine intelligence, vol. 22, no. 3, pp. 266-280, March 2000. 152
Image Registration and Segmentation
163
[17] M. Hellman and J. Raviv, “Probability of error, equivocation, and the Chernoff bound,” IEEE Trans. Information Theory, vol. 16, pp. 368-372, 1970. 154 [18] T. M. Cover and P. E. Hart, “Nearest neighbor pattern classification,” IEEE Trans. Inform. Theory, vol. 13, pp. 21-27, January 1967. 154 [19] R. Katuri and R. C. Jain, Computer Vision: Principles, IEEE Comnputer Society Press, Los Alamitos, CA, 1991. 155 [20] J. R. Jensen, Introductory digital image processing: a remote sensing perspective, 2nd edition, Prentice Hall, Upper Saddle River, NJ, 1996. 155 [21] P. A. Van den Elsen, E. J. D. Pol, M. A. Viergever, “Medical image matching-a review with classification,” IEEE Engineering in Medicine and Biology Magazine, vol. 12, no. 1, pp. 26-39, March 1993. 155