c 200x Society for Industrial and Applied Mathematics
SIAM J. APPL. MATH. Vol. 0, No. 0, pp. 000–000
WELL-POSEDNESS OF TWO NONRIGID MULTIMODAL IMAGE REGISTRATION METHODS∗ OLIVIER FAUGERAS† AND GERARDO HERMOSILLO† Abstract. Image registration methods may be designed as solutions of minimization problems on a set of geometric deformations. In the nonrigid case, solving these problems often means computing the steady state of a system of evolution equations involving the “gradient” of the error criterion, defined from the Euler–Lagrange equations of the corresponding minimization problem. The wellposedness of the registration method requires showing the existence of a solution to the minimization problem, as well as that of a stable solution of the evolution equations derived from it. We provide such proofs in the case where the error criterion is derived from two different statistical similarity measures: global mutual information and local cross-covariance. We also describe our numerical implementation for solving the corresponding evolution equations and show examples of registrations of real 2D and 3D images achieved with these algorithms. The proofs are quite general and can be applied to most of the known nonrigid image registration methods. Key words. multimodal image matching, variational methods, image registration, mutual information, cross-covariance, Euler–Lagrange equations, initial-value problems, analytical semi groups of linear operators AMS subject classifications. 34G20, 35B65, 35D10, 35K55, 35K90, 35Q80, 47D03, 47D06, 47H06, 47H07, 47J35, 47N60 DOI. 10.1137/S0036139903424904
1. Introduction. The problem of estimating the geometric deformation between two images has a long history. Most of the existing methods have been developed for images acquired with the same sensor or at least the same modality (e.g., in the visible spectrum). They often rely on the minimization of an error criterion which takes into account two sources of a priori knowledge: (a) the properties of the images’ intensities that characterize their similarity and (b) the constraints on the possible geometric deformations. For the first point, the basic idea is that the intensities at corresponding points should be “similar,” i.e., equal as in the case of the optical flow problem [28]. This is, of course, too strict a requirement in many cases, and it has been relaxed to an average intensity similarity in the neighborhoods of the corresponding points, as in the case of local image differences [29] or more general block matching strategies [51, 41]. One more step and one can characterize similarity as a large score of a nonlocal similarity measure such as the cross-correlation [20, 21, 11, 39], the correlation ratio [49], or mutual information [55, 58, 56, 32], among several others [59, 26, 45, 31]. As for the second point, the possible geometric deformations, a first idea is to restrict the search to sets of low-dimensional parametric transformations (e.g., Euclidean, affine, quadratic, or spline-interpolation between a set of control points) [36, 50]). Another example of a constrained deformation is the stereo case, in which knowledge of the fundamental matrix allows one to restrict the search for the matching point along the epipolar line [1, 60]. If the deformation is not defined parametrically, ∗ Received by the editors March 18, 2003; accepted for publication (in revised form) November 10, 2003; published electronically DATE. This work was partially supported by NSF grant DMS-9972228, EC grant Mapawamo QLG3-CT-2000-30161, INRIA ARCs IRMf and MC2, and the Mexican National Council for Science and Technology, Conacyt. http://www.siam.org/journals/siap/x-x/42490.html † INRIA Sophia Antipolis, 2004 route des Lucioles, BP 93 06902, Sophia-Antipolis Cedex, France (
[email protected],
[email protected]).
1
2
O. FAUGERAS AND G. HERMOSILLO
the constraint may consist of requiring some smoothness of the displacement field, possibly preserving discontinuities [53, 1, 47, 2, 34, 35, 4, 3, 18]. Concerning this regularization, we can distinguish the approaches based on explicit smoothing of the field, as in Thirion’s deamons algorithm [53] (we refer to [44] for a variational interpretation of this algorithm), from those considering an additive term in the error criterion, yielding (possibly anisotropic) diffusion terms [5, 57]. For a comparison of these two approaches, we refer to the work of Cachier and Ayache [9, 10]. Approaches relying on a regularization functional induce a certain amount of competition between the similarity of the images and the smoothness of the deformation. Fluid methods do not impose this competition, although they still require a parameter that fixes the amount of desired smoothness or fluidness of the result [13, 54, 12]. Many of the previous methods are “differential” and are mostly valid for small displacements. Special techniques are required in order to recover large deformations. For instance Alvarez, Weickert, and S´ anchez [2] use a scale-space focusing strategy. Christensen, Miller, and Vannier [13] adopt a different approach. They look for a continuously invertible mapping which is obtained by the composition of small displacements. Each small displacement is calculated as the solution of an elliptic partial differential equation describing the nonlinear kinematics of fluid-elastic materials under deforming forces given by the matching term (in their case the image differences). Trouv´e [54] has generalized this approach using Lie group ideas on sets of diffeomorphisms. Under a similar formalism, a very general framework which also allows for changes in the intensity values is proposed by Miller and Younes [37]. In the case of multimodal image registration (e.g., visible and infrared, anatomical and functional magnetic resonance images, etc. ) the similarity in the intensities is “weak” and one has to rely on statistical definitions. Such definitions have been widely used in the case of low-dimensional parametric transformations. Mutual information was introduced by Viola and colleagues [55, 58, 56] and independently by Maes et al. [32]. The correlation ratio was first proposed as a similarity measure for image matching by Roche et al. [49]. Other statistical approaches rely on learning the joint distribution of intensities, as done, for instance, by Leventon and Grimson [31]. Extensions to more complex (nonrigid) transformations using statistical similarity measures include approaches relying on more complex parametric transformations [36, 50], block matching strategies [33, 24, 22], and parametric intensity corrections [48]. Some recent approaches rely on the computation of the gradient of the local cross correlation [11, 39]. Several papers have discussed the theoretical well-posedness of similar registration problems in the monomodal case as, for instance, [27, 14]. However, none of those treating the multimodal case has dealt significantly with the problem of the existence and uniqueness of a solution of the registration problem. Our work is an attempt to improve this situation. We consider the problem of multimodal image registration and use statistical considerations to define similarity. In order to be able to deal with strongly nonstationary variations in the intensity we extend the usual statistical framework to act locally instead of globally in the images. Since most of the geometric deformations that occur in practice are nonparametric, we model them as dense deformation fields that are constrained to belong to some reasonable functional spaces, e.g., Sobolev spaces. We then classically define the problem as a variational one and show that it is well posed. In detail, we consider the problem of dense matching between two images using statistical dissimilarity criteria, which are well adapted to the case of multimodal im-
NONRIGID MULTIMODAL IMAGE REGISTRATION METHODS
3
age data often encountered, e.g., in medical imaging. The minimization of the sum of the dissimilarity term and the regularization term defines, through the associated Euler–Lagrange equations, a set of coupled functional evolution equations. The conditions under which this type of evolution equation is well posed are known. We show that the matching functions that we obtain satisfy these conditions in the case where the error functional is derived from two different statistical similarity measures: global mutual information and local cross-covariance. 1.1. A generic registration problem. At a conceptual level, images are integrable real bounded functions defined on Rn (we restrict ourselves to n = 2, 3). These abstract images are not directly observable because of the physics of acquisition. What we call an image is the convolution of such a function with a C ∞ mollifier. We therefore view the images as belonging to the space of infinitely differentiable functions, C ∞ (Rn ). They are bounded and Lipschitz continuous, as well as all their derivatives. These assumptions are not central to this article and can be greatly weakened. The reader will verify that the first image is only required to be C 2 while the second is required to be C 1 with a Lipschitz continuous first order derivative. This asymmetry is due to the fact that the two images do not play the same role in the registration problem. Weaker conditions may be possible but we have not investigated this specific problem. The range of the values of an image is the interval [0, A], A > 0. Let I1 and I2 be two images. Let Id be the identity mapping of Rn and h : Ω → Rn a given vector field defined on a bounded and regular region of interest Ω ⊂ Rn . By regular we mean C 2 . This technical assumption is required in the proofs of Lemma 2.3 and Proposition 2.4. Images are usually defined on a product of intervals. Ω is in practice chosen as a C 2 superset of this set (e.g., a disk). The image values are smoothly extended outside the product of intervals to reach the value of 0 on the boundary ∂Ω of Ω. The registration or matching problem may be defined as that of finding a vector field h∗ minimizing an error criterion between I1 and the warped image I2 ◦ (Id + h). The search for this function is done within a set F of admissible functions such that it minimizes an energy functional I : F → R of the form I(h) = J (h) + R(h). The term J (h) is designed to measure the “dissimilarity” between the reference image (I1 ) and the h-warped second image (I2 (Id + h)). The term R(h) is designed to penalize fast variations of the function h. It is a regularization term introducing an a priori preference for smoothly varying functions. Generally speaking, the set F is a dense linear subspace of a Hilbert space H, the scalar product of which is denoted by (·, ·)H . If I is sufficiently regular, its first variation1 at h ∈ F in the direction of k ∈ H is defined by (see, e.g., [15]) (1.1)
δk I(h) = lim
ε→0
I(h + εk) − I(h) . ε
If the mapping k → δk I(h) is linear and continuous, the Riesz representation theorem [19] guarantees the existence of a unique vector, denoted by ∇H I(h), and called the gradient of I, which satisfies the equality δk I(h) = (∇H I(h), k)H 1 Also
called the Gˆ ateaux derivative.
4
O. FAUGERAS AND G. HERMOSILLO
for every k ∈ H. The gradient depends on the choice of the scalar product (·, ·)H though, a fact which explains our notation. If a minimizer h∗ of I exists, then the set of equations δk I(h∗ ) = 0 must hold for every k ∈ H, which is equivalent to ∇H I(h∗ ) = 0. These equations are called the Euler–Lagrange equations associated with the energy functional I. They give necessary conditions for the existence of a minimizer, but they are not sufficient since they guarantee only the existence of a critical point of the functional I. These critical points can be found in many ways, including methods for nonlinear equations. Rather than solving them directly, the search for a minimizer of I is done using a “gradient descent” strategy. Given an initial estimate h0 ∈ F, a time-dependent differentiable function (also denoted by h) from the interval [0, +∞[ into H is computed as the solution of the following initial value problem: dh = − ∇ J (h) + ∇ R(h) , H H dt (1.2) h(0)(·) = h0 (·). The asymptotic state (i.e., when t → ∞) of h(t) is then chosen as the solution of the matching problem, provided that h(t) ∈ F ∀t > 0. We shall restrict ourselves to the case when ∇H R is an unbounded linear operator from its domain into H, whereas ∇H J may be a nonlinear function mapping H into H. There are two (mostly theoretical) advantages in introducing the artificial time variable. The first is that, as shown in Theorem 1.2, we are able to prove the wellposedness of the initial value problem (1.2). The second is that, as shown in Proposition 2.5, we are able to prove that the asymptotic state of the solution of (1.2) is indeed a zero of the Euler–Lagrange equations and, in effect, a local minimum of I. The third advantage is practical: our experiments (see section 6) have shown that when we discretize (1.2) we converge toward a local minimum at a reasonable speed. 1.2. Existence of a classical solution of (1.2). Equation (1.2) may be viewed as a first order ordinary differential equation with values in H. By borrowing tools from functional analysis and the theory of semigroups generated by unbounded linear operators on Hilbert spaces, we can prove the existence of a unique classical solution of (1.2) under fairly general hypotheses. We refer to the books of Brezis [7], Pazy [43], and Tanabe [52] for an in-depth study of these subjects. Because of our choice of the regularization term R (described in section 2), it turns out that the mapping A : h → −∇H R(h) is linear from its domain, a subset D(A) of H, into H. Similarly, we denote by F the (nonlinear) mapping defined by h → −∇H J . F is called the matching function of the registration problem. The unknown of the initial-value problem (1.2) is an H-valued function h : [0, +∞[ → H defined on R+ . We now establish the properties required for A and F in order for (1.2), which is now written as a semilinear abstract initial value problem of the form dh − Ah(t) = F (h(t)), t > 0, dt (1.3) h(0) = h0 ∈ H, to have a unique solution. We first recall the following definition. Definition 1.1. A function h : [0, +∞[ → H is a classical solution of (1.3) if h ∈ C([0, +∞[; H) ∩ C 1 (]0, +∞[; H) ∩ C(]0, +∞[; D(A))
NONRIGID MULTIMODAL IMAGE REGISTRATION METHODS
5
and (1.3) is satisfied for t > 0. As a consequence of known results we have the following. Theorem 1.2. If in (1.3) the linear operator −A is strongly elliptic, invertible2 and the nonlinear function F is bounded and Lipschitz continuous, then there exists a unique classical solution of (1.3). Moreover, the function t → dh/dt from ]0, +∞[ into Hα (defined in the proof ) is H¨ older continuous. Proof. We use the notion of an analytical semigroup of operators on H defined, e.g., in [52, 43]. Theorem 3.6.1 in [52] or Theorem 2.7 in Chapter 7 of [43] show that if the operator −A is strongly elliptic, then A generates an analytical semigroup SA (t), t ≥ 0, on H. It follows from [43, Chapter 2, section 2.6], that (−A)α can be defined for 0 < α ≤ 1 and that (−A)α is a closed linear invertible operator with domain D((−A)α ) dense in H. It is invertible because −A is. The closedness of (−A)α implies that D((−A)α ) endowed with the graph norm of (−A)α , i.e., the norm |h| = hH + (−A)α hH , is a Banach space. Since (−A)α is invertible, its graph norm is equivalent to the norm hα = (−A)α hH . Thus, D((−A)α ) equipped with the norm · α is a Banach space, which we denote by Hα . The space Hα is continuously embedded in H for all α’s, which implies hH ≤ kα hHα
∀h ∈ Hα ,
for some kα > 0. Theorems 3.1 and 3.3 of [43] allow us to conclude. The remainder of the paper is divided into five additional sections. Section 2 discusses the regularization part of the initial-value problem (1.3). Section 3 is devoted to the definition of the two statistical dissimilarity measures we have considered and to the computations of the associated Euler–Lagrange equations. Section 4 contains the proofs of the Lipschitz continuity of the corresponding matching functions. Finally, sections 5 and 6 describe the implementation of the matching algorithms and present experimental results with real two- and three-dimensional (2D and 3D) images, respectively. 2. Regularization term. This section studies the regularization part of the initial-value problem (1.2), i.e., the term ∇H R(h). A one-parameter family of regularization operators is considered which encourages the preservation of edges of the displacement field along the edges of the reference image. In view of the results of the previous section, we choose concrete functional spaces F and H and specify the domain of the regularization operators. We then show that these operators satisfy the properties of A which are sufficient to assert the existence of a classical solution of (1.2) according to the main result of the previous section. 2.1. Function spaces and boundary conditions. We begin with a brief description of the functional spaces that will be appropriate for our purposes. In doing this, we will make reference to Sobolev spaces, denoted by W k,p (Ω). We refer to the books of Evans [19] and Brezis [7] for formal definitions and in-depth studies of the properties of these functional spaces. For the definition of ∇H I, we use the Hilbert space n
H = L2 (Ω) = L2 (Ω) × · · · × L2 (Ω) = (W 0,2 (Ω)) . n terms
2 The invertibility of A is not required, as discussed in, e.g., [43, p. 195], , but it makes the proofs simpler.
6
O. FAUGERAS AND G. HERMOSILLO
The regularization functionals that we consider are of the form
R(h) = κ (2.1) ϕ(Dh(x)) dx, Ω
where Dh(x) is the Jacobian of h at x, ϕ is a quadratic form of the elements of the matrix Dh(x), and κ > 0. Therefore the set of admissible functions F will be contained in the space n
H1 (Ω) = (W 1,2 (Ω)) . Additionally, the boundary conditions for h will be specified in F. We consider Dirichlet conditions of the form h = 0 almost everywhere on ∂Ω (in fact, because of the regularity of h, proved in Proposition 2.4, this condition holds everywhere on ∂Ω), and set n
F = H10 (Ω) = (W01,2 (Ω)) . Because of the special form of R(h), the corresponding regularization operator is a second order differential one, and we therefore will need the space n
H2 (Ω) = (W 2,2 (Ω)) for the definition of its domain.
2.2. Image-driven anisotropic diffusion. The family that we consider is obtained by defining ϕ in (2.1) by ϕ(Dh) =
(2.2)
1 Tr Dh TI1 DhT , 2
where TI1 is an n × n symmetric matrix defined at every point of Ω by the following expression: Tf =
(λ + |∇f |2 )Id − ∇f ∇f T (n − 1)|∇f |2 + nλ
for f : Rn → R, regular.
This matrix is a regularized projector in the plane perpendicular to ∇f . It was first proposed by Nagel and Enkelmann [38] for computing optical flow while preserving the discontinuities of the deforming template. As pointed out by Alvarez, Weickert, and S´ anchez [2], applying the smoothness constraint to the reference image (here I1 ) instead of the deforming one (here I2 ) allows us to avoid artifacts which appear when recovering large displacements. The matrix Tf has one eigenvector equal to ∇f , while the remaining eigenvectors span the plane perpendicular to ∇f . Its eigenvalues λi are all positive and verify i λi = 1, independently of ∇f . It is straightforward to verify that the Euler–Lagrange equation corresponding to (2.1) in this case is div(TI1 ∇h1 ) .. div(Dϕ(Dh)) = = 0. . div(TI1 ∇hn ) Thus, the regularization operator ∇H R(h) yields a linear diffusion term with TI1 as diffusion tensor. In regions where ∇hi is small compared to the parameter λ in TI1 ,
NONRIGID MULTIMODAL IMAGE REGISTRATION METHODS
7
the diffusion tensor is almost isotropic and so is the regularization. At the edges of f , where |∇I1 | λ, the diffusion takes place mainly along these edges. This operator is thus well suited for encouraging large variations of h along the edges of the reference image I1 . We define the corresponding regularization operator as follows. Definition 2.1. The linear operator A : D(A) → H is defined as D(A) = H10 (Ω) ∩ H2 (Ω), div(TI1 ∇h1 ) .. Ah = . . div(TI1 ∇hn ) We now check that −A is strongly elliptic (or, in functional analytic language, is variational) by applying the standard variational approach [19]. Proposition 2.2. The operator −A defines a bilinear form B on the space H10 (Ω) which is continuous and coercive (strongly elliptic). Proof. The proof is quite standard, and we only sketch it here. Because of the form of the operator A, it is sufficient to work on one of the coordinates, to consider the operator a : D(a) → L2 (Ω) defined by a u = div(TI1 ∇u), and to show that the operator u → −au defines a bilinear form b on the space H01 (Ω) which is continuous and coercive. Continuity is obtained from the fact that the coefficients of TI1 are bounded by integrating by parts the expression of b(u, v) and applying the Cauchy–Schwarz inequality. Coercivity is obtained from the fact that the eigenvalues of TI1 are strictly positive and by using Poincar´e’s inequality. Lemma 2.3. The linear operator −κA is invertible for all κ > 0. Proof. It is sufficient to show that the equation −κAh = f has a unique solution for all f ∈ L2 (Ω). The proof of Proposition 2.2 shows that the bilinear form associated with the operator −κA is continuous and coercive in H1 (Ω); hence the Lax–Milgram theorem tells us that the equation −κAh = f has a unique weak solution in H10 (Ω) for all f ∈ L2 (Ω). Since Ω is regular (i.e., C 2 ), the weak solution is in H10 (Ω) ∩ H2 (Ω) and is a strong solution. 2.3. Existence of a regular solution of (1.2). Now that the domain of the regularization operator is defined, we give a stronger result concerning the existence of a solution of (1.2): because the domain of A is defined in terms of Sobolev spaces, a classical solution of the abstract problem corresponds in fact to the notion of weak solution in PDE theory. The existence of a regular solution in this context may be shown, assuming regularity of the boundary ∂Ω of Ω. In effect we have the following result. Proposition 2.4. The functions (t, x) → h(t, x) and (t, x) → (∂/∂t)h(t, x) are continuous on ]0, +∞[×Ω, and for each t > 0 the function x → h(t, x) is in C2 (Ω). Proof. It follows from a theorem due to Sobolev that H2 (Ω) ⊂ C(Ω) for n = 2, 3; hence D(A) ⊂ H2 (Ω) ⊂ C(Ω) and, since h(t) ∈ D(A) for t > 0, this proves the continuity of h on ]0, +∞[×Ω. Next, because of Theorem 1.2, t → dh/dt ∈ Hα is H¨ older continuous for t > 0. Moreover, a theorem due to Sobolev, which can be found, e.g., in Theorem 8.4.3 of [43], shows that if α > 3/4 (n = 3) and α > 1/2
8
O. FAUGERAS AND G. HERMOSILLO
(n = 2), Hα is a set of H¨older continuous functions, and we have the continuity of dh/dt on ]0, +∞[×Ω. It remains to show that h(t, ·) ∈ C2 (Ω). First the function x → F (h(x)) from Ω → Rn is H¨older continuous if h is (this is proved in Propositions 4.12 and 4.22). Since h(t) ∈ H2 (Ω) for t > 0 and (Sobolev) H2 (Ω) ⊂ Cγ (Ω) (0 ≤ γ < 0.5 for n = 3 and 0 ≤ γ < 1 for n = 2), x → h(t, x) is H¨older continuous for t > 0. Finally, since (∂/∂t)h(t, .) is H¨older continuous in Ω, it follows that −Ah = F (h) − dh/dt is H¨older continuous in Ω, and by a classical regularity theorem for elliptic equations [16, 23], it follows that h(t, ·) ∈ C2+δ (Ω) for some δ > 0, i.e., has second order H¨ older continuous derivatives in the space variable and is thus a regular solution. 2.4. Existence of minimizers. Having defined the regularization functional, we discuss in this section the existence of minimizers of the global energy functional
I(h) = J (h) + κ (2.3) ϕ(Dh(x)) dx. Ω
We assume that J (h) is continuous in h and bounded below. These properties are shown for the statistical dissimilarity functionals J (h) that we study in Proposition 3.1 and Theorem 4.18. In this case, a classical result (see, e.g., Chapter 8, Theorem 2, in [19]) shows that a minimizer exists if ϕ is convex and coercive. We readily check that ϕ given in (2.2) satisfies these hypotheses. As pointed out in [2], because of the smoothness of ∂i I1 , TI1 has strictly positive eigenvalues and therefore, clearly, n T + the mapping ϕ : X ∈ R → XTI1 X ∈ R is convex. Then the functional R(h) = ϕ(Dh(x)) dx satisfies the coercivity inequality, i.e., there exist c1 > 0, c2 ≥ 0 such Ω that ϕ(Dh(x)) ≥ c1 |Dh|2 − c2 , since we have ∇uT TI1 ∇u ≥ θ|∇u|2 for all x ∈ Ω, where θ > 0 is the smallest eigenvalue of TI1 in Ω. 2.5. Existence of a limiting steady state solution. Theorem 1.2 proves the existence of a solution of the initial-value problem (1.2) or (1.3). Proposition 2.4 gives some regularity properties of this solution. We also have to show that it satisfies the Euler–Lagrange equation in the limit. In order to see this, we introduce the function L : R+ → R defined by (2.4)
L(t) = I(h(t)).
The properties of L relevant to our goal are stated in the following. Proposition 2.5. The function L is bounded below and differentiable. Its derivative is given by dh(t) 2 dL = − dt dt H and hence is continuous. Proof. The boundedness of L follows from Proposition 3.1 in the mutual information case and from Theorem 4.18 in the local cross-covariance case. In order to prove differentiability we consider L(t + ε) − L(t) I(h(t + ε)) − I(h(t)) . = ε ε We then use the first order Taylor expansion with integral remainder of the C 1 function h,
1 dh (2.6) (t + ζε)dζ, h(t + ε) = h(t) + ε 0 dt (2.5)
NONRIGID MULTIMODAL IMAGE REGISTRATION METHODS
9
and replace h(t + ε) by the right-hand side of the previous equation on the right-hand side of (2.5). We denote by k(t, ε) the integral on the right-hand side of (2.6) and obtain (2.7)
I(h(t) + εk(t, ε)) − I(h(t)) L(t + ε) − L(t) = . ε ε
In the proof of Proposition 2.4 we have shown the continuity of dh/dt on ]0, +∞[×Ω; hence on [t, t + ε] × Ω, t > 0. This shows that k(t, ε) ∈ C(Ω), t > 0. In fact, (2.5), Theorem 1.2, and Proposition 2.4 show that k(t, ε) ∈ C2 (Ω) ∩ D(A). We would now like to take the limit of both sides when ε → 0. Because of the similarity with (1.1), we define (2.8)
I(h(t) + εk(t, ε)) − I(h(t)) . δ˜k I(h(t)) = lim ε→0 ε
The difference is that the function k here also depends upon ε, unlike in (1.1). We prove that this limit is well defined and equal to δk(t,0) I(h(t)). First we show that limε→0 k(t, ε) = k(t, 0) almost everywhere in Ω. Indeed,
1 dh (t + ζε) − dh (t) dζ k(t, ε) − k(t, 0)H A ≤ dt dt H 0
1 dh dh ≤ kα dζ dt (t + ζε) − dt (t) 0 Hα for some α, 0 < α ≤ 1 (Theorem 1.2); kα is defined in the proof of the same theorem. Because of the same theorem, the function t → dh/dt from ]0, +∞[ into Hα is H¨older continuous, say with exponent γ > 0. This proves that
1 dh (t + ζε) − dh (t) dζ ≤ Cεγ , dt dt Hα 0 where C is independent of ε and the convergence in H follows. Since H = L2 (Ω) and k(t, ε) is continuous in Ω, the L2 convergence implies the convergence almost everywhere. The proof that δ˜k I(h(t)) = δk(t,0) I(h(t)) follows from this result, the computations done in sections 3.1.2 and 3.2, the fact that k(t, ε) converges almost everywhere toward k(t, 0), and the fact that each coordinate of k(t, ε) is bounded and hence in tegrable. The last two points guarantee that such integrals as Ω k(t, ε)(x) · ∇I2 (x + h(t, x)) dx converge toward Ω k(t, 0)(x) · ∇I2 (x + h(t, x)) dx by applying the dominated convergence theorem. The boundedness of k(t, ε) follows from its continuity on the compact set Ω. The continuity of dL/dt follows from the continuity of dh/dt (Theorem 1.2) and the continuity of the norm in H. We have therefore proved that our criterion I decreases along the “trajectory” h(t), solution of the initial-value problem (1.3). In order to prove that the solution asymptotically satisfies the Euler–Lagrange equations, we prove the following claim. Proposition 2.6. If dh(t)/dtH is bounded on ]0, +∞[, the limit when t → +∞ of the derivative dL/dt of the function L(t) defined by (2.4) is equal to 0. Proof. We assume the converse and prove a contradiction. We assume that dL/dt does not go to zero when t goes to infinity. Therefore there exists ε > 0 such that for
10
O. FAUGERAS AND G. HERMOSILLO
all T > 0 there exists t > T such that dL(t)/dt < −ε. We can therefore construct an infinite, strictly increasing, sequence {tn }, n ≥ 0, tn > 0, such that dL(tn )/dt < −ε. Let us assume that there exists η(tn ) > 0 such that dL(tn + η(tn ))/dt = −ε/2 and dL(t)/dt < −ε/2 for all tn ≤ t < tn + η(tn ). If there is no such tn , we are done; otherwise we choose tn+1 > tn + η(tn ). We prove that η(tn ) is bounded below. Indeed we write 2 2 dL(tn + η(tn )) dL(tn ) dh dh − = (tn + η(tn )) − (tn ) ε/2 ≤ dt dt dt dt H H dh dh dh dh = (tn + η(tn )) + (tn ) dt (tn + η(tn )) − dt (tn ) . dt dt H H H H Let A be an upper bound on dh(t)/dtH ; then the rightmost term in the above equation is less than or equal to dh dh (tn )) 2A (tn + η(tn )) − . dt dt H According to Theorem 1.2, dh (tn + η(tn )) − dh (tn )) ≤ kα dt dt H
dh (tn + η(tn )) − dh (tn )) ≤ kα (η(tn ))γ , dt dt Hα
and therefore η(tn ) ≥
ε 4Akα
1/γ .
Consider now the sequence of intervals [tn , tn + η(tn )]. On each such interval we have dL/dt ≤ −ε/2, and therefore ε ε L(tn + η(tn )) ≤ L(tn ) − η(tn ) ≤ L(tn ) − 2 2
ε 4Akα
1/γ .
This contradicts the fact that L(t) is bounded below; hence we have lim
t→+∞
dL = 0. dt
This result shows that, asymptotically, dL/dt = 0, therefore dh(t)/dt = 0, and hence the solution of the initial-value problem (1.2) satisfies the Euler–Lagrange equations associated with the energy functional I. 3. Statistical similarity measures and their Euler–Lagrange equations. Our approach to matching multimodal images relies on regarding the intensity values of two different modalities as samples of two random processes. Within this probabilistic framework, the link between the two modalities is characterized by their joint probability density function (pdf). In this section, we compute the variational gradient of two statistical dissimilarity functionals. Among many possible criteria, the cross-covariance and the mutual information provide us with a convenient trade-off between robustness and generality. To be able to evaluate these criteria for a given field h, we consider a nonparametric Parzen estimator [42] for the joint pdf as described
NONRIGID MULTIMODAL IMAGE REGISTRATION METHODS
11
below. This estimator can be either global, assuming that the unknown relationship between the intensity values does not vary spatially, or local. We give only one example of each case because they are the ones we found to work best in our applications. However, the method and the proofs are quite generic. More examples can be found in [25]. We denote by X the random variable associates with the intensity values of I1 and Yh that associated to the values of I2 (Id + h). For conciseness, we will use the notation i = (i1 , i2 ) and Ih (x) = (I1 (x), I2 (x + h(x))). 3.1. Global mutual information. Our estimator is based on a one-dimensional (1D) normalized Gaussian kernel of variance β, 2 1 −i gβ (i) = √ exp 2β 2πβ from which we construct Gβ (i) = gβ (i1 ) gβ (i2 ):
1 Gβ (Ih (x) − i) dx. P (i, h) = |Ω| Ω Note that for each pair of intensities i ∈ R2 the value of the estimated joint pdf is a nonlinear functional of h. Also note that it is strictly positive. Its first variation is obtained by applying (1.1): P (i, h + εk) − P (i, h)
ε 1 = gβ (I1 (x) − i1 ) (gβ (I2 (x + h(x) + εk(x)) − i2 ) − gβ (I2 (x + h(x)) − i2 )) dx. ε|Ω| Ω We use the first order Taylor expansion with integral remainder of the C 1 function gβ for the second factor within the integral: P (i, h + εk) − P (i, h)
ε 1 I2 (x + h(x) + εk(x)) − I2 (x + h(x)) = gβ (I1 (x) − i1 ) |Ω| Ω ε
1 gβ (I2 (x + h(x)) − i2 + ζ(I2 (x + h(x) + εk(x)) − I2 (x + h(x)))) dζ dx. × 0
Taking the limit when ε → 0, we obtain
1 δk P (i, h) = (3.1) ∂2 Gβ (Ih (x) − i)∇I2 (x + h(x)) · k(x) dx, |Ω| Ω where ∂2 indicates the first order partial derivative with respect to the second variable. Using this estimate, we first consider the maximization of mutual information, a concept which is borrowed from information theory. Given two random variables X and Y , their mutual information is defined as MI(X, Y ) = H(X) + H(Y ) − H(X, Y ) , where H stands for the differential entropy. The mutual information is positive and symmetric and measures how the intensity distributions of two images fail to be
12
O. FAUGERAS AND G. HERMOSILLO
independent. It can be defined in terms of the joint pdf P and its marginals p(i1 ) = P (i, h) di and p(i , h) = P (i, h) di1 . These last two functions are also strictly 2 2 R R positive. The following short notation will be useful: E MI (i, h) = − log
(3.2)
P (i, h) . p(i1 ) p(i2 , h)
The dissimilarity functional based on mutual information is then defined as the expected value of the function E MI :
JMI (h) = −MI(X, Yh ) = P (i, h) E MI (i, h) di. R2
3.1.1. Continuity of JMI (h). Recall that the existence of minimizers for I(h) was discussed by assuming the continuity and boundedness of J (h). Proposition 3.1. Let hn , n = 1, . . . , ∞, be a sequence of functions of H such that hn → h almost everywhere in Ω. Then JMI (hn ) → JMI (h). Moreover, |JMI (h)| is bounded. Proof. Because I2 and gβ are continuous, Gβ (Ihn (x) − i) → Gβ (Ih (x) − i) almost everyone in Ω for all i. Since Gβ (Ihn (x) − i) ≤ gβ (0)2 , the dominated convergence theorem implies that P (i, hn ) → P (i, h) for all i ∈ R2 . A similar reasoning shows that p(i2 , hn ) → p(i2 , h) for all i2 ∈ R. Hence, the logarithm being continuous, and p(i1 ), P (i, hn ), P (i, h), p(i2 , hn ), and p(i2 , h) being > 0, P (i, hn ) log
P (i, h) P (i, hn ) → P (i, h) log p(i1 )p(i2 , hn ) p(i1 )p(i2 , h)
∀i ∈ R2 .
We next consider three cases to find an upper bound for P (i, hn ) log We detail only the first one: i2 ≤ 0 This is the case where
P (i,hn ) p(i1 )p(i2 ,hn ) .
0 ≤ |i2 | ≤ |i2 − I2 (x + hn (x))| ≤ |i2 − A| n ≥ 1. Hence gβ (i2 − A) ≤ gβ (i2 − I2 (x + hn (x))) ≤ gβ (i2 ) n ≥ 1. This yields P (i, hn ) gβ (i2 ) gβ (i2 − A) ≤ ≤ gβ (i2 ) p(i1 )p(i2 , hn ) gβ (i2 − A) and
and therefore
log P (i, hn ) ≤ log gβ (i2 ) , p(i1 )p(i2 , hn ) gβ (i2 − A) P (i, hn ) log
P (i, hn ) gβ (i2 ) ≤ gβ (i2 )p(i1 ) log . p(i1 )p(i2 , hn ) gβ (i2 − A)
The function on the right-hand side is continuous and integrable in R×] − ∞, A]. The next two cases, 0 ≤ i2 ≤ A and i2 ≥ A, are left to the reader. Combining the three cases, the dominated convergence theorem implies that
P (i, hn ) Ph (i) di → JMI (h) = − di. P (i, hn ) log Ph (i) log JMI (hn ) = − p(i1 )p(i2 , hn ) p(i1 )ph (i2 ) R2 R2 The proof also shows that |JMI (h)| is bounded.
NONRIGID MULTIMODAL IMAGE REGISTRATION METHODS
13
3.1.2. Euler–Lagrange equation. We do an explicit computation of the first variation of JMI by applying (1.1). First we obtain δk E MI (i, h) = −
δk P (i, h) δk p(i2 , h) + . P (i, h) p(i2 , h)
We then write
δk JMI (h) =
[δk P (i, h) E MI (i, h) + P (i, h) δk E MI (i, h)] di
P (i, h) = (E MI (i, h) − 1) δk P (i, h) + δk p(i2 , h) di. p(i2 , h) R2
Using the fact that
R2
R2
P (i, h) δk p(i2 , h) di = p(i2 , h)
R
δk p(i2 , h) di2 = 0,
this yields
δk JMI (h) =
R2
(E MI (i, h) − 1) δk P (i, h) di.
We then apply (3.1) to obtain
1 δk JMI (h) = (E MI (i, h) − 1) ∂2 Gβ (Ih (x) − i)∇I2 (x + h(x)) · k(x) dx di. |Ω| R2 Ω A convolution with respect to the intensity variable i appears in this expression. It commutes with the derivative ∂2 with respect to the second intensity variable i2 , and therefore
1 Gβ ∂2 E MI Ih (x), h ∇I2 (x + h(x)) · k(x) dx. δk JMI (h) = |Ω| Ω To simplify notation, we denote by LMI the function function fMI : R2 × H → R:
1 |Ω| ∂2
E MI . We then define the
fMI (i, h) = Gβ LMI (i, h).
(3.3) It is easily verified that (3.4)
LMI (i, h) = −
1 |Ω|
∂2 P p 1 (i, h) − (i2 , h) = (r(i2 , h) − R(i, h)), P p β
where (3.5)
(3.6)
r(i2 , h) =
1 1 p(i2 , h) β|Ω|
1 1 R(i, h) = P (i, h) β|Ω|
Ω
I2 (x + h(x)) gβ (I2 (x + h(x)) − i2 ) dx,
Ω
I2 (x + h(x)) Gβ (Ih (x) − i) dx.
14
O. FAUGERAS AND G. HERMOSILLO
Fig. 1. Effect on P (i, h) of minimizing JMI (h) with respect to h: the two images on the right show the estimated joint pdf between the two images on the left before and after optimization. Notice that the joint pdf has been clustered, but the final shape of its support remains nonfunctional.
Since I2 (x + h(x)) ∈ [0, A] for all h ∈ H and for all x ∈ Ω, we have (3.7)
0 ≤ r(i2 , h), R(i, h) ≤
A β
∀h ∈ H.
For each h ∈ H, the mapping k → δk JMI (h) is clearly linear. To show that it is continuous it is sufficient, according to Schwarz inequality, to show that the function x → fMI (Ih (x), h)∇I2 (x + h(x)) is bounded, and this is a consequence of Theorem 4.11. The variational gradient ∇H JMI (h) of JMI can therefore be defined, and its expression is given by ∇H JMI (h)(x) = fMI Ih (x), h ∇I2 (x + h(x)). The function LMI plays the role of an intensity comparison function. Its first term ∂2 P (i, h)/P (i, h) tends to cluster the joint pdf, while the term −p (i2 , h)/p(i2 , h) tries to prevent the marginal law p(i2 , h) from becoming too clustered; i.e., it keeps the intensities of I2 (Id + h) as unpredictable as possible. For an example of the effect on P (i, h) of minimizing JMI (h) with respect to h, see Figure 1. The reader will notice that the resulting joint pdf is more localized than the original one but that the minimization has not imposed a strong functional relation of the type i1 = f (i2 ), unlike the method described in the next section. According to the notation introduced in section 1.2, we define FMI : H → H by FMI (h)(x) = −fMI Ih (x), h ∇I2 (x + h(x)). (3.8) 3.2. Local cross-covariance. An interesting generalization is to make the probability density estimator local, since it allows one to take into account nonstationarities in the relation between intensities. To do this, we build an estimate in the neighborhood of each point x0 in Ω. This is achieved by weighting our previous estimate with a normalized spatial Gaussian of variance γ: Gγ (x) =
2 1 − |x| e 2γ . 2πγ
This means that to each point x0 of Ω we associate a joint pdf defined by
1 P (i, h, x0 ) = Gβ (Ih (x) − i)Gγ (x − x0 ) dx, Gγ (x0 ) Ω where Gγ (x0 ) = Ω Gγ (x − x0 ) dx. This new pdf is along the line of the ideas discussed in [30], except that we consider a bidimensional local histogram at each
15
NONRIGID MULTIMODAL IMAGE REGISTRATION METHODS
Table 3.1 Definitions of the local mean µ1 and variance v1 of X, the local mean µ2 and variance v2 of Yh , and the local correlation v1,2 of X and Yh .
µ1 (x) =
R i1
µ2 (h, x) =
p(i1 , x) di1
R i2
v1 (x) =
p(i2 , h, x) di2
v1,2 (h, x) =
2 R i1
v2 (h, x) =
R2 i1 i2
p(i1 , x) di1 − µ21 (x)
2 R i2
p(i2 , h, x) di2 − µ2 (h, x)2
P (i, h, x) di − µ1 (x) µ2 (h, x)
point. Its marginals are now given by p(i1 , x) = R P (i, h, x) di2 and p(i2 , h, x) = P (i, h, x) di1 . Using these local estimates, we consider the case of the crossR covariance, which has been widely used as a robust comparison function for image matching. Within recent energy-minimization approaches relying on the computation of its gradient, we can mention, for instance, the works of Faugeras and Keriven [21], Cachier and Pennec [11], and Netsch et al. [39]. The cross-covariance, being a measure of the affine dependency between the intensities, is more constraining than the mutual information. A dissimilarity functional is obtained as a function of the quantities defined in Table 3.1 by averaging the local cross-covariance
v1,2 (h, x)2 JCC (h) = dx. JCC (h, x) dx = − Ω Ω v1 (x) v2 (h, x) The minus sign simply reflects the fact that we want to minimize the dissimilarity. The continuity and boundedness of this criterion (needed for proving the existence of a minimizer) are a consequence of Theorem 4.18. In a way similar to the case of the mutual information, its first order variation is well defined and defines a gradient given by ∇H JCC (h)(x) = fCC Ih (x), h, x ∇I2 (x + h(x)), where fCC (i, h, x) = Gγ Gβ LCC (i, h, x), LCC (i, h, x) = and E CC (i, h, x) = −
1 ∂2 E CC (i, h, x), Gγ (x)
1 2 v v1 (x) v2 (h, x)
1,2 (h, x) i2
(i1 − µ1 (x))
+ JCC (h, x) v1 (x) i2 (i2 − 2 µ2 (h, x)) .
Hence (3.9) 2 LCC (i, h, x) = − Gγ (x)
v1,2 (h, x) v2 (h, x)
i1 − µ1 (x) v1 (x)
+ JCC (h, x)
i2 − µ2 (h, x) v2 (h, x)
.
16
O. FAUGERAS AND G. HERMOSILLO
Notice that LCC is an affine function of i, and therefore Gβ LCC (i, h, x) = LCC (i, h, x). Thus fCC (i, h, x) = Gγ LCC (i, h, x) v1,2 (h, x) i1 − µ1 (x) 2 i2 − µ2 (h, x) + JCC (h, x) . = −Gγ Gγ (x) v2 (h, x) v1 (x) v2 (h, x) As in the mutual information case, the function LCC compares intensities in the two images. It shows that minimizing JCC with respect to the field h amounts to attempting to make the pair of intensities at corresponding pixels lie on a straight line in R2 . According to the notation introduced in section 1.1, we define FCC : H → H by (3.10)
FCC (h)(x) = −fCC (Ih (x), h, x)∇I2 (x + h(x)).
For an example of the effect of minimizing JCC (h) with respect to h on P (i, h), see Figure 2. The reader will notice that the resulting joint pdf is more localized than the original one and that the minimization has imposed an affine relation between the intensities i1 and i2 , unlike the method described in the previous section.
Fig. 2. Effect on P (i, h) of minimizing JCC (h) with respect to h: the two images on the right show the estimated joint pdf between the two images on the left before and after optimization. Notice that the joint pdf has been clustered, and the final shape of its support approaches an affine function. The global estimation of the cross-covariance has been used to illustrate this effect. In the local case, each locally estimated pdf would undergo this effect.
4. Lipschitz continuity of the matching functions. This section is devoted to proving that the functions FMI and FCC are Lipschitz continuous. We begin with some elementary results on Lipschitz continuous functions that will be used very often in what follows. We state them here without proof. Proposition 4.1. Let H be a Banach space, and let us denote its norm by .H . Let fi , i = 1, 2 : H → R, be two Lipschitz continuous functions. We have the following: 1. f1 + f2 is Lipschitz continuous. 2. If f1 and f2 are bounded, then the product f1 f2 is Lipschitz continuous. 3. If f2 > 0 and if f1 and f2 are bounded, then the ratio ff12 is Lipschitz continuous. In what follows, we will need the following definitions and notations.
NONRIGID MULTIMODAL IMAGE REGISTRATION METHODS
17
Definition 4.2. We denote by H1 = [0, A] × H and H2 = [0, A]2 × H the Banach spaces equipped with the norms (z, h)H1 = z + hH and (z1 , z2 , h)H2 = z1 + z2 + hH , respectively. We will use several times the following (obvious) result. Lemma 4.3. Let f : H2 → R be such that (z1 , z2 ) → f (z1 , z2 , h) is Lipschitz continuous for all h with a Lipschitz constant lf independent of h and such that h → f (z1 , z2 , h) is Lipschitz continuous for all (z1 , z2 ) with a Lipschitz constant Lf independent of (z1 , z2 ). Then f is Lipschitz continuous. 4.1. Global mutual information. In the following, we will use the function LMI : [0, A]2 × H → R defined in (3.4)–(3.6) We then consider the result fMI of convolving LMI with Gβ , i.e., the two functions s : H1 → R, defined as
(4.1)
s(z2 , h) = (gβ r)(z2 , h) =
R
gβ (z2 − i2 )r(i2 , h) di2 ,
and S : H2 → R defined as
(4.2)
S(z1 , z2 , h) = (Gβ R)(z, h) =
R2
Gβ (z − i)R(i, h) di.
We prove a series of propositions. Proposition 4.4. For each h ∈ H, the function [0, A] → R+ defined by z2 → s(z2 , h) is Lipschitz continuous with a Lipschitz constant ls , which is independent of h. Moreover, it is bounded by A β. Proof. The second part follows from (3.7). We then prove that the magnitude of the derivative of the function is bounded independently of h. Indeed
1 +∞ (z − i )g (z − i )r(i , h)di 2 2 β 2 2 2 2 β −∞
A +∞ A ≤ |z2 − i2 |gβ (z2 − i2 )di2 = 2 2 . β −∞ β
|s (z2 , h)| =
Proposition 4.5. For each z2 ∈ [0, A], the function h → s(z2 , h) : H → R is Lipschitz continuous on H with Lipschitz constant Ls , which is independent of z2 ∈ [0, A]. Proof. We consider
s(z2 , h1 ) − s(z2 , h2 ) = (4.3) gβ (z2 − i2 ) (r(i2 , h1 ) − r(i2 , h2 )) di2 . R
According to (3.5), r(i2 , h) is proportional to the ratio N (i2 , h)/p(i2 , h) of the two functions
I2 (x + h(x))gβ (I2 (x + h(x)) − i2 ) dx N (i2 , h) = Ω
and
p(i2 , h) =
Ω
gβ (I2 (x + h(x)) − i2 ) dx.
18
O. FAUGERAS AND G. HERMOSILLO
We write
(4.4)
N (i2 , h2 ) |p(i2 , h2 ) − p(i2 , h1 )| di2 p(i2 , h1 )p(i2 , h2 ) R
p(i2 , h2 ) |N (i2 , h2 ) − N (i2 , h1 )| di2 , + gβ (z2 − i2 ) p(i2 , h1 )p(i2 , h2 ) R
|s(z2 , h1 ) − s(z2 , h2 )| ≤
gβ (z2 − i2 )
and consider the first term on the right-hand side:
p(i2 , h2 ) − p(i2 , h1 ) = (gβ (i2 − I2 (x + h2 (x)) ) − gβ (i2 − I2 (x + h1 (x)) )) dx. Ω
We use the first order Taylor expansion with integral remainder of the C 1 function gβ :
1 gβ (i + tα) dα. gβ (i + t) = gβ (i) + t 0
We can therefore write gβ (i2 − I2 (x + h2 (x)) ) − gβ (i2 − I2 (x + h1 (x)) )
1 = (I2 (x + h1 (x)) − I2 (x + h2 (x))) gβ i2 − α I2 (x + h2 (x)) 0
+(1 − α) I2 (x + h1 (x)) dα.
We use the fact that I2 is Lipschitz continuous and write |p(i2 , h2 ) − p(i2 , h1 )|
|h1 (x) − h2 (x)| ≤ Lip(I2 ) Ω
×
0
1
gβ (i2
− (α I2 (x + h2 (x)) + (1 − α) I2 (x + h1 (x)) )) dα dx.
The Schwarz inequality implies |p(i2 , h2 ) − p(i2 , h1 )|
≤ Lip(I2 )h1 − h2 H
Ω
0
1
gβ (i2 − (α I2 (x + h2 (x)) +(1 − α) I2 (x + h1 (x)) )) dα
2
12 dx
.
We introduce the function 1 c(i2 , h1 , h2 ) = i2 − α I2 (x + h2 (x)) + (1 − α) I2 (x + h1 (x)) Ω
0
gβ i2 − α I2 (x + h2 (x)) + (1 − α) I2 (x + h1 (x))
2 dα
12 dx
19
NONRIGID MULTIMODAL IMAGE REGISTRATION METHODS
and notice that
1 0
Ω
≤
gβ (i2
12
2 − (α I2 (x + h2 (x)) + (1 − α) I2 (x + h1 (x)) )) dα
dx
1 c(i2 , h1 , h2 ). β
So far we have
N (i2 , h2 ) |p(i2 , h2 ) − p(i2 , h1 )| di2 gβ (z2 − i2 ) (4.5) p(i2 , h1 )p(i2 , h2 ) R
Lip(I2 ) N (i2 , h2 ) c(i2 , h1 , h2 ) h1 − h2 H di2 . ≤ gβ (z2 − i2 ) β p(i2 , h1 )p(i2 , h2 ) R We study the function of z2 that is on the right-hand side of this inequality. First we note that the function is well defined since no problems occur when i2 goes to infinity because “there are three Gaussians in the numerator and two in the denominator.” We then show that this function is bounded independently of h1 and h2 for all z2 ∈ [0, A]. As in the proof of Proposition 3.1, we consider three cases and detail only the first one: i2 ≤ 0. This is the case where |i2 − I2 (x + hj (x))| ≤ |i2 − A|, 0 ≤ |i2 | ≤ 0 ≤ |i2 | ≤ |i2 − (αI2 (x + h2 (x)) + (1 − α)I2 (x + h1 (x)))| ≤ |i2 − A|,
j = 1, 2, 0 ≤ α ≤ 1.
Hence gβ (i2 − I2 (x + hj (x))) ≤ gβ (i2 ), j = 1, 2, gβ (i2 − A) ≤ gβ (i2 − A) ≤ gβ (i2 − (αI2 (x + h2 (x)) + (1 − α)I2 (x + h1 (x)))) ≤ gβ (i2 ), 0 ≤ α ≤ 1. This yields
0
N (i2 , h2 ) c(i2 , h1 , h2 ) di2 gβ (z2 − i2 ) p(i2 , h1 )p(i2 , h2 ) −∞ 2
0 gβ (i2 ) 1/2 ≤ |Ω| A gβ (z2 − i2 )|i2 − A| di2 . gβ (i2 − A) −∞
The integral on the right-hand side is well defined and defines a continuous function of z2 . The remaining two cases, 0 ≤ i2 ≤ A and i2 ≥ A, are left to the reader. In all three cases, the functions of z2 appearing on the right-hand side are continuous, independent of h1 and h2 , and therefore upperbounded on [0, A] by a constant independent of h1 and h2 . Returning to inequality (4.5), we have proved that there existed a positive constant C independent of z2 such that
N (i2 , h2 ) |p(i2 , h2 ) − p(i2 , h1 )| di2 ≤ Ch1 − h2 H gβ (z2 − i2 ) p(i2 , h1 )p(i2 , h2 ) R ∀z2 ∈ [0, A], ∀h1 , h2 ∈ H. A similar proof can be developed for the second term in the right-hand side of inequality (4.4). In conclusion, we have proved that there exists a constant Ls , independent of z2 , such that |s(z2 , h1 ) − s(z2 , h2 )| ≤ Ls h1 − h2 H
∀z2 ∈ [0, A],
∀h1 , h2 ∈ H.
20
O. FAUGERAS AND G. HERMOSILLO
Thus, we can state the following. Proposition 4.6. The function s : H1 → R is Lipschitz continuous. Proof. The proof follows from Propositions 4.4 and 4.5 and Lemma 4.3. We now proceed with showing the same kind of properties for the function S. Proposition 4.7. For all h ∈ H, the function [0, A]2 → R+ defined by (z1 , z2 ) → S(z1 , z2 , h) is Lipschitz continuous with a Lipschitz constant lS , which is independent of h. Moreover, it is bounded by A β. Proof. The proof follows the same pattern as the proof of Proposition 4.4. Proposition 4.8. For all (z1 , z2 ) ∈ [0, A]2 , the function h → S(z1 , z2 , h), H → R is Lipschitz continuous with a Lipschitz constant LS , which is independent of (z1 , z2 ) ∈ [0, A]2 . Proof. The proof follows the same pattern as that of Proposition 4.5. Therefore we can state the following result. Proposition 4.9. The function S : H2 → R is Lipschitz continuous. Proof. The proof follows those of Propositions 4.7 and 4.8 and Lemma 4.3. From Propositions 4.6, 4.9, and 4.1 we obtain the following. Corollary 4.10. The function fMI : H2 → R defined by (z1 , z2 , h) → s(z2 , h) − S(z1 , z2 , h) is Lipschitz continuous and bounded by 2 A β . We denote by Lip(fMI ) the corresponding Lipschitz constant. We can now state the main result of this section as follows Theorem 4.11. The function FMI : H → H defined by FMI (h) = −fMI (I1 (Id), I2 (Id + h), h)∇I2 (Id + h) = (S(I1 (Id), I2 (Id + h), h) − s(I2 (Id + h), h))∇I2 (Id + h), is Lipschitz continuous and bounded. Proof. Boundedness comes from Corollary 4.10 and the fact that |∇I2 | is bounded. This implies that FMI (h) ∈ H = L2 (Ω) for all h ∈ H. i of FMI : We consider the ith component FMI i i FMI (h1 )(x) − FMI (h2 )(x) = U1 T1i − U2 T2i ,
i = 1, . . . , n,
with Uj = S(I1 (x), I2 (x + hj (x)), hj ) − s(I2 (x + hj (x)), hj ), Tji = ∂i I2 (x + hj (x)), i = 1, · · · , n, and j = 1, 2. We continue with i i |FMI (h1 )(x) − FMI (h2 )(x)| ≤ |U1 − U2 ||T1 | + |U2 ||T1 − T2 |.
Because ∂i I2 is bounded, |Tji | ≤ ∂i I2 ∞ . Because of Corollary 4.10, |U2 | ≤ 2 A β. Because ∂i I2 is Lipschitz continuous, |T1 − T2 | ≤ Lip(∂i I2 )|h1 (x) − h2 (x)|. Finally, because of Corollary 4.10 and the fact that I2 is Lipschitz continuous, |U1 − U2 | ≤ Lip(fMI ) (Lip(I2 )|h1 (x) − h2 (x)| + h1 − h2 H ) . Collecting all terms, we obtain i i |FMI (h1 )(x) − FMI (h2 )(x)| ≤ C i (|h1 (x) − h2 (x)| + h1 − h2 H )
NONRIGID MULTIMODAL IMAGE REGISTRATION METHODS
21
for some positive constant C i , i = 1, . . . , n. The last inequality yields, through the application of the Cauchy–Schwarz method, FMI (h1 ) − FMI (h2 )H ≤ LF h1 − h2 H for some positive constant LF and this completes the proof. The following proposition is needed in the proof of Proposition 2.4. Proposition 4.12. The function Ω → Rn such that x → FMI (h)(x) satisfies |FMI (h)(x) − FMI (h)(y)| ≤ K(|x − y| + |h(x) − h(y)|) for some constant K > 0. Proof. We write FMI (h)(y) − FMI (h)(x) = fMI (h)(x)∇I2 (x + h(x)) − fMI (h)(x)∇I2 (y + h(y)) + fMI (h)(x)∇I2 (y + h(y)) − fMI (h)(y)∇I2 (y + h(y)). Hence |FMI (h)(x) − FMI (h)(y)| ≤ +|fMI (h)(x)| |∇I2 (x + h(x)) − ∇I2 (y + h(y))| |∇I2 (y + h(y))| |fMI (h)(x) − fMI (h)(y)|. Corollary 4.10 and the fact that the functions I1 , I2 , and its first order derivative are Lipschitz continuous imply |FMI (h)(x) − FMI (h)(y)| A ≤ 2 Lip(∇I2 )(|x − y| + |h(x) − h(y)|) + |∇I2 |∞ Lip(fMI )|h(x) − h(y)|, β and hence the result. 4.2. Local cross-covariance. In this section we prove the Lipschitz-continuity of the mapping H → H defined by FCC (h) (see (3.10)). The reasoning is analogous to that for the global case. We start with some estimates of the bounds of the means and variances of the relevant random variables. Lemma 4.13. Let diam(Ω) be the diameter of the open bounded set Ω:diam(Ω) = supx ,y∈Ω x−y. We denote by Gγ (diam(Ω)) the value inf x ,y∈Ω Gγ (x−y) and define (4.6)
Gγ (0) . Gγ (diam(Ω)) 1 ≥ |Ω|Gγ (diam(Ω)) > 0 and Ω Gγ (x−x0 ) Gγ (x dx0 ≤ KΩ for all x ∈ 0) KΩ =
1 We have Gγ (x 0) Ω. Proof. Since Gγ (x0 ) = Ω Gγ (y − x0 ) dy, we have Gγ (x0 ) ≥ |Ω|Gγ (diam(Ω)). Therefore
1 1 Gγ (x − x0 ) Gγ (x − x0 )dx0 dx0 ≤ |Ω|Gγ (diam(Ω)) Ω Gγ (x0 ) Ω 1 × |Ω|Gγ (0) = KΩ . ≤ |Ω|Gγ (diam(Ω))
Lemma 4.14. For all x0 ∈ Ω, the following inequalities are verified: 0 ≤ µ1 (x0 ) ≤ A
and
β ≤ v1 (x0 ) ≤ β + A2 .
22
O. FAUGERAS AND G. HERMOSILLO
Proof. We have
(4.7) µ1 (x0 ) =
R2
(4.8)
=
i1
1 Gγ (x0 )
1 Gβ (Ih (x) − i) Gγ (x − x0 ) dx di1 di2 Gγ (x0 ) Ω
I1 (x) Gγ (x − x0 ) dx, Ω
and the first inequalities follow from the fact that I1 (x) ∈ [0, A]. Similarly, for v1 (x0 ), we have
(4.9) v1 (x0 ) = 1 =β+ Gγ (x0 )
R2
i21
Ω
1 Gγ (x0 )
Ω
Gβ (Ih (x) − i) Gγ (x − x0 ) dx
2
I1 (x) Gγ (x − x0 ) dx −
1 Gγ (x0 )
Ω
di1 di2 − µ21 (x0 )
2 I1 (x) Gγ (x − x0 ) dx ,
from which the right-hand side of the second inequality follows. The application of the Cauchy–Schwarz inequality to
I1 (x) Gγ (x − x0 ) I1 (x) Gγ (x − x0 ) dx = Gγ (x − x0 ) dx Ω
Ω
yields the left-hand side. We then characterize the mean µ2 (x0 , h); see Table 3.1. Lemma 4.15. The function Ω × H −→ R defined by (x0 , h) −→ µ2 (x0 , h) is bounded and Lipschitz continuous in H uniformly in Ω. Proof. According to the definition of µ2 in Table 3.1 we have
1 I2 (x + h(x))Gγ (x − x0 ) dx. µ2 (x0 , h) = Gγ (x0 ) Ω We have immediately µ2 (x0 , h) ≤ A. Moreover, |µ2 (x0 , h1 ) − µ2 (x0 , h2 )| ≤
1 Gγ (x0 )
Ω
|I2 (x + h1 (x)) − I2 (x + h2 (x))|Gγ (x − x0 ) dx.
Since I2 is Lipschitz, a combination of Lemma 4.13 and the Cauchy–Schwarz inequality yields |µ2 (x0 , h1 ) − µ2 (x0 , h2 )| ≤ Lip(I2 )KΩ h1 − h2 H . We have similar properties for the variance v2 (x0 , h); see Table 3.1. Lemma 4.16. The function Ω × H −→ R defined by (x0 , h) −→ v2 (x0 , h) is strictly positive, bounded, and Lipschitz continuous in H uniformly in Ω. Proof. According to the definition of v2 in Table 3.1 we have
1 2 i2 gβ (I2 (x + h(x)) − i2 ) Gγ (x − x0 ) dx di2 − µ2 (x0 , h)2 . v2 (h, x0 ) = Gγ (x0 ) R2 Ω
NONRIGID MULTIMODAL IMAGE REGISTRATION METHODS
Hence v2 (h, x0 ) = β +
1 Gγ (x0 )
Ω
23
I2 (x + h(x))2 Gγ (x − x0 ) dx − µ2 (h, x0 )2 .
Therefore v2 (h, x0 ) ≤ β + A2 . To prove that it is strictly positive, we write
1 v2 (h, x0 ) = β + I2 (x + h(x))2 Gγ (x − x0 ) dx Gγ (x0 ) Ω 2
1 I2 (x + h(x))Gγ (x − x0 ) dx , − Gγ (x0 ) Ω and the same reasoning as in Lemma 4.14 shows that β ≤ v2 (h, x0 ). For the second part, since µ2 (h, x0 ) is bounded and Lipschitz continuous uniformly in Ω (Lemma 4.15), it suffices to show the Lipschitz continuity of the first term on the right-hand side. For this term we have, using again a combination of Lemma 4.13 and the Cauchy–Schwarz inequality,
1 2 2 I2 (x + h1 (x)) Gγ (x − x0 ) dx − I2 (x + h2 (x)) Gγ (x − x0 ) dx Gγ (x0 ) Ω Ω
1 (I2 (x+h1 (x))+I2 (x+h2 (x))) |I2 (x+h1 (x))−I2 (x+h2 (x))| Gγ (x−x0 ) dx ≤ Gγ (x0 ) Ω ≤ 2A Lip(I2 ) KΩ h1 − h2 H We now show the boundedness and Lipschitz continuity of the correlation v1,2 (x0 , h); see Table 3.1. Proposition 4.17. The function H × Ω → R defined by (h, x0 ) → v1,2 (h, x0 ) is bounded and Lipschitz continuous in H, uniformly in Ω. Proof. According to the definition of v1,2 in Table 3.1 we have
1 v1,2 (h, x0 ) = i 1 i2 Gβ (Ih (x) − i) Gγ (x − x0 ) dx di1 di2 Gγ (x0 ) R2 Ω − µ1 (x0 )µ2 (x0 , h). Hence v1,2 (h, x0 ) =
1 Gγ (x0 )
Ω
I1 (x) I2 (x + h(x)) Gγ (x − x0 ) dx − µ1 (x0 ) µ2 (h, x0 ).
Thus we have, for all x0 ∈ Ω, |v1,2 (h, x0 )| ≤ 2A2 , which proves the first part of the proposition. For the second part, since µ2 (h, x0 ) is Lipschitz continuous uniformly in Ω (Lemma 4.15) and µ1 is bounded, it suffices to show the Lipschitz continuity of the first term on the right-hand side. For this term we have, using again a combination of Lemma 4.13 and the Cauchy–Schwarz inequality,
1 I1 (x)I2 (x+h1 (x)) Gγ (x−x0 ) dx − I1 (x) I2 (x+h2 (x)) Gγ (x−x0 ) dx Gγ (x0 ) Ω Ω
1 ≤ |I1 (x)| |I2 (x + h1 (x)) − I2 (x + h2 (x))| Gγ (x − x0 ) dx Gγ (x0 ) Ω ≤ A Lip(I2 ) KΩ h1 − h2 H .
24
O. FAUGERAS AND G. HERMOSILLO
Theorem 4.18. The function H × Ω → R defined by (h, x) → JCC (h, x) is bounded and Lipschitz continuous in H, uniformly in Ω. Proof. Indeed, by definition we have −1 ≤ JCC (h, x) ≤ 0 for all h ∈ H. Moreover, we have JCC (h, x) = −
(4.10)
v1,2 (h, x)2 , v1 (x) v2 (h, x)
and the following properties hold true: • v1,2 (h, x) is bounded and Lipschitz-continuous in H uniformly in Ω (Proposition 4.17). • v2 (h, x) is strictly positive, bounded, and Lipschitz-continuous in H, uniformly in Ω (Lemma 4.16). • v1 (x) is bounded and > 0 (Lemma 4.14). We may therefore apply Proposition 4.1. Theorem 4.19. The function H2 × Ω → R defined by (z, h, x) → LCC (z, h, x) is bounded and Lipschitz continuous in H2 , uniformly in Ω. Proof. We have v1,2 (h, x) z1 − µ1 (x) 2 z2 − µ2 (h, x) LCC (z, h, x) = − + JCC (h, x) . Gγ (x) v2 (h, x) v1 (x) v2 (h, x) This can be rewritten as LCC (z, h, x) = f1 (h, x) z1 + f2 (h, x) z2 + f3 (h, x). The functions H × Ω → R, f1 , f2 , and f3 , are bounded and Lipschitz continuous in H uniformly in Ω because of Lemmas 4.13, 4.14, 4.15, 4.16, Proposition 4.17, and Theorem 4.18. The result readily follows. Theorem 4.20. The function fCC : H2 × Ω → R, defined by fCC (z, h, x) = Gγ LCC (z, h, x), is bounded and Lipschitz continuous in H2 uniformly in Ω. Proof. The fact that it is bounded follows from Theorem 4.19. To obtain the Lipschitz continuity, we write |f (z, h, x) − fCC (z , h , x)| CC = Gγ (x − x0 ) (LCC (z, h, x0 ) − LCC (z , h , x0 )) dx0 Ω ≤ Gγ (0) |Ω| Lip(LCC ) |z − z | + h − h H and |Gγ f (z, h, x) − Gγ f (z, h, y)|
≤ |Gγ (x − x0 ) − Gγ (y − x0 )||f (z, h, x0 )| dx0 ≤ Bf Lip(Gγ )|Ω| |x − y|, Ω
and hence obtain the result.
NONRIGID MULTIMODAL IMAGE REGISTRATION METHODS
25
Theorem 4.21. The function FCC : H → H defined by FCC (h) = −fCC (I1 (Id), I2 (Id + h), h, Id)∇I2 (Id + h) is Lipschitz continuous and bounded. Proof. Since fCC is bounded (Theorem 4.20) as well as ∇I2 , so is FCC , which therefore belongs to H. For the Lipschitz continuity, we write FCC (h1 )−FCC (h2 ) = fCC (I1 (Id), I2 (Id+h1 ), h1 , Id) (∇I2 (Id+h2 )−∇I2 (Id+h1 )) +∇I2 (Id + h2 ) (fCC (I1 (Id), I2 (Id + h2 ), h2 , Id) − fCC (I1 (Id), I2 (Id + h1 ), h1 , Id)) . The Lipschitz continuity follows from that of ∇I2 , of LCC (Theorem 4.20), and the use of the Cauchy–Schwarz inequality as in the proof of Theorem 4.11. Proposition 4.22. The function Ω → Rn such that x → FCC (h)(x) satisfies |FCC (h)(x) − FCC (h)(y)| ≤ K(|x − y| + |h(x) − h(y)|), for some constant K > 0. Proof. The proof is similar to that of Proposition 4.12; it follows from Theorem 4.20 and the fact that the functions I1 , I2 and all its derivatives are Lipschitz continuous. 5. Numerical implementation. The numerical implementation of the previously described continuous matching flows involves estimating the matching term, which depends on one of the two intensity-comparison functions FMI and FCC and the regularization operator, which is div(TI1 Dh). For the discretization in time, we adopt an explicit forward scheme. Implicit schemes are difficult to devise due to the high nonlinearity of the matching functions. The way in which the time step is chosen is discussed in the following section. Alvarez, Weickert, and S´ anchez [2] propose a very efficient scheme for discretizing the Nagel–Enkelmann divergence operator which we adopt in our experiments. We use a schematic notation for the description of the i,j finite-difference schemes. For instance, let us denote by hi,j p and Lp the components (p = 1, 2) of h and its Laplacian ∆h at the grid point (i, j) in the discrete image domain. The voxel size in all directions is assumed to be equal to one. A possible scheme for α∆h is i+1,j i−1,j i,j−1 i,j+1 i,j (5.1) , = α h + h + h + h − 4 h Li,j p p p p p p which we write schematically as 1
Li,j p
=α∗
1
−4
1
1 hp .
In this notation, the tables represent the discrete grid and contain the weights associated with each pixel (zero if none). The function to which the grid corresponds is written at the bottom. The scheme proposed in [2] is the following. Let A = div(TI1 Dh), where a b TI1 = . b c
26
O. FAUGERAS AND G. HERMOSILLO
Then, for p = 1, 2,
Ai,j p =
1 2
1
∗
1
−1
∗
1 2
∗
+
1 2
1
∗
1
hp
a
+
1
1
1
1
−1
∗
c
hp
1
−1
1 4
∗
1
∗
1 2
+
hp
1
∗
1 4
∗
1 4
+
1
∗
−1
∗
hp
b
1
−
1 hp
1
hp
1
−1
∗
1 c
1
b
−1
a
1
+
1
∗
1 −1
∗
−
1 4
1
∗
1
hp
b
−1
∗
1 hp
b
The 3D case. This scheme generalizes readily to the 3D case. In order to write the 3D scheme explicitly in a compact way, we take advantage of its very simple form and introduce a more compact notation. We write
S 21 (a, x+ ) ≡
1
1 ∗ 2
1
∗
−1
1
,
hp
a
where x+ indicates the direction defined by the voxels with non-null weights, starting at the center. With this notation, we write the 2D Nagel–Enkelmann operator above as + − + − Ai,j p = S 12 (a, x ) + S 12 (a, x ) + S 12 (c, y ) + S 12 (c, y ) + S 14 (b, x+ y + ) + S 14 (b, x− y − ) − S 14 (b, x+ y − ) − S 14 (b, x− y + ).
In the 3D case, we have
TI1
a b = b d c e
c e , f
and the corresponding scheme is, for p = 1, 2, 3, = S 12 (a, x+ ) + S 21 (a, x− ) + S 12 (d, y + ) + S 12 (d, y − ) + S 12 (f, z + ) Ai,j,k p + S 12 (f, z − ) + S 14 (b, x+ y + ) + S 14 (b, x− y − ) − S 14 (b, x+ y − ) − S 14 (b, x− y + ) + S 14 (c, x+ z + ) + S 14 (c, x− z − ) − S 14 (c, x+ z − ) − S 14 (c, x− z + ) + S 14 (e, y + z + ) + S 14 (e, y − z − ) − S 14 (e, y + z − ) − S 14 (e, y − z + ).
.
NONRIGID MULTIMODAL IMAGE REGISTRATION METHODS
27
5.1. Intensity comparison functions. Concerning the intensity comparison functions, we approximate convolutions with a Gaussian kernel by recursive filtering using the smoothing operator introduced in [17]. Terms of the form ∇I2 (x+h(x)) and I2 (x + h(x)) are calculated by trilinear interpolation. The global function FMI is estimated by explicitly computing the global density estimate P (i, h) through recursive smoothing of the discrete joint histogram of intensities, as detailed in section 5.1.2. For the local function FCC , a special implementation has been developed, as detailed in section 5.1.4. 5.1.1. Convolutions. The convolutions by a Gaussian kernel are approximated by recursive filtering using the smoothing operator introduced by Deriche [17]. Given a discrete 1D input sequence x(n), n = 1, . . . , M , its convolution by the smoothing operator Sα (n) = k (α|n| + 1) e−α|n| is calculated efficiently as (see [17]) y(n) = (Sα x)(n) = y1 (n) + y2 (n), where y1 (n) = k x(n) + e−α (α − 1) x(n − 1) + 2 e−α y1 (n − 1) − e−2α y1 (n − 2), y2 (n) = k e−α (α + 1) x(n + 1) − e−2α x(n + 2) + 2 e−α y2 (n + 1) − e−2α y2 (n + 2). The normalization constant k is chosen by requiring that R Sα (t) dt = 1, which yields k = α/4. This scheme is very efficient since the number of operations required is independent of the smoothing parameter α. The smoothing filtercan be readily n generalized to n dimensions by defining the separable filter Tα (x) = i=1 Sα (xi ). 5.1.2. Density estimation. Parzen density estimates are obtained by smoothing the discrete joint histogram of intensities. We define the piecewise constant function v : Ω → [0, N ]2 ⊂ N2 by quantification of Ih (x) into N + 1 intensity levels (bins): (0, 0)T on Ω0,0 , ζI1 (x) . . v(x) = = . ζI2 (x + h(x)) on ΩN,N , (N, N )T where ζ = N/A, · denotes the floor operator in R+ , i.e., the function R+ → N such that x = max{n ∈ N : n ≤ x} and {Ωk,l }(k,l)∈[0,N ]2 is a partition of Ω. We then compute, setting β = ζ 2 β,
1 ζ P (i, h)= Gβ (Ih (x) − i) dx = Gβ ζ (Ih (x) − i) dx |Ω| Ω |Ω| Ω
N N
ζ ζ Gβ v(x) − ζi dx = Gβ (k − ζi1 , l − ζi2 ) dx |Ω| Ω |Ω| Ωk,l k=0 l=0
N
N
=ζ k=0 l=0
|Ωk,l |/|Ω| Gβ (k − ζi1 , l − ζi2 ) = ζ H Gβ (ζi), H(k,l)
28
O. FAUGERAS AND G. HERMOSILLO
H being the discrete joint histogram. The convolution is performed by recursive filtering, as described in the previous section. Note that this way of computing P is quite efficient since only one pass on the images is required, followed by the convolution. 5.1.3. Implementation of the computation of FMI . The global mutual information function is then estimated as follows: • Estimate P (i, h) and its marginals. • Estimate LMI (i, h) (equation (3.4)) using centered finite-differences for the derivatives. • Estimate fMI (i, h) = Gβ LMI (i, h) (equation (3.3)) by recursive smoothing. • Estimate FMI (h)(x) = −fMI (Ih (x), h) ∇I2 (x + h(x)) (equation (3.8)). 5.1.4. Implementation of the computation of FCC . The function fCC is estimated as LCC (i, x) = (Gγ f1 )(x) i1 + (Gγ f2 )(x) i2 + (Gγ f3 )(x), where f1 (x) = −2 v1,2 (h, x)/(Gγ (x) v1 (x) v2 (h, x)), f2 (x) = −2 JCC (h, x)/(Gγ (x) v2 (h, x)), and
f3 (x) = − f1 (x) µ1 (x) + f2 (x) µ2 (h, x) .
All the required space dependent quantities like µ1 (x) are computed through recursive spatial smoothing. This algorithm is similar to the one proposed in [11]. 5.2. Parameters. We now discuss the way in which the different parameters of the algorithms are determined. • γ: This is the variance of the spatial Gaussian for local density estimates. Its value does not affect the computation time since the local statistics are calculated using the recursive smoothing filter. Thanks to this, we have conducted some experiments with different values of this parameter, which have shown that the algorithms are not very sensitive to it. Qualitatively speaking, the local window has to be large enough for the statistics to be significant and small enough to account for nonstationarities of the density. We set γ to 5 in all our experiments. • β: This is the variance of the Gaussian for the Parzen estimates. Unlike γ, determining a good value for β is important for obtaining good results. In our case, it is determined automatically as follows (we refer to [6] for a recent comprehensive study on nonparametric density estimation). We adopt a cross-validation technique based on an empirical maximum likelihood method. We denote by {ik } a set of m intensity pair samples (k = 1, . . . , m) and take the value of β which maximizes the empirical likelihood: L(β) =
m !
Pˆβ,k (ik ),
k=1
where Pˆβ,k (ik ) =
1 m − nk
Gβ (ik − is ) {s: is =ik }
NONRIGID MULTIMODAL IMAGE REGISTRATION METHODS
29
and nk is the number of data samples for which is = ik . For the experiments shown in this paper, the optimal value of β varies in the range 15–20. • κ: This parameter determines the coefficient of the regularization term in the energy functional. Since the range of the different matching functions F varies considerably, we normalize it by dividing by the maximum value κ0 = F (h0 )∞ , h0 being the initial field. This is equivalent to replacing κ by C such that C = κ κ0 . The behavior of the algorithms is much more stable with respect to C than to κ. κ has been kept to the fixed value of 10 in all our experiments except the one with the Eiffel tower, where the expected displacement is close to a translation, a very smooth transformation indeed. In this experiment the value of κ has been fixed to 100. • ∆t: The time step is chosen such that the coefficient of the regularization term (i.e., C∆t) is less than a specified value. It is known for instance that the scheme (5.1) is stable for values of C∆t no larger than 0.25. In our experiments, we fix C∆t to a value of 0.1. The resulting ∆t is different at each resolution due to the fact that C is recomputed at the beginning of each level. In some of the examples, we have computed the value of ∆t by doing a line-search in the gradient direction, looking for the minimum of the criterion. At each resolution, the line-search consistently produces values which are asymptotically (a) almost constant for that particular resolution and (b) close to a value such that C∆t is approximately 0.15–0.2. This is true only asymptotically for each resolution. The first optimal steps are much larger, and this has suggested to us to use the line-search in conjunction with more sophisticated minimization algorithms such as the conjugate gradient. We discuss this point in one of the examples of section 6. • σ: This is the scale parameter. We adopt a multiresolution approach (see the next section), smoothing the images at each stage by a small amount. Within each stage of the multiresolution pyramid, the parameter σ is fixed to a small value, typically 0.5 voxels/pixels. One extra parameter is needed for the regularization operator. • λ: This is the parameter controlling the anisotropic behavior of the Nagel– Enkelmann tensor. We adopt the method proposed by Alvarez, Weickert, and S´ anchez [2]. Given q, which in practice is fixed to 0.1, we take the value of λ such that
λ q= H|∇I1 | (z) dz, 0
where H|∇I1 | (z) is the normalized histogram of |∇I1 |. 6. Experimental results. We present results of experiments using the previously described algorithms. To recover large deformations, we use a multiresolution approach by applying the gradient descent to a set of smoothed and subsampled images. Since the functionals considered are not convex, this coarse-to-fine strategy helps avoid irrelevant extrema while reducing the computational cost of the algorithms. The first stage of the multiresolution pyramid is initialized with h0 = 0. The field resulting from the computation at a coarse level is multiplied by two and resampled at the
30
O. FAUGERAS AND G. HERMOSILLO x-component h∗ 0 = h0 (N0 dt)
h0 (0) = 0 32x32
32x32
h1 (0) = h∗ 0 64x64
h2 (0) = h∗ 1 128x128
h3 (0) = h∗ 2
256x256
y-component
h∗ 1 = h1 (N1 dt) 64x64
h∗ 2 = h2 (N2 dt) 128x128
∗ h∗ 3 = h3 (N3 dt) ≡ h
256x256
Fig. 3. Multiresolution strategy: the first stage of the multiresolution pyramid is initialized with h0 = 0, and the resulting field from a coarse level is scaled by two and resampled at the next finer scale (by (tri)-linear interpolation) at each step of the pyramid descent where it is used as the initial value of the corresponding evolution equation. The left column shows the pyramid; the right shows the computed deformation field at each level. This example illustrates the result of the experiment in Figure 4.
finer scale (by (tri)-linear interpolation) at each step of the pyramid descent, where it is used as the initial value of the corresponding evolution equation. The number of iterations at each level is automatically determined based on the reduction of the criterion. Iterations at each level are stopped when the criterion changes by an amount less than a specified threshold. The number of resolution levels in the pyramid is chosen so that at the coarsest level the image has approximately n pixels (voxels) in its smallest dimension, n being manually fixed to a small value such as 8, 16, or 32. (see Figure 3). Experiment with synthetic data (Figure 4). In this example, we use the mutual information criterion to compute the displacement field between two synthetic images. Despite its simplicity, this example illustrates the main difficulties of the problem we consider: the distortion is nonrigid, and we are unable to directly compare intensities. Besides, this particular example is especially difficult since the discrete histogram (i.e., before Gaussian smoothing) contains only six nonzero entries, which underlines the importance of the Parzen-window regularization. T2-PD registration (Figure 5). We use the mutual information criterion to realign two slices from a proton density (PD) and a T2-weighted magnetic resonance image (MRI) volume (same patient). A good introduction to medical image acquisition can be found in [8, 40]. An artificial geometric distortion has been applied to the
31
NONRIGID MULTIMODAL IMAGE REGISTRATION METHODS
Fig. 4. Mutual information criterion with a synthetic example. From left to right: I1 (256 × 256), I2 , I2 (Id + h∗ ), and a plot of the estimated optimal displacement field h∗ .
1.3
12 20
0
20
40
10
60
40
−1
60
1.2
40
1.1
60
1
80
0.9
100
0.8
−2
8 80
20
80 −3
100
6
120
100 −4
120 4
140
140
−5
0.7
120
0.6
140
0.5
160
160
160
−6
2 180
180
0.4 180 0.3
−7 200 20
40
60
80
100
120
140
160
180
200
0
200
200
20
40
60
80
100
120
140
160
180
200
20
40
60
80
100
120
140
160
180
200
Fig. 5. PD/T2 MRI registration using mutual information. First row, from left to right: I1 , I2 , I2 (Id + h∗ ), and the contours of I2 (Id + h∗ ) superimposed on I1 . The second row shows, from left to right: an arrow-plot of h∗ (one every three pixels shown), the dense x and y components of h∗ , and the determinant of the Jacobian of the transformation Id + h∗ .
original preregistered dataset. In order to evaluate the accuracy of the realignment, we superimposed some contours of the T2 image (initial and recovered pose) over the reference image (PD). It gives a good qualitative indication of the quality of the registration. Most of the anatomical structures seem correctly realigned. MRI-fMRI registration (Figure 6). This example shows an experiment with MR data of the brain of a macaque monkey. The reference image is a T1-weighted anatomical volume, and the image to register is a functional, mion contrast, MRI (fMRI). The contrast in this modality is related to blood oxygenation level. This registration was obtained using mutual information. Each image in this figure shows the intersection of the image volume with three orthogonal planes. The crosses in each plane image show a point of interest. The observation of these points shows that the correspondence has been improved by the registration process. Matching of anatomical versus diffusion-tensor–related MRI (Figure 7). Our next experiment shows the results of an experiment with real 3D MR data of a human brain. This registration was also obtained using mutual information. The reference image is a T1-weighted anatomical MRI of a human brain. The target image is a T2-weighted anatomical MRI from the same patient, which was acquired as part of the process of obtaining an image of the water diffusion tensor at each voxel (diffusion MR). Notice that the intensities in this modality are related in a noninvertible
32
O. FAUGERAS AND G. HERMOSILLO
Fig. 6. MRI-fMRI registration using the mutual information. The first and second images in each row show the reference anatomical MRI and the initial fMRI, respectively. The third image shows the registered fMRI. The two rows show two different points of interest in the volume.
Fig. 7. Matching of anatomical vs. diffusion-tensor-related MRI, using mutual information (see text).
33
NONRIGID MULTIMODAL IMAGE REGISTRATION METHODS
2 5
6
1.8 50
50
50
4 1.6 0 2
100
100
100
1.4
0
1.2
−5 150
150
150
−2
200
−4
1
−10
200
0.8
200
0.6
−6 250
250
−15
250 0.4
−8
20
40
60
80
100 120 140 160 180 200 220
20
40
60
80
100 120 140 160 180 200 220
20
40
60
80
100 120 140 160 180 200 220
Fig. 8. Human face template matching. First row from left to right: I1 with some reference points marked, I2 , I2 (Id + h∗ ), and I2 with the corresponding reference points according to the estimated displacement field h∗ . The second row shows, from left to right: an arrow-plot of h∗ (one every five pixels shown), its dense x and y components, and the determinant of the Jacobian of the transformation Id + h∗ . The local cross-covariance was used as similarity criterion.
way, i.e., the correspondence i1 → i2 is not a monotonous function, thereby justifying the use of mutual information. The estimated deformation field has a dominant y component, a property which is physically coherent with the applied gradient. The observation of the points pointed at by the crosses shows that the correspondence has been improved by the registration process. Face template matching (Figure 8). This experiment shows template matching of human faces. The different albedos of the two skins create a “multimodal” situation, and the transformation is truly nonrigid due to the different shapes of the noses and mouths. Notice the excellent matching of the different features. This result was obtained using local cross-covariance. The running time was approximately five minutes on a PC at 900MHz. With the correspondences, one can interpolate the displacement field and the texture to perform fully automatic morphing. The Eiffel tower example (Figure 9). This last experiment shows matching under varying illumination conditions. The Eiffel tower is taken at nearly one year distance under very different weather conditions. The result was obtained using local cross-covariance. Conjugate gradient minimization. The explicit time discretization using a fixed time step corresponds to a steepest descent method without line-search, which is generally quite inefficient. In this section, we present results obtained with a modified version of the algorithm, which performs line-searching and uses a Fletcher–Reeves conjugate gradient minimization routine as described in [46]. Figure 10 shows plots of the decrease of the functional for the face template matching experiment of Figure 8 using both the standard steepest descent and the modified conjugate gradient version. Plots (b) and (c) show the advantage of the multiresolution approach over the single
34
O. FAUGERAS AND G. HERMOSILLO
1.4 0 −2 50
50
50
1.2
−2 −4
100
−4
100
1
100
−6 −6 150
−8
150
0.8 150
−8 −10
200
200
0.6 200
−10 0.4 −12 250
250 20
40
60
80
100 120 140 160
20
40
60
80
100 120 140 160
−12
250 20
40
60
80
100 120 140 160
Fig. 9. Matching under varying illumination conditions. First row from left to right: I1 , I2 , I2 superimposed to I1 , and I2 (Id + h∗ ) superimposed to I1 . The second row shows, from left to right: an arrow-plot of h∗ (one every three pixels shown), its dense x and y components, and the determinant of the Jacobian of the transformation Id + h∗ . The local cross-covariance was used as similarity criterion.
resolution plot in (a), as a local minimum is avoided using a five-level multiresolution pyramid (the three coarsest levels are shown in (b), the finest two in (c)). Plots (d) and (e) show that the conjugate gradient method allows about one order of magnitude reduction in the total number of iterations required. In this example the gain in speed is much higher since the number of iterations at the finest level in (e) is very small, despite the fact that each iteration is slightly more costly. The result of Figure 8 is obtained in less than two seconds on a PC at 2.4GHz using the conjugate gradient version, instead of four minutes using the multiresolution approach. The same criterion was used to stop iterations in every case. 7. Conclusion. Image registration is best posed as an optimization problem defined over some functional space. The optimization functional is in general the sum of two terms, a data term and a regularization term. When one deals with different image modalities, the data term has to rely on image statistics rather than directly on the image intensities. This has the effect of making the data term a bit complicated. We have considered two such terms, a global and a local one. We have considered only one regularization term, but it is in effect quite representative. In order to prove that the registration problem is well posed, we have shown the existence of minimizers, computed the Euler–Lagrange equations induced by our functionals, and shown that the corresponding evolution functional equations had a unique and regular solution for a given initial condition. In order to prove this last point we have used well-known theorems in functional analysis. These theorems say that if the differential operator defined by the regularization term is strictly elliptic and the gradient of the data term is Lipschitz continuous,
35
NONRIGID MULTIMODAL IMAGE REGISTRATION METHODS
-0.31 ’Crit@1level’ -0.32
-0.33
-0.34
-0.35
-0.36
-0.37
-0.38
-0.39
-0.4 0
500
1000
1500
2000
2500
3000
3500
4000
4500
(a) -0.82
-0.4 ’Crit@5levels’
’Crit@5levels’
-0.84 -0.45 -0.86 -0.5 -0.88
-0.9
-0.55
-0.92 -0.6 -0.94 -0.65 -0.96
-0.98
-0.7 0
500
1000
1500
2000
2500
3000
(b)
3500
4000
4500
5000
(c) -0.4 ’Crit@5levels-LineSearch-CG’
’Crit@5levels-LineSearch-CG’
-0.84 -0.45 -0.86
-0.88
-0.5
-0.9 -0.55 -0.92 -0.6 -0.94 -0.65 -0.96
-0.98
-0.7 0
50
100
(d)
150
200
250
300
350
400
450
500
(e)
Fig. 10. Plots of the decrease of the functional for the face template matching experiment of Figure 8 using both the standard steepest descent and the modified conjugate gradient version. Plots (b) and (c) show the advantage of the multiresolution approach over the single resolution plot in (a), as a local minimum is avoided by using a five-level multiresolution pyramid (the three coarsest levels are shown in (b)). Plots (d) and (e) show that the conjugate gradient method allows about one order of magnitude reduction in the total number of iterations required. In this example the gain in speed is much higher since the number of iterations at the finest level in (e) is very small. The same criterion was used to stop iterations in every case.
36
O. FAUGERAS AND G. HERMOSILLO
then the functional evolution equation has a unique classical solution. We have proved that these hypotheses are satisfied in our two examples. Since they are “generic” in image registration, this suggests that most of these problems are well posed. Our numerical implementation for computing the solutions of the Euler–Lagrange equations has allowed us to demonstrate the power and the flexibility of the approach for solving a wide range of difficult image registration problems. REFERENCES `nchez. Dense disparity map estimation [1] L. Alvarez, R. Deriche, J. Weickert, and J. Sa respecting image discontinuities: A PDE and scale-space based approach, Int. J. Visual Communication and Image Representation, Special Issue on Partial Differential Equations in Image Processing, Computer Vision, and Computer Graphics, 2000. ´nchez, Reliable estimation of dense optical flow fields [2] L. Alvarez, J. Weickert, and J. Sa with large displacements, IJCV, 39 (2000), pp. 41–56. [3] G. Aubert, R. Deriche, and P. Kornprobst, Computing optical flow via variational techniques, SIAM J. Appl. Math., 60 (1999), pp. 156–182. [4] G. Aubert and P. Kornprobst, A mathematical study of the relaxed optical flow problem in the space BV (Ω), SIAM J. Math. Anal., 30 (1999), pp. 1282–1308. [5] G. Aubert and P. Kornprobst, Mathematical problems in image processing: Partial differential equations and the calculus of variations, Appl. Math. Sci., volume 147, Springer-Verlag, 2002. [6] D. Bosq, Nonparametric Statistics for Stochastic Processes, 2nd ed., Lecture Notes in Statistics 110, Springer-Verlag, New York, 1998. [7] H. Brezis, Analyse Fonctionnelle. Th´ eorie et Applications, Masson, Paris, 1983. [8] R. B. Buxton, Introduction to Functional Magnetic Resonance Imaging, Cambridge University Press, Cambridge, UK, 2002. [9] P. Cachier and N. Ayache, Regularization in Non-Rigid Registration: I. Trade-off between Smoothness and Intensity Similarity, Technical report 4188, INRIA, Rocquencart, France, 2001. [10] P. Cachier and N. Ayache, Regularization in Non-Rigid Registration: II. Isotropic Energies, Filters, and Splines, Technical report 4243, INRIA, Rocquencart, France, 2001. [11] P Cachier and X. Pennec, 3d non-rigid registration by gradient descent on a Gaussian weighted similarity measure using convolutions, in Proceedings of the IEEE Workshop on Mathematical Methods in Biomedical Image Analysis (MMBIA 2000), Hilton Head, SC, 2000, IEEE Press, Los Alamitos, CA, 20000, pp. 182–189. [12] C. Chefd’hotel, G. Hermosillo, and O. Faugeras, Flows of diffeomorphisms for multimodal image registration, in Proceedings of the International Symposium on Biomedical Imaging, IEEE, Los Alamitos, CA, 2002. [13] G. Christensen, M. I. Miller, and M. W. Vannier, A 3d deformable magnetic resonance textbook based on elasticity, in Proceedings of the American Association for Artificial Intelligence Symposium: Applications of Computer Vision in Medical Image Processing, 1994. [14] U. Clarenz, S. Henn, M. Rumpf, and K. Witsch, Relations between optimization and gradient flow methods with applications to image registration, in Multigrid and Related Methods for Optimization Problems, W. Hackbusch and M. Griebel, eds., Proceedings of the GAMM seminar, 2002. [15] R. Courant, Calculus of Variations, New York, 1946. [16] R. Dautray and J.-L. Lions, Mathematical Analysis and Numerical Methods for Science and Technology, 6 volumes, Springer, New York, 2000. [17] R. Deriche, Fast algorithms for low-level vision, IEEE Trans. Pattern Analysis and Machine Intelligence, 1 (1990), pp. 78–88. [18] R. Deriche, P. Kornprobst, and G. Aubert, Optical flow estimation while preserving its discontinuities: A variational approach, in Proceedings of the 2nd Asian Conference on Computer Vision, Singapore, 1995, Vol. 2, pp. 71–80. [19] L. C. Evans, Partial Differential Equations, Graduate Studies in Math. 19, 1998. ´ville, Z. Zhang, P. Fua, E. The ´ron, L. Moll, [20] O. Faugeras, B. Hotz, H. Mathieu, T. Vie G. Berry, J. Vuillemin, P. Bertin, and C. Proy, Real Time Correlation Based Stereo: Algorithm Implementations and Applications, Technical report 2013, INRIA, SophiaAntipolis, France, 1993. [21] O. Faugeras and R. Keriven, Variational principles, surface evolution, PDEs, level set meth-
NONRIGID MULTIMODAL IMAGE REGISTRATION METHODS
37
ods and the stereo problem, IEEE Trans. Image Process., 7 (1998), pp. 336–344. [22] T. Gaens, F. Maes, D. Vandermeulen, and P. Suetens, Non-rigid multimodal image registration using mutual information, in Proceedings of the Comput. Sci. First International Conference on Medical Image Computing and Computer-Assisted Intervention, J. van Leeuwen, G. Goos, and J. Hartmanis, eds., Lecture Notes in Comput. Sci. 1496, Springer, New York, 1998. [23] D. Gilbarg and N. S. Trudinger, Elliptic Partial Differential Equations of Second Order, 2nd ed., Grundlehren Math. Wiss. 8, Springer-Verlag, New York, 1983. [24] N. Hata, T. Dohi, S. Warfield, W. Wells, III, R. Kikinis, and F. A. Jolesz, Multimodality deformable registration of pre-and intra-operative images for MRI-guided brain surgery, in Proceedings of the Comput. Sci. First International Conference on Medical Image Computing and Computer-Assisted Intervention, J. van Leeuwen, G. Goos, and J. Hartmanis, eds., Lecture Notes in Comput. Sci. 1496, Springer, New York, 1998. [25] G. Hermosillo, Variational Methods for Multimodal Image Matching, Ph.D. thesis, INRIA, Rocquencart, France, ftp://ftp-sop.inria.fr/robotvis/html/Papers/hermosillo:02.ps.gz, 2002. [26] D. Hill, Combination of 3D Medical Images from Multiple Modalities, Ph.D. thesis, University of London, London, 1993. ¨ rr, and J. Weickert, Analysis of optical flow [27] W. Hinterberger, O. Scherzer, C. Schno models in the framework of calculus of variations, Numer. Funct. Anal. Optim., 23 (2002), pp. 69–89. [28] B. K. Horn and B. G. Schunk, Determining optical flow, Artificial Intelligence, 17 (1981), pp. 185–203. [29] T. Kanade and M. Okutomi, A stereo matching algorithm with an adaptive window: Theory and experiment, IEEE Trans. Pattern Anal. and Machine Intelligence, 16 (1994), pp. 920– 932. [30] J. J. Koenderink and A. J. van Doorn, Blur and disorder, in Proceedings of the ScaleSpace Theories in Computer Vision, Second International Conference (Scale-Space’99), M. Nielsen, P. Johansen, O. F. Olsen, and J. Weickert, eds., Lecture Notes in Comput. Sci. 1682, Springer, 199,9 pp. 1–9. [31] M. E. Leventon and W. E. L. Grimson, Multi-modal volume registration using joint intensity distributions, in Medical Image Computing and Computer-Assisted InterventionMICCAI’98, W. M. Wells, A. Colchester, and S. Delp, eds., Lecture Notes in Comput. Sci. 1496, Springer-Verlag, Cambridge, MA, 1998. [32] F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, and P. Suetens, Multimodality image registration by maximization of mutual information, IEEE Trans. Medical Imaging, 16 (1997), pp. 187–198. [33] J. B. A. Maintz, H. W. Meijering, and M. A. Viergever, General multimodal elastic registration based on mutual information, in Medical Imaging 1998, Image Processing vol. 3338, SPIE, Bellingham, WA, 1998, pp. 144–154. ´min and P. Pe ´rez, A multigrid approach for hierarchical motion estimation, in Proceed[34] E. Me ings of the 6th International Conference on Computer Vision, Bombay, India, 1998, IEEE Computer Society Press, Los Alamitos, CA, pp. 933–938. ´min and P. Pe ´rez, Dense/parametric estimation of fluid flows, in Proceedings of the [35] E. Me IEEE International Conference on Image Processing (ICIP’99), Kobe, Japan, 1999, IEEE Press, Los Alamitos, CA. [36] C. Meyer, J. Boes, B. Kim, and P. Bland, Evaluation of control point selection in automatic, mutual information driven, 3d warping, in Proceedings of the First International Conference on Medical Image Computing and Computer-Assisted Intervention, , J. van Leeuwen, G. Goos, and J. Hartmanis, eds., Lecture Notes in Comput. Sci. 1496, 1998. [37] M. Miller and L. Younes, Group actions, homeomorphisms, and matching: A general framework, Int. J. Computer Vision, 41 (2001), pp. 61–84. [38] H. H. Nagel and W. Enkelmann, An investigation of smoothness constraints for the estimation of displacement vector fields from image sequences, IEEE Trans. Pattern Anal. and Machine Intelligence, 8 (1986), pp. 565–593. [39] T. Netsch, P. Rosch, A. van Muiswinkel, and J. Weese, Towards real-time multi-modality 3d medical image registration, in Proceedings of the 8th International Conference on Computer Vision, Vancouver, Canada, IEEE Computer Society Press, Los Alamitos, 2001. [40] W. W. Orrison, J. D. Lewine, J. A. Sanders, and M. F. Hartshorne, Functional Brain Imaging, Mosby-Year Book, 1995. [41] S. Ourselin, A. Roche, S. Prima, and N. Ayache, Block matching: A general framework to improve robustness of rigid registration of medical images, in Medical Image Computing
38
[42] [43] [44]
[45]
[46] [47]
[48]
[49]
[50]
[51] [52] [53] [54] [55] [56] [57] [58]
[59] [60]
O. FAUGERAS AND G. HERMOSILLO and Computer-Assisted Intervention-MICCAI 2000, Lecture Notes in Comput. Sci. 1935, Springer-Verlag, New York, 2000. E. Parzen, On the estimation of probability density function, Ann. Math. Statist., 33 (1962), pp. 1065–1076. A. Pazy, Semigroups of Linear Operators and Applications to Partial Differential Equations, Springer-Verlag, New York, 1983. X. Pennec, P. Cachier, and N. Ayache, Understanding the “demon’s algorithm”: 3d nonrigid registration by gradient descent, in Proceedings of the Second International Conference on Medical Image Computing and Computer-Assisted Intervention, Lecture Notes in Comput. Sci. 1679, Springer, 1999, pp. 597–605. G. Penney, J. Weese, J. A. Little, P. Desmedt, D. L. G. Hill, and D. J. Hawkes, A comparison of similarity measures for use in 2d-3d medical image registration, in Proceedings of the First International Conference on Medical Image Computing and Computer-Assisted Intervention, J. van Leeuwen, G. Goos, and J. Hartmanis, eds., Lecture Notes in Comput. Sci. 1496, Springer-Verlag, New York, 1998. W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes in C, Cambridge University Press, Cambridge, UK, 1988. M. Proesmans, L. Van Gool, E. Pauwels, and A. Oosterlinck, Determination of optical flow and its discontinuities using non-linear diffusion, in Proceedings of the 3rd ECCV, II, Lecture Notes in Comput. Sci. 801, Springer-Verlag, New York, 1994, pp. 295–304. A. Roche, A. Guimond, J. Meunier, and N. Ayache, Multimodal elastic matching of brain images, in Proceedings of the 6th European Conference on Computer Vision, Dublin, Ireland, 2000. A. Roche, G. Malandain, X. Pennec, and N. Ayache, The correlation ratio as new similarity metric for multimodal image registration, in Medical Image Computing and ComputerAssisted Intervention-MICCAI’98, W. M. Wells, A. Colchester, and S. Delp, eds., Lecture Notes in Comput. Sci. 1496, Springer, Cambridge, MA, 1998, pp. 1115–1124. ¨ckert, C. Hayes, C. Studholme, P. Summers, M. Leach, and D. J. Hawkes, NonD. Ru rigid registration of breast MR images using mutual information, in Medical Image Computing and Computer-Assisted Intervention-MICCAI’98, W. M. Wells, A. Colchester, and S. Delp, eds., Lecture Notes in Comput. Sci. 1496, Springer, Cambridge, MA, 1998. D. Scharstein and R. Szeliski, Stereo matching with nonlinear diffusion, Int. J. Computer Vision, 28 (1998), pp. 155–174. H. Tanabe, Equations of Evolution, Pitman, Boston, 1975. J. P. Thirion, Image matching as a diffusion process: An analogy with Maxwell’s demons, Medical Image Analysis, 2 (1998), pp. 243–260. ´, Diffeomorphisms groups and pattern matching in image analysis, Int. J. Computer A. Trouve Vision, 28 (1998), pp. 213–221. P. Viola, Alignment by Maximization of Mutual Information, Ph.D. thesis, MIT, Canbridge, MA, 1995. P. Viola and W. M. Wells III, Alignment by maximization of mutual information, Int. J. Computer Vision, 24 (1997), pp. 137–154. ¨ rr, A theoretical framework for convex regularizers in PDE-based J. Weickert and C. Schno computation of image motion, Int. J. Computer Vision, 45 (2001), pp. 245–264. W. M. Wells, III, P. Viola, H. Atsumi, S. Nakajima, and R. Kikinis, Multi-modal volume registration by maximization of mutual information, Medical Image Anal., 1 (1996), pp. 35– 51. R. P. Woods, J. C. Maziotta, and S. R. Cherry, MRI-PET registration with automated algorithm, J. Computer Assisted Tomography, 17 (1993), pp. 536–546. Z. Zhang, R. Deriche, O. Faugeras, and Q. T. Luong, A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry, Artificial Intelligence Journal, 78 (1995), pp. 87–119.