Multiscale fundamental forms: a multimodal image wavelet representation P. Scheunders Vision Lab, Department of Physics, University of Antwerp, Groenenborgerlaan 171, 2020 Antwerpen, Belgium Tel.: +32/3/218 04 39 Fax: +32/3/218 03 18 Email:
[email protected]
Abstract
in the image gradient. A nice way of describing multimodal edges is given in [2]. Here, the images ”first fundamental form”, a quadratic form, is defined for each image point. Based on this definition, in [1], a colour edge detection algorithm was described and a colour image anisotropic diffusion algorithm was described in [7]. In single-valued images, multiresolution techniques are used to describe edges. The wavelet transform e.g. is successfully applied to compression, noise-reduction, enhancement, classification and segmentation of greylevel images. However, when applied to multimodal images, it is applied to each band separately. In this paper, a new multimodal image wavelet representation is presented. This representation allows for a multiscale edge description of multimodal images. The idea for the representation is based on the first fundamental form of [2] and the dyadic wavelet representation of Mallat, presented in [5]. To demonstrate the use of the presented representation, I will describe two applications of multimodal image processing and analysis. First, a technique for multispectral image fusion will be elaborated. Finally, based on the work of [7], a colour anisotropic diffusion filter will be constructed. The developed techniques will be compared to the singlevalued and/or the single scale versions of the algorithms.
In this paper, a new wavelet representation for multimodal images is presented. The idea for this representation is based on the first fundamental form that provides a local measure for the contrast of a multimodal image. In this paper, this concept is extended towards multiscale fundamental forms using the dyadic wavelet transform of Mallat. The multiscale fundamental forms provide a local measure for the contrast of a multimodal image at different scales. The representation allows for a multiscale edge description of multimodal images. Two applications are presented: multispectral image fusion and colour image noise filtering. In an experimental section, the presented techniques are compared to single valued and/or single scale algorithms that were previously described in the literature. The techniques, based on the new representation are demonstrated to outperform the others.
1. Introduction With the evolution of imaging technology, an increasing number of image modalities becomes available. In remote sensing, sensors are used that generate a number of multispectral bands. In medical imagery, distinct image modalities reveal different features of the internal body. A lot of attention has been devoted to the classification and/or segmentation of multimodal data. Other image processing procedures include noise filtering and image enhancement. It is obvious that all these image processing and analysis techniques would benefit from the combined use of the different bands. Nevertheless, in most cases singlevalued processing and analysis techniques are applied to each of the bands separately. The results for each component are then combined in a usually heuristic manner. A large part of image processing and analysis techniques makes use of the image edge information, that is contained
2. Multimodal edge representation using the first fundamental form For the derivation of the first fundamental form, we will follow [2]. Let I(x; y ) be a multimodal image with components In (x; y ); n = 1; :::N . The value of I at a given point is a N -dimensional vector. To describe the gradient information of I, let us look at the differential of I. In a Euclidean space: dI =
1
@I @I dx + dy @x @y
(1)
and its squared norm is given by (sums are over all bands of the image):
(dI)2 =
1 @ I 2 @ I @ I dx @x @x @y 2 A @ = @I @I @I dy @x @y @y 0 P 1 T @In 2 P @In @In dx dx @x @x @y @ P A 2 @In @In P @In dy dy @x @y @y
dx dy
T
0
(2) This quadratic form is called the first fundamental form. It reflects the change in a multimodal image. The direction of maximal and minimal change are given by the eigenvectors of the 2 2 matrix. The corresponding eigenvalues denote the rates of change. For a greylevel image (N = 1), it is easily calculated that the largest eigenvalue is given by 1 = krI k2 , i.e. the squared gradient magnitude. The corresponding eigenvector lies in the direction of maximal gradient. The other eigenvalue equals zero. For a multimodal image, the eigenvectors and eigenvalues describe an ellips in the image plane. When 1 2 , the gradients of all bands are more or less in the same direction. When 2 ' 1 , there is no preferential direction. The conjecture is that the multimodal edge information is reflected by the eigenvectors and eigenvalues of the first fundamental form. A particular problem that occurs is that the diagonalization does not uniquely specify the sign of the eigenvectors. This has been extensively studied in [1]. There, it was proven that the eigenvectors can be uniquely oriented in simply connected regions where 2 6= 1 . Based on this, an algorithm was proposed to orient the eigenvectors, keeping the angle continuous in local regions.
3. The dyadic wavelet transform The wavelet transform employed in this work is based on non-orthogonal (redundant) discrete wavelet frames introduced by Mallat [5] . Let (x; y ) be a 2-D smoothing function. Supposing is differentiable, define
1 (x; y) =
@(x; y ) and @x
2 (x; y) =
@(x; y ) @y
(3)
The wavelet transform of a greylevel image I (x; y ) is then defined by: Ds1 (x; y ) = I s1 (x; y ) and Ds2 (x; y ) = I s2 (x; y ) (4)
where denotes the convolution operator and
1
1
s (x; y ) = s2
1 ( x ; y ) and 2 (x; y) = 1 2 ( x ; y ) s s s s2 s s
(5)
denote the dilations of the functions i . s is the scale parameter which commonly is set equal to 2 j with j = 1; :::; d. This yields the so called dyadic wavelet transform of depth d. D21j and D22j are referred to as the detail images, since they contain horizontal and vertical details of I at scale j . Substitution of (3) and (5) in (4) yields the following interesting property:
D21j D22j
= 2j
@ @x (I @ @y (I
2j ) 2j )
= 2j r(I
2j )
(6)
This stipulates that the wavelet transform of a greylevel image consists of the components of the gradient of the image, smoothed by the dilated smoothing function 2j .
4. The multiscale fundamental form Based on (6), for multimodal images a fundamental form can be constructed at each scale. Similar to (2), and applying (6), the squared norm of the differential of (I 2j )(x; y ) is given by:
(d(I 2j ))2 = 2 2j
dx dy
T
12 G11 212j G222j G2j G2j
dx dy
(7) PN k Dl 1 j and D 2 j are = D and D where Gkl j j j n=1 n;2 n;2 2 n;2 n;2 the j -th scale detail coefficients of the n-th band image. This quadratic form will be referred to as the j -th scale fundamental form. It reflects the change in the j -th scale smoothed image and therefore the edge information at the j -th scale. The direction of maximal and minimal change are given by the eigenvectors v 2+j and v2j of the 2 2 matrices. The corresponding eigenvalues + 2j and 2j denote the rates of change. The eigenvectors and eigenvalues describe an ellips in the image plane, where the longest axis denotes the direction of the largest gradient at scale j and the shortest axis the variance of gradient at scale j around that direction. For a greylevel image, one obtains
2j (D1j )2 + (D2j )2 = kr(I 2j )k2 2 2 j r ( I ) 2 v2+j = (8) kr(I 2j )k + 2j
= 2
i.e. the first eigenvector denotes the direction of the gradient of the j-th scale smoothed image, while its corresponding eigenvalue denotes its length. Remark that: D21j (x; y )
D22j (x; y )
= =
q
+ + 2j v2j ;x (x; y)
q
+ + 2j v2j ;y (x; y)
(9)
i.e. the original representation is obtained by projecting the first eigenvector, multiplied by the square root of the corresponding eigenvalue onto the x and y -axes. In multimodal
images the edge information is contained in both eigenvalues. In this paper, the conjecture is made that the eigenvectors and eigenvalues of the multiscale fundamental forms describe the edge information of a multimodal image in a multiresolution way. The same problem as in the single scale case occurs: the matrix diagonalization does not uniquely specify the signs of the eigenvectors. This phenomenon translates in the multimodal image problem as arbitrariness of the gradients orientation. This orientation reflects on the sign of the detail coefficients that can flip incoherently from one pixel to another. Therefore the orientation must be determined before a reconstruction can be calculated. Instead of following the proposal of [1], we propose the following (more simple) solution to this problem. The orientation of the gradient is approximated by the orientation of the gradient of the average of all bands. The average of the bands is calculated and wavelet transformed. The scalar product of the obtained de1 2 tail coefficients D 2j and D 2j with the first eigenvectors then 1 2 determines the signs: if D 2j v2+j ;x + D2j v2+j ;y 0 then the sign of the eigenvectors is not changed, if the scalar product is negative, then the signs of v 2+j ;x and v2+j ;y are flipped. The sign of v2j is chosen so that the angle of its direction is 2 more than the angle of v 2+j .
5. Applications and experiments In this paragraph, several applications of the proposed multimodal image wavelet representation (MIWR) are described. In principle, all greylevel processing techniques that are based on gradient or multiscale contrast can be extended towards multimodal images using the MIWR representation. In this paper, two applications are discussed, because they are obvious choices or they were previously described in the literature. To compare, single valued techniques, i.e. where the processing is performed on each band separately, or single scale techniques, i.e. where the first fundamental form is used, are applied. Multispectral image fusion Image fusion is the process of combining several, perfectly registered images into one greylevel image. This technique is applied on multispectral satellite imagery [10, 3] as well as on biomedical multimodal imagery [9], with the purpose of visualisation and of reducing the complexity for classification and segmentation tasks. In a multiresolution approach, the wavelet representations of all bands are to be combined into one greylevel image wavelet representation. In [4] e.g. the detail coefficients between different bands are compared and for each pixel the largest one is chosen to represent the fused image. Using the proposed MIWR representation, a single greylevel wavelet representation can be constructed in the
following way, starting by ignoring the second eigenvalues. A low resolution image is obtained by averaging 2d = theP low resolution images of the original bands: L 1 N Ln;2d . The obtained representation is then given n=1 N 2d and fD i;j+ gi=1;2 . Reconstruction generates a by: L 2 j=1;:::;d greylevel image that contains the fused edge information of the different bands. To demonstrate the proposed fusion technique, the following experiment is conducted. As a test image remote sensing data is used: a Thematic Mapper image from the Huntsville area, Alabama, USA, containing 7 bands of 512x512 images from the U.S. Landsat series of satellites. The first four images are fused into one greylevel image. In figure (1), the result is shown. On the left the proposed technique is applied. On the right, the wavelet fusion technique of [4] is applied. The same wavelet redundant wavelet representation as in the first image is applied on every band. For each pixel position and at each scale, the largest detail coefficient is taken to be the detail coefficient of the fused ~ i j (x; y ) = maxN Di j (x; y ) . One can observe image: D 2 n;2 an improved overall contrast using the proposed technique. This effect can be attributed to the superior description of the edge information in the MIWR representation. Colour image noise filtering In this section we will restrict ourselves to adaptive filtering techniques, based on anisotropic diffusion [6]. In [7], an anisotropic diffusion technique was described for colour images, based on the first fundamental form. In this section, a multiscale version of colour anisotropic diffusion is constructed, based on the multiscale fundamental forms. Let us first describe the single scale version. From a colour image, the first fundamental form is calculated using (2). The eigenvectors and corresponding eigenvalues of the first fundamental form describe an ellips in the image plane. Since the first eigenvector is directed along the gradient, it will be directed across an edge. The second eigenvector will the be directed along the edge. Anisotropic diffusion is based on the idea to smooth an image preferably along edges while trying to keep high frequency information across the edges. Using the first fundamental form, a locally adapted Gaussian smoothing kernel is constructed (see [11] for more information):
G(r) = exp
1 2
r:v+
12
+
r:v
22
(10)
with standard deviations given by: 1
=
2
=
1+C
1+C
1
+ + +
2 !
(11)
where is the standard deviation of the image noise and C a measure for the corner strength. The advantage of this smoothing kernel is that it is more extended along and less extended across the edges. The algorithm was originally designed for greylevel images, where instead of the first fundamental form (2), a quadratic form was used, in which the sum was taken over a local window around each pixel [8]. The obtained gaussian kernel was then convolved with the image. In the case of colour images, a gaussian kernel can be calculated for each band separately. In [7], it is argued that it is better to apply the same anisotropic diffusion process to all three bands, by using the first fundamental form. The single scale colour anisotropic diffusion algorithm then looks as follows. From a colour image, the first fundamental form is calculated using (2). From this, a gaussian kernel is calculated using (10) and (11). Each of the three bands is then convolved with this kernel. How to extend this algorithm to a multiscale one? Using the multiscale fundamental forms of (7), the obtained eigenvectors and eigenvalues describe the directions and rates of the gradient of the smoothed image (I 2j )(x; y ). One can now construct a gaussian kernel for each scale, that is adapted to the edge information at that particular scale. To apply the algorithm as proposed in the previous paragraph, the smoothed images should be convolved with their corresponding kernel. One has however no direct access to the smoothed images, but only to their derivatives (i.e. the detail images). Since taking the derivative and convolving with a gaussian are interchangeable, one can as well convolve the detail images with the corresponding kernels. After reconstruction, a noise filtered image is obtained. The algorithm then looks as follows:
Figure 1. Fusion of the first 4 bands of a Landsat image; a: using the proposed technique; b: using wavelet maxima fusion
For each band, calculate its dyadic wavelet representation. For all scales, calculate the multiscale fundamental forms, using (7). For each scale, calculate the gaussian kernels, using (10) and (11). At each scale, convolve the obtained gaussian kernel with the detail images from every band separately. Reconstruct each noise-filtered band separately.
To demonstrate the technique the following experiment is conducted. The original RGB image peppers is corrupted with gaussian white noise, with a variance of 50 (PSNR=22.4 db). The proposed algorithm is applied, leading to a noise filtered image with a PSNR of 25.5 db. To compare, the single scale algorithm, using the first fundamental form, is applied, leading to a PSNR of 24.1 db. In
Figure 2. Anisotropic diffusion; a: original peppers image; b: gaussian noise added ( = 20); c: filtering using the fundamental form; d: filtering using the multiscale fundamental forms.
figure (2), the results (only intensity) are displayed. One can clearly observe an improved noise reduction, while keeping the edge information, using the proposed technique.
References [1] A. Cumani. Edge detection in multispectral images. CVGIP: Graphical Models and Image Processing, 53:40–51, 1991. [2] S. Di Zenzo. A note on the gradient of a multi-image. Comput. Vision Graphics Image Processing, 33:116–125, 1986. [3] C. Lee and D. Landgrebe. Analyzing high-dimensional multispectral data. IEEE Trans. Geosci. Remote Sensing, 31(4):388–400, 1993. [4] H. Li, B. Manjunath, and S. Mitra. Multisensor image fusion using the wavelet transform. Graphical Models and Image Processing, 57(3):235–245, 1995. [5] S. Mallat and S. Zhong. Characterization of signals from multiscale edges. IEEE Trans. Pattern Anal. Machine Intell., 14:710–732, 1992. [6] P. Perona and J. Malik. Scale-space and edge detection using anisotropic diffusion. IEEE Trans. PAMI, 12,7:629–639, 1990. [7] G. Sapiro and D. Ringach. Anisotropic diffusion of multivalued images with applications to color filtering. IEEE Trans. Image Processing, 5, 11:1582–1586, 1996. [8] J. Sijbers, P. Scheunders, M. Verhoye, A. Van der Linden, D. Van Dyck, and E. Raman. Watershed-based segmentation of 3d mr data for volume quantization. Magnetic Resonance Imaging, 15,6:679–688, 1997. [9] T. Taxt and A. Lundervold. Multispectral analysis of the brain using magnetic resonance imaging. IEEE Trans. Med. Imaging, 13(3):470–481, 1994. [10] I. Thomas, V. Benning, and N. Ching. Classification of remotely sensed images. Adam Hilger, Bristol, 1987. [11] G. Yang, P. Burger, D. Firmin, and S. Underwood. Structure adaptive anisotropic filtering for magnetic resonance image enhancement. Proceeding of CAIP, pages 384–391, 1995.