Fuzzy Statistical Classi cation Method For Multiband Image Fusion M. Germain2 , M. Voorons1 , J.M. Boucher2 , G.B. Benie1 1 Centre d'Applications et de Recherche en Teledetection Universite de Sherbrooke, Quebec, Canada 2 Ecole Nationale Superieure des Telecommunications de Bretagne Brest, France
[email protected] We propose a new fusion algorithm based on the Dempster-Shafer theory of evidence. The main interest of this method is a new distribution of mass functions. Generally the methods used are the mass consonant distribution and the partially mass consonant distribution. The originality of this work is to de ne uncertain and inaccurate data by using a fuzzy statistical classi cation algorithm such as FSEM (Fuzzy Stochastic Estimation Maximization). Application to multiband image fusion produces interesting results for classi cation. Abstract {
Evidential reasoning, Dempster-Shafer combination, mass function distribution, multiband image, classi cation algorithm, fuzzy statistical method, image fusion Keywords:
1 Introduction
Multiband image fusion has been widely studied since the rst exploitation of remote sensing data during the 70's. At the beginning, only combinations of dierent images were used; they were compared on a pixel-per-pixel basis, like the image dierencing or image regression techniques. But in the last few years, image processing research has focused on the problem of merging several images in order to increase information content and to improve the decision level. It was widely used for military applications [11] as well as for civilian applications with LANDSAT or SPOT sensors. Traditional methods of data fusion such as Maximum Likelihood (ML) in Bayesian theory are based on a multivariate gaussian probability density function (pdf) employed to statistically model the data set. But this kind of model fails when the number of images increases and the quality of training becomes critical. Another widely used approach is the fuzzy model. In
ISIF © 2002
this context, the data set is characterized by a degree of membership [12]. The main disadvantage of these two approaches is their inability to represent the two important concepts of all data fusion processes: inaccuracy and uncertainty [2]. This is why we prefer using the Demspter-Shafer (DS) theory of evidence. Unlike Bayesian and Fuzzy theory, DS theory can represent both inaccuracy and uncertainty through the de nition of belief and plausibility functions which are derived from a mass function. Moreover, subsets of the hypothesis set, called the frame of discernment, give strong modeling. The problem is to de ne a distribution of mass functions representing a degree of reliability for a single hypothesis or compound hypotheses. The most commonly used distributions are based on a probabilistic distribution [10], but inaccuracy between compound hypotheses is better represented by a fuzzy distribution approach [1]. Nevertheless the latter is not well adapted to strong noise and uncertain data. The purpose of this paper is to de ne uncertain and inaccurate data using a fuzzy statistical classi cation algorithm such as Fuzzy Stochastic Estimation Maximization (FSEM) [3]. The aim of this work is twofold: rst of all, we propose to generalize the FSEM algorithm to a global class set, whilst only a two-class set has been studied in the past. The second aim, is to assign a coherent pdf to each mass function in the DS theory. The proposed fusion method produces interesting results. Classi cation of multiband remote sensing are improved compared to evidential reasoning using standard mass distribution functions. This paper is structured as follows. Section 2 gives a brief introduction to DS Theory. Section 3 deals with the dierent ways to initialize the mass distribution functions. Section 4 described a new classi cation method based on the fuzzy SEM algorithm, and we apply this new algorithm in section 5 to initialize the
178
mass distribution functions. And nally, an example 2.2 Evidence combination of classi cation on a remote sensing multiband image The greatest advantage of DS theory is its robustis given in section 6. ness in combining information coming from various sources with the DS orthogonal rule. For instance, let 2 Dempster-Shafer theory of ev- us denote two mass distributions m1 and m2 from two sources. Then, the DS combination can be represented idence The DS mathematical theory of evidence was rst by the following orthogonal rule : introduced by Dempster [5] and formalized by Shafer P m1(B1)m2(B2) [8]. Unlike Bayesian and fuzzy methods, DS theory provides a representation of both inaccuracy and un- (m1 m2 )(A) = B1 \B2 =A ; K 6= 1 1 K certainty using the de nition of speci c functions. (7) 2.1
Basic principles
Let us denote the hypothesis set , called the frame of discernment, intended to represent a set of mutually exclusive and exhaustive propositions. DS theory allows us to consider any subset of , so we denote 2 the set of the subset of . This means that not only single propositions (also called singletons) but also any union of propositions can be used. Representation of evidence on a subset A 2 2 is given by a basic probability assignment (bpa) also called the value of the mass function (m) assign to A : m(A) 0. Then m is de ned for every element of 2 as follows :
K
=
X
\
B1 B2 =
( ) ( )
m1 B1 m2 B2
(8)
K is considered as a normalization factor and is interpreted as a measure of con ict between the various sources. In addition, it is a representation of the empty set mass function. Thus, the larger K is, the more the sources con ict and the less sense their combination has. If K = 1, then the sources are totally contradictory. 2.3
Decision making
There are many ways to decide which is the most reliable hypothesis from single and unions of proposim : 2 ! [0; 1] (1) tions, such as maximum of belief, maximum of plausibility or compromises [e.g. max(Bel(A) + P l(A)]. X m(A) = 1 (2) In this paper, we use the most commonly used deciA22 sion rule, i.e. the maximum of belief rule : m(;) = 0 (3) D = arg(max (m(A)) (9) A22 From a mass distribution, the inaccuracy and the uncertainty can be represented by two functions: the where D is the decision taken from all propositions. belief function Bel(A) and the plausibility function P l(A). 3 Mass function de nition The belief function gives the amount of evidence im- As mentioned in the introduction, the most crucial plied by the observation of A: step in applying the DS theory is the mass function X In practical applications, the mass disBel(A) = m(B ) (4) determination. tribution directly determines the eÆciency of the fuB A sion algorithms and, more precisely, the segmentation The plausibility function can be seen as the amount processes. of evidence that does not refute A: In the literature, some methods have been described. One can de ne mass values from standard statistical X m(B) P l(A) = (5) methods [10]. Another method is to de ne a more B \A6=; intuitive approach based either on the estimation of an attenuation factor, or on the distance to class centers. P l(A) = 1 Bel(A) (6) Some other methods are based on the fuzzy logic [1]. The belief and the plausibility functions represent the minimum and the maximum uncertainty value 3.1 Statistical approach about A respectively. The interval [Bel(A); P l(A)] is The most widely used mass functions are derived called the belief interval. The length of this interval from probabilities, i.e. Gaussian conditional densities gives a measurement of the inaccuracy about the un- can be used to initialize the mass of single and comcertainty value. pound hypotheses. The main mathematical framework 179
has been described by Shafer in [8] a by consonant distribution and generalized by Walley [10] with the de nition of the partially consonant belief. To achieve the determination of the mass functions, let us de ne Bel(A) a belief function partially consonant on a partition V = fV1 ; V2 ; :::; V g of the frame of discernment . This belief function has the following representation:
The determination of mass functions using this theory enables us to assign mass values for each hypothesis from degrees of membership. Thus, if we de ne two hypotheses 1 and 2 , then from membership functions we determine mass functions as follows:
~ ( ) = 1 (x) (14) m ~ (2 ) = 2 (x) (15) m ~ (1 [ 2 ) = =(1 (x); 2 (x)) (16) X Bel(A) = CP [max P (x=i ) max P (x=i )]; where m~ (:) is a non-normalized mass function and =(:) i 2Vk i 2A\Vk k=1 a fuzzy characteristic estimation for compound hy(10) potheses. Bentadet and al. [1] de ne this estimation 8A Vk by: CP
=f
X max P (x= )g 1
k=1
2
i Vk
i
(11)
If = 1, the partially consonant belief becomes the consonant belief de ned by Shafer [8]: max P (x=i ) i 2A Bel(A) = 1 ; 8A (12) max P (x=i ) i 2
m 1
=(1 (x); 2 (x)) = 2: (1x(x) )+:2 (x(x) ) 1
2
(17)
Finally in practice, statistical and fuzzy approaches give complementary results. Indeed, probability models uncertainty and fuzziness models inaccuracy [7]. On the one hand, statistical approaches present the advantage of being well adapted to strong noises and images [3]. On the other hand, fuzzy methods Conversely, a partition with = jj gives a dissonant textured allow a better assignment of mixed pixels and so imbelief de ned by a set of Bayesian mass functions : prove fuzzy structure classi cation. That is the reason why, in the next sections, we will study a new approach Bel(A) = Bel(i ) = CP P (x=i ) (13) based on both statistical and fuzzy methods for determining a more eÆcient mass distribution. 3.2 Fuzzy approach Unlike the statistical approach, a fuzzy approach 4 Fuzzy statistical classi cation seems to be a better way to take inaccurate values method into account, and so improve the mass distribution assigned to compound hypotheses. A few works have at- Let us consider the problem of segmenting a sateltempted to incorporate fuzzy measures in a DS frame- lite image into two classes: \water" and \vegetation". work. Bentadet and al. [1] propose a method to au- There may be some pixels with only vegetation or watomatically determine the mass functions in the con- ter, but others, as in a boggy area, in which water text of image segmentation with fuzzy C-means (FCM) and vegetation are simultaneously present. In the rst clustering. In the same way, Verikas and al. [9] use ev- case, the pixel will be called a pure pixel and in the idence theory applied to a classi cation process using second case, it will be called a mixed pixel. the concept of fuzzy logic. Some algorithms try to deal with this problem. A The fuzzy set theory was described by Zadeh [12]. It general way to consider it is the fuzzy approach for provides a mathematical framework in which the hu- which each pixel has a grade of membership to all man ability to take a decision is modeled with inaccu- classes. Conversely, classical statistical models force rate, incomplete and not totally reliable information. each pixel to be associated with exactly one class and For instance, in remote sensing image processing, the do not give a representation of mixed classes. boundaries of the real elds are not well de ned and However, to counter this drawback, some algorithms the transitions are smooth. introduce a fuzzy version of statistical modeling. Thus, The main characteristic that distinguishes the fuzzy parameter estimation stochastic algorithms have been approach from the statistical one is the use of a degree modi ed to take fuzzy data into account such as the of membership instead of a probability value. This Expectation Maximization (EM), Stochastic Expectamembership function can be represented by i (x)(2 tion Maximization (SEM) and Iterative Conditional [0; 1]) which determines the degree of membership of Estimation (ICE) algorithm [4]. But these surveys the element x to the fuzzy set i . only apply for the case of a maximum of two classes. 180
In fact, the computational complexity increases when the number of classes exceeds two and such methods would undoubtedly be very time consuming [3]. However, Salzenstein and al. [7] propose a hypothesis and assume that each pixel cannot belong to more than two pure classes. This hypothesis is widely observed in practice, like \water-vegetation", \treeshouses" and \agricultural- vegetation" fuzzy classes, whereas \trees-houses-agricultural" mixed classes or others are fairly unusual. In the following sections, we will take the previous hypothesis into account and we will attempt to generalize the fuzzy statistical classi cation approach by using the SEM algorithm. 4.1
Principle of the fuzzy SEM algorithm
The Fuzzy SEM (FSEM) algorithm was introduced by Caillol and al. [3]. An overview of fuzzy statistical method can be found in [4]. We propose to generalize the FSEM method to K classes by assuming that each pixel cannot belong to more than two pure classes. We de ne the unobservable random eld X = (Xs )s2S taking its values in a nite set of classes numbered from 1 to K. We denote Y = (Ys )s2S the observed random eld which is a corrupted version of X . And S be the set of pixels S = f1; :::; ng. As stated previously, for each pixel s the random variable Xs can be of two types : hard information or fuzzy information. Caillol and al. propose an example in [3]: only two pure classes 1 and 2 , and one fuzzy class are considered f12 which takes its values in ]0,1[. Let Æ0 , Æ1 be Dirac weights for 0 and 1, and the Lebesgue measure on