Multisource classification using ICM and dempster-shafer theory

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 51, NO. 2, APRIL 2002

277

Multisource Classification Using ICM and Dempster–Shafer Theory Samuel Foucher, Member, IEEE, Mickaël Germain, Student Member, IEEE, Jean-Marc Boucher, Member, IEEE, and Goze Bertin Bénié, Member, IEEE

Abstract—We propose to use evidential reasoning in order to relax Bayesian decisions given by a Markovian classification algorithm (ICM). The Dempster–Shafer rule of combination enables us to fuse decisions in a local spatial neighborhood which we further extend to be multisource. This approach enables us to more directly fuse information. Application to the classification of very noisy images produces interesting results. Index Terms—Data fusion, Dempster–Shafer theory, ICM algorithm, multisource classification, remote sensing.

I. INTRODUCTION

F

OR the past few years, image processing research has focused on the problem of merging several images in order to increase information content. Image fusion can be done at different levels of representation: pixel level, feature level, or decision level. The present paper deals with the fusion of decisions (classes) commonly called multisource classification. Traditional methods, such as maximum likelihood, are based on a multivariate Gaussian pdf employed to statistically model the data set. Whereas this is suitable for multispectral data, such a model fails when sources of information are highly heterogeneous i.e., a combination of radar and optical images. Moreover, performances of ML methods rapidly decrease when the number of images increases and the quality of the training becomes critical. In order to overcome these limits, fusion methods try to deal with the following issues: heterogeneity in the sources and in the representation format, large number of sources, imprecision in the data, non-Gaussian sources, etc. Fusion methods can be categorized by two main approaches: the statistical approach using a classical Bayesian framework and methods using an Artificial Intelligence framework, such as possibility theory or Dempster–Shafer theory. The aim of this article is twofold. First, we propose a modification of the multiscale iterated conditional mode (ICM) algorithm using a local relaxation of the Bayesian decision based on Dempster–Shafer theory. Second, we extend this approach to apply to the multisource case. The final method produces interesting results on classification of radar images and in the fusion of an optical (spot) and the SAR image (radarsat).

II. PRINCIPLE A. ICM Classification Markovian methods of classification try to estimate the MAP solution for the class field of the image. Annealing methods, such as the Gibbs sampler, or the Metropolis algorithm, ensure the convergence toward a global energy minimum but the computational burden is high. Deterministic methods such as ICM are much faster but remain a suboptimal approach where finding a global minimum is not guaranteed [1]. The ICM method estimates a local MAP solution for the label by minimizing the sum of the local likelihood and Gibbs energies. The image data from a sensor is assumed to consit of vectors . Let be the set of pixels in the image. The classification process is to estimate the class labels of the scene , is chosen in the class set . The ICM algorithm [1] is a solution to resolve this problem. with The ICM algorithm is based on maximizing as the representation of contextual respect to , we note labels. For each iteration of algorithm, a plausible choice is the class label that maximizes conditional probability, given and elsewhere. We can note a fast conthe current class label vergence to a local maximum instead of a global MAP algorithm like the simulated annealing method. B. Dempster–Shafer Theory Basics Dempster–Shafer theory is a mathematical framework in which nonadditive probability models enable us to model imprecision in beliefs [6]. The hypothesis set , called the frame of discernment, is intended to represent a set of mutually exclusive and exhaustive propositions. In our problem of classification, we . Evidence on a subset is represented with have . Subsets with a basic probability assignment (bpa) nonnull bpa are called focal elements and compose the kernel , and they have the following properties: (1) (2) (3)

Manuscript received December 12, 2001; revised January 21, 2002. S. Foucher and G. B. Bénié are with the Centre d’Applications et de Recherche en Télédétection Université de Sherbrooke, Sherbrooke, QC, Canada (e-mail: [email protected]). M. Germain and J.-M. Boucher are with the École Nationale Supérieure des Télécommunications de Bretagne, Brest, France. Publisher Item Identifier S 0018-9456(02)04315-2.

gives the amount of evidence which The belief function This function is defined on the implies the observation of frame of discernment by the relation

0018-9456/02$17.00 © 2002 IEEE

(4)

278


on

. This belief function has the fol-

lowing representation

(a)

(b)

(12) (13) . The partially We can notice that with a partition with consonant belief becomes a consonant belief defined by Shafer

(c) Fig. 1. (a) Consonant distribution, (b) partially consonant distribution, and (c) dissonant distribution.

(5) (6)

(14) Conversely, a partition with defined by a set of Bayesian masses

shows a dissonant belief (15)

can be seen as the amount of The plausibility function evidence which does not refute

(7) This function can be represented according to the belief function

(8) Total ignorance is represented when is the only focal element. On the contrary, when focal elements are all singletons, and are we obtain a Bayesian representation where equal and equivalent to a probability measure on . When we observe the outcome of a statistical experiment, Shafer proposes an approach to assess our evidence concerning provided by the statistical observation [4], [6]. The Dempster rule combines pieces of evidence from independent sources and , If

(9) (10)

In Section II-C, we will give closed algebric formulations for these three bpa distributions (one consonant, one partially consonant and one dissonant). C. Local Relaxation of the Bayesian Decision With the Dempster Rule In order to relax the first decision made by one ICM iteration, we establish the following hypothesis: in a 3 3 neighborhood , labels around the central pixel are elements of noted evidence that determine our belief in the value of the label of the central pixel. 1) Elementary Mass Distributions and Local Combination Rule: Following one iteration of the ICM algorithm, labels attached to the pixels can be ordained in a decreasing order according to their probability value with and , . Two different types of mass distributions are used reflecting different ways to distribute our belief on . a) Consonant distribution: The elementary masses of evidence are determined by the results of the previous Bayesian decision in the ICM iteration. The choice of the elementary mass on the frame of discernment is critdistribution ical because it models our primary knowledge. Following Shafer [6] and Denoeux [3], we choose a consonant way to distribute elementary belief, as depicted below

(11) (16) The Dempster rule is difficult to apply when kernels have nonsingleton focal elements. The different mass representations are a way to reduce complexity by imposing a structure to the kernel. These representations can be described by three distributions (Fig. 1). A mathematical framework has been described by Shafer [6] in the consonant distribution. This theory has been generalized by Walley [7] with the definition of the partially consonant beis defined “partially conlief. In fact, a belief function is defined “consonant” on a partition sonant” on if

The local combination using the Dempster rule is

In the neighborhood in two sets: the set and the set

(17) , evidence supporting can be split of sites with focal elements of sites with

FOUCHER et al.: MULTISOURCE CLASSIFICATION USING ICM AND DEMPSTER–SHAFER THEORY

(focal elements intersect only in : ) This approach enables us to identify possible factorizations in and can the relation (17), and mass combinations in be done separately

279

The Dempster product is simplified in the same way as the consonant case by regrouping pixels according to their focal sets when they have a nonnull intersection, that is to say , and . Consequently, the nonnormalized Dempster product gives

(18)

The second term is trivial, and the first one can be simplified observing the following relation expressing the sum of all the combinations of products of binary terms (with a set of indexes)

(19) With relation (18)

(25)

, we obtain the first term of the Terms in this relation can be simplified using the relation and (20) with (20)

which gives

and respectively as In Sections II-C1b and c, we note the elementary mass and the Dempster rule without normalization. Using relations (18) and (19) we obtain the following nonnormalized mass

(21)

(26)

(22) We obtain two simple algebraic relations The normalization constant is (23) a) Partially consonant distribution: Focal elements are , with the following mass distribution

(24) (27)

280


Dissonant Distribution: Focal elements are . In that case, focal elements have the following mass distribution

(28)

(a)

(b)

(c)

In the same way as partially consonant distribution, we obtain two simple algebraic relations (27). 2) Decision Rule: There are many ways to decide, the most straightforward being the maximum of belief rule

(d)

(e)

(f)

(29) , we have a total conflict between decisions in When and the rule of combination is no longer defined. When conflict which has the occurs, we propose to take the decision in best confidence level. That is to say (30)

Fig. 2. (a) Noisy image, (b) truth, (c) ICM method without Dempster–Shafer, (d) proposed method with consonant distribution, (e) proposed method with partially consonant distribution, and (f) proposed method with dissonant distribution.

III. RESULTS A. Algorithm Implementation In order to compare different mass distributions, we use the artificial Gibbs field image in which initialization is performed with the SEM algorithm. B. Classification of Artificial Noisy Image

D. Extension to the Multisource Case images . Each inWe consider a set of has a particular class set noted formation source . The fusion process at the level decision aims to focus decisions from the sources in the information of interest . As a result, we obtain the multisource set . classification on the infor1) Decision Space Mapping: Projection of mation set is obtained from a priori knowledge by defining the matrix where is our belief that the contributes to the information class . Consequently, class for a site , bpa’s in the information set are calculated from source bpa’s using prior belief in the following manner: (31)

(32)

The proposed algorithm is applied on a 256 256 artificial image (Fig. 2) corrupted by a simulated Gaussian noise. The ground truth contained four classes [Fig. 2(b)]. A simple ICM algorithm gives 89.9% of correct classification [Fig. 2(c)]. We obtained good results with Dempster–Shafer (around 93% of correct classification), despite the fact that the image was not filtered with a strong noise. A consonant and a partially consonant distribution give similar results (91.2% and 93.2% of correct classifiation, respectively) whereas a dissonant distribution gives better results (94.7%). C. Fusion of Optical and Radar Images Spot images are very sensitive to vegetation cover density. Dense vegetation appears in dark whereas bare soils are bright (Fig. 2). RADARSAT images give reliable information about lateritic soil which appears dark on the image whereas dense vegetation is very bright. The proposed method is used with the following projection matrix:

After projection, information fusion is realized when the bpa’s are combined in the following multisource neighborhood

(33) As in Section II-C, in case of conflict, we take the decision in , which has the maximum bpa.

In this case, we use the multiresolution ICM algorithm proposed by Boucher [2]. On the classification in five classes, we preserved information from the radar (white) whereas information from both sendense sors are used to determine vegetation classes (black

FOUCHER et al.: MULTISOURCE CLASSIFICATION USING ICM AND DEMPSTER–SHAFER THEORY

281

[5] S. Le Hégarat-Mascle, I. Bloch, and D. Vidal-Madjar, “Application of Demspter-Shafer evidence theory to unsupervised classification in multisource remote sensing,” IEEE Trans. Geosci. Remote Sensing, vol. 35, pp. 1018–1031, Apr. 1997. [6] G. Shafer, A Mathematical Theory of Evidence. Princeton, NJ: Princeton Univ. Press, 1976. [7] P. Walley, “Belief function representation of statistical evidence,” Ann. Statist., vol. 15, no. 4, pp. 1439–1465, 1987.

(a)

(b)

(c) Samuel Foucher (M’02) was born in Nantes, France, in 1969. He received the B.S. degree in physics from the University of Nantes in 1989, the telecommunication engineering degree from the Ecole Nationale Supérieure des Télécommunications de Bretagne, Brest, France, the M.S. degree in image processing from the University of Rennes, Rennes, France, in December 1996, and the Ph.D. degrees in radar filtering and segmentation in September 2001.

(d)

(e)

(f)

Fig. 3. (a) Spot image, (b) radar image, (c) multisource classification, (d) truth, (e) c-means, and (f) UDS method [5].

vegetation, dark gray average cover, gray low cover, light gray bare soil). Despite the very noisy radar image, we are enable to extract information without filtering. In order to compare, we give classification results obtained by a clustering algorithm (c-means) and an unsupervised method of fusion using Dempster–Shafer theory [5]. IV. CONCLUSION The proposed approach takes into account local imprecision in a previous Bayesian classification in order to initiate a second decision based on the Dempster–Shafer rule of combination. We have extended this local relaxation to incorporate multisource information. Results show an interesting robustness toward noisy images. REFERENCES [1] J. Besag, “On the statistical analysis of dirty pictures,” J. R. Statist. Soc. B, vol. 48, no. 3, pp. 259–302, 1986. [2] J. M. Boucher, G. B. Bénié, R. Fau, and S. Plehiers, “Local and global multiscale image classification,” Proc. SPIE, vol. 2303, pp. 485–493, 1994. [3] T. Denoeux, “A k -nearest neigbor classification rule based on Dempster–Shafer theory,” IEEE Trans. Syst., Man, Cybern., vol. 25, pp. 805–813, May 1995. [4] H. Kim and P. H. Swain, “Evidential reasoning approach to multisource-data classification,” IEEE Trans. Geosci. Remote Sensing, vol. 25, pp. 1257–1265, Oct. 1995.

Mickaël Germain (S’01) was born in Bressuire, France, in 1974. He received the telecommunication engineering degree from the Ecole Nationale Supérieure des Télécommunications de Bretagne, Brest, France, and the M.S. degree in image processing from the University of Rennes, Rennes, France, in November 1998. He is currently pursuing the Ph.D. degree at the University of Sherbrooke, Sherbrooke, QC, Canada. His research including multispectral image fusion and segmentation.

Jean-Marc Boucher (M’83) was born in 1952. He received the engineering degree in telecommunications from the Ecole Nationale Supérieure des Télécommunications, Paris, France, in 1975, and the Habilitation à Diriger des Recherches degree in 1995 from the University of Rennes 1, Rennes, France. He is currently Professor with the Department of Signal and Communications, Ecole Nationale Supérieure des Télécommunications de Bretagne, Brest, France, where he is also Education Deputy Director. His current research interests include estimation theory, Markov models and Gibbs fileds, blind deconvolution, wavelets and multiscale image analysis with applications to radar and sonar image filtering and classification, multisensor seismic signal deconvolution, electrocardiographic signal processing, and speech coding. He has published 100 technical articles in these areas in international journals and conferences.

Goze Bertin Bénié (M’01) was born in Daloa, Ivory Coast. From 1977 to 1987, he received the B.A.Sc. degree in surveying and the M.Sc. and the Ph.D. degrees in photogrammetry and remote sensing from Universite Laval, Sainte-Foy, QC, Canada. He was a Postdoctoral Fellow at the Canada Centre for Remote Sensing, Digim, Inc., Lavalin, Montreal, QC, and at Intera Information Technologies, Inc., Calgary, AB, Canada, from 1987 to 1990. In 1990, he joined the Department of Geography and Remote Sensing and the Centre d’applications et de recherches en télédétection (CARTEL) of the Université de Sherbrooke, Sherbrooke, QC, Canada, as an Assistant Professor. He was the head of CARTEL from 1995 to 2000. He is currently Full Professor in image processing and geomatics. His research interests include image filtering, segmentation and classification methodology, and spatial modeling in GIS.