Multimed Tools Appl (2007) 35:1–27 DOI 10.1007/s11042-007-0115-x
Collaborative multi-step mono-level multi-strategy classification Pierre Gançarski · Cédric Wemmert
Published online: 27 June 2007 © Springer Science + Business Media, LLC 2007
Abstract This article deals with the description of a new way to learn from multiple and heterogeneous data sets, and with the integration of this method in a multiagent hybrid learning system. This system integrates different kinds of unsupervised classification methods and gives a set of clusterings as the result and a unifying result, representing all the other one. In this new approach, the method occurrences compare their results and automatically refine them to try to make them converge towards a unique clustering that unifies all the results. Thus, the data are not really merged but the results from their classification are compared and refined according to the results from all the other data sets. This enables to produce a set of classification hierarchies which classes are very similar, although these hierarchies were extracted from different data sets. Then it is easy to build a unifying result from all of them. Keywords Complex data · Collaborative clustering · Classification combining · Per-pixel image analysis
1 Introduction The goal of clustering is to identify subsets called clusters (or classes) of the data where a cluster usually corresponds to objects that are more similar to each other than they are to objects from other clusters. Clustering is carried out in an
P. Gançarski (B) · C. Wemmert LSIIT–AFD, UMR 7005 CNRS/ULP, University Louis Pasteur, Strasbourg, France e-mail:
[email protected] C. Wemmert e-mail:
[email protected]
2
Multimed Tools Appl (2007) 35:1–27
unsupervised way by trying to find theses subsets without having a predefined notion of the cluster: it operates only starting from the intrinsic properties of the objects. A complete panorama of the unsupervised classification existing methods is given in [10, 16]. However, on the one hand, when the data come from different medias, the information contained in these sets are too complex and heterogeneous to give a unique representation of the data structure. On the other hand, there exist many methods of automatic knowledge extraction, based on very different techniques. Some previous work in our laboratory on neuronal networks [13], statistical methods [20] or conceptual learning [19], have shown their efficiency in learning problems such as remote sensing image classification. However, each method has some limitations (imposed number of classes, initial choice of the centers of the classes, depth of the conceptual hierarchy, . . . ). A relatively recent approach can be used. It is based on the idea that the information offered by different sources and different classifiers about objects are complementary [17]. And thus the combination of different classification methods may increase their efficiency and accuracy. A single classification is produced from results of methods having different points of view: all individual classifier opinions are used to derive a consensual decision. And each decision can be processed from a different source or media. There are many different ways to combine multiple classifiers depending on the representational methodology. In [1, 17], the authors divide them in: –
–
Multi-expert approaches (Fig. 1a) like boosting [3, 6, 22, 24], bagging [5, 23] or stacked generalization [25, 26, 30]: the different classification methods work in parallel and give all their classifications; the final classification is computed by a separate combiner; Multi-step approaches (Fig. 1b) like cascading [2, 9]: the methods work in a serial way; each method is trained or consulted only for the patterns rejected by the previous classifier(s).
Fig. 1 Combining classifiers
a Multi-experts classication
... b Multi-step Classication
Multimed Tools Appl (2007) 35:1–27
3
In both cases, the approach is described as: – –
Multi-strategy , if different types of algorithms are used (neural and statistical for example) at the same time [4]; Multi-representational, if the algorithms use different data or different points of view on the data [8].
In our work, we focus on unsupervised classification. In this case, we believe that the combination of classifiers should be able to define a classification method that decreases the importance of the initial choices. And secondly, it should also solve some of the limitations of the methods by using the complementarity of the different classification methods used. For example, some classifiers only propose a partitioning of the data space, whereas others give a hierarchy of classes or concepts as a result. So it could be interesting to automatically produce a hierarchy of classes with the partitioning methods according to the results computed by the hierarchical methods. Many techniques for combining supervised classifiers exist. Unfortunately, it is hard to apply the same schemes to the unsupervised case. First, these techniques are often monostrategical. Secondly, the fusion of decisions is harder because there is no direct correspondence between the clusters found by the different classifiers. And finally, few of these methods are able to use different representations of data. Nevertheless, we believe that these traditional approaches for combining classifiers can be used and improved if the methods collaborate during the entire classification process. Each method can use a different strategy and/or representation of data. Thus, we propose a new method including the three significant aspects: – –
–
The collaborative multi-strategy aspect with a classification method based on an automatic and mutual refinement of several classification results (Fig. 2); The multi-step aspect with a method using results of one step to initialise methods in next step (e.g using centres of cluster as initial seeds for Kmeans algorithm, using some statistics (means μ and standard deviation σ , predictivity, . . . ) to estimate the “best” depth and the “best” acuity for Cobweb algorithm.); The multi-source aspect with the ability of each method to use a different data file.
In this paper, we first highlight the collaborative multi-strategy aspect. Secondly, we present the multi-step approach and we present a proposition to integrate the multi-source aspect into our work. Finally we validate the multi-step approach with a pixel based classification application.
Fig. 2 Collaborative classification
...
4
Multimed Tools Appl (2007) 35:1–27
2 Collaborative multi-strategy classification To compute an unsupervised multistrategy classification, we need to solve two main problems : – –
The integration of different types of unsupervised classification methods (neural networks , partitioning methods, concepts formation algorithms. . . ); The combination of very different types of result such as a set of classes or a hierarchy of concepts.
So it is really difficult to compute a unique result, from this heterogeneous set of classification results, representing all the knowledge given by all the various methods. Indeed, to combine P classifications {R p } using a voting method for example, it is necessary to define a correspondence function, associating each class Cki of a classification Ri to one class of the classification R j, for each couple (Ri , R j). To carry out this operation in an optimal way, this function should be bijective, as in supervised approaches. In the case of unsupervised classification, the results may not have the same number of classes and we do not have any information about the correspondence between the different classes of the different results. We propose to perform a pretreatment based on a collaborative process [29] which consists in an automatic and mutual refinement of classification results: each classification method produces a result that is built according to the results of all the other methods. These refinements are performed to make the results of the various classifications converge: they should have almost the same number of classes, and all these classes should be statistically similar. Thus we obtain very similar results for all the method occurrences and links between them, representing the correspondences between their classes. In many cases, it is then possible to define a bijective correspondence function and to apply a unifying technique, such as our new voting method [27]. Moreover we asserted ourselves to do not modify the classification algorithms, but to integrate them just as they are. So any method of unsupervised classification can be added without being modified, and moreover, without modifying the structure and the behavior of our approach. For example, we have integrated about ten methods (Kmeans, Cobweb, EM, SOM, Weighting Kmeans, ...) in classifx, our multi-step multi-strategy system.1 The entire classification process is presented in Fig. 3. It is decomposed in three main phases: 1. First a phase of initial classifications is performed: classifications are computed by each method with its parameters. 2. An iterative phase of convergence of the results which corresponds to alternations between two steps, as long as the convergence and the quality of the results improve: 2.1 2.2
A step of evaluation of the similarity between the results and of mapping of the classes; A step of refinement of the results.
3. A phase of combination of the refined results.
1 http://geodm.u-strasbg.fr
Multimed Tools Appl (2007) 35:1–27
5
Data set
Step 1
Classification method 1
Classification method m
Result 1
Result m
Step 2.1
Results evaluation
Evaluation coefficients
Step 2.2.1
Conflicts detection
Step 2.2.2
Conflicts choice
Step 2.2.3
Local resolution
Step 2.2.4
Local modifications management
Step 2 Step 2.2
Results refining
Refined result 1
Step 3
Refined result m
Results unification
Unified result
Fig. 3 Collaborative multi-strategy classification process
The mechanism we propose for refining the results is based on the concept of distributed local resolution of conflicts by the iteration of four phases: 2.2.1
Detection of the conflicts by evaluating the dissimilarities between couples of results;
2.2.2
Choice of the conflicts to solve;
2.2.3
Local resolution of these conflicts (concerning the two results implied in the conflict);
2.2.4
Management of these local modifications in the global result (if they are relevant).
The process is reiterated as long as the quality of the unified result increases. Initial classifications are carried out in the usual way and do not highlight particular problems. The convergence of the results and their combination require a more complex analysis. In this article, we particularly present the convergence phase and then how the results are combined. In the sequel, all the steps described are “illustrated” on an example in Section 2.3
6
Multimed Tools Appl (2007) 35:1–27
2.1 Convergence of the results (Step 2) In order to apply a algorithm of combination of classification results in an optimal way, the correspondence function between two results should be bijective, like in the case of the combination of supervised classifications. Indeed, in this case, the classes are all identified with the same label for each result. I.e. it should associate j for each class Cki of a result, only one class Ck of another result, and reciprocally, j the class corresponding to Ck in the first result should be the class Cki . In the case of unsupervised classification, the results may not have the same number of classes and we do not have any information about the correspondence between the different classes of the different results. This is why, it is first necessary to define a method to evaluate similarity between results and to match the classes. 2.1.1 Evaluation and mapping (Step 2.1) j
A class Ck of the result R j is the corresponding class of the class Cki of the result Ri if it is the most similar to Cki . To put in correspondence the classes of two distinct results (Ri , R j), we have to estimate, for a class of the result Ri , how much it is similar to the classes of the result R j. Classically, the concept of similarity can be estimated by the calculation of a distance between the classes of the various results. A class j Ck of the result R j will be considered as the corresponding class of the class Cki of the result Ri if it is the closest to Cki according to the selected distance. However, this definition implies that there exists a distance between the objects of the classes, which is not always the case, in particular if they are described by symbolic or more complex attributes. We define the relation of correspondence and similarity between classes without using a distance measure between objects (see Section 2.3.1 for an example). For a class Cki in Ri : j
–
Its corresponding class in R j is the class Ckm which contains the most objects in
–
common with Cki i.e the class such as |Cki ∩ Ckm | is maximal. The similarity between a class Cki and its corresponding class is defined by:
j
i, j
i, j
j,i
ωk = ρk αkm ,k It is evaluated by observing: –
The distribution of the class Cki in the classes of R j i, j ρk
=
nj
i, j
αk,l
2
j
i, j
where αk,l =
l=1
–
|Cki ∩ Cl | |Cki |
j
The proportion of objects of Cki in Ckm : j
j,i
αkm ,k =
|Ckm ∩ Cki | j
|Ckm |
Moreover, during the same classification process, it can also be interesting to use various similarity measures according to the types of methods used to produce the results.
Multimed Tools Appl (2007) 35:1–27
7
If one uses for example two occurrences of the same algorithm using a distance measure (iterative partitioning for example) and another method (e.g. concept formation type algorithm) which does not use a distance to compute the classification, it could be relevant to use the similarity by distance to evaluate the correspondences between the classes of the first two methods, and the similarity by recovery to evaluate the correspondences between the classes of the last method and those of the first two ones. We can now describe the refining mechanism we have defined. 2.1.2 Results refinement (Step 2.2) The mechanism we propose for refining the results is based on the concept of distributed local resolution of conflicts. During a phase of refinement of the results, several local resolutions are performed in parallel. The detection of the conflicts (Step 2.2.1) consists, for all the j classes Cki of Ri , in seeking all the couples Cki , R j , where Cki = Ck which is its cori j responding class of Ck in R : a conflict importance coefficient is calculated according to the interclass similarity between the two classes. Then a conflict is selected (Step 2.2.2) according to the conflict importance coefficient and its resolution is started (Step 2.2.3). This conflict and all those concerning the two associated methods are removed from the list of conflicts. This process is reiterated as long as the list of conflicts is not empty. Then each modification proposed by the refinement step is evaluated and eventually taken into account in the global result according to the improvement of the global agreement coefficient (Step 2.2.4). Conflicts detection A class Cki of a result Ri is in conflict with its corresponding j j class Ck of the result R j if Cki = Ck . The conflict importance coefficient is calculated according to the interclass similarity between the two classes by: j i, j j i i τ Ck , Ck = 1 − ωk . There is a conflict if τ Ck , Ck = 0. See Section 2.3.2 for an example. Choice of the conflicts to solve During a phase of refinement of the results, several local resolutions are performed in parallel using one of different heuristics of choice of the conflict to be solved, according to the conflict importance coefficient: – – – –
The most important conflict is solved first: The conflict of average importance is solved first; The least important conflict is solved first; Random choice. The choice of the heuristic depends on the application. Experiments show that:
–
–
With the first heuristic (most important first) the results can diverge (the disturbances induced by the resolutions of conflicts are too significant) but they converge quickly; With the third heuristic (least important first) the results converge too slowly. In fact, for all our applications, we used the first heuristic.
8
Multimed Tools Appl (2007) 35:1–27
Local resolution of a conflict The resolution of a conflict consists in applying an operator to Ri and an operator to R j. These operators are chosen according to the j classes Cki i and Ck j involved in the conflict: –
– –
Merging of classes: the classes to be merged are chosen according to the representative classes of Cki i where its representative classes compared to result R j are the set of classes from R j which have more than pcr % of their objects included in Cki ( pcr is given by the user); Splitting of a class into subclasses: all the objects of Cki i are classified into subclasses; Reclassification of a group of objects: Cki i is removed and its objects are reclassified in all the other existing classes.
But, the simultaneous application of operators on Ri and on R j is not always relevant. Indeed, it does not always increase the similarity of the results implied in the conflict treated (Red Queen effect : success on one side is felt by the other side as failure to which must be responded in order to maintain one’s chances of survival [21]), and the iteration of conflict resolutions may lead to a trivial solution where all the methods are in agreement: a result with only one class including all the objects to classify, or a result having one class for each object. So we defined the local concordance and quality rate which estimates the similarity and the quality for a couple of results by ⎞ ⎛ i, j j,i j nj ni ps .ωk + pq .δki ps .ωk + pq .δk 1 ⎠ γ i, j = ⎝ + 2 ni nj k=1
k=1
where ps + pq = 1 and δki are the class quality criterion chosen by the user 0 < δki ≤ 1 (0 : worse quality 1 : b est quality). For example, with methods including a distance measure, the user can select the intra-class inertia as quality criterion. Without a distance measure, he can use the class predictivity (Cobweb), or the class probability density (EM algorithm), . . . At the end of each conflict resolution, after the application of the operators, the couple of results (the two new results, the two old results, or one new result with one old result) which maximizes this rate is kept (see Section 2.3.2). Global management of the local modifications After the resolution of all the conflicts, a global application of the modifications proposed by the refinement step is decided according to the improvement of the global agreement coefficient: 1 i m i=1 m
= where
1 i, j γ m − 1 j=1 m
i =
j =i
is the global concordance and quality rate of the result Ri with all the other results.
Multimed Tools Appl (2007) 35:1–27
9
Then a new iteration of the convergence step is started if the global agreement coefficient has increased and an intermediate unified result is calculated by combining all the results. Even if the local modifications decrease this global agreement coefficient, the solution is accepted to avoid to fall in a local maximum. If the coefficient is decreasing to much, all the results are reinitialized to the best temporary solution (the one with the best global agreement coefficient). Finally, once the methods can not solve their conflicts anymore, all the results have roughly the same number of classes and these classes are very similar. All the result can be combine together. 2.2 Combination of the results (Step 3) All the results tend to have the same number of classes which are increasingly similar. There are two cases: – –
It is possible to define a bijective correspondence function: it is then possible to apply a unifying technique as a classical voting algorithm; It is not possible: we have defined a new voting method to perform this combination of the results [27].
Relevant classes and nonconsensual objects It is possible to define two new concepts which are relevant classes and nonconsensual objects. The relevant classes correspond to the groups of objects of a same class, which were classified in an identical way in a majority of the results. Moreover, we can quantify this relevance by using the ratio of classifications that are in agreement. These classes are interesting to highlight, because they are, in the majority of the cases, the relevant classes for the user. Reciprocally, a nonconsensual object is an object that has not been classified identically in a majority of results, i.e. that does not belong to any of the relevant classes. These objects often correspond to the edges of the classes in the data space (for example in remote sensing image classification they may correspond to mixed pixels). 2.3 A synthetic example of local conflict resolution Suppose that the two results are ones shown by Fig. 4
C 11
C 22
C 12
C 21 2
C 24
C3 2
C6
1 C3
C 14
2 C5
a Result R1 Fig. 4 Two results at the beginning of local conflict resolution
b Result R2
10
Multimed Tools Appl (2007) 35:1–27
2.3.1 Evaluation and mapping To match the classes of R1 (Fig. 4a) with the classes of R2 (Fig. 4a), we calculate (Fig. 5): – The confusion matrix R1 to R2 and R2 to R1 – The repartion coefficient ρk1,2 of classes Ck1 in the classes of R2 (and, in the same way, all ρk2,1 ) – The coefficient ωk1,2 for all classes of R1 Then we can associate each class of R1 to one class of R2 and reciprocally, each class of R2 to one class of R1 . 2.3.2 Results refinement Figure 5 shows the values of the conflict importance coefficient. Now, we suppose that it is the conflict between C31 and its corresponding class C52 which is to be resolved. With pcr = 0, 5, the representative classes of C31 are C42 and C62 . Let N R1 and N R2 be the results after the application of the operators (Fig. 6). To evaluate local concordance and quality rate of each couple (R, R ) ∈ 1 {R , R2 } × {N R1 , N R2 }, (there are four possibilities) we calculate: – The confusion matrix R to R ; – The repartition coefficient ρk of all classes of R in the classes of R (and, in the same way, of all classes of R in the classes of R); – The coefficient ωk for all classes of R and R (Fig. 7). Suppose that the class quality criterion δki is defined as follows: – –
Dik with Ki It −I(Cki ) i Dk = It
δki =
Ki is the number of classes of Ri where It is the total inertia and I(Cki ) is the inertia of Cki .
a
b Fig. 5 Comparison between the two results
Multimed Tools Appl (2007) 35:1–27 Fig. 6 A conflict resolution: new results
11
NC 11
NC 22
NC 12
NC 21 2
NC 3 N C 24
1 NC 3
NC15
a
2
NC 5
NC 14
new result NR1
b
new result NR2
Figure 8 shows the (estimated) values of local concordance and quality rate with ps = pq = 0.5. Then we compute the local concordance and quality rate for each couple of results (Fig. 9). And, the results kept as result of this local conflict resolution are N R1 and R2.
3 Multi-step process The multi-step can be studied with two points of view: – –
Mono-level multi-step process; Multi-level multi-step process.
In the first case, the data is the same at all the steps of the knowledge discovery process: the methods work in a serial way (Fig. 1b). At a step, a method works using results from previous steps: patterns rejected by the previous classifier(s) (cascading [2, 9]), qualities of previous clusters, etc. In the second case, before each step of classification, the (new) data model as well as the data themselves are built. This construction depends on the application domain. For example in remote sensing, the first level of abstraction is the “pixel” level and the second level could be the “area” level. To be able to classify areas, it is necessary to describe the area model (for example each area is associated to an histogram of the classes of the pixels composing it, its size. . . ) but also to build the values of the areas. Consequently, the classification operations i.e. all the necessary
a
b
c Fig. 7 Comparison between the old results and the new results
d
12
Multimed Tools Appl (2007) 35:1–27
a
c
b
d
Fig. 8 A conflict resolution: concordance and quality
operators on the objects to classify them must be defined according to the data representation. In [28] we have shown the relevance of this approach in the domain of automatic extraction of urban zones from remote sensing images. In this paper, we only highlight the mono-level multi-step aspect. Our approach is different of classical approaches and is based on the gradual refinement of the parameters: after a multi-strategy clustering, the unified result is used to define new initialization of the individual methods. For example, we have defined this calculus for methods based on K initial seeds like K-means type algorithms and for conceptual clustering methods like Cobweb algorithm for example. Kmeans-like algorithms initialization The algorithm used for determining new parameters is obvious: – – –
First, small classes are removed according to a cardinal threshold chosen by the user; K is set as the number of classes in the unified result without small classes; The K new seeds are the centers of the unified classes.
Cobweb algorithm initialization In Cobweb algorithm, –
–
If the standard deviation estimated is 0, an infinite value is produced for the realvalued category utility function. To overcome this potential problem, Cobweb allows us to set the acuity setting, which is the minimum value for the standard deviation; If almost all the objects are different, which is almost always true in the case of the continuous attributes, the predictivity of the concepts is equal to 1 only for classes reduced to an object. In fact, the produced hierarchy is very deep. To overcome this potential problem, Cobweb allows us to set two setting: –
The cutoff setting which is the threshold for the predictivity of the concepts. Over this value, a concept is not specialised and the node is cut off;
Fig. 9 Local concordance and quality rate
Multimed Tools Appl (2007) 35:1–27
– –
13
The maximum depth setting which is the absolute maximum depth of the produced hierarchy;
The order the objects are treated can greatly impact the clustering: sometimes, the algorithm places two instances that are very similar and appear as the first input instances at opposite ends of the tree.
There is no direct relation known between the cutoff, acuity and maximum depth settings and the number and the quality of the classes obtained. Generally, users determinate the “good” values by a test-errors approach. Reciprocally, we did not find studies on how to modify the parameters according to the obtained result. Currently, we simply use the unified result: – –
To determinate a more adequate order to treat the objects. They are sorted by a round robin algorithm according to their class found in the previous step. To calculate a new acuity value. First, we calculate the average of the leaf acuities: to increase (resp. decrease) number of classes, we set the acuity setting with a lower (resp. higher) value.
4 Multi-sources data The current system could use different types of classification methods but they were always applied to the same data. All the learning agents worked with only one common data set. Nevertheless it would be interesting to have a system able to combine different sources of data and to extract knowledge from them. The multi-sources problem can be described as follow. There exists one “real” object O that can be viewed from different points of view, and we want to find one description of this object, according to all the different points of view (Fig. 10). Each view V i of the object is represented by a data set Di which is composed of many elements {Ei1 , . . . , EiNi }. And each element is described by a set of attributes.
Fig. 10 Multi-sources data
14
Multimed Tools Appl (2007) 35:1–27
Three different cases can happen (Fig. 11): 1. All the data sets Di contain the same elements and they are described by the same attributes, but they have different values in each data sets: for example two remote sensing images of a same region, from the same satellite, but at different seasons; 2. All the data sets Di contain the same elements but they have different description i.e. different attributes: for example two remote sensing images of a same region, having a same resolution, but from two different satellites with different sensors; 3. All the data sets Di contain different kind of elements with different attributes: for example two remote sensing images of a same region, but having a different resolution, and from two different satellites with different sensors. 4.1 A new approach for multi-description objects classification A first “classical” method to classify multi-description objects is to simply build a new unique description for each object, by merging all the attributes and then classify the objects according to these new descriptions, as shown in Fig. 12a. But this technique may produce many classes because the description of the object is to precise, i.e. has to many attributes. So it is hard to discriminate the objects. A second way to combine all the attributes (Fig. 12b) is to first classify the objects with each data sets. These classifications are processed independently. Then a new description of each object is build, using the class found by the first classifications.
Fig. 11 Different representations of a same real object
First case:
same resolution, same attributes, different values
D1
Second case:
D2 xs1 xs2 xs3
xs1 xs2 xs3
23
19
45
3
12
same resolution, different attributes, different values
D1
D2 xs1 xs2 xs3
23
Third case:
42
45
tm 1 tm 2 tm 3 tm 4
3
25
123 56
5
different resolution, same attributes, different values
D1
D2 xs1 xs2 xs3
23
45
3
tm 1 tm 2 tm 3 tm 4
25
123 56
5
Multimed Tools Appl (2007) 35:1–27
15
Fig. 12 Data fusion techniques
And finally a classification is produced using these new descriptions of the objects. The first phase of classifications enables to reduce the data space for the final classification, making it easier. In our approach (Fig. 12c), we use a method based on the second one presented above. Each data set is classified according to its attributes. But these classifications are not made independently and they are refined to make them converge towards a unique result. They are unified only after this refinement, by a voting method or a classification as in the second method. 4.2 Integration of the multi-sources data approach in our system To integrate this new approach of multi-sources data use in our system we have defined a multi-agent system in which each learning agent Ai integrates a classification method, a result I i which is an image. Each one can use a different data set, as shown in Fig. 13). Nevertheless, we are faced with the problem of the comparison of the different results, and precisely of the estimation of the local similarity coefficient of a class. All the process of results refinement stay unchanged. In the two first cases presented above (same elements with different descriptions) the local similarity coefficient previously defined can be used. But in the third case (different elements with different descriptions), it cannot be applied because it is based on the computing of a confusion matrix between two classes, which involves that the classes refer to the same objects. In general, such a coefficient is very hard, or even impossible, to define. In the next section we describe how this similarity could be evaluated in the domain of multi-scale remote sensing images classifications. 4.2.1 Multiscale remote sensing images classification In remote sensing image classification, the problem of the image resolution is not easy to resolve. One can have different data sets on a same region (Spot data, Landsat data, . . . ) but not with the same resolution. So it is really difficult to use this different data sets to obtain a better result, because they do not include the same objects to be classified (Fig. 14).
16
Multimed Tools Appl (2007) 35:1–27
Fig. 13 Integration of data set in the learning agents
data set
classification method initial parameters
...
result
LEARNING AGENT 1
classification method initial parameters result
LEARNING AGENT m
data set
data set classification method initial parameters
classification method initial parameters
...
result
LEARNING AGENT 1
result
LEARNING AGENT m
An easy solution is to rescale all the images to a unique size. Thus, if we have n images {Ii }i=1...n , each with a different size wi × hi , it is easy to build n new images {Ii∗ }i=1...n having the same size W × H with W = lcm(wi ) and H = lcm(hi ). Then the system can be used without any modifications and without any loss of information. The problem is that remote sensing images are very important (for
Fig. 14 Comparison of objects at different scale
? Classification results atdifferent resolutions
” Real” world
Multimed Tools Appl (2007) 35:1–27
17
example : Strasbourg, Spot image 512 × 512 pixels, that is 262,144 objects to be classified, described by three attributes). So if we use this technique, the data sets to classify would be too large. To use the first method presented above (merging of all the attributes) we have to rescale all the images and then the cost of the classification can be very important because the number of objects is increased and the number of attributes for each one is important. The first classifications of the second technique could be made using different objects. The problem is then to produce the final classification because all the results contain different objects. We have the same problem with our system. But a simple way to avoid it, consists of a new definition of the confusion matrix between two results. Previously each line i, j of the confusion matrix was given by the confusion vector αk of the class Cki from Ri compared to the n j classes in R j : i, j i, j αk = αk,l
j
i, j
l=1,...,n j
, where αk,l =
|Cki ∩ Cl | |Cki |
When two results have not been computed from the same data set and when the j resolution of the two images are not the same, it is impossible to compute |Cki ∩ Cl |. So we propose a new definition of the confusion vector for a class Cki from Ri compared to the result R j. Let ri and r j be the resolution of the two images I i and I j corresponding respectively to Ri and R j. Let λ I i ,I j be a function that associate each pixel of the image I i to one pixel of the image I 1 , with ri r j (Fig. 15); Two segmentations, S1 et S2 , were achieved: the first (S1 ) with three instances of the Kmeans algorithm, with 6, 10 and 14 initial nodes; the second one (S2 ) was obtained by five instances of the Kmeans algorithm, with 4, 7, 10, 13 et 16 initial nodes. In both cases, the initial nodes were randomly chosen among the data. The different coefficients found by the system are presented below: Segmentation S 1 M 11 M 12 M 13
n i initial 6 10 14
Γ i initial 0.53 0.48 0.46
n i final 5 5 5
Γ i final 0.76 0.80 0.79
Γ = 0.78 Segmen tation S 2 M 21 M 22 M 23 M 24 M 25
n i initial 4 7 10 13 16
Γ i initial 0.48 0.52 0.50 0.48 0.44
n i final 4 5 5 5 5
Γ i final 0.74 0.81 0.81 0.78 0.82
Γ = 0.79
18
Multimed Tools Appl (2007) 35:1–27
Fig. 15 Association function
I1
I2
I 1 ,I 2 (p)
Let #(C, I i , I j) = |{ p ∈ C : cl(λ I i ,I j ( p)) = cl( p)}| where cl( p) is the class associated to the pixel p. i, j Then we calculate each αk,l as following: i, j
if ri r j then αk,l =
#(Cki , I i , I j) |Cki |
j
else if #(Cl , I j, I i ) × r j < |Cki | × ri j
i, j
then αk,l =
#(Cl , I j, I i ) × r j |Cki | × ri |Cki | × ri
i, j
else αk,l = i, j
Properties: 0 αk,l 1 and 0
n j l=1
j
#(Cl , I j, I i ) × r j
i, j
αk,l 1 i, j
With this new definition of the confusion vector, the distribution coefficient ρk of class Cki from result Ri compared to the result of R j conserve all its properties. i, j ρk characterizes the distribution expressed by the confusion vector of class Cki compared to the result R j. The more the distribution coefficient is closer to 1, the i, j more Cki is included in one single class from R j. On the other hand, the more ρk is i j closer to 1, the more the objects of Ck are scattered in the classes from R . So the results are evaluated by computing all the coefficients, but using this new definition of the confusion vector. In the same way, the conflicts resolution phase is unchanged.
Fig. 16 Results of the two segmentations
S1
S2
Multimed Tools Appl (2007) 35:1–27
19
Table 1 Confusion matrix between the two results of S1 and S2
S1 C01 S2
C02 C12 C22 C32 C42
C11
C21
C31
8,881 197
26 5,452
C41
14,283 8,227 937 741
1,381 3,075
But during the phase of unification of the results the voting algorithm can not be applied, because of the resolution difference among the classification results. In order to build a unique image representing all the results, we must choose a resolution for this image. Then all the results must be rescaled to this resolution, and the voting algorithm can be applied. To rescale the classification results, the system uses the association function λ I i ,I j (Fig. 15) and applies it on all the pixels of the new image if its resolution is greater, or on all the pixels of the old image if not.
5 Experiments 5.1 Automatic medical image segmentation Our system was used to segment a medical image. The data was extracted from a sequence of images of slices of a human brain, obtained by magnetic resonance. Five attributes were assigned to each pixel of the image: – –
Its own value (grayscale between 0 and 255); The value of each of its 4-neighbors.2
As shown in the two results presented in Fig. 16 (final results of the two hybrid classifications), the two segmentations S1 et S2 are visually very similar. Moreover they both have five classes that are very similar (as shown by the confusion matrix between the two results, presented in Table 1) and no result differs too much compared to the others (0.74 i 0.82). We can suppose that these five classes are relevant, because they were found by two different learnings, initialized differently. This proves that the importance of the initial choices is decreased because of the collaboration between different classification methods. We can notice that eventhough the two segmentations did not evolve in the same way, they both found very similar results. Figure 17 shows the evolution graphs of the percentage of pixels classified identically by all the method occurrences at each step of each hybrid learning. The segmentation S2 (with five agents) started with a less relevant result than S1 (27% of pixels classified identically against 57%) but it converged faster (12 refinement steps against 14).
2 See
this information not like a spatial information but only as a (very) basic textural information.
20
Multimed Tools Appl (2007) 35:1–27
Fig. 17 Evolution graph of the pixels classified identically by all the method occurrences during each segmentation
5.2 Per-pixel classification of remote sensing images In order to make our method usable for remote sensing images analysis, we implemented the MuStIC system developed in the Fodomust project.3 It was initially intended for the geographers and ecological experts to classify pixels from remote sensing images. It integrates some unsupervised classification tools for images using K-means [14], Cobweb [7] or S.O.M algorithms [18], and the Samarah module which implements the collaborative method using all of the algorithms below. We have applied this system to remote sensing data from the city of Strasbourg (France): Spot 4 data with three channels (XS1, XS2, XS3) at standard resolution (200×250 pixels − 20 m/pixel) (Fig. 18) We carried out two series of tests: 1. Because the inflexion of the curve of empirical average (Fig. 23), we have configured our system (ten experimentations) to find about six classes. 2. According to the geographers from the LIV,4 we have configured (15 experimentations) our system to find about ten classes. First we present a test with six classes expected. The unsupervised classification methods used are : – – – –
M1 : K-means with four initial random nodes; M2 : K-means with eight initial random nodes; M3 : SOM with a 4×4 map; M4 : Cobweb with an acuity of 7.5. We have obtained the results5 below:
– – – –
R1 with four classes; R2 with eight classes; R3 with 14 classes; R4 with 27 classes.
3 http://lsiit.u-strasbg.fr/afd/sites/fodomust 4 Laboratoire 5 In
Image et Ville—UMR 7011 CNRS/ULP, Strasbourg.
all the results, the colors have been randomly chosen by the authors.
Multimed Tools Appl (2007) 35:1–27
21
Fig. 18 Spot image of Strasbourg
These results have been refined according to our multi-strategical algorithm. We obtained the following results (Fig. 19): – – – –
R1 with five classes; R2 with six classes; R3 with five classes; R4 with four classes.
Fig. 19 The four classification results
22
Multimed Tools Appl (2007) 35:1–27
Fig. 20 The unified result
We applied to these results our multi-view voting algorithm and obtained the unifying result presented on Fig. 20a. This result is composed of five different classes.We present on Fig. 20b the voting result for all the objects: – – –
In white : all the methods agreed on the classification; In gray : one method disagreed with the other ones; In black : the non consensual objects (two results or more classified these objects differently).
Secondly, we present a test with ten classes expected. The unsupervised classification methods used are: – – –
M1 : K-means with eight initial random nodes; M2 : K-means with ten initial random nodes; M3 : K-means with 12 initial random nodes.
Figure 21 shows (a) the initial unified classification before collaborative refinement with 15 classes and (b) the final unified classification with ten classes. In supervised classification, it is quite easy to define a classification quality by measures such as accuracy and precision. In unsupervised classification, the evaluation of
Fig. 21 Classified image
Multimed Tools Appl (2007) 35:1–27
23
Fig. 22 Evolution of the number of classes Number of classes
15 14 13 12 11 10 0
5
10
15
20
25
Step
Fig. 23 Empirical inertia average
9000 8000
Inter-class inertia
7000 6000 5000 4000 3000 2000 1000 0
2
4
6
8
10
12
14
16
18
Number of classes
Fig. 24 Intra-class inertia evolution
1600 +2 σ
Inter-class inertia
1500
Empirical average
1400
-2 σ
Result Inertia
1300 1200 1100 1000 900
0
5
10
15
20
25
Step #classes
15
14
12
11
10
24
Multimed Tools Appl (2007) 35:1–27
Table 2 Inertia of the result after the second step No. of classes
Inertia
Empirical average
First test (Six classes expected) 6 7
2,302 2,075
2,305
2,338
2,319
2,401
2,327
2,331
2,304
1,384 1,290
1,373 1,270
1,382 1,275
1,375
1,401
2,329
2,401 2,080
Second test (Ten classes expected) 10 11 12
1,465 1,292 1,156
1,377 1,293
1,408 1,244
1,502 1,373 1,282
quality is really a hard problem. Today, no real evaluation process has been proposed [11, 12]. There exists many different statistical measures that can be used to have an idea of the quality of our results. The most frequent cluster validation index proposed in the literature is the inter-classes inertia (and cluster compactness [15] and the Xie–Beni index [31]). In order to evaluate the quality of our results, we first use Kmeans to evaluate the intra-classes inertia according to the number of classes (Fig. 22). We calculated an empirical average for inertia for each number of classes (see Fig. 23): we carried out the algorithm 200 times for each number of classes with a random initialization of the centers. In this test, all the computations needed about 25 steps to propose their final number of classes (see Fig. 22). For each unified result, we have carried out a second step. First, small classes of the unified result are removed according a cardinal threshold chosen by user. Then a single Kmeans algorithm have been performed with K was been set as the number of classes in the unified result without small classes and the K new seed was been the centers of unified classes. Figure 24 shows, for a test, the intra-class inertia evolution of the unified result after this second step. We can see that the inertia of the new result is always better than the empirical average for a same number of classes. Table 2 shows that, at each experiment, the inertia of the new result is better than the empirical average for a same number of classes.
6 Conclusion We presented a new process of collaborative multi-strategy classification of complex data which enables us to carry out an unsupervised multi-strategy classification on complex data, giving a single result that is the combination of all the results suggested by the various classification methods and which the data model can integrate any type of attributes (numerical, symbolic or structured). Moreover, any type of unsupervised classification method can easily be integrated into the system. This new methodology of combination of classifiers enables many classification methods to collaborate, and allows them to refine automatically and mutually their results. This enables them to converge towards a single result (without necessarily reaching it), and to obtain very similar classes. Doing this, it is possible to put in
Multimed Tools Appl (2007) 35:1–27
25
correspondence the classes found by the various methods and finally to apply an unification algorithm like a voting method for example. This way, we can give to the user a single result representing all the results found by the various methods of unsupervised classification. Within the framework of the research on this approach, we were brought to study the theoretical bases of the integration of classification methods and the unification of classification results. On one hand, we proposed the concepts allowing the combination of unsupervised classification methods extending the results on the combination of supervised methods. And on the other hand, we presented a new theoretical approach to distributed multi-strategy classification, not based on the fusion of methods but on the collaboration, inspired by the multi-agent paradigm. This approach is not specialized in a particular domain and allows to integrate directly any unsupervised classification method, without modification. The definition of this collaboration gave place to a theoretical study and the definition of an objective method of conflict resolution, a new concept that we introduced to represent the dissensions of classification between the methods used. To quantify these disagreements, we introduced a new criterion, called similitude, which can be used as well with numerical data as with symbolic attributes, since it is based on the recovery of classes and not on a distance. This means our approach can be applied to symbolic data (obviously, only if the integrated classification methods used in this case are able to treat such data). With this criterion, we proposed a new definition of the concept of relevant classes based on the similitude, we gave a method to determinate these classes within the framework of unsupervised hybrid classification. We also proposed a theoretical solution to the problem of automatic unification of unsupervised classification by extension of the traditional voting methods. Finally we have presented a first approach to use multi-sources data in the case of remote sensing images classification. This proposition deals about images at different resolution but from the same type. Our current research focuses on the possibility of the collaboration of various classification methods, where each one uses a different model from the data. In the case of remote sensing image for example, we are extending our approach to use various data sources on the same zone (radar, radiometry, photo. . . ). Moreover, we are interested by the integration of domain knowledge (ontology, training, ...) in our system to improve each step of the collaborative mining.
References 1. Alpaydin E (1998) Techniques for combining multiple learners. In: Alpaydin E (ed) Proceedings of Engineering of Intelligent Systems’98 Conference, vol 2, pp 6–12 2. Alpaydin E, Kaynak C (1998) Cascading classifiers. Kybernetika 34(4):369–374 3. Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36:105–142 4. Binaghi E, Pepe M, Radice F (1997) Multistrategy fuzzy learning for multisource remote sensing classifiers. In: Proceedings of the International Society for Optical Engineering, vol 3217, pp 306–317 5. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
26
Multimed Tools Appl (2007) 35:1–27
6. Breiman L (1998) Arcing classifiers. Ann Stat 26(3):801–849 7. Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2:139–172 8. Frigui H, Nasraoui O (2004) Unsupervised learning of prototypes and attribute weights. Pattern Recogn 34:567–581 9. Gama J (1998) Combining classifiers with constructive induction. In: Proceedings of the 10th European Conference on Machine Learning (ECML-98). LNAI, vol 1398. Springer, Berlin, pp 178–189, 21–23 April 1998 10. Grabmeier J, Rudolph A (2002) Techniques of cluster algorithms in data mining. Data Min Knowl Disc 6(4):303–360 11. Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intel Inf Syst 17(2,3):107–145 12. Halkidi M, Batistakis Y, Vazirgiannis M (2002) Cluster validity methods. SIGMOD 31(2):40–45 13. Hammadi-Mesmoudi F, Korczak JJ (1995) An unsupervised neural network classifier and its application in remote sensing. In: Proceedings of International Conference on Image Processing 14. Hartigan J (1979) Algorithm AS136: a k-means clustering algorithm. 28:100–108 15. He J, Tan A-H, Tan C-L, Sung S-Y (2002) On quantitative evaluation of clustering systems. In: Wu W, Xiong H (eds) Information Retrieval and Clustering. Kluwer, Boston, MA 16. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3): 264–323 17. Kittler J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239 18. Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480 19. Korczak JJ, Blamont D, Ketterlin A (1994) Thematic image segmentation by a concept formation algorithm. In: Proceedings of the European Symposium on Satelite Remote Sensing, Rome 20. Korczak JJ, Rymarczyk M (1993) Application of classical clustering methods to digital image analysis. Technical report, CRI 21. Paredis J (1997) Coevolving cellular automata: be aware of the red queen! In: 7th Int. Conference on Genetic Algorithms (ICGA 97), pp 393–400 22. Quinlan JR (1996) Boosting first-order learning. In: Arikawa S, Sharma AK (eds) Proceedings of the 7th International Workshop on Algorithmic Learning Theory. LNAI, vol 1160. Springer, Berlin, pp 143–155, 23–25 October 1996 23. Roli F, Giacinto G, Vernazza G (2001) Methods for designing multiple classifier systems. In: Lecture notes in Computer Science, vol 2096 24. Schapire RE (1999) Theoretical views of boosting. In: Fischer P, Simon HU (eds) Proceedings of the 4th European Conference on Computational Learning Theory (COLT-99). LNAI, vol 1572. Springer, Berlin, pp 1–10, 29–31 March 1999 25. Ting KM, Low BT, Witten IH (1999) Learning from batched data: model combination versus data combination. Knowl Inf Syst 1(1):83–106, February 1999 26. Ting KM, Witten IH (1999) Issues in stacked generalization. J Artif Intell Res 10:271–289 27. Wemmert C, Gançarski P (2002) A multiview voting method to combine unsupervised classifications. In: Proceedings of the 2nd IASTED International Conference on Artificial Intelligence and Applications, AIA 2002, pp 447–453 28. Wemmert C, Gançarski P (2002) Urban thematical zones construction from remote sensing data by unsupervised classification. In: Proceedings of the 23rd Urban Data Management Symposium, UDMS 2002 29. Wemmert C, Gançarski P, Korczak JJ (2000) A collaborative approach to combine multiple learning methods. Int J Artif Intell Tools 9(1):59–78, March 30. Wolpert DH (1990) Stacked generalization. Technical Report LA-UR-90-3460, Complex Systems Group, Theoretical Division, and Center for Non-linear Studies, Los Alamos, NM 31. Xie X, Beni G (1991) A validity measure for fussy clustering. IEEE Trans Pattern Anal Mach Intell 13(4):841–847
Multimed Tools Appl (2007) 35:1–27
27
Pierre Gançarski received his Ph.D. in Computer Science from the Strasbourg University (France) in 1988. He has been an associate professor of Computer Science at Strasbourg University since 1992. His current research interests include collaborative multi-strategical clustering and feature weighting for clustering with applications to complex data mining and remote sensing analysis.
Cédric Wemmert received his Ph.D. in Computer Science from the Strasbourg University (France) in 2000. He has been an associate professor of Computer Science at Strasbourg University since 2001. His current research interests include collaborative multi-strategical clustering with applications to complex data mining and remote sensing analysis.