A New Meta-Classifier Massimo Buscema Stefano Terzi Semeion Research Center Rome, Italy {m.buscema;s.terzi}@semeion.it Abstract – A taxonomy for classifying classifiers is presented. A new meta-classifier, Meta-Consensus, with a foundation in both consensus theory and the theory of independent judges, is introduced.
William Tastle Department of Management Ithaca College, Ithaca, New York 14850 USA
[email protected]
P
N
k
i
Net j = ∑∑ Outik ⋅ wik, j
MetaOut j =
Introduction It is the intention of a classification system to perform the task of classifying some object and to do so with a reasonable degree of accuracy. There exists today a rather extensive listing of meta-classifiers developed around specialized algorithms to satisfy certain classification schemes. This has led to the creation of a vast library of available instruments from which an investigator must make a choice, with each classifier possessing a particular typology. While one type of classifier might yield excellent results in one situation, it might also yield dismal results when applied to another.
Proposed algorithm: Meta-Net MetaClassifiers
e N
Net j
∑e
Neti
i
where k is an index of P classifiers, j is the index of the k output unit, Outi is the output node of each single classifier k, and MetaOut j is the j output node of the th
Meta-Net. All Meta-Nets are unsupervised. Each one evaluates its own output without knowing its composing classifiers errors; it only knows the statistic of their responses. So, Meta-Nets are strongly sensitive to the quality of classifiers to be optimized. This means that each Meta-Net, to be considered excellent, should be composed of classifiers in which the confusion matrix, in blind testing, clearly respects the following equation: N −1
∀k , k ∈ P : aik,i − ∑ aik, j > 0;
General Properties The fundamental characteristic of the Meta-Net [5] consists of considering not only the “positive credibility” of its composing classifiers (i.e., “this pattern is white”), but also their “negative credibility” (i.e., “this pattern is not white”). So, the characterizing connection of the Meta-Net is to connect each output node of each composing classifier with each output class. “Complete grid” connections are planned between Meta-Net inputs and outputs, and each connection can be either excitatory (positive numbers), or inhibitory (negative numbers). Between 1994 and 2008 Semeion researchers conceived and developed a series of Meta-Classifiers [3] based on some common traits and called them “Meta-Nets.” All Meta-Nets have typically similar neural network architecture [1, 5]: certain input nodes are the whole outputs of all composing classifiers, and certain output nodes are the output classes of the classification problem. The connections between Meta-Net inputs and outputs always possess a complete grid structure and are defined by specific algorithms characterizing the Meta-Net peculiarities. The Meta-Net output vector is calculated from the probabilistic equation Soft Max [4]:
j ≠i
where
aik, j is a generic cell of the confusion matrix.
However, in the tests that follow we shall verify that this condition, if not properly respected, will produce a “very smooth” fall of Meta-Net capacities in accordance with the typical characteristics of artificial neural networks (ANNs). Each connection value represents the plausibility trough in which every component classifier supports every classification node of the Meta-Net. The numerical value of each Meta-Net connection can belong to the interval between –• (implausibility) and +• (plausibility). The plausibility and the implausibility of each connection is a function of the probability of each Meta-Net component during the testing phase.
The Nomenclature of Topology
aik, j is the row i, column j element from the confusion th
matrix of the k classifier N is the dimension of the confusion matrix M is the number of classifiers
978-1-4244-7858-6/10/$26.00 ©2010 IEEE
wik, j is the value of the weight connecting output th
node i of the k base classifier to output node j of metanet
I kj is the output node j of the kth base classifier Label is the winner class (output of metaclassifier)
the Meta Net depends not only by the sensitivity of the considered classifier, but also by its precision. In other words, each weight of Meta Net is the result of a function composed by the sensitivity and by the precision of each cell of the confusion matrix generated in test phase for each basic classifier.
(
)
wik, j = f Rik, j , Cik, j . Legenda:
Weight definition
k
Weights are estimations based on the performance of base classifiers evaluated on an independent testing set. The results are summarized and used in the confusion matrix. From a mathematical perspective, the common feature of the all Meta-Net algorithms is the specific procedure through which the plausibility of each output of any classifier is connected to each output of the global Meta-Classifier. To explain this procedure we need to start from the analysis of the confusion matrix of one classifier [5]:
Classifier k
Output
Target
⎛ a11 ... a1N ⎞ ⎜ ⎟ ⎜ ... ... ... ⎟ ⎜a ⎟ ⎝ N 1 ... aNN ⎠
aik, j .
k
Ci , j = precision of the cell i,j in the k-th basic classifier; f () = typically a fuzzy function; k
wi , j = value of the weight between the j-th output of the k-th classifier and the i-th output of Meta Net.
The function composing the sensitivity and the precision of each weight of Meta Net can be a simple fuzzy rule, like the following:
{
}
wik, j = min Rik, j , Cik, j . Or a more complex fuzzy rule like this one:
(
) ((
)(
wik, j = Rik, j + Cik, j − 1 − Ri , j k ⋅ 1 − Ci , j k
In this matrix we need to distinguish four criteria for each cell,
Ri , j = sensitivity of the cell i,j in the k-th basic classifier;
The first criterion represents the
)).
Meta Consensus, presented in this paper (see equations 15), is a particularly suitable and effective new fuzzy function composing sensitivity and precision. The Meta Consensus function was explicitly inspired by Consensus Theory [6-7].
th
“Rights,” that is, the plausibility by which the k classifier considers correct the records classified in the k i, j
cell v
with respect to the summation of Targets: k i, j
R =
aik, j N
∑a
k i, j
j
The second criteria represents the “Corrects,” that is, th the plausibility by which the k classifier considers “correct” the records classified in the cell
vik, j with
respect to the (column) summation of outputs: k i, j
C =
aik, j N
∑a
k i, j
i
The third criteria is a correlation of the “Rights” to the probability that any specific output depends on a specific Target:
p kj ,i = p(O kj | Ti k ) .
The fourth criteria is a correlation of the “Corrects” to the probability that any specific Target comes from a specific Output:
pik, j = p(Ti k | O kj ) .
Every weight connecting the output of each basic classifier (that is, the Meta Net input) and the output of
Both the R and C matrices give additional information to the Meta-Classifier for the purpose of the weighting of each base classifier. The intention is to provide increased accuracy to the Meta-Classifier. Traditionally, the combination of the outputs from the base classifiers has been done with weighted averages and these weights have been determined by the main diagonal of R and C. By limiting the weight calculations to the diagonal omits potentially important additional information and hence, the precision of a value does not necessarily indicate conciseness of accuracy, and here is where the MetaClassifier gains it value. It utilizes all the information available in the entire matrix to determine the weights of the Meta-Classifier. Referring to Kuncheva’s [2] work, given L number of classifiers and c number of classes, we can have three types of weighted averages depending on the number of weights. First, we can have L weights in which each classifier has exactly one weight; second, we can have L * c weights in which there is one weight per class, and third we can have L * c * c weights which represent a complete connection between the outputs of the base classifiers and the outputs of the Meta-Classifier. The Meta-Net algorithm uses this third method to take into account the possibility of how much a single base classifier might render a wrong decision. It is important to understand the meaning of the R and C values that are off the main diagonal. For the R matrix the values represent the number of times the base classifier answered i when the answer should have been j, and the C
matrix is the “precision” of the confusion between i and j; simply stated, C informs us that from among all the times the classifier answered class i (correct and incorrect decisions) the percentage of correct decisions was actually class j. In other words, Meta Consensus (and all the Meta Net algorithms) additionally considers the inhibitory credibility of each basic classifier. This is the case when the weight pushes Meta Consensus to change opinion in relation to the classification suggested by the basic classifier. An example: suppose the basic classifier confuses the correct class A with the incorrect class B 30 times out of 100, but let us also suppose that the basic classifier makes a systematic mistake confusing class A with class B. At this point, the weight connecting class B of the basic classifier with class A (the correct one) of Meta Consensus will be strong, while the weight connecting class B of the basic classifier with class B of Meta Consensus will be weak. And, consequently, Meta Consensus is also able to correct many systematic errors of classification generated by its basic classifiers. Armed with this theory we can now proceed to a description of the equations.
In the same manner that the missing information is calculated for the rows, equation 4 captures the missing information from the column: k Ci ⎛ k k C − a − k k i i, j ⎜ Ci − ai , j N k k ⎜ ⋅ − log fi , j = C 2 i k 2 ⋅ ( N − 1) Ci ⎜ ⎜ ⎝
⎞ ⎟ ⎟ ⎟ ⎟ ⎠
(4)
which is combined to yield the overall weight (equation 5) acting on the Meta-Classifier:
⎛ mik, j ⋅ f i ,kj w = − ln ⎜ k k ⎜ r ⋅c ⎝ i, j i, j k i, j
⎞ ⎟⎟ ⎠
(5)
These are now the weights used to modify the MetaClassifier nodes.
⎧N M ⎫ Label = Arg Max ⎨∑∑ I kj ⋅ wik, j ⎬ i ⎩ j k ⎭
(6)
Specific weight equation on the Confusion Matrix
Experimentation
Meta-Consensus
Experimental setup
For each output cell in the base classifiers a weight as calculated on a Meta-Classifier input node. Given k input nodes in which i and j are subscripts that identify the column (precision) and row (sensitivity) values, equation 1 gives the weight provided by the row calculation.
As base classifiers we chose to utilize different typologies and in this way we can reasonably expect to have high variability. The set includes CART, a decisional tree (TREE) [8], a K Nearest Neighbor (KNN), a BackPropagation (BP) [9] neural network, a Sine Network (SN) [10], a Support Vector Machine (SVM) [11], a Bayesian linear classifier (LDC), the Bayesian “naive” classifier (NAIVEBC), and a quadratic Bayesian classifier (QDC).
⎛ aik, j − Rik N k r = k ⋅ log 2 ⎜ Ri − ⎜ Ri 2 ⋅ ( N − 1) ⎝ k i, j
aik, j
⎞ ⎟ ⎟ ⎠
(1)
A similar weight calculation (equation 2) is made based on column values:
c
k i, j
⎛ aik, j − Cik N k = k ⋅ log 2 ⎜ Ci − ⎜ Ci 2 ⋅ ( N − 1) ⎝ aik, j
⎞ ⎟ ⎟ ⎠
(2)
The data that are missing from these calculations of weight are addressed in an additional weight equation that captures this missing information. Note that from the sum of the rows is subtracted the individual value from the confusion matrix classifier (equation 3) to yield the remaining information that is also used to calculate the weight: k Ri ⎛ k k R − a − k k i i, j ⎜ Ri − ai , j N k k ⎜ mi , j = ⋅ R − log i 2 k Ri 2 ⋅ ( N − 1) ⎜ ⎜ ⎝
⎞ ⎟ ⎟ ⎟ ⎟ ⎠
(3)
With respect to meta-classifiers we propose, for comparison purposes, to compare our model with the other models: Wernecke [12], Dempster Shafer Combination [13], Behavior Knowledge Space Method (BKS) [14], Majority Vote (MajVote), Clustering and Selection [15], Direct Knn Decision Dependent (DynDdDirectKnn) [16], Fuzzy Integral [17] and some simple methods of output fusion, i.e., Naïve Bayesan Combiner (BayesComb), Simple Mean, Median, Minimum and Product. We have also used two classifiers that do not use the base classifier trained, rather create by themselves the classifier ensemble: ArcX4 [18] and AdaBoostM1 [19]. For the experiment we use five datasets, two of these are Semeion Research Centre property and come from experimental activities on real problems: Digits and Faults [20]. The other three come from the UCI Repository [21] and are used in machine learning area to evaluate the different algorithms performances (DNA, Letters and Segment).
Experimental Results
SEGMENT dataset UCI: Statlog (Image Segmentation) Data Set
DIGITS Dataset Recognition of handwritten numeric characters. The dataset is composed of 1594 digits handwritten by different subjects in different situations and codified in a 256 bit streak corresponding to a 16 x 16 grid. The objective is to classify each grid into the corresponding digit, 0 of 9. In this dataset the presence of a clear advantage in using meta-classifiers in not clear because the result of the best classifier (PARZENC) is quite similar to the best meta-classifiers [Table I].
FAULTS dataset Every dataset record represents a superficial fault of a stainless steel leaf. There are 6 different typologies of faults. The fault description is constituted by 27 indicators representing the geometric shape of the fault and its contour. We have 1941 records in total. The Faults’ dataset underlines, in a well-marked way, the efficacy in using meta-classifiers and is further illustrative of the evidence that serves to establish the quality of Meta-Consensus as a classifier. While it occupies the third position in terms of weighted average, it is the first of the set that does not create a base classifier (Arcing, Boosting) but rather, uses the available base classifiers [Table II].
DNA dataset From UCI Repository: Molecular Biology (Splicejunction Gene Sequences) Data Set.
Splice junctions are points on a DNA sequence at which `superfluous' DNA is removed during the process of protein creation in higher organisms. The problem posed in this dataset is to recognize, given a sequence of DNA, the boundaries between exons (the parts of the DNA sequence retained after splicing) and introns (the parts of the DNA sequence that are spliced out). This problem consists of two subtasks: recognizing exon/intron boundaries (referred to as EI sites), and recognizing intron/exon boundaries (IE sites). (In the biological community, IE borders are referred to a ``acceptors'' while EI borders are referred to as ``donors''). Details on the dataset can be found at http://archive.ics.uci.edu/ml/machine-learningdatabases/molecular-biology/splice-junction-genesequences/splice.names [Table III].
LETTERS dataset UCI: Letter Recognition Data Set The objective is the identification of a great number of boxes containing white and black pixels representing one of the 26 letters of the English alphabet. The characters are extracted by 20 different fonts and distorted in random way producing 20,000 different characters. Each character has been codified with 16 numeric attributes, scaled on 16 integer values from 0 to 15 [Table IV].
The records have been randomly extracted from a database of seven outdoor pictures. These images have been manually sectioned to create a classification for each pixel. Each record represents a 3 x 3 region. There are seven classes (1 = brick face, 2 = sky, 3 = foliage, 4 = cement, 5 = window, 6 = path, 7 = grass). Every region is characterized by 19 measures on the color image. There are 2310 records. Even in this dataset the good performances of MetaConsensus are evident [Table V].
SATIMAGE dataset UCI: Statlog (Image Segmentation) Data Set The database consists of the multi-spectral values of pixels in 3 x 3 neighborhoods in a satellite image, and the classification associated with the central pixel in each neighborhood. The aim is to predict this classification, given the multi-spectral values. Results are reported on Table VI.
AGGREGATE RESULTS To synthesize the results on the single datasets and globally evaluate the classifiers’ performances we calculate the positions (ranking) of each classifier compared to the five datasets and calculate the average ranking. The results of this calculation are reported in table VII (see below). The results make clearly evident the quality of the meta-classifier proposed in this paper, and the fact that it has the first position and is the only one ensemble able to overcome SVM suggests that the criteria that inspired this algorithm is particularly useful for the fusion of the results of a set of classifiers.
Conclusion and Future works Meta-Consensus is a meta classifier conceived in the framework of the Meta Nets algorithms [5]. The main features of this algorithm are two: • Every weight connecting the output of each basic classifier (that is, the Meta Net input) and the output of the Meta Net depends not only on the sensitivity of the considered classifier, but also by its precision. • Each cell of the confusion matrix of every basic classifier will generate a specific weight connecting all the outputs of the basic classifiers to all the outputs of the Meta Net. The weights matrix of Meta Consensus is a full matrix, where the “residuals” (the cells outside the main diagonal) also play a role in the final classification. In other words, Meta Consensus (and all the Meta Net algorithms) additionally considers the inhibitory credibility of each basic classifier. These features make Meta Consensus more effective than the other Meta Classifiers as this benchmark study has clearly shown. With these observations it may now be necessary to also take into account a possible change of philosophy in the Meta Classifiers literature: local sensitivity, local precision and small residuals can dramatically increase the
amount of information available upon which complex decisions can be made. To paraphrase an often used phrase: The devil is in the detail. TABLE I. RANKING OF RESULTS ON DIGITS BY WEIGHTED MEANS.
WeightedAverage
96.63%
96.64%
21.4
2
BayesComb
96.42%
96.61%
21.6
3
DempsterShafer
96.63%
96.61%
21.6
3
QDC
96.51%
96.45%
22.6
5
Meta-Consensus
96.19%
96.39%
23.0
6
SVM
95.94%
96.23%
24.0
7
MajVote
96.17%
96.04%
25.2
8
DynDdDirectKnn
95.2%
95.32%
29.8
9
Mean Predict
AMean
WMean
Errors
Rank
QDC
95.79%
95.79%
13.4
1
SVM
95.73%
95.73%
13.6
2
MajVote
95.27%
95.29%
15.0
3
ClusteringAndSelection
94.92%
94.82%
33.0
10
BayesComb
94.81%
94.79%
16.6
4
AdaBoostM1
94.3%
94.73%
33.6
11
DempsterShafer
94.78%
94.79%
16.6
4
DBD
94.34%
94.51%
35.0
12
Meta-Consensus
94.78%
94.79%
16.6
4
ArcX4
94.19%
94.48%
35.2
13
Wernecke
94.76%
94.79%
16.6
4
NaiveBayes
93.55%
94.04%
38.0
14 15
DecisionTemplate
94.41%
94.41%
17.8
8
FuzzyIntegral
93.95%
94.01%
38.2
DynDdDirectKnn
94.40%
94.41%
17.8
8
BP
93.35%
93.69%
40.2
16
ClusteringAndSelection
93.38%
93.41%
21.0
10
LDC
94.41%
93.41%
42.0
17
Parzen
92.17%
92.22%
24.8
11
SN
92.77%
93.19%
43.4
18
DBD
90.76%
90.77%
29.4
12
TREE
92.21%
93.03%
44.4
19
KNN
90.50%
90.59%
30.0
13
Wernecke
88.92%
89.83%
64.8
20
LDC
90.26%
90.27%
31.0
14
KNN
87.12%
88.04%
76.2
21
BP
89.56%
89.58%
33.2
15
Parzen
78.45%
73.63%
168.0
22
SN
88.88%
88.89%
35.4
16
NaiveBayes
85.01%
84.99%
47.8
17
FuzzyIntegral
83.36%
83.49%
52.6
18
Mean Predict
A.Mean W.Mean Error Ranking
TREE WeightedAverage
73.23% 61.09%
73.32% 61.2%
85.0 123.4
19 20
SVM Mean
97.87% 97.89%
84.60
1
Meta-Consensus
97.60% 97.62%
95.4
2
mcWeightedAverage
97.48% 97.49% 100.20
3
mcClusteringAndSelection 96.98% 96.98% 120.80
4
mcDempsterShafer
96.65% 96.66% 133.40
5
mcDecisionTemplate
96.62% 96.63% 134.80
6
96.45% 96.46% 141.60
7
TABLE IV. RANKING OF RESULTS ON LETTERS BY WEIGHTED MEANS.
TABLE II. RANKING OF RESULTS ON FAULTS BY WEIGHTED MEANS. Mean Predict
A.Mean
W.Mean
Errors
Rank
ArcX4
80.35%
79.50%
79.6
1
AdaBoostM1
79.31%
78.93%
81.8
2
94.8
3
mcMajVote
74.76%
98.0
4
C Parzen Mean
96.20% 96.22%
151.40
8
74.04%
100.8
5
mcDynDdDirectKnn
96.05% 96.07% 157.20
9
71.95%
73.83%
101.6
6
DBD Mean
95.76% 95.78%
168.80
10
79.98%
73.73%
102.0
7
mcBayesComb
95.35% 95.36% 185.40
11
Meta-Consensus
77.00%
76.47%
MajVote
80.44%
SVM
73.62%
BayesComb DecisionTemplate DempsterShafer
80.58%
73.68%
102.2
8
KNN Mean
94.86% 94.88%
205.00
12
Wernecke
78.11%
73.37%
103.4
9
SN Mean
94.71% 94.74%
210.60
13
TREE
76.22%
73.11%
104.4
10
DynDdDirectKnn
77.40%
72.59%
106.4
11
ClusteringAndSelection
74.24%
70.99%
112.6
12
Parzen
74.15%
70.94%
112.8
13
KNN
73.62%
70.94%
112.8
13
BP Mean
94.19% 94.22%
231.40
14
LVQ Mean
94.06% 94.08%
236.80
15
mcArcX4
92.94% 92.97% 281.40
16
mcAdaBoostM1
92.62% 92.64% 294.20
17
FuzzyIntegral
77.82%
70.74%
113.6
15
mcFuzzyIntegral
88.65% 88.68% 452.80
18
SN
74.16%
70.68%
113.8
16
QDC Mean
88.49% 88.54%
458.60
19
BP
74.54%
70.53%
114.4
17
Tree Mean
87.87% 87.89%
484.40
20
DBD
75.73%
70.37%
115.0
18
NaiveBayes Mean
73.22% 73.27% 1069.40
21
NaiveBayes
73.60%
68.63%
121.8
19
LDC Mean
70.07% 70.17% 1193.40
22
LDC
74.25%
64.97%
136.0
20
QDC
77.20%
63.37%
142.2
21
WeightedAverage
75.75%
59.92%
155.6
22
TABLE III. RANKING OF RESULTS ON DNA BY WEIGHTED MEANS. Mean Predict DecisionTemplate
A.Mean
W.Mean
Errors
Rank
96.69%
96.70%
21.0
1
TABLE V. RANKING OF RESULTS ON SEGMENT BY WEIGHTED MEANS
Mean Predict
A.Mean
W.Mean Errors Rank
ArcX4
97.92%
97.92%
Meta-Consensus
97.92%
97.92%
9.6
1
MajVote
97.84%
97.84%
10.0
3
9.6
1
DecisionTemplate
97.71%
97.71%
10.6
4
Wernecke
97.71%
97.71%
10.6
4
DempsterShafer
97.66%
97.66%
10.8
6
SN
97.36%
97.36%
12.2
7
AdaBoostM1
97.36%
97.36%
12.2
7
BayesComb
97.36%
97.36%
12.2
7
DynDdDirectKnn
97.32%
97.32%
12.4
10
SVM
97.14%
97.14%
13.2
11
TREE
96.97%
96.97%
14.0
12
Parzen
96.93%
96.93%
14.2
13
DBD
96.88%
96.88%
14.4
14
KNN
96.80%
96.80%
14.8
15
BP
96.58%
96.58%
15.8
16
FuzzyIntegral
96.45%
96.45%
16.4
17
ClusteringAndSelection
96.23%
96.23%
17.4
18
LDC
91.64%
91.64%
38.6
19
NaiveBayes
90.61%
90.61%
43.4
20
QDC
87.75%
87.75%
56.6
21
TABLE VI. RANKING OF RESULTS ON SATIMAGE BY WEIGHTED MEANS
Mean Predict
A.Mean
W.Mean
Errors Rank
SVM
90.29%
92.35%
98.4
1
Meta-Consensus
89.02%
91.73%
106.4
2
BayesComb
88.47%
91.11%
114.4
3
MajVote
89.18%
91.05%
115.2
4
DempsterShafer
89.26%
91.02%
115.6
5
Wernecke
89.02%
90.85%
117.8
6
SN
89.26%
90.83%
118.0
7
DecisionTemplate
89.47%
90.75%
119.0
8
ClusteringAndSelection 88.82%
90.74%
119.2
9
WeightedAverage
88.23%
90.69%
119.8
10
AdaBoostM1
88.02%
90.66%
120.2
11
ArcX4
87.78%
90.52%
122.0
12
DynDdDirectKnn
88.27%
90.52%
122.0
12
BP
88.88%
90.49%
122.4
14
KNN
88.61%
90.47%
122.6
15
Parzen
89.32%
89.89%
130.2
16
DBD
86.83%
87.75%
157.6
17
QDC
82.45%
86.88%
168.8
18
FuzzyIntegral
84.54%
86.82%
169.6
19
TREE
82.99%
85.38%
188.2
20
LDC
81.79%
83.87%
207.6
21
NaiveBayes
80.33%
83.65%
210.4
22
Reference [1] Ensemble Learning. In The Handbook of Brain Theory and Neural Networks, Second edition, (M.A. Arbib, Ed.), Cambridge, MA: The MIT Press, 2002. [2] L.I.Kuncheva. Combining Pattern Classifiers. Methods and Algorithms, Wiley, 2004. [3] M. Buscema, MetaNet: The Theory of Independent Judges, in Substance Use & Misuse, Vol. 33, n. 2 (Models), Marcel Dekker, Inc., New York, 1998, pp. 439-461. [4] Bridle, J.S. (1990a). Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition. In: F.Fogleman Soulie and J.Herault (eds.), Neurocomputing: Algorithms, Architectures and Applications, Berlin: Springer-Verlag, pp.227-236. [5] Ron Kohavi and Foster Provost, Glossary of Terms. Editorial for the Special Issue on Applications of Machine Learning and the Knowledge Discovery Process (volume 30, Number 2/3, February/March 1998). [6] Tastle, W.J., Wierman, M.J., Dumdum,U.R. (2005). Ranking Ordinal Scales Using The Consensus Measure. Issues in Information Systems, Volume VI, No. 2, 96-102. [7] W.J. Tastle, M.J. Wierman. Consensus and dissention: A measure of ordinal dispersion. International Journal of Approximate Reasoning, 45(3):531-545, 2007. http://citeseer.ist.psu.edu/kohavi95study.html. (Morgan Kaufmann, San Mateo) [8] Breiman et al. (1993), "Classification and Regression Trees" Chapman and Hall, Boca Raton. [9] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning Internal Representations by Error Propagation, in. D. E. Rumelhart and J. L. McClelland, (eds.) Parallel Distributed Processing, Vol.1 Foundations, Explorations in the Microstructure of Cognition, The MIT Press, Cambridge, MA, London, England 1986. [10] M. Buscema, S. Terzi, M. Breda, Using sinusoidal modulated weights improve feed-forward neural networks performances in classification and functional approximation problems, WSEAS Transactions on information science and applications, Issue 5, Volume 3, May 2006, pp.885-893 [11] Corinna Cortes and V. Vapnik, "Support-Vector Networks", Machine Learning, 20, 3, 273-297, 1995. [12] Wernecke, K D. A coupling procedure for the discrimination of mixed data., Biometrics, Vol.48, 2: 497-506, 1992. [13] G.Rogova. Combining the results of several neural network classifiers. Neural Networks, 7:777-781, 1994. [14] Y.S.Huang, C.Y.Suen. A method for combining multiple expert for the recognition of unconstrained handwritten numerals. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17:90-93, 1995. [15] L.I.Kuncheva. Clustering-and-Selection model for classifier combination. In Proc. Knowledge-based Intelligent Engineering Systems and Allied Technologies, Brighton, UK, 2000, pp.185-188. [16] K.Woods, W.P.Kegelmeyer, K.Bowyer. Combination of multiple classifiers using local accuracy estimates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19:405-410, 1997. [17] S.B.Cho and J.H.Kim. Combining multiple neural networks by fuzzy integral and robust classification. IEEE Transactions on Systems, Man , and Cybernetics, 25:380-384, 1995. [18] L.Breiman. Arcing classifiers. The Annals of Statistics, 26(3):801849, 1998. [19] Y.Freund and R.E.Schapire. A decision-theoretic genralization of on-line learning and an application to boosting. Journal of Computer and Systems Sciences, 55(1):119-139, 1997. [20] M.Buscema et al (1999), Reti Neurali Artificiali e Sistemi Sociali Complessi. Vol II – Applicazioni. Franco Angeli, Milano, pp 288-291 [Artificial Neural Networks and Complex Systems. Applications]. [21] Asuncion, A. & Newman, D.J. (2007). UCI Machine Learning Repository [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, School of Information and Computer Science.
3
6
1
2
2.83
2
5
7
4
1
5.00
MajVote
3
3
4
8
13
4
5.83
Satim
4
11
Letter
1
SVM
DNA
Digits
Meta-Consensus
Faults
Segment
Mean Rank
TABLE VII. RANKING RESULTS ON 6 DATASETS
DempsterShafer
6
4
8
3
10
5
6.00
BayesComb
7
4
6
3
15
3
6.33
DecisionTemplate
4
8
7
1
11
8
6.50
ArcX4
1
1
13
16
12
8.60
AdaBoostM1
7
2
11
17
11
9.60
ClusteringAndSelection
18
12
10
5
9
10.67
DynDdDirectKnn
10
8
11
9
14
12
10.67
SN
7
16
16
18
3
7
11.17
DBD
14
12
18
12
8
17
13.50
Parzen
13
11
13
22
7
16
13.67
KNN
15
13
13
21
9
15
14.33
TREE
12
19
10
19
6
20
14.33
QDC
21
1
21
5
21
18
14.50
BP
16
15
17
16
12
14
15.00
10
FuzzyIntegral
17
18
15
15
20
19
17.33
LDC
19
14
20
17
18
21
18.17
NaiveBayes
20
17
19
14
19
22
18.50