Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008.
Fuzzy Cognitive Maps for Pattern Recognition Applications G.A. Papakostas1, Y.S. Boutalis1, D.E. Koulouriotis2 and B.G. Mertzios3 1
2
3
Democritus University of Thrace, Department of Electrical and Computer Engineering, 67100 Xanthi, Hellas e-mail:
[email protected],
[email protected]
Democritus University of Thrace, Department of Production Engineering and Management, 67100 Xanthi, Hellas e-mail:
[email protected]
Thessaloniki Institute of Technology, Department of Automation, Laboratory of Control Sys. And Comp. Intell., Thessaloniki, Hellas e-mail:
[email protected]
Abstract
A first attempt to incorporate Fuzzy Cognitive Maps (FCMs), in pattern classification applications is performed in this paper. Fuzzy Cognitive Maps, as an illustrative causative representation of modelling and manipulation of complex systems, intend to model the behaviour of any system. By transforming a pattern classification problem into a problem of discovering the way the set of the patterns interacts with each other and with the classes that they belong to, we could describe the problem in terms of Fuzzy Cognitive Maps. More precisely, some FCM architectures are introduced and studied in respect to their pattern recognition abilities. An efficient novel hybrid classifier is proposed as an alternative classification structure, which exploits both neural networks and FCMs to ensure improved classification capabilities. Appropriate experiments with four well known benchmark classification problems and a typical computer vision application; establish the usefulness of the Fuzzy Cognitive Maps, in pattern recognition research field. Moreover, the present paper introduces the usage of
1
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. more flexible FCMs by incorporating nodes with adaptively adjusted activation functions. This advanced feature gives more degrees of freedom in the FCM structure to learn and store knowledge, as needed in pattern recognition tasks.
Keywords: pattern recognition, computer vision, fuzzy cognitive maps, hybrid classifier, neural networks
1. Introduction A crucial part of any intelligent system, which learns from its environment and interacts with it, is a pattern recognition process. Generally, the process of recognizing a specific pattern or object, by using an appropriate intelligent structure able to make knowledge based decisions, is called pattern recognition. Artificial Neural Networks constitute an essential part of a modern decision making system, in which the need of knowledge storing and knowledge based decision have been inspired by the neural networks of a human’s brain [1,2]. Due to their knowledge storage capability, neural networks are able to be used for pattern recognition tasks and classification problems [3,4], while their ability to repeatedly learn their internal representation makes them very useful to real-time image and signal processing applications. Neural networks have been successfully applied in many scientific fields, as intelligent structures that can learn multidimensional linear and non-linear mappings, by adjusting their connection weights. The knowledge in a neural network is stored during the training phase, when its weights take appropriate values, according to the problem being processed. However, since the dynamics of a neural network is still
2
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. under investigation, one have to make some appropriate trial and error tests in order to decide the optimal architecture of the network to be used, for a specific application. Recently, an increased interest about the theory and application of the Fuzzy Cognitive Maps (FCMs) [5] in engineering science is noted. FCMs are characterized by their ability to model the dynamics of complex systems by incorporating the casual relationships between the main concepts that describe the system. They have been used initially to model complex social and financial systems [6-11], where analytical descriptions do not exist or can not be derived. In the current paper, it is attempted to use Fuzzy Cognitive Maps, in pattern classification tasks, as alternative to neural networks or even as collaborators, in achieving better classification capabilities. Appropriate experiments shown that when FCMs are used in fusion with the neural networks, more effective hybrid architectures can be derived, a hybridation which deals with the conclusions already resulted from several attempts in classifier combinations [12-20]. The paper is organized, by describing shortly the fundamental theory of the FCMs in section 2, introducing FCMs for pattern classification purposes in section 3 and finally by studying the efficiency of the proposed structures through appropriate simulations in section 4.
2. Fuzzy Cognitive Maps Fuzzy Cognitive Maps (FCMs) are fuzzy signed directed graphs with feedback [21-24]. They were proposed by Kosko [25] as a modelling methodology of complex
3
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. systems, able to describe the casual relationships between the main factors (concepts) that determine the dynamic behaviour of a system. In their graphical presentation, FCMs have the form of a diagram with nodes having interconnections between them, as illustrated in the following Figure 1,
Figure 1. A simple Fuzzy Cognitive Map
As it can be seen from the above figure, an FCM consists of nodes (concepts), Ci, i=1,2,3,…,N, where N is the total number of concepts, which are characteristics, main factors, or properties of the system being modelled. The concepts are interconnected through arcs having weights that denote the cause and effect relationship that a concept has on the others. There are three possible types [26] of casual relationships between two concepts Ci and Cj (i) positive, meaning that an increase or decrease in the value of a cause concept results in the effect concept moving in the same direction and it is described by a positive weight Wij, (ii) negative, meaning that changes in the cause and effect concepts take place in opposite directions, and the weight Wij has negative sign and (iii) no relation, with the interconnection weight being zero. The weight value, e.g. Wij, denotes the strength of how casual concept Ci causes Cj and takes values in the range [-1,1]. 4
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. At each time, the value of each concept Ai is calculated, by summarizing the influences of all the other concepts to this and by squashing the overall impact using a barrier function f, according to the following rule
Ait 1 f Ait W ji A tj i 1,i j
(1)
where Ait 1 and Ait are the values of concept Ci at times t+1and t respectively, A tj the value of concept Cj at time t, Wji the weight value of the interconnection with direction from concept Cj to concept Ci, and f the barrier function used to restrict the concept value into a specific range, usually in the range [0,1]. In each step, a new state of the concepts is derived according to (1) and after a number of iterations, the FCM may arrive in one of the following states [26]:
1. In a fixed equilibrium point 2. In a limited cycle 3. In a chaotic behaviour
When the FCM arrives in a fixed equilibrium point, we can conclude that the FCM has converged and the final state corresponds to the actual system state in which the system concludes, when the initial values of concepts are applied. The design of the FCMs is significantly based on the experience of some experts, who have the knowledge of the system being modelled and provide the weight values for the interconnection between concepts. In more flexible FCM structures these weights are being found during a learning procedure, in a similar way as in the case of
5
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. neural networks training. For this reason many researchers have borrowed and adjusted algorithms from the neural network theory and proposed new learning procedures [27-29]. Additionally, a significant work about the usage of evolutionary methods for finding the optimal FCM weight sets, as alternative to the gradient based algorithms, has been done with remarkable results [6, 30-36]. In the present paper, a Simple Genetic Algorithm (SGA) [37] is used in order to find the optimal weight set for each one of the proposed FCM structures, by minimizing an appropriate objective function.
3. Fuzzy Cognitive Mappers (FCMpers) In the present paper the introduction of FCMs in pattern recognition applications is presented and their performance is evaluated. The main idea of the proposed architectures can be summarized in the following proposition,
Proposition 1. Since the major advantage of Fuzzy Cognitive Maps is their ability to describe the behaviour of complex systems, they could be used to model a classifier having appropriate input and output concepts and appropriate interconnections among them.
Based on the above proposition, we can define a new modelling structure suitable for pattern recognition tasks, Definition 1. The Fuzzy Cognitive Map structure used to map a set of input concepts (ICs) to a set of output concepts (OCs), through the execution of the inference formula (1), until an equilibrium point is reached, is called Fuzzy Cognitive Mapper (FCMper).
6
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. The operation of the Fuzzy Cognitive Mapper as a modelling methodology of the dynamics of a classifier can be defined as follows,
Definition 2. The values of the output concepts (OCs) of the state in which a Fuzzy Cognitive Mapper equilibrates (equilibrium point), when appropriate values of the input concepts (ICs) are applied, correspond to the decision of the classifier being modelled by the FCMper.
Before making the description and discuss each one of the novel FCM structures presented in this paper, it is useful to introduce a new representation of the bidirectional casual relationship between two concepts of an FCM. On the left side of Figure 2, the conventional representation of the bidirectional casual relationship between the two concepts is replaced by the one presented on the right side.
Figure 2. Bidirectional casual relationship between concepts C1 and C2
In the above Figure 2, the weights W12 and W21 represent the strength of how C1 influences C2 and vice versa. This representation will facilitate the illustration of the structures that follow. In all the proposed structures the concepts are divided into two types, the input and output concepts, (ICs) and (OCs) respectively. The input concepts are considered to
7
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. be steady state concepts, which means that their values remain unchanged, they are not updated according to rule (1), but they just contribute to the updating procedure of the rest nodes. In the next sections three FCMpers are presented as a first approach of using FCMs, to model the dynamics of a classifier in pattern classification applications.
3.1 Fuzzy Cognitive Mapper-1 (FCMper1)
The first type of Fuzzy Cognitive Mapper introduced in this paper, has the simple form illustrated in Figure 3, below. This structure tries to model the behaviour of a classifier, where the classifier inputs are defined as input steady state concepts and the outputs of the classifier as output concepts. The input concepts correspond to the features used to distinguish the patterns’ classes while the output concepts are the classes’ numbers. The output node, which will be activated by taking the higher value, is the class number in which the input feature set belongs. The operation of the FCMper of Figure 3, is based on the following Lemma,
Lemma 1
If a common set of interconnection weights can be found, which can lead the FCM to specific equilibrium points for different initial set of input concepts, then the FCMper1 can map the patterns to the classes they belong to; in other words it can recognize them.
8
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008.
Figure 3. Fuzzy Cognitive Mapper-1 (FCMper1)
More precisely, we try to find the weight set which for each input set of concepts drive the FCM in an equilibrium point, where the output nodes determine the class in which the input set (features) belongs. In the model presented in Figure 3, one can see that the nodes of the output concepts, which correspond to the class number, are fully connected, in order to introduce dependence between the nodes that make the decisions. This way, the structure attempts to capture how the decision of the one node affects the decision of the others and vice versa.
9
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008.
3.2 Fuzzy Cognitive Mapper-2 (FCMper2)
The second approach of involving a Fuzzy Cognitive Mapper in a pattern recognition procedure is by using the structure illustrated in Figure 4. In this case, the FCMper2 operates in a second stage. In the first stage a neural network of any type and structure is trained with the training data set, for a specific application. In the sequel the trained neural network by working in generalization mode, tries to recognize the training patterns. The decisions of the neural network constitute the values of the input concepts of the FCMper2, as depicted in Figure 4. The operation of the FCMper2, can be summarized in the following definition
Definition 3. Assuming that the neural network decisions are not absolutely accurate, the FCMper2 is used to map these decisions to the correct decisions that neural network ought to have made.
Figure 4. Fuzzy Cognitive Mapper-2 (FCMper2)
10
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. As it can be seen from the above figure, the decisions in which class the patterns presented in the trained neural network belong, are considered to be controversial. This is the reason why the input concepts have the form of questions. The FCMper2 comes to improve the decisions of the network by making classification in a second stage. In this way we have a hybrid structure, a hybrid two-stage classifier of the form of Figure 5.
Figure 5. Hybrid Classifier -1
By finding the optimal weight set, it is expected the FCMper2 to equilibrate to more accurate decisions than the neural network, due to its simplicity and ability to describe the relationships between the node concepts.
3.3 Fuzzy Cognitive Mapper-3 (FCMper3) The third type of Fuzzy Cognitive Mapper, has a more complicated form than the previous ones, since it consists of more input steady state nodes, as it can be seen from Figure 6. This approach, which is called FCMper3, is a combination of the
11
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. previous structures and it works as a second stage classifier like the FCMper2 presented in the previous section. In FCMper3 however, the input nodes are not only the NN decisions, but the input feature vector values as well. In this case, the proposed model attempts to model a supervisor classifier in a second stage, which takes the decisions of the first stage classifier (neural network) for specific input features and maps them to more accurate ones. This structure combines the advantages of the previous FCMpers, by managing the whole available information coming both from the neural network and the input data.
Figure 6. Fuzzy Cognitive Mapper-3 (FCMper3)
12
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008.
Figure 7. Hybrid Classifier -2
In Figure 7, the integrated hybrid classifier with the addition of the FCMper3 in a second stage is presented. The FCMper3, as in the case of FCMper2, operates as a supervisory device that tends to correct the neural network decisions by modelling the behaviour of a more accurate classifier.
13
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008.
3.4 FCMpers with Adaptive Activation Functions
In the inference formula (1), the barrier function f is used to restrict the concepts values in the range [0,1], by making a squashing. A typical squashing function widely used in FCMs is the logistic sigmoid function
f x
1 1 e Ax
(2)
where A is a parameter that controls the shape of the function.
However, in the literature of FCMs there isn’t an approach where different transfer functions are used for each one of the nodes. A first usage of adjustable transfer functions has been made in [6]. It is believed that as in the case of neural networks, the activation function of each node should not just restrict the output neuron in a certain range, but also to introduce a nonlinear behaviour. Thus, if we permit each neuron to operate with different activation function we can improve the cognitive capability of the FCMpers. Also, by enabling the neurons to have variable activation functions, the FCMpers are made more flexible, having more degrees of freedom and therefore are more capable to learn and store internal representations of the system being modelled. Some logistic sigmoidal activation functions of different shape, for different values of the parameter A, are drawn in Figure 8. To test this hypothesis, the three types of FCMpers introduced previously, will be also examined in their adaptive versions by enabling their nodes to have logistic sigmoid functions of different shapes.
14
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008.
Figure 8. Logistic activation functions of different shape.
In our experiments, we are making use of logistic sigmoidal functions of different shape, as activation functions for the models’ nodes. The A parameters of these functions are optimally chosen during the learning process by the learning algorithm. In a more advanced scenario, these functions might be of different type and not of different shape only.
3.5 Learning Algorithm
As it has already been discussed in section 2, many researchers in the recent years have tried to enhance the functionality of the FCMs by developing learning algorithms, which find the appropriate set of weight interconnections. The proposed algorithms are divided into two types, these which use gradient information and are
15
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. called gradient-based algorithms and those which are based on evolutionary mechanisms. Until now, the algorithms of both kinds search a weight set which can lead the FCM from a specific initial state to a desired equilibrium point. However, in our case the learning algorithm should work as follows,
Definition 4. The learning algorithm for the case of FCMpers, have to find a common weight set which, for different initial state of the input concepts, the FCMpers equilibrate to different points.
In order to satisfy Definition 4, it is decided to use a Simple Genetic Algorithm (SGA) [37], in all the cases of FCMpers, already discussed. The settings of the SGA used, are summarized in Table 1.
Table 1. Simple Genetic Algorithm settings
Population Size
50
Variables Range
[-1,1]
Maximum Generations
300
Elitism
YES, 2 chromosomes
Crossover Points
2 points
Crossover Probability
0.8
Mutation Probability
0.01
Selection Method
Stochastic Universal Sampling (SUS)
16
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. The training procedure of the proposed FCMpers, is supervised, since we provide the desired equilibrium points, output concepts for each set of input concepts. The desired output concepts correspond to the class number that the input concepts belong to. The genetic algorithm used in our experiments, is a real valued algorithm with chromosome representation as,
Ch1 ( x11 x12 x31 ...x1n ), ..., Chm ( x1m x2m x3m ...xnm )
where Chi is the ith chromosome, and x ij is the jth variable (interconnection weights) of the ith chromosome. In the above formulation, the population size is equal to m. In the case of nodes with adaptive activation functions, extra variables are included in the chromosomes, the parameters A that control the shape of the functions, one parameter for each one of the output nodes. The objective function that is being minimized in all cases is the Mean Squared Error (MSE), between the actual and the desired output concepts, of the FCMpers. The MSE performance index is defined as follows,
1 M N MSE OC ijActual OC ijDesired MN i 1 j 1
(
2
3)
where M is the number of training patterns, N the number of FCMper output concepts and ( OCijActual – OC ijDesired ) the difference between the ith output concept and its corresponding desired value (target), when the jth set of input concepts appears to the FCMper’s input.
17
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. A detailed description of the learning algorithm used in all the cases is illustrated in the following pseudo-code Fig. 9, Collect FCM InputConcepts (ICs)-OutputConcepts(OCs) Build FCM Structure Create Population for Generation=1 to Maximum Generations for Individual=1 to Population Size Initialize FCM with Individual Fitness(Individual) = MSE (Equation 3) End Selection Crossover Mutation Reinsertion with Elitism End Initialize FCM with Best Individual Operation in Generalization Mode
Figure 9. Pseudo-Code of the GA-based learning algorithm.
Additionally, in the cases of FCMper2 and FCMper3, where in the first stage a neural network (multilayer perceptron feedforward network) has to be trained, the well known Backpropagation (BP) algorithm with momentum term [1] is being used. The settings of the backpropagation are shown in Table 2. Table 2. Backpropagation Algorithm settings Epochs
500
Learning Rate
0.01
Momentum
0.9
Regarding the selection of the parameters of SGA and BP algorithms, it is noted that they have been derived by trial and error and have not been optimized. Better performance can be achieved for both algorithms, if an extra optimization procedure is performed. However, in our experiments, it is desired to investigate the
18
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. generalization and recognition capabilities of the newly introduced structures in general, without giving optimal performance indices. An illustrative example of the way the learning process executed is presented at the Appendix section at the end of the manuscript.
4. Experimental Study In this section a set of simulations on applying the novel FCM structures, to four well known classification benchmarks and to one typical computer vision application, is taking place. These experiments investigate the ability of FCMpers to recognize known patterns, with either regular or adaptive activation node functions. In the tables presenting the test results, the structure of each model used in each classification problem is mentioned. In the case of neural network the structure has the form of “Number of Inputs – Number of hidden nodes – Number of classes”, while in the case of FCMpers, it has the form of “Number of Input Concepts – Number of Output Concepts”. The adaptive versions for each one of the proposed FCMpers have the same name initiated with the letter A, e.g FCMper1 and AFCMper1. Each one of the experiments was executed repeatedly for 100 times, in order to take statistically correct results and the mean statistic measurements are presented in the Tables that follow.
4.1 The two Spiral Problem The two spiral problem is an extremely hard problem for neural networks to solve. The goal of this problem is to learn to discriminate between two sets of training points, which lie on two distinct spirals in the x-y plane [38]. Figure 10, shows the
19
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. mean training curves for each model and for the neural classifier used to make the comparison. Table 3, summarizes the classification performance for the structures under test, through appropriate indices.
(a)
(b)
Figure 10. Training curves of (a) the proposed FCMpers and (b) the neural classifier, in the case of Two spirals problem.
Table 3. Classification Statistics in the case of the Two Spiral problem
Two Spiral Neural
FCM
AFCM
FCM
AFCM
FCM
AFCM
Network
per1
per1
per2
per2
per3
per3
Structure
2-10-2
MSE
0.1505
0.2491
0.2485
0.1504
0.1768
0.2434
0.1389
PMSE
0.1389
0.2448
0.2459
0.1499
0.3158
0.2410
0.1349
CR
75.27%
56.94%
56.56%
75.59%
60.22%
54.17%
79.67%
0.0013
0.0012
0.0905
0.0990
0.0076
0.1093
Stdv(MSE) 0.0922
2-2
2-2
20
4-2
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. The experimental pattern set is consisted of 100 training and 50 testing patterns. While the MSE index is calculated during the training phase, when the training patterns are applied, the PMSE measurement corresponds to the Performance Mean Squared Error. This error is computed by applying the testing pattern set to the trained structure. Moreover, CR is the Classification Rate and it is defined as the percentage of the correctly classified patterns over the entire testing set and Stdv is the Standard Deviation of the MSE, as an index of the computational stability of the learning algorithms. The results presented in Table 3, show that the AFCMper3 (last column in bold face), outperforms the other FCMpers, even the neural classifier. More precisely, the AFCMper3 having smaller structure (only 16 adjustable parameters) than neural network (52 weights and biases), gives better classification rate of almost 80%, with smaller training error.
4.2 The XOR Problem This problem corresponds to the 2-parity problem, and it is the most widely used to evaluate the performance of pattern recognition systems [2]. This problem is an easy problem, for the neural network structure used to our simulations. The training and testing pattern set is the same and is consisted of 4 patterns the 22 combinations of 0’s and 1’s. The neural network has two outputs ([1, 0] and [0, 1]) so that it can be compared with FMappers, which should employ al least two output nodes. Figure 11
21
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. illustrates the learning process of each structure, while Table 4 summarizes their performance.
(a)
(b)
Figure 11. Training curves of (a) the proposed FCMpers and (b) the neural classifier, in the case of XOR problem.
Table 4. Classification Statistics in the case of the XOR problem
XOR Neural
FCM
Network
per1
AFCM FCM per1
AFCM FCM
per2
per3
2-2
per3
Structure
2-2-2
MSE
0.1041
0.2501
0.0191
0.1119
0.1020
0.0941
0.0085
PMSE
0.1041
0.2501
0.0191
0.1119
0.2760
0.0941
0.0085
CR
77.5%
53%
98%
82.5%
67.5%
95.5%
100%
1.731x10-5
0.0234
0.0745
0.0908
0.0700
0.0101
Stdv(MSE) 0.0815
2-2
per2
AFCM
22
4-2
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. As it can be observed by Table 4, there are four structures among the six FCMpers that perform better than the neural classifier, with the Hybrid Classifier-2 (AFCMper3) being again the “winner” in this test. In this experiment the structure of the AFCMper3 is bigger (16 adjustable parameters) than that of the neural network (12 weights and biases) being used and this is the reason why the proposed hybrid classifier performs better. The very small training error of 0.0085 and the very high classification rate of 100%, in conjunction with the corresponding values of the neural classifier, prove that the proposed structure can efficiently solve such classification problems without increasing the architecture complexity.
4.3 4-Parity The n-parity problems are widely used as test data, for determining the performance of a training algorithm. These problems generate 2n combinations of n bits as input data, while the output is a single bit equal to 1 if there is an odd number of high bits or 0 if there is an even number of high bits, in the input pattern [2].
(a)
(b)
Figure 12. Training curves of (a) the proposed FCMpers and (b) the neural classifier, in the case of 4-Parity problem.
23
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. Table 5. Classification Statistics in the case of the 4-Parity problem
4-Parity Neural
FCM
AFCM
FCM
AFCM
FCM
AFCM
Network
per1
per1
per2
per2
per3
per3
Structure
4-4-2
MSE
0.0648
0.2503
0.2471
0.0664
0.0902
0.2503
0.0338
PMSE
0.0486
0.2503
0.2471
0.0664
0.2402
0.2503
0.0338
CR
93.75%
50.37%
52.75%
93.75%
71.37%
49.75%
98.25%
4.98x10-5
0.0041
0.0178
0.0906
9.29x10-5
0.0267
Stdv(MSE) 0.0512
4-2
2-2
6-2
The same conclusions, as in the previous problems, can also be derived in the case of the 4-parity problem, as Figure 12 and Table 5, present. Only, the AFCMper3 presents better classification behaviour than the neural network, by resulting in higher classification rate and minimum errors.
4.4 IRIS Data The classification of Fisher’s IRIS data set is a common pattern recognition task, for testing the efficiency of the neural structures. The data set [39] contains 3 classes of 50 instances each, where each class refers to a type of iris plant (Setosa, Versicolor, Virginica) and each data consists of four features (sepal length, sepal width, petal length, petal width, in cm). One class is linearly separable from the other 2; the latter are not linearly separable from each other. In order to perform this pattern classification test, a 4-4-3 multilayer neural network, is used. The four inputs correspond to the measurements that describe each one of the
24
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. three classes, the tree types of Irish flowers. From the 150 patterns, 105 randomly selected patterns constitute the training set and the remaining 45 the testing set.
(a)
(b)
Figure 13. Training curves of (a) the proposed FCMpers and (b) the neural classifier, in the case of Iris Data.
Table 6. Classification Statistics in the case of the Iris Data
Iris Data Neural
FCM
AFCM
FCM
AFCM
FCM
AFCM
Network
per1
per1
per2
per2
per3
per3
Structure
4-4-3
MSE
0.0628
0.1730
0.1599
0.0628
0.0705
0.1408
0.0281
PMSE
0.0596
0.1781
0.1709
0.0706
0.2676
0.1499
0.0453
CR
87.67%
64.38%
69.24%
93.23%
60.98%
70.93%
95.27%
0.0816
0.0940
0.0235
0.0859
0.1157
0.0163
Stdv(MSE) 0.0499
4-3
3-3
25
7-3
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. In this problem the Hybrid Classifier-2 (AFCMper3) outperforms the other structures, totally. The results of Table 6, shows that the improvement of the classification capabilities of the simple neural classifier, by using the AFCMper3 in a second stage is impressive. The 87.67% classification rate of the conventional neural classifier can be improved, if a second classification structure is used in a second stage, giving 95.27% rate with smaller training error.
4.5 Zernike Moments In this experiment a typical classification application, where the Zernike moments are used as discriminative features of the objects be classified, is taken place. Initially some test objects (patterns) are selected. Figure 14b shows a wooden pyramidal puzzle, which is used for robot vision tasks in the Control Systems Lab of DUTH (Democritus University of THrace) [3,4]. The 9 parts of the puzzle, placed in arbitrary positions, are shown in Figure 14a. The (256x256 pixels) images of these parts are the initial nine patterns of our experiments.
(a)
(b)
Figure 14. The nine work pieces that are placed (a) in arbitrary positions on the table and (b) on a 3-D truncated pyramid.
26
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. After a typical cross-validation procedure [3] it was decided, the neural classifier to be a typical multilayer perceptron containing 1 hidden layer, with 10 hidden neurons and 9 (equal to the number of classes) output neurons. The number of inputs equals the number of the features used in every case. The learning set contains 324 images. They were produced by using the following procedure. The original 9 image objects were transformed by combining various scaling, translation, and rotation operations to produce 54 images, including the original ones. Adding noise and blurring to the previous 54 images, 540 more images were generated. In fact, Gaussian noise of 6 levels from 0-25% (this value corresponds to the percent number of the image pixels that have been affected by the noise) and blurring effect of 10 blurring radius 0-9 (the radius determines the width of the edge that is blurred) generated these noisy images. 270 noisy images randomly selected from the set of the above 540 images plus the 54 uncorrupted images were used to form the 324 images training set. The rest 270 degraded images were used for testing. For experimental purposes, the first 12 Zernike Moments are computed for each one of the 540 plus the 54 uncorrupted images previously generated. From this set the first two moments are omitted because they are constant for all objects and thus they do not constitute discriminative quantities. Note here that the resulted images have been transformed appropriately in order to ensure scale, translation and rotation invariance, in the same way as described in [3]. This problem differs from the previous ones, since the data set has been formed directly from a real environment, the captured images of the objects. The experimental results of this classification problem are presented in Table 7 and the mean training curves for each one of the classification devices are illustrated in Figures 15a and 15b.
27
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008.
(a)
(b)
Figure 15. Training curves of (a) the proposed FCMpers and (b) the neural classifier, in the case of classification by using the Zernike Moments
Table 7. Classification Statistics in the case of the Two Spiral problem
Zernike Moments Neural
FCM
AFCM
FCM
Network
per1
per1
per2
per2
AFCM
per3
9-9
per3
Structure
10-10-9
MSE
0.0849
0.2027
0.2415
0.0955
0.1976
0.1863
0.0739
PMSE
0.0741
0.2009
0.2390
0.0956
0.3395
0.1861
0.0723
CR
87%
44.76%
56.2%
87.43%
58.14%
64.8%
89.33%
0.0756
0.0150
0.0745
0.1309
0.0961
0.0894
Stdv(MSE) 0.0863
10-9
AFCM FCM
19-9
The superiority of the AFCMper3, is still preserved in this real problem, as it can be seen by Table 7.
28
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. 4.5.1 A 10-fold Cross-Validation Analysis In order to analyse the performance and the behaviour of the classifiers be compared in the previous experiments, a 10-fold cross-validation [1], [40] analysis is taken place in this section. This statistical analysis is being performed in the case of Iris dataset of section 4.4, since the dataset is large enough and permits that kind of processing. According to the 10-fold cross-validation procedure, the dataset of the 150 entries, is randomly divided into 10 subsets of 15 data patterns each one. Then each subset is used to test the performance of the classifiers trained on the union of the remaining 9 subsets. This procedure is repeated 10 times, choosing a different part for testing each time. The results of this analysis are presented in the following Table 8.
Table 8. Classification rates in (%) for a 10-fold cross-validation analysis, in the case of IRIS dataset.
IRIS DATA Fold
Neural
FCM AFCM FCM AFCM FCM AFCM
Network
per1
per1
per2
per2
per3
per3
1
95.93
63.33
92.59
95.96
72.60
96.30
96.30
2
69.63
91.85
94.81
94.44
66.37
95.55
99.26
3
87.00
83.33
75.18
94.07
63.33
98.88
98.88
4
65.55
76.30
93.33
95.92
75.77
95.93
99.63
5
88.30
60.74
91.11
60.74
51.11
90.74
88.52
6
66.66
75.55
88.15
98.89
78.96
97.77
99.26
7
95.55
88.52
95.18
95.55
64.88
96.30
96.30
8
98.88
78.88
94.81
98.88
63.70
99.26
99.26
9
98.88
93.33
99.26
98.88
63.70
98.88
99.26
29
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. 10
98.88
72.60
93.33
98.88
98.52
98.88
98.52
Mean
86.53
78.44
91.77
93.22
69.89
96.85
97.52
Stdv
13.95
11.14
6.50
11.57
12.71
2.57
3.39
Based on the results of the above Table 8, it can be concluded that the proposed structures perform better than the neural network in all the cases, while in the most cases the hybrid classifier AFCMper3 outperforms its competitors, totally. This detailed statistical analysis verifies the previous experiments and concludes that the performance of a single classifier (in our case a typical neural network) can effectively be improved, by using an FCM model in the second stage connected appropriately. Also, the usage of adaptive nodes can provide more flexibility to the models, in learning the internal relations of the patterns.
5. Conclusion
The introduction of Fuzzy Cognitive Maps in pattern classification problems has been presented in this paper. Three possible classification models are proposed and new terminology is defined to face the pattern recognition problem in terms of Fuzzy Cognitive Maps. The new concept of FCMpers was introduced and their operational heuristics are defined. Through appropriate experiments the performance of the proposed structures was compared to each other and with that of a typical neural classifier.
30
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. The performance of the first FCMper is somewhat poor, although nodes with adaptive activation functions were used. This happened due to the weak internal structure of this model, something which could be improved by the addition of some hidden nodes. The second FCMper behaved like the first one, although in some cases it seems to corrupt the decisions of the neural classifier. The most efficient over all structures was the Hybrid Classifier-2, which has the AFCMper3 in the second stage. This structure performs better than all of the other structures, at least in the set of the problems they were tested. It clearly demonstrates that an FCM structure can significantly improve the classification performance of a classifier if this structure is put as a second stage complementary classifier. Moreover, the experiments demonstrate that the adoption of adaptive node activation functions brings better results only when it is combined with the appropriate FCM structure. This is clearly the case of AFCMper3. While this preliminary study demonstrates the efficacy of some FCM structures in pattern classification tasks, further research has to be done to explore new structures or even new adaptive approaches, which will further improve their performance.
Appendix In this section a detailed description of the learning process is presented for solving the XOR benchmark problem using the Hybrid Classifier-2 (AFCMper3), which is the most efficient among the structures proposed in this paper. As it has been noted in Table 4, the structure of the Neural Network of the first stage is 2-2-2 while the FCM consists of 6 concept nodes, 4 of them being steady or input nodes and 2 being output nodes. The Hybrid Classifier-2 in the case of XOR problem takes the following form,
31
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008.
Figure 16. Hybrid Classifier-2 (AFCMper3) for the case of XOR problem.
The main goal of the experiment is to learn the above hybrid structure the following XOR mapping,
Table 9. XOR mapping. Feature #1
Feature #2
Desired Output
0
0
0
0
1
1
1
0
1
1
1
0
After the neural network is trained to recognise the XOR mapping of the above table, the learning process of the FCM begins with the NN classifier working in generalized mode keeping its weights constant. In one pass of the SGA algorithm and for each candidate set of FCM weights of the population, the performance of the weight set in learning the mapping of the above Table 9 is calculated. 32
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. In computing the fitness of each FCM weight set, the Features #1 and #2, are inserted in the trained neural classifier and the corresponding NN outputs along with the inputs (Features #1 and #2) are entered as the initial values of the input concepts, while the output concepts are initialized to zero. Next, the FCM rule of Eq.(1) is applied until the map equilibrates in a certain point. The values of the output concepts in this point correspond to the classification result. The fitness value of the weight set is calculated by using Eq.(3). This process is repeated until the maximum number of generations is reached. The following Table 10, shows the initial values and the final values (iteration k) in the equilibrium point, of the FCM’s concepts.
Table 10. Concepts’ values during execution of the inference formula Eq.(3) . Iterations
IC1
IC2
IC3
IC4
OC1
OC2
1
1st NN output
2nd NN output
1st NN input
2nd NN input
0
0
Values [0-1]
Values [0-1]
Values [0,1]
Values [0,1]
……………….
……………….
………………..
………………
………
……..
1st NN output
2nd NN output
1st NN input
2nd NN input
Values
Values
Values [0-1]
Values [0-1]
Values [0,1]
Values [0,1]
[0-1]
[0-1]
k
References
[1]
Haykin, S., “Neural Networks – A Comprehensive Foundation”, Prentice Hall, 2nd Edition, 1999.
33
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. [2]
C.G. Looney, “Pattern Recognition Using Neural Networks, Theory and Algorithms for Engineers and Scientists”, Oxford University Press, 1997.
[3]
G.A. Papakostas, D.A. Karras, B.G. Mertzios and Y.S. Boutalis, “An Efficient Feature Extraction Methodology for Computer Vision Applications using Wavelet Compressed Zernike Moments”, ICGST International Journal on Graphics, Vision and Image Processing, Special Issue: Wavelets and Their Applications, Vol. SI1, pp.5-15, 2005.
[4]
D.A. Mitzias and B.G. Mertzios, “A Neural Multiclassifier System for Object Recognition in Robotic Vision Applications,” Measurement, Special Issue on Imaging Measurement Systems, Vol. 36, No. 3-4, pp. 315-330, 2004.
[5]
J. Aguilar, “A Survey about Fuzzy Cognitive Maps Papers (Invited Paper)”, International Journal of Computational Cognition, Vol. 3, No. 2, 2005.
[6]
D.E. Koulouriotis, I.E. Diakoulakis, D.M. Emiris, C.D. Zopounidis, “Development of Dynamic Cognitive Networks as Complex Systems Approximators: Validation in Financial Time Series”, Applied Soft Computing, Vol. 5, pp. 157-179, 2005.
[7]
D.E. Koulouriotis, I.E. Diakoulakis, D.M. Emiris, E.N. Antonidakis and I.A. Kaliakatsos, “Efficiently Modeling and Controlling Complex Dynamic Systems Using Evolutionary Fuzzy Cognitive Maps (Invited Paper)”, International Journal of Computational Cognition, Vol. 1, No. 2, pp. 41-65, 2003.
[8]
Koulouriotis D.E., Diakoulakis I.E., Emiris D.M. “A Fuzzy Cognitive Mapbased Stock Market Model: Synthesis, Analysis and Experimental Results”, 10th IEEE International Conference on Fuzzy Systems (FUZZ-2001), December 2-5, Melbourne, Australia, pp. 465-468, 2001.
34
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. [9]
Koulouriotis D.E., Diakoulakis I.E., Emiris D.M., “Fuzzy Cognitive Maps in Stock Market”, Fuzzy Sets and Systems in Management and Economy (C. Zopounidis, P.M. Pardalos and G. Baourakis, eds), pp. 165-179, World Scientific Press, 2001.
[10]
Koulouriotis D.E., Diakoulakis I.E., Emiris D.M. “Modeling Investor Reasoning using Fuzzy Cognitive Maps”, European Symposium on Intelligent Technologies, Hybrid Systems and their Implementation on Smart Adaptive Systems (EUNITE 2001), December 13-14, Tenerife, Spain, Proceedings pp. 386-391, 2001.
[11]
Koulouriotis D.E. “Investment Analysis and Decision Making in Markets using Adaptive Fuzzy Causal Relationships”, Operational Research: an International Journal, vol. 4, no. 2, pp. 213-233, 2004.
[12]
L. I. Kuncheva, “On combining multiple classifiers”, 7th International Conference on Information Processing and Management of Uncertainty (IPMU’98), pp. 1890–1891, Paris, France, 1998.
[13]
L. I. Kuncheva, J. C. Bezdek, and R. P. W. Duin, “Decision templates for multiple classifier fusion: an experimental comparison”, Pattern Recognition, vol. 34, no. 2, pp. 299–314, 2001.
[14]
J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, “On combining classifiers”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226–239, 1998.
[15]
L. I. Kuncheva, “Using measures of similarity and inclusion for multiple classifier fusion by decision templates”, Fuzzy Sets and Systems, vol. 122, no. 3, pp. 401–407, 2001.
35
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. [16]
L. I. Kuncheva, “Switching between selection and fusion in combining classifiers: An experiment”, IEEE Transactions on Systems, Man, and Cybernetics, vol. 32, no. 2, pp.146–156, 2002.
[17]
L. Lam, “Classifier combinations: implementations and theoretical issues”, In J. Kittler and F. Roli, editors, Multiple Classifier Systems, Lecture Notes in Computer Science, vol. 1857, Springer, pp. 78–86, 2000.
[18]
E. Caillault and C.V. Gaudin, “Mixed Discriminant Training of Hybrid ANN/HMM
Systems
for
Online
Handwritten
Word
Recognition”,
International Journal of Pattern Recognition and Artificial Intelligence, vol. 21, no. 1, pp. 117-134, 2007. [19]
F. Han, X.Q. Li, M.R. Lyu, T.M. Lok, “A Modified Learning Algorithm Incorporating Additional Functional Constraints into Neural Networks”, International Journal of Pattern Recognition and Artificial Intelligence, vol. 20, no. 2, pp. 129-142, 2006.
[20]
J.K. Min and S.B. Cho, “Multiple Decision Templates with Adaptive Features for Fingerprint Classification”, International Journal of Pattern Recognition and Artificial Intelligence, vol. 21, no. 8, pp. 1323 – 1338, 2007.
[21]
Kosko, “Hidden Patterns in Combined and Adaptive Knowledge Networks”, International Journal of Approximate Reasoning, vol. 2, pp. 377-393, 1988.
[22]
B. Kosko, “Neural Networks and Fuzzy Systems: A Dynamical Systems Approach to Machine Intelligence”, New Jersey, USA, Prentice-Hall, Inc., 1992.
[23]
Koulouriotis D.E., Diakoulakis I.E., Emiris D.M. “Anamorphosis of Fuzzy Cognitive Maps for Operation in Ambiguous and Multi-stimulus Real World
36
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. Environments”, 10th IEEE International Conference on Fuzzy Systems (FUZZ2001), December 2-5, Melbourne, Australia, pp. 1156-1159, 2001. [24]
Koulouriotis D.E., Diakoulakis I.E., Emiris D.M. “Realism in Fuzzy Cognitive Maps: Incorporating Synergies and Conditional Effects” 10th IEEE International Conference on Fuzzy Systems (FUZZ-2001), December 2-5, Melbourne, Australia, pp. 1179-1182, 2001.
[25]
B. Kosko, “Fuzzy Cognitive Maps”, International Journal Man-Machine Studies, Vol. 24, pp. 65-75, 1986.
[26]
C.D. Stylios and P.P. Groumpos, “The Challenge of Modelling Supervisory Systems Using Fuzzy Cognitive Maps”, Journal of Intelligent Manufacturing, Vol. 9, pp. 339-345, 1998.
[27]
T.L. Kottas, Y.S. Boutalis and M.A. Christodoulou, “A New Method for Weight Updating in Fuzzy Cognitive Maps Using System Feedback”, 2nd International Conference on Informatics in Control, Automation and Robotics (ICINCO’05), pp. 202-209, Barcelona, Spain, 2005.
[28]
Y. S. Boutalis, T. L. Kottas, B. Mertzios and M. A. Christodoulou, “A Fuzzy Rule Based Approach for storing the Knowledge Acquired from Dynamical FCMs”, 5th International Conference on Technology and Automation (ICTA05), pp. 119-124 ,Thessaloniki, Greece, 15-16 October, 2005
[29]
E.I. Papageorgiou, C.D. Stylios and P.P. Groumpos, “Fuzzy Cognitive Map Learning Based on Nonlinear Hebbian Rule”, Australian Conference on Artificial Intelligence, pp. 256-268, 2003.
[30]
D.E. Koulouriotis, I.E. Diakoulakis and D.M. Emiris, “Learning Fuzzy Cognitive Maps Using Evolution Strategies: A Novel Schema for Modeling
37
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. and
Simulating
High-Level
Behaviour”,
IEEE
Congr.
Evolutionary
Computation, pp. 364-371, 2001. [31]
Ghazanfari M., Alizadeh S., Fathian M. and Koulouriotis D.E. “Comparing Simulated Annealing and Genetic Algorithm in Learning FCM”, Applied Mathematics and Computation, vol. 192, pp. 56-68, 2007.
[32]
W. Stach, L. Kurgan, W. Pedrycz, M. Reformar, “Genetic Learning of Fuzzy Cognitive Maps”, Fuzzy Sets and Systems, Vol. 153, pp. 371-401, 2005.
[33]
W. Stach, L. Kurgan, W. Pedrycz, M. Reformar, “Evolutionary Development of Fuzzy Cognitive Maps”, Proceedings of IEEE Conference on Fuzzy Systems, pp. 619-624, 2005.
[34]
E.I. Papageorgiou, K.E. Parsopoulos, C.S. Stylios, P.P. Groumpos, M.N. Vrahatis,
“Fuzzy
Cognitive Maps
Learning
Using Particle
Swarm
Intelligence”, Journal of Intelligence Information Systems, Vol.25, No. 1, pp. 95-121, 2005. [35]
K.E. Parsopoulos, E.I. Papageorgiou, P.P. Groumpos, M.N. Vrahatis, “A First Study of Fuzzy Cognitive Maps
Learning
Using Particle Swarm
Optimization”, Proceedings of IEEE Congr. on Evolutionary Computation, pp. 1440-1447, 2003. [36]
M.S. Khan, S. Khor and A. Chong, “Fuzzy Cognitive Maps With Genetic Algorithm For Goal-Oriented Decision Support”, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol. 12, pp. 31-42, 2004.
[37]
D.A. Coley, “An Introduction to Genetic Algorithms for Scientists and Engineers”, World Scientific Publishing, 2001.
38
Published: International Journal of Pattern Recognition and Artificial Intelligence vol.22, no.8, pp. 1461-1486, 2008. [38]
F. Langlet, H. Abdulkader, D. Roviras, A. Mallet and F. Castanie, “Comparison of Neural Network Adaptive Predistortion Techniques for Satellite Down Links”, Proceedings of International Joint Conference on Neural Networks, pp. 709-714, 2001.
[39]
A. Asuncion, and D.J. Newman, “UCI Machine Learning Repository”, [http://www.ics.uci.edu/~mlearn/MLRepository.html], Irvine, CA: University of California, School of Information and Computer Science, 2007.
[40]
L. I. Kuncheva, “Combining Pattern Classifiers: Methods and Algorithms”, Wiley-Interscience Publiching, 2004.
39