Neurocomputing 149 (2015) 198–206
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
A Fully Complex-valued Fast Learning Classifier (FC-FLC) for real-valued classification problems M. Sivachitra a,n, R. Savitha b, S. Suresh b, S. Vijayachitra c a
Department of Electrical and Electronics Engineering, Kongu Engineering College, Perundurai, India School of Computer Engineering, Nanyang Technological University, Singapore c Department of Electronics and Instrumentation Engineering, Kongu Engineering College, Perundurai, India b
art ic l e i nf o
a b s t r a c t
Article history: Received 5 September 2013 Received in revised form 21 April 2014 Accepted 22 April 2014 Available online 16 September 2014
This paper presents a Fully Complex-valued Fast Learning Classifier (FC-FLC) to solve real-valued classification problems. FC-FLC is a single hidden layer network with a nonlinear input and hidden layer, and a linear output layer. The neurons at the input layer of the FC-FLC employ the circular transformation to convert the real-valued input features to the Complex domain. At the hidden layer, the complexvalued input features are projected onto a hyper-dimensional complex plane (Cm -CK ) using the K hidden neurons employing Gudermannian (Gd) activation function. To investigate the suitability of the Gd as an activation function for a fully complex-valued network, we formulate the activation function in two forms. The output layer is linear. The input weights of the FC-FLC are chosen randomly and the output weights are estimated analytically. The best input weights corresponding to the best generalization performance of the FC-FLC are obtained by a k-fold cross validation. The performance of the proposed classifier is evaluated in comparison to other complex-valued and a few best performing realvalued classifiers on a set of benchmark classification problems from the UCI machine learning repository and a practical human emotion recognition problem. Statistical analysis is carried out to validate the performance of the classifier. Performance results show that FC-FLC has better classification ability than the other classifiers in the literature. & 2014 Elsevier B.V. All rights reserved.
Keywords: Complex-valued neural network Classification Gudermannian function Circular transformation Extreme learning machine
1. Introduction Signals in applications like communication [1], signal processing [2–5], image processing [6,7], etc., are inherently complexvalued and change with time. In order to preserve their physical characteristics, these signals and their nonlinear transformations have to be represented in the Complex domain. Therefore, it is imperative that adaptively learning algorithms can be developed in the Complex domain. Therefore, researchers began focussing on the development of neural networks and their learning algorithms in the Complex domain. The main challenge in the development of a complex-valued neural network is the lack of an entire and bounded activation function as Liouville's theorem states that an entire and bounded function is a constant in the Complex domain [8]. Initially, this constraint was overcome with the use of split complex-valued networks, where the complex-valued signals were divided into their real and imaginary/magnitude and phase components and real-valued networks were used to operate on these signals [9].
n
Corresponding author. E-mail address:
[email protected] (M. Sivachitra).
http://dx.doi.org/10.1016/j.neucom.2014.04.075 0925-2312/& 2014 Elsevier B.V. All rights reserved.
However, the splitting of complex-valued signals into their real components is not an efficient representation and results in loss of significant complex-valued information [10]. Therefore, Kim et al. reduced the desired properties of a complex-valued activation function to be analytic and bounded almost everywhere [11] and suggested a set of elementary transcendental functions with singularities [12] as activation functions for the fully complexvalued multi-layer perceptron networks. The convergence of these activation functions and the effect of their singularities were experimentally studied in detail in [13]. On the other hand, the Gaussian function that maps the complex-valued input features to the Real domain (C-R) was employed in the Complex-valued Radial Basis Function (CRBF) network [14]. Therefore, although the centers of the Gaussian activation function and the output weights of the CRBF network were complex-valued, the responses of the neurons in the hidden layer are real-valued. Moreover, during the backward computation for parameter updates, the correlation between the real and imaginary parts of error was lost. Hence, the CRBF is not capable of approximating phase of the complex-valued signal accurately. To overcome this, in [15] a Fully Complex-valued Radial Basis Function network (FC-RBF) using the fully complex-valued ‘sech’ activation function and its learning algorithm was developed.
M. Sivachitra et al. / Neurocomputing 149 (2015) 198–206
It was also shown that the FC-RBF network with the fully complexvalued ‘sech’ activation function has better phase approximating abilities than the CRBF network. However, the ‘sech’ activation function has its singularities in the finite region of the Complex domain. This imposes strict restrictions on the operating region of the network so as to avoid hitting these singular points. Hence, there is a need to identify better activation functions that do not restrict the operating region of the complex-valued neural network. It has been recently shown that the complex-valued networks have better computational power than the real-valued networks [16] and their orthogonal decision boundaries [17] render them better decision makers than their real-valued counterparts. This decision making ability of complex-valued networks motivated researchers to develop efficient classifiers in the Complex domain. Classifiers developed in the Complex domain include the Multi Layered Network with Multi-Valued Neurons (MLMVN) [18], the single layered complex-valued classifier referred to as, ‘Phase Encoded Complex-valued Neural Network (PE-CVNN)’ [19], the Bilinear Branchcut Complex-valued Extreme Learning Machine (BB-CELM), the Phase Encoded Complex-valued Extreme Learning Machine (PE-CELM) [20], the Fully Complex-valued Radial Basis Function network (FC-RBF) classifier [21] and the Circular Complex-valued Extreme Learning Machine (CC-ELM) [22]. A brief summary of these classifiers is discussed in Section 2. A complete survey of supervised learning algorithms in complex valued neural networks and recent advancements in complex valued neural network can be found in [23,24]. In this paper, we present a Fully Complex-valued Fast Learning Classifier (FC-FLC) to solve real-valued classification problems. The classifier is a single hidden layer network with a nonlinear input and hidden layer and a linear output layer. The neurons in the input layer of the FC-FLC employ a circular transformation [22] to map the real-valued input features to the Complex domain. The neurons at the hidden layer of FC-FLC employ the fully complexvalued Gudermannian function (‘Gd’) as the activation function, as it is shown to be a good discriminator in [25]. This function quantitatively characterize the Hubble sequence of spiral galaxies including grand design and barred spirals. Special shapes such as ring galaxies with inward and outward arms are also described by the analytic continuation of the same Gd function and hence, we investigate the suitability of the function as a complex-valued activation function to solve real-valued classification problems. We formulate two activation functions using the Gd function to develop the FC-FLC. The input weights of the FC-FLC are chosen randomly and the output weights are estimated as a solution to an unconstrained linear programming problem. Therefore, the FC-FLC requires minimal computational effort during the training process. It should be noted here that the optimal input weights of FC-FLC are chosen using a k-fold cross-validation approach, presented in [26]. The performance of the FC-FLC with the proposed Gudermannian activation functions at the hidden layer is evaluated using a set of benchmark real-valued problems with balanced and unbalanced data sets from the UCI machine learning repository [27]. In all these problems, the performance of FC-FLC is compared against other complex-valued classifiers and a few best performing real-valued classifiers available in the literature. Performance results show that the FC-FLC with the Gd activation function performs better than the best performing real-valued classifiers and are comparable with other complex-valued classifiers available in the literature. The paper is organized as follows: a brief review on the existing complex-valued classifiers is presented in Section 2. In Section 3, the real-valued classification problem is defined in the Complex domain and the network architecture, and description about Gd
199
function and learning algorithm of FC-FLC is described in detail. Section 4 presents the performance study results of the proposed classifier using a set of multi-category and binary benchmark problems from UCI machine learning repository. Performance of the FC-FLC classifier is analyzed using a practical human emotion recognition problem. Statistical tests are also performed to validate the performance of the classifier. Finally, in Section 5, the summary of the work and the observations from the study are presented.
2. A literature survey on complex-valued classifiers In 2003, Nitta showed that the complex-valued networks have better computational power than the real-valued networks [16]. He also showed that the XOR problem that cannot be solved with a single real-valued neuron can be solved with a single complexvalued neuron [28]. It was later shown by him that the complexvalued neural networks with split-type of activation functions have two decision boundaries and that these decision boundaries are orthogonal to each other [17]. Thereafter, researchers focussed on developing efficient complex-valued classifiers to solve realvalued classification problems. It must be noted that in order to solve real-valued classification problems in the Complex domain, the input features have to be transformed to the Complex domain. The multi-layered network with multi-valued neurons [18] was the first classifier developed in the Complex domain. The MLMVN employs multi-valued neurons that uses a piecewise continuous activation function to map the complex-valued inputs to C discrete class labels (i.e., C-valued threshold logic, where C is the number of classes). In MLMVN, the normalized real-valued input features (x) are mapped onto a unit circle using expði2πxÞ transformation and the class labels are encoded by the roots of unity in the Complex plane. However, as the input features are mapped to a unit circle, this mapping results in the same complex-valued features for the real-valued features with values 0 and 1 (i.e., the transformation is not unique). This affects the classification performance of the MLMVN. Moreover, as the number of classes increases, the region of sector within the unit circle for each output neuron decreases, which might further affect the classification performance of MLMVN in multi-category classification problems. Further, the learning algorithm in MLMVN uses a derivative free error correcting rule to estimate the parameters of the network. Therefore, MLMVN requires significant computational effort during the training process. In the single layer phase encoded complex-valued neural network [19,29], the real-valued input features (x) are phase encoded in ½0; π, using the transformation ‘expðiπxÞ’ to obtain the complexvalued input features. Therefore, the transformation of the PE-CVNN used to transform the real-valued input features to the Complex domain performs one-to-one mapping of features from R-C and is, hence, unique. However, the activation functions used in the PE-CVNN are similar to those used in the split complex-valued neural networks. Therefore, the gradients used in the parameter update of the PE-CVNN are not fully complexvalued, and, hence, this might result in a significant loss of complex-valued information [11] resulting in misclassification. Moreover, the PE-CVNN uses a gradient descent based learning algorithm to estimate the parameters of the network, and hence, requires huge computational effort during training. A multi-layer classifier, namely ‘Fully Complex-valued Radial Basis Function classifier (FC-RBF)’, has been introduced in [21]. An FC-RBF network is the basic building block of the FC-RBF classifier, with the phase encoded transformation introduced in [19,29] at the input layer, to convert the real-valued features to the Complex domain. However, the FC-RBF classifier uses a gradient descent
200
M. Sivachitra et al. / Neurocomputing 149 (2015) 198–206
based batch learning algorithm to estimate the parameters of the network and requires huge computational effort during training. Hence, two fast learning three-layered classifiers, referred to as, ‘Bilinear Branch-cut Complex-valued Extreme Learning Machine (BB-CELM)’ and the ‘Phase Encoded Complex-valued Extreme Learning Machine (PE-CELM)’, were developed in [20]. The classifiers are single hidden layer complex-valued neural networks. In BB-CELM, a bilinear transformation with a branchcut around 2π was employed at the input layer to transform the real-valued input features to the Complex domain, while in PE-CELM, the neurons in the input layer employed the phase encoded transformation. The fully complex-valued ‘sech’ function was used at the hidden layer of these classifiers and the neurons in the output layer are linear. Although these classifiers are fast and use fully complex-valued activation function at the hidden layer, the BB-CELM and PE-CELM classifiers use the bilinear transformation and the phase encoded transformation used in MLMVN and PE-CVNN classifiers, respectively. Hence, the transformation used in BB-CELM and PE-CELM does not completely exploit the orthogonal decision boundaries of the fully complex-valued classifiers. Therefore, a circular transformation that uniquely maps the real-valued input features to all the four quadrants of the Complex domain has been proposed and a ‘Circular Complex-valued Extreme Learning Machine (CC-ELM)’ has been developed in [22], and this transformation has been used in [30,31]. For complete details on the complex-valued neural networks in the literature, one must refer to [32]. The sechð:Þ activation function used in FC-RBF, PE-CELM, BBCELM, and CC-ELM classifiers has its singularity in the finite region of the Complex domain. These singularities interfere with the operating region of the network and affect the convergence of the network during the learning process. Hence, there is a need to identify an analytic and almost bounded complex-valued activation function with singularities that do not interfere with the convergence of the network. To overcome these issues, in the next section, we introduce a Fully Complex-valued Fast Learning Classifier to solve real-valued classification problems.
3. Description of the Fully Complex-valued Fast Learning Classifier (FC-FLC) In this section, we first define the real-valued classification problem in the Complex domain. Next, we describe the architecture of the FC-FLC and finally, we present the learning algorithm of FC-FLC in detail.
3.2. Architecture of FC-FLC The FC-FLC is a single hidden layer fully complex-valued network with a nonlinear input and hidden layer, and a linear output layer, as shown in Fig. 1. In order to solve the real-valued classification problem in the Complex domain, the real-valued input features should be transformed to the Complex space (Rm -Cm ). The neurons in the input layer of FC-FLC perform this transformation by employing the circular transformation proposed in [22] and the complex-valued input features are given by zt ¼ ½zt1 ⋯ztj ⋯ztm T . The circular transformation for the jth feature of the tth sample is given by t
ztj ¼ sin ðaxtj þ ibxj þ αj Þ;
ð2Þ
where a; b A ½0; 1 are randomly chosen scaling constants and αj A ½0; 2π is used to shift the origin to enable effective usage of the four quadrants of the Complex plane. As the circular transformation performs one-to-one mapping of the real-valued input features to the Complex domain, it uses all the four quadrants of the Complex domain effectively and overcomes the issues due to transformation in the existing complex-valued classifiers. The neurons at the hidden layer of the FC-FLC use the Gd function whose magnitude and phase responses have an inverted Gaussian characteristic to map the complex-valued input features to a hyper-dimensional Complex domain. It has been shown in [25] that the Gd function is a good discriminator. In this paper, we use the Gd to formulate two fully complex-valued activation functions. This function has the following properties: 1. Gd function directly relates hyperbolic angles with circular angles (provides inner transformations). 2. The magnitude response plot of Gd function has inverse gaussian function characteristics. 3. The Gd function provides a smooth transition between extremities. 4. It is continuously differentiable. This function satisfies the desirable properties of a complexvalued activation function. The magnitude response plot for the Gd
Inset x1
Circular Transformation sin ( ax1t+ibx1t+ α1 )
3.1. Definition of the real-valued classification problem in the Complex domain Let x1 ; c1 ; …; xt ; ct ; …; xN ; cN be the training data set with N samples, where xt ¼ ½xt1 ; ⋯xtm T A Rm is the m-dimensional realvalued input feature vector of the tth sample, and ct A f1; 2; …; Cg is its class label. The coded class label of the tth sample yt ¼ ½yt1 ; ⋯ytC T A CC in the Complex domain is given by ( 1 þ i1 if ct ¼ l; l ¼ 1; …; C ð1Þ ytl ¼ 1 i1 otherwise; pffiffiffiffiffiffiffiffi where i ¼ 1 is a Complex operator. The classification problem in the Complex domain is defined as the following. 1 1 Given the training data set x ; y ; …; xt ; yt ; …; xN ; yN , estimate the decision function DF : xm -yC , such that the network is capable of predicting the class label of new, unseen samples as accurately as possible.
j ¼ 1; …; m
T
z1
h1 = atan (sinh (p1 z )) OR
Input neuron
x1
h2
z1
w11
w
y^1
C2
xm
zm
T
h1 = atan (sinh ( u1 (z −v1) ))
y^C
hK−1
hK
wCK
Fig. 1. The architecture of the FC-FLC. The inset shows the circular transformation used at the input layer.
M. Sivachitra et al. / Neurocomputing 149 (2015) 198–206
201
t
OR hj ¼ AF 2 ðuj ; vj ; zt Þ
ð6Þ
The output neurons are linear and the predicted output at the lth output neurons of the FC-FLC is given by K
btl ¼ ∑ wlj htj ; y j¼1
l ¼ 1; …; C
ð7Þ
where wlj A C is the output weight connecting the jth hidden neuron and the lth output neuron. Hence, P A CCK are the C K output weights of the network. The predicted class label is obtained using t b btl ð8Þ c ¼ max real y l ¼ 1;2;…;C
In the next section, we present the learning algorithm of FC-FLC in detail. 3.3. Learning algorithm of FC-FLC Similar to the C-ELM [33], the input weights of the FC-FLC are chosen randomly and the output weights are estimated as a solution to a set of linear equations. The response of the neurons in the hidden layer (H A CNK ) is given by 2 3 AF 1 ðp1 ; z1 Þ ⋯ AF 1 ðp1 ; zN Þ 6 7 ⋮ ⋮ ⋮ HðP; ZÞ ¼ 4 ð9Þ 5 1 N AF 1 ðpK ; z Þ ⋯ AF 1 ðpK ; z Þ ð10Þ
OR 2 6 HðU; V ; ZÞ ¼ 4
Fig. 2. Magnitude and phase response plots of Gd function. (a) Magnitude response plot for Gd of FC-FLC network; (b) phase response plot for Gd of FC-FLC network.
function is shown in Fig. 2(a) and the phase response plot is shown in Fig. 2(b). The magnitude response plot follows the inverse Gaussian characteristics. At the origin, the magnitude is zero and as the input increases, the magnitude of the Gd function increases. Moreover, we can observe that the magnitude of Gd function does not reach infinity and is bounded. Similar characteristics can be observed in the phase response plots also. As the singularities of the Gd function does not interfere with the operating region of a complex-valued neural network, we investigate the suitability of the Gd function in performing realvalued classification by formulating two activation functions. The two activation functions, referred as AF1 and AF2 henceforth, are defined as AF 1 ðpj ; zt Þ ¼ atanðsinhðpTj zt ÞÞ;
j ¼ 1; …; K
AF 2 ðuj ; vj ; zt Þ ¼ atanðsinhðuTj ðzt vj ÞÞ;
j ¼ 1; …; K
ð3Þ ð4Þ
where K represents the total number of hidden neurons, pj A Cm are the K m input weights, uj A Cm is the complex-valued scaling factor, vj A Cm is the complex-valued center of the jth hidden neuron. t Hence, the response of the hidden layer ðhj Þ of the FC-FLC is given by t
hj ¼ AF 1 ðpj ; zt Þ;
j ¼ 1; …; K
ð5Þ
AF 2 ðu1 ; v1 ; z1 Þ
⋯
AF 2 ðu1 ; v1 ; zN Þ
⋮
⋮
⋮
AF 2 ðuK ; vK ; z1 Þ
⋯
AF 2 ðuK ; vK ; zN Þ
3 7 5
ð11Þ
Here H is a K N hidden layer output matrix. The jth row of the matrix H represents the response of the hidden neuron hj for the inputs (zt ; t ¼ 1; …; N). Hence, the output weights can be obtained as a least square solution to the set of linear equations at the output layer, and is given by W ¼ YH †
ð12Þ †
where H is the Moore–Penrose pseudo-inverse of the hidden layer output matrix, and Y is the complex-valued coded class label of all the N samples. The number of hidden neurons and the choice of input parameters influence the performance of the FC-FLC. Therefore, a k-fold cross-validation, similar to that presented in [26], is used to choose the number of hidden neurons and the input weights corresponding to the best generalization performance of the FC-FLC. In such a k-fold cross-validation, the training data set is divided into k equal subsets. Then the FC-FLC algorithm is called k times, each time leaving one of the subsets from training. The generalization accuracy is calculated based on the omitted subset. The input parameter corresponding to the best generalization accuracy is chosen as the optimal parameter of the FC-FLC.
4. Performance evaluation of the FC-FLC In this section, we evaluate the performance of the FC-FLC on a set of benchmark classification problems from the UCI machine learning repository and a practical human emotion recognition problem. The number of classes, the number of features, the number of samples in the training and the testing data sets for the various benchmark real-valued classification problems are presented in Table 1. The availability of small number of samples,
202
M. Sivachitra et al. / Neurocomputing 149 (2015) 198–206
Table 1 Description of benchmark real-valued classification data sets selected from the UCI machine learning repository for performance study. Type of data set
Multi-categ.
Binary
Problem
Image segmentation Vehicle classification Glass identification Acoustic emission Liver Disorder PIMA Data Breast Cancer Ionosphere
No. of features
No. of classes
No. of samples
I:F:
Training
Testing
19 18 9 5
7 4 6 4
210 424 109 62
2100 422 105 137
0 0.1 0.68 0.1
6
2
200
145
0.17
8
2
400
368
0.225
9
2
300
383
0.26
34
2
100
251
0.28
the sampling bias, and the overlap between classes may affect the classification performance of the classifier [34]. Hence, we also consider the Imbalance Factor (I:F:) of the training data sets to study the effect of these factors. The I:F: is defined as
minl ¼ 1⋯C N l ðI:F:Þ ¼ 1 nC ð13Þ N where Nl is the number of samples belonging to a class l. Note that N ¼ ∑Cl¼ 1 N l . The imbalance factor gives a measure of the sample imbalance in the various classes of the training data set, and the imbalance factor of each data set considered in this study is also presented in Table 1. It can be observed from the table that the imbalance factors of the unbalanced data sets considered in the study vary widely from 0.1 to 0.68, and the image segmentation problem with a balanced data set has an imbalance factor of 0. 4.1. Performance measures The average (ηa) (Eq. (14)) and over-all (ηo) (Eq. (15)) classification efficiencies [35] of the classifiers derived from their confusion matrices are used as the performance measures for comparison in this study ηa ¼
1 C qii ∑ 100% C i ¼ 1 Ni
ð14Þ
ηo ¼
∑Ci¼ 1 qii 100% ∑Ci¼ 1 N i
ð15Þ
where qii is the total number of correctly classified samples in the class ci and Ni is the total number of samples belonging to a class ci in the data set. In this paper, an appropriate number of hidden neurons is selected using the addition/deletion of neurons to obtain an optimal performance, as discussed in [36] for realvalued networks. 4.2. Performance study on multi-category benchmark data sets First we present the results of the FC-FLC on the multi-category benchmark classification problems, viz., the image segmentation problem, the vehicle classification problem, glass identification and the acoustic emission problem. In these problems, the performance results of the FC-FLC are compared with other complex-valued classifiers, viz., CSRAN [37], FC-RBF [21] and CC-ELM [22] classifiers. To highlight the advantages of using a complex-valued classifier, the performance of the FC-FLC is also compared with a few best performing real-valued classifiers, viz., the Support Vector Machines (SVM) [38], the Self-regulatory Resource Allocation Network (SRAN)
[39], and the Extreme Learning Machines (ELM) [40]. The number of hidden neurons, the training time and the testing performance results of the FC-FLC, in comparison with other complex-valued and real-valued classifiers for the four multi-category benchmark problems, are presented in Table 2. The results for the real-valued and complex-valued classifiers are reproduced from [30]. From the table, it can be seen that the performance of the FC-FLC (AF2) is better than the real-valued classifiers chosen for comparison, especially in the vehicle classification and glass identification problems with unbalanced data set. In the vehicle classification, the overall efficiency and average efficiencies are better than the other classifiers considered, whereas for glass identification problem, the average efficiency is better than all other classifiers taken for comparison. Comparing the performance of the FC-FLC (AF2) with the complex-valued classifiers, its performance is better than CSRAN and FC-RBF classifiers and is comparable with CC-ELM classifier. Moreover, the FC-FLC requires minimal hidden neurons and minimal computational effort compared to other classifiers. 4.3. Performance study on binary benchmark classification problems In this section, we present the performance results of the FC-FLC on the binary benchmark data sets presented in Table 1. The performance of these classifiers in comparison with the other complex-valued and best performing real-valued classifiers is presented in Table 3. From the table, it can be observed that the FC-FLC outperforms all the real valued classifiers, CSRAN and FC-RBF classifiers and its performance is comparable to those of CC-ELM classifier. In the PIMA data set, the performance of FC-FLC is slightly better than CC-ELM classifier. Hence, it can be inferred that the proposed classifier has better decision making capabilities compared to other real-valued and complex-valued classifiers. Next, we perform statistical tests to validate the performance of the FC-FLC. First, the Friedman test [41] is conducted based on the overall and average efficiencies of the classifiers ranked in Tables 2 and 3. Based on the overall efficiencies of the classifiers, the average ranks of the SVM, ELM, SRAN, CSRAN, FC-RBF, CC-ELM, FC-FLC (AF1) and FC-FLC (AF2) classifiers can be calculated as 6.25, 6.5, 4.437, 6.5, 4.937, 1.5, 3.68 and 2.1875, respectively. Under the null hypothesis which states that all the classifiers are equivalent and so their ranks must be equal, the Friedman statistic ðχ 2F Þ is 34.6192. A better statistic which is derived by Iman and Davenport [42] follows the F-distribution and it is 11.33. The modified statistic follows the F-distribution with 7 and 49 degrees of freedom and the critical value for rejecting the null hypothesis at a significance level of 0.05 is 2.20. Since the modified Friedman statistic ðF F Þ is
M. Sivachitra et al. / Neurocomputing 149 (2015) 198–206
203
Table 2 Performance results for the multi-category benchmark data sets. Problem
Image segmentation
Classifier Domain
Real
Complex
Vehicle classification
Real
Complex
Glass identification
Real
Complex
Acoustic emission
Real
Complex
a
Classifier
No. of parameters
Time (s)
Testing ηðrankÞ o
ηðrankÞ a
SVMa
127
721
91:38ð6Þ
91:38ð6Þ
ELM
1323
0.25
90:23ð7Þ
90:23ð7Þ
SRAN
1296
22
93
ð3Þ
88
ð8Þ
93ð3Þ 88ð8Þ
CSRAN
4860
339
FC-RBF
3420
421
92:33
ð4Þ
92:33ð4Þ
CC-ELM
3120
0.03
93:52ð1Þ
93:52ð1Þ
FC-FLC(AF1)
3640
0.2
FC-FLC(AF2)
2600
0.18
93:3ð2Þ 92.1(5)
93:3ð2Þ 92.1(5)
SVMa
340
550
70:62ð8Þ
68:51ð8Þ
ELM
3450
0.4
77:01ð5:5Þ
77:59ð5Þ
SRAN
2599
55
75:12
ð7Þ
76:86ð7Þ
CSRAN
6400
352
79:15ð4Þ
79:16ð4Þ
FC-RBF
5600
678
77:01ð5:5Þ
77:46ð6Þ
CC-ELM
3740
0.1084
82:23ð2Þ
82:52ð2Þ
FC-FLC(AF1)
4400
1.09
FC-FLC(AF2)
3080
0.67
81:5ð3Þ 83.18(1)
81:95ð3Þ 83.29(1)
SVMa
183
320
70:47ð8Þ
75:61ð8Þ
ELM
1280
0.05
81:31ð7Þ
87:43ð3Þ
SRAN
944
28
86:2
ð2Þ
80:95ð5:5Þ
CSRAN
3840
452
83:5ð5Þ
78:09ð7Þ
FC-RBF
4320
452
CC-ELM
3000
0.08
83:76ð4Þ 94:44ð1Þ
84:52ð4Þ
80:95ð5:5Þ
FC-FLC(AF1)
2400
0.34
FC-FLC(AF2)
2400
0.1716
82:85ð6Þ 84.76(3)
90:65ð2Þ 92.756(1)
SVMa
22
–
98:54ð6Þ
97:95ð6:5Þ
ELM
100
0.05
98:54ð6Þ
97:95ð6:5Þ
SRAN
100
22
99:27ð3Þ
98:91ð3:5Þ
CSRAN
224
32.6
98:54ð6Þ
98:08ð5Þ
FC-RBF
280
129.6
96:35ð8Þ
95:2ð8Þ
CC-ELM
180
0.031
99:27ð3Þ
99:17ð2Þ
FC-FLC(AF1)
180
0.03
FC-FLC(AF2)
180
0.03
99:27ð3Þ 100(1)
98:91ð3:5Þ 100(1)
Support vectors.
greater than the critical value ð11:33 o 2:20Þ, we can reject the null hypothesis and it can be inferred that the classifiers used in this paper are not similar. Next, we conduct the post ad hoc Bonferroni–Dunn test [43] to emphasize the significance of the FC-FLC classifier. The test assumes that the performances of two classifiers are basically different if their corresponding average ranks are different by a critical difference. As the FC-FLC (AF2) performs better than the FC-FLC (AF1), the FC-FLC (AF2) is used as the key classifier for the statistical study. The critical difference for the classifiers and the various data sets used in this study is calculated as 3.0 for a significance level of 0.10 (critical value q0:1 ¼ 3.0). The differences in the average rank between the FC-FLC (AF2) classifier and the average ranks of the SVM, ELM, SRAN, CSRAN, FC-RBF, CC-ELM and FC-FLC (AF1) classifiers can be calculated as 4.0625, 4.3125, 2.2495, 4.3125, 2.7495, 0.6875 and 1.4925, respectively. Thus, out of the seven classifiers taken for comparison, the differences in average ranks between the FC-FLC classifier (AF2) and the SVM classifier (4.0625), ELM classifier (4.3125) and CSRAN classifier (4.3125) are greater than the critical difference (3.0). Although the differences in average ranks between the FC-FLC classifier and the SRAN classifier (2.2495), FC-RBF classifier (2.7495) and with CC-ELM classifier (0.6875) are less than the
critical difference, it can be observed from Tables 2 and 3 that the FC-FLC (AF2) classifier performs better than SRAN and FC-RBF classifiers and is comparable with CC-ELM and FC-FLC(AF1) classifiers.
4.4. Performance study on a practical human emotion recognition problem In this section, we study the decision making ability of the FC-FLC on a practical Human Emotion recognition problem. Emotion recognition is a challenging problem due to the similarity in the features for various facial expressions. In the literature, there are many benchmark facial expression data sets that are available for evaluating the performance, such as JAFFE database, MMI and Cohn–Kanade database. In order to evaluate the performance of FC-FLC for emotion recognition, we extract LBP based features from JAFFE database with eight sampling points and radius of four as given in [44]. Table 4 presents the 10-fold cross validation performance comparison results of the FCFLC for JAFFE Data set. The results for SVM and Mc-FIS are reproduced from the paper [44]. It can be inferred that the FC-FLC performs slightly better than SVM and Mc-FIS classifiers.
204
M. Sivachitra et al. / Neurocomputing 149 (2015) 198–206
Table 3 Performance comparison on benchmark binary classification problems. Problem
Liver disorders
Classifier Domain
Real-valued
Complex-valued
PIMA data
Real-valued
Complex-valued
Breast cancer
Real-valued
Real-valued
Training time (s)
Testing ηðrankÞ o
ηðrankÞ a
SVMa
141
0.0972
71:03ð6Þ
70:21ð6Þ
ELM
900
0.1685
72:41ð5Þ
71:41ð5Þ
SRAN
819
3.38
CSRAN
560
38
67:59ð7Þ
69:24ð7Þ
FC-RBF
560
133
ð4Þ
75:41ð1Þ
CC-ELM
160
0.059
FC-FLC(AF1)
800
0.18
75:17
ð2:5Þ
74:23ð3Þ
FC-FLC(AF2)
768
0.078
75:17ð2:5Þ
74:03ð4Þ
66:9
ð8Þ
74:46
75:5ð1Þ
65:8ð8Þ
74:83ð2Þ
SVMa
221
0.205
77:45ð7Þ
76:33ð5Þ
ELM
1100
0.2942
76:63ð8Þ
75:25ð6Þ
SRAN
2067
12.24
78:53ð4:5Þ
74:9ð7Þ
CSRAN
720
64
77:99ð6Þ
76:73ð4Þ
FC-RBF
720
130.3
78:534:5Þ
68:49ð8Þ
CC-ELM
400
0.073
81:25ð2Þ
76:86ð3Þ
FC-FLC(AF1)
300
0.09
80:97ð3Þ
77:4ð2Þ
FC-FLC(AF2)
300
0.0624
81:5ð1Þ
77:79ð1Þ
ð6Þ
97:06ð5:5Þ
SVM
24
0.1118
ELM
792
0.1442
96:35
ð7Þ
96:5ð8Þ
SRAN
84
0.17
96:87ð4Þ
97:3ð4Þ
96:6
ð8Þ
96:86ð7Þ
CSRAN
800
60
FC-RBF
400
158.3
CC-ELM
330
0.0811
97:39
FC-FLC(AF1)
660
0.29
96:86ð5Þ
97:06ð5:5Þ
FC-FLC(AF2)
330
0.0156
97:13
ð2Þ
97:45ð2:5Þ
a
Complex-valued
a
No. of parameters
a
Complex-valued
Ionosphere
Classifier
96:08
97:12ð3Þ ð1Þ
97:45ð2:5Þ 97:84ð1Þ
SVM
43
0.0218
91:24ð3Þ
88:51ð4Þ
ELM
1184
0.0396
89:64ð6:5Þ
87:52ð7Þ
SRAN
777
3.7
90:84ð4Þ
91:88ð1Þ
CSRAN
420
86
88:05ð8Þ
85:78ð8Þ
FC-RBF
1400
186.2
89:64ð6:5Þ
88:01ð5Þ
CC-ELM
1080
0.0312
92:43ð1Þ
89:93ð2Þ
FC-FLC(AF1)
1800
0.09
90:83
ð5Þ
87:95ð6Þ
FC-FLC(AF2)
1800
0.078
91:63ð2Þ
89:31ð3Þ
Support vectors.
Table 4 Performance comparison results for the human emotion recognition problem (JAFFE data set). Classifier domain
Real Valued Complex-valued
Classifier
SVM Mc-FIS FC-FLC
Testing No: of parameters
ηo
50,730 23,763 37,380
88.09 7 7.18 89.05 7 3.214 89.437 0.7314
5. Conclusion In this paper, we have presented a Fully Complex-Valued Fast Learning Classifier to solve real-valued classification problems. The FC-FLC is a single hidden layer network with a nonlinear input layer to transform the real-valued input features to the Complex domain. The neurons in the input layer of FC-FLC employ a circular transformation to convert the real-valued input features to the Complex domain. At the hidden layer, the complex-valued input features are projected onto a hyper-dimensional Complex plane (Cm -CK ) using the K hidden neurons employing the Gudermannian (Gd) activation function. To investigate the suitability of the
Gd as an activation function for a fully complex-valued network, we have formulated the activation function in two forms (AF1) and FC-FLC(AF2). The neurons at the output layer are linear. The input weights of the FC-FLC are chosen randomly and the output weights are estimated as a solution to an unconstrained linear programming problem. The optimum input weights are chosen using a k-fold cross-validation. The performance of the FC-FLC is evaluated on a set of benchmark classification problems from the UCI machine learning repository and a practical human emotion recognition problem. The experimental and statistical studies on the various benchmark data sets show that the FC-FLC with the Gd activation function of the form AF2 outperforms other best performing classifiers in the literature and the FC-FLC with the Gd activation of the form AF1. Performance study in comparison with other complex-valued and the best performing real-valued classifiers clearly indicate that the proposed classifier has a better classification ability. References [1] M.B. Li, G.B. Huang, P. Saratchandran, N. Sundararajan, Complex-valued growing and pruning RBF neural networks for communication channel equalisation, IEE Proc.—Vis. Image Signal Process. 153 (4) (2006) 411–418. [2] J.C. Bregains, F. Ares, Analysis, synthesis and diagnosis of antenna arrays through complex-valued neural networks, Microw. Opt. Technol. Lett. 48 (8) (2006) 1512–1515.
M. Sivachitra et al. / Neurocomputing 149 (2015) 198–206
[3] R. Savitha, S. Vigneshwaran, S. Suresh, N. Sundararajan, Adaptive beamforming using complex-valued radial basis function neural networks, in: IEEE Region 10 Conference, 2009 (TENCON 2009), November 23–26 2009, pp. 1–6. [4] C. Shen, H. Lajos, S. Tan, Symmetric complex-valued RBF receiver for multipleantenna-aided wireless systems, IEEE Trans. Neural Netw. 19 (9) (2008) 1659–1665. [5] D. Jianping, N. Sundararajan, P. Saratchandran, Complex-valued minimal resource allocation network for nonlinear signal processing, Int. J. Neural Syst. 10 (2) (2000) 95–106. [6] I. Aizenberg, D.V. Paliy, J.M. Zurada, J.T. Astola, Blur identification by multilayer neural network based on multivalued neurons, IEEE Trans. Neural Netw. 19 (5) (2008) 883–898. [7] N. Sinha, M. Saranathan, K.R. Ramakrishna, S. Suresh, Parallel magnetic resonance imaging using neural networks, in: IEEE International Conference on Image Processing, 2007 (ICIP 2007) vol. 3, 2007, pp. 149–152. [8] R. Remmert, Theory of Complex Functions, Springer-Verlag, New York, USA, 1991. [9] H. Leung, S. Haykin, The complex backpropagation algorithm, IEEE Trans. Signal Process. 39 (September) (1991) 2101–2104. [10] A.J. Noest, Phasor neural networks, Neural Inf. Process. Syst. 2 (1989) 584–591 (Online: 〈http://books.nips.cc/nips02.html〉). [11] T. Kim, T. Adali, Fully complex multi-layer perceptron network for nonlinear signal processing, J. VLSI Signal Process. Syst. Signal Image Video Technol. 32 (1/2) (2002) 29–43. [12] T. Kim, T. Adali, Approximation by fully complex multi-layer perceptrons, Neural Comput. 15 (7) (2003) 1641–1666. [13] R. Savitha, S. Suresh, N. Sundararajan, P. Saratchandran, A new learning algorithm with logarithmic performance index for complex-valued neural networks, Neurocomputing 72 (16–18) (2009) 3771–3781. [14] S. Chen, S. McLaughlin, B. Mulgrew, Complex valued radial basis function network. Part I: network architecture and learning algorithms, EURASIP Signal Process. J. 35 (1) (1994) 19–31. [15] R. Savitha, S. Suresh, N. Sundararajan, A fully complex-valued radial basis function network and its learning algorithm, Int. J. Neural Syst. 19 (4) (2009) 253–267. [16] T. Nitta, The computational power of complex-valued neuron, in: Artificial Neural Networks and Neural Information Processing ICANN/ICONIP. Lecture Notes in Computer Science, vol. 2714, 2003, pp. 993–1000. [17] T. Nitta, Orthogonality of decision boundaries of complex-valued neural networks, Neural Comput. 16 (1) (2004) 73–97. [18] I. Aizenberg, C. Moraga, Multilayer feedforward neural network based on multi-valued neurons (MLMVN) and a backpropagation learning algorithm, Soft Comput. 11 (2) (2007) 169–183. [19] M.F. Amin, K. Murase, Single-layered complex-valued neural network for realvalued classification problems, Neurocomputing 72 (4–6) (2009) 945–955. [20] R. Savitha, S. Suresh, N. Sundararajan, H.J. Kim, Fast learning fully complexvalued classifiers for real-valued classification problems, in: D. Liu, et al. (Eds.), ISNN 2011, Part I, Lecture Notes in Computer Science (LNCS), vol. 6675, 2011, pp. 602–609. [21] R. Savitha, S. Suresh, N. Sundararajan, H.J. Kim, A fully complex-valued radial basis function classifier for real-valued classification, Neurocomputing 78 (1) (2012) 104–110. [22] R. Savitha, S. Suresh, N. Sundararajan, Fast learning circular complex-valued extreme learning machine (CC-ELM) for real-valued classification problems, Inf. Sci. 187 (1) (2012) 277–290. [23] S. Suresh, N. Sundararajan, R. Savitha, Supervised Learning with ComplexValued Neural Networks, Studies in Computational Intelligence, Vol. 421, Springer Verlag, Berlin Heidelberg, 2013. [24] A. Hirose, Complex-valued Neural Networks, Studies in Computational Intelligence, Vol. 400, Springer Verlag, Berlin Heidelberg, 2006. [25] H.I. Ringermacher, L.R. Mead, A new formula describing the scaffold structure of spiral galaxies, Mon. Not. R. Astron. Soc. 397 (2009) 164–171. [26] S. Suresh, R.V. Babu, H.J. Kim, No-reference image quality assessment using modified extreme learning machine classifier, Appl. Soft Comput. 9 (2) (2009) 541–552. [27] C. Blake, C. Merz, UCI Repository of Machine Learning Databases, Department of Information and Computer Sciences, University of California, Irvine, URL: 〈http://archive.ics.uci.edu/ml/〉, 1998. [28] T. Nitta, Solving the xor problem and the detection of symmetry using a single complex-valued neuron, Neural Netw. 16 (8) (2003) 1101–1105. [29] M.F. Amin, M.M. Islam, K. Murase, Ensemble of single-layered complex-valued neural networks for classification tasks, Neurocomputing 72 (10–12) (2009) 2227–2234. [30] R. Savitha, S. Suresh, N. Sundararajan, Projection-based fast learning fully complex-valued relaxation neural network, IEEE Trans. Neural Netw. Learn. Syst. 24 (2013) 529–541. [31] R. Savitha, S. Suresh, N. Sundararajan, A meta-cognitive learning algorithm for a fully complex-valued relaxation network, Neural Netw. 32 (2012) 209–218. [32] K. Subramanian, R. Savitha, S. Suresh, A Metacognitive Complex-Valued Interval Type-2 Fuzzy Inference System, IEEE Trans. Neural Networks Learn. Syst. 25 (9) (2014) 1659–1672. [33] M.B. Li, G.-B. Huang, P. Saratchandran, N. Sundararajan, Fully complex extreme learning machine, Neurocomputing 68 (1–4) (2005) 306–314. [34] S. Suresh, N. Sundararajan, P. Saratchandran, Risk-sensitive loss functions for sparse multi-category classification problems, Inf. Sci. 178 (12) (2008) 2621–2638.
205
[35] S. Suresh, N. Sundararajan, P. Saratchandran, A sequential multi-category classifier using radial basis function networks, Neurocomputing 71 (7–9) (2008) 1345–1358. [36] S. Suresh, S.N. Omkar, V. Mani, T.N.G. Prakash, Lift coefficient prediction at high angle of attack using recurrent neural network, Aerosp. Sci. Technol. 7 (8) (2003) 595–602. [37] S. Suresh, R. Savitha, N. Sundararajan, A sequential learning algorithm for complex-valued self-regulating resource allocation network—CSRAN, IEEE Trans. Neural Netw. 22 (2011) 1061–1072. [38] N. Cristianini, J.S. Taylor, An Introduction to Support Vector Machines, Cambridge University Press, Cambridge, UK, 2000. [39] S. Suresh, K. Dong, H.J. Kim, A sequential learning algorithm for self-adaptive resource allocation network classifier, Neurocomputing 73 (16–18) (2010) 3012–3019. [40] G.-B. Huang, D.H. Wang, Y. Lan, Extreme learning machines: a survey, Int. J. Mach. Learn. Cybern. 2 (2011) 107–122. [41] M. Friedman, A comparison of alternative tests of significance for the problem of m rankings, Ann. Math. Stat. 11 (1940) 86–92. [42] J. Demsar, Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res. 7 (2006) 1–30. [43] O.J. Dunn, Multiple comparisons among means, J. Am. Stat. Assoc. 56 (1961) 52–64. [44] K. Subramanian, S. Suresh, R.V. Babu, Meta-cognitive neuro-fuzzy inference system for human emotion recognition, in: International Joint Conference on Neural networks (IJCNN), 2012, pp. 1–7.
M. Sivachitra received B.E Degree in Electrical and Electronics Engineering from Bharathiyar University in the year 2000 and M.E (Applied Electronics) from Anna University, Chennai in the year 2006. She is currently pursuing research in Anna University, Chennai in the area of complex valued neural networks. She has a total teaching experience of 14 years. At present she is working as an Assistant professor (Selection Grade) in Kongu Engineering College, Autonomous institution affiliated to Anna University Chennai. She has attended International conferences and published papers in International journals. She has conducted workshops and seminars in her research area.
Ramasamy Savitha (M’11) received the B.E. degree in electrical and electronics engineering from Manonmaniam Sundaranar University, Thirunelveli, India, the M. E. degree in control and instrumentation engineering from Anna University, Chennai, India, and the Ph.D. degree from the School of Electrical and Electronics Engineering, Nanyang Technological University, Singapore, in 2000, 2002, and 2011, respectively. She was a lecturer at MSEC, affiliated to Anna University from 2002-2005. She is currently a Post-Doctoral Research Fellow with the School of Computer Engineering, Nanyang Technological University. Her current research interests include neural networks, metacognition, and bioinformatics
Sundaram Suresh received the B.E degree in Electrical and Electronics Engineering from Bharathiyar University in 1999, and M.E (2001) and Ph.D. (2005) degrees in Aerospace Engineering from Indian Institute of Science, India. He was post-doctoral researcher in School of Electrical Engineering, Nanyang Technological University from 2005 to 2007. From 2007–2008, he was in INRIA-Sophia Antipolis, France as ERCIM research fellow. He was in Korea University for a short period as a visiting faculty in Industrial Engineering. From January 2009 to December 2009, he was in Indian Institute of Technology-Delhi as an Assistant Professor in Department of Electrical Engineering. Currently, he is working as an Assistant Professor in School of Computer Engineering, Nanyang Technological University, Singapore since 2010. His research interest includes flight control, unmanned aerial vehicle design, machine learning, optimization and computer vision.
206
M. Sivachitra et al. / Neurocomputing 149 (2015) 198–206
S. Vijayachitra received the B.E Degree in Electronics and Communication Engineering from Mookambigai College of Engineering, Pudukkottai, the M.E degree from Annamalai University, Annamalai Nagar, Chidambaram, and the Ph.D. Degree from Anna University, Chennai in 1994, 2001 and 2009 respectively.Since 1994, she started her teaching career at M.P. Nachimuthu M. Jaganathan Polytechnic, Chennimalai as Lecturer in the Department of Electronics and Communication Engineering and served for about six years. Since February 2002, she is with Kongu Engineering College, Perundurai, Erode District, Tamil Nadu State, India, occupying teaching position and she is now the Professor in the Department of Electronics and Instrumentation Engineering. She has conducted many seminars, conferences and short term courses funded by reputed agencies like ICMR, DBT, CSIR, BRNS, AICTE-ISTE and etc. She has published 30 papers in International journals and 53 papers in conference proceedings.