An improved regularized extreme learning machine ... - IEEE Xplore

An improved regularized extreme learning machine based on Symbiotic Organisms Search Boyang Zhang1 , Lingjie Sun1,2 , Haiwen Yuan1 , and Jianxun Lv1

Zhao Ma

1. School of Automation Science and Electrical Engineering Beihang University, Beijing, 100191, China [email protected] 2. Taizhou Vocational and Technical College Taizhou, 318000, China

China Electric Power Research Institute Beijing, 100192, China [email protected]

Abstract— In this paper, a novel data classification approach is proposed based on integration of regularized extreme learning machine and Symbiotic Organisms Search (SOS). In order to simplified the description, the new method is named as SosRELM, which mainly contains two phases. As is known, in compared with traditional classification paths, such as SVM, LSSVM and BP, extreme learning machine expresses its excellent ability in term of accuracy and computing time. Hence, in the first phase, we utilize regularised extreme learning machine with the goal that the output weights can be rapidly calculated. Symbiotic Organisms Search is one of new metaheuristic algorithms with various operations to update the individuals, which outperform DE, GA, and PSO. According to this effective and efficient optimization approach, in the second phase, the set of input wights, hidden biases and regularization parameter are optimized using Symbiotic Organisms Search. And the experimental results indicates that Sos-RELM attain a good comprehensive performance. Index Terms— Classification. Regularized extreme learning machine. Symbiotic Organisms Search.

I. I NTRODUCTION Recently, data classification has become a fundamental issue attracting more and more attention, which is widely used in text recognition, disease diagnosis, automatic license plate recognition, etc. Generally specking, the main paths can be divided into supervised learning, unsupervised clustering, and semi-supervised learning. Among these three types, the supervised learning is more common. In the past decades, researchers have developed various methods aiming at obtaining lower test error with the higher computing efficiency. And, for instance, support vector machine (SVM) [1] distinguishes the different data sets associates with the maximum margin in input space or kernel space. Moreover, artificial neutral network, especially single hidden layer feedback neutral networks (SHLFs) is proved with function approximation capacity [2]. Besides considering test error, computing cost also plays an important role when selecting the hypothesis dealing with classification tasks. The traditional way using gradients to

c 978-1-4673-8644-9/16/$31.00 2016 IEEE

calculate the weighting coefficients, such as back-propagation (BP) [3], leads to a computational burden. In order to conquer such defects, Huang [4] proposes a novel SLFNs method named extreme learning machine (ELM). A. Regularized extreme learning machine Unlike BP algorithm, the weights from input layer to hidden layer and thresholds randomly generate, and the weights from hidden layer to input layer subjecting to the minimum norm along with actual and desired outputs. Hence, to train the entire network, we need only to calculate MoorePenrose Generalized Inverse of the corresponding matrix. This operation saves a lot of time. However, similar to the most types of networks, overfitting problem is still existing for ELM. To avoid this defects, we adopt two typical techniques in this paper, regularization factor [5], [6] and validation simultaneously. B. Meta-heuristic optimization algorithm Optimized problems are often encountered in the actual project where the optimized parameters are needed. Traditional solution methods, such as convex optimization, need more additional information connecting with the object functions or tasks. However, to retain this scenario is not easy due to lacking of analytical form or related feedback. Metaheuristic algorithms inspired by biobehavioral or physical phenomenons help us deal with these troublesome issues without any apriori knowledge, and genetic algorithm (GA) [7] and particle swarm optimization (PSO) [8] are two classical paradigms. Recently, a novel algorithm framework named symbiotic organisms search (SOS) proposed by Cheng [9] shows competitive performances when considering robust and efficient ability, and the simulation results in term of benchmark functions and complex numerical problems confirm its performance. C. Hybrid framework Despite the operation of randomly selecting input weights and hidden biases associating with the goal that computing

1645

time is saved, it also yields a set of sub-optimal solutions [10]. Therefore, we need additional metaheuristic algorithms which is likely utilized to optimize these parameters. So far, a series of achievements according to hybrid framework have been proposed, such as evolutionary extreme learning machine (E-ELM) [10], self-adaptive evolutionary extreme learning machine (SaE-ELM) [11], improved cuckoo search based extreme learning machine (ICSELM) [12], and selfadaptive extreme learning machine (SaELM) [13]. This paper considers a novel framework, an improved regularized extreme learning machine where SOS helps to search for optimal parameters. More details can be found in the rest of this paper. II. P RELIMINARY WORKS A. RELM A typical classification machine learning task can be simplified as the relationship between input data set and output data set. Among them, supervised learning is a common sense. Given a training data set 𝒳 := {⃗𝑥1 , ⋅ ⋅ ⋅ ⃗𝑥𝑛 , ⋅ ⋅ ⋅ ⃗𝑥𝑁 }, where 𝑁 represents the number of samples and ⃗𝑥𝑛 indicates one sample with 𝑀 dimensions 𝑀 such as ⃗𝑥𝑛 := [𝑥1𝑛 , ⋅ ⋅ ⋅ , 𝑥𝑚 𝑛 , ⋅ ⋅ ⋅ 𝑥𝑛 ], and an output data set 𝒴 := {⃗𝑦1 , ⋅ ⋅ ⋅ , ⃗𝑦𝑛 , ⋅ ⋅ ⋅ , ⃗𝑦𝑁 } where ⃗𝑦𝑛 indicates an output sample. Generally speaking, the value is always a real number or text. For calculating conveniently, we convert ⃗𝑦𝑛 into a pattern vector such as ⃗𝑦𝑛 := [𝑦𝑛1 , ⋅ ⋅ ⋅ , 𝑦𝑛𝑙 , ⋅ ⋅ ⋅ , 𝑦𝑛𝐿 ] where range of 𝑦𝑛𝑙 are binary values, −1 and 1 respectively. A typical extreme learning machine owns three layers structure, input layer, hidden layer and output layer respectively. Now let’s focus on the transfer from the input layer to hidden layer, which can be described as ⎡ ⎢ ⎢ H=⎢ ⎣

〈

〉

𝑔( ⃗𝑎1 , ⃗𝑥1 + 𝑏1 )

〈

.. .

〉

〈

⋅⋅⋅

𝑔( ⃗𝑎1 , ⃗𝑥𝑁 + 𝑏1 )

〉

⎤

.. . 〉

⎥ ⎥ ⎥ ⎦

˜

𝑔( ⃗𝑎𝑁 , ⃗𝑥1 + 𝑏𝑁˜ )

⋅⋅⋅

〈

˜

𝑔( ⃗𝑎𝑁 , ⃗𝑥𝑁

⋅⋅⋅

+ 𝑏𝑁˜ )

2

B

2

(2)

Here, 𝑐 denotes the penalty factor ; and Y𝑁 ×𝐿 is the output matrix in binary form. Adopt derivative formulas; so the

1646

ˆ = (H𝑇 H + 𝑐𝐼)−1 H𝑇 Y B

(3)

Algorithm 1 Pseudo code of RELM Require: Training set; validation set; Ensure: Accurate 1: Received input weights, bias, and regularized factor 2: Calculate H𝑇 𝑟𝑎𝑖𝑛 using Equ. 1 ˆ 𝑇 𝑟𝑎𝑖𝑛 using Equ. 3 3: Calculate B 4: Predict the accurate based on validation set B. SOS Symbiotic Organisms Search (SOS) is a novel simple and powerful mataheuristic algorithm imitating various relationships between different organisms who struggle for living. In SOS, each organism in ecosystem corresponds to a possible solution in domain space, and the interaction effects define three different stages which can be activated in sequence. By virtue of such merits, SOS provides a effective and efficient path to engineering tasks including cloud computing [16], multiple-resources leveling optimization [17], and capacitated vehicle routing problem (CVRP) [18]. Note that the process to update individuals applies a remarkable influence on the final results, the structure is delicately designed, see Algorithm 2 where the pseudo code is given [9]. Algorithm 2 Pseudo code of SOS Require: Target function 𝑓 (⃗𝑥) with ⃗𝑥min ≤ ⃗𝑥 ≤ ⃗𝑥max Ensure: Optimal solution {min 𝑓 (⃗𝑥∗ )} 1: Initialize 2: while Loop criterion do 3: Mutualism phase 4: Commensalism phase 5: Parasitism phase 6: end while

˜ 𝑁 ×𝑁

(1) with the weights from input layer to hidden layer ˜ ˜ 𝑇 , ⋅ ⋅ ⋅ 𝑎𝑛𝑀 ] 1≤˜𝑛≤𝑁˜ and the bias parameters ⃗𝑎𝑛˜ = [𝑎𝑛1˜ , ⋅ ⋅ ⋅ , 𝑎𝑛𝑚 ˜ 𝑏1≤˜𝑛≤𝑁˜ where 𝑁 indicates the number of hidden layer. 𝑔 is the activation function which can be chosen with any infinitely differentiable function [15] as well as depends on demands. The key point of RELM, different from traditional SHLFs, is that one randomly assign values to input weight ⃗𝑎𝑛˜ and bias 𝑏𝑛˜ , and the output weight matrix B𝑁˜×𝐿 is calculated subjecting to the convex optimization problem such that [6] min(∥HB − Y∥𝐹 + 𝑐 ∥B∥𝐹 )

optima estimation of B can be easily found, such that

In Algorithm 2, the sequential phases correspond to different manipulation to update the individuals in group. With the iteration going on, the possible area around the optimal solution can be found. As an important issue, the tradeoff between exploration and exploitation decide the efforts that domain space is sufficiently explored and the optimal region, solving accurate, can be elaborately exploited. See Cheng’s original paper [9] to find more details about the operations modeled by equations. III. F RAMEWORK OF S OS -RELM Though the randomly selecting input weights and bias provide a powerful path to prevent the heavy computational burden, endeavors to won a satisfied learning effect will augment the number of hidden nodes due to inefficient use

2016 IEEE 11th Conference on Industrial Electronics and Applications (ICIEA)

of the generalization ability. However, in another direction, it is unadvisable to utilize approaches such as mesh generation or inverse inference to design input weights and bias, or the efforts will in vain. Hence, SOS, one of the metaheuristic algorithms, with flexibility and powerful optimization ability tends to be a good choice. The framework of Sos-RELM mainly consists of two phases that SOS searches for optimized parameters including input weights, bias and the regularization factor and RELM decides the output weights using ⇀ pseudo-inverse. Consider a hybrid vector 𝜐 = ˜ ˜ 𝑁 𝑁 1 1 𝑛 ˜ 𝑛 ˜ [𝑎1 , ⋅ ⋅ ⋅ 𝑎𝑀 , 𝑏1 , ⋅ ⋅ ⋅ , 𝑎1 , ⋅ ⋅ ⋅ 𝑎𝑀 , 𝑏𝑛˜ , ⋅ ⋅ ⋅ , 𝑎1 , ⋅ ⋅ ⋅ 𝑎𝑀 , 𝑏𝑁˜ , 𝑐] ˜ (𝑀 + 1) + 1, and it also where the dimensions is 1 by 𝑁 means that 𝑑, the number of various in such optimization, ˜ (𝑀 + 1) + 1. The purpose of classification is to equals to 𝑁 gain high accurate when dealing with the test set considered as actual data without acquisition in advance. What we can find in supervised case is a priori data set used to train the machine learning hypothesis. Most of approaches, such as BP, SVM and ELM, incline to find the optimized parameters that ensure the smallest error rate. However, in the light of statistical learning theory [19], the excessive pursuit catering to training set will lend to the overfitting problem. Hence, in the former paragraph, the regularization factor is used. Besides, K-fold cross-validation [20], also demonstrates a good performance to avoid overfitting. During the cross-validation process, K-1 subsets from known data are used to train the hypothesis, and the residual single set is retained as validation. This paper chooses the mean value of validation results as the objective function with the goal that small classification error is obtained. Algorithm 3 Pseudo code of calculating the fitness Require: Priori data set Ensure: Average accuracy 𝜉 1: Initialize: Received parameters 2: Randomly divide priori data into 𝐾 subsets 3: for 1 ≤ 𝑖 ≤ 𝐾 do 4: Training := 𝐾 − 1 subsets 5: Validation := 1 subsets 6: Accurate[i] = RELM(training, validation, parameters) 7: end for 8: 𝜉 = average(Accurate) It can be seen that such a hybrid approach is made up of two stages. In the first stage, SOS helps to find the optimal parameter vector using three different update strategies. Subsequently, fitness values are calculated by Algorithm 2. A confused issue that iteration will add the computing time, however, delicate design on random selected parameters save the hidden nodes as well as the computational burden. Algorithm 4 depicts details.

Algorithm 4 Pseudo code of Sos-RELM Require: Optimal parameters Ensure: Priori data set 1: Initialize: Given the number of population 𝑛, the maximum of iteration 𝐼𝑡𝑒𝑟𝑚𝑎𝑥 ; Select group bounds ⃗𝑣min , ⃗𝑣max ; Randomly generate ⃗𝑣𝑖 . 2: for 1 ≤ 𝑗 ≤ 𝐼𝑡𝑒𝑟max do 3: Find ⃗𝑣𝑏𝑒𝑠𝑡 , 𝑓𝑏𝑒𝑠𝑡 among the group 4: Calculate new position ⃗𝑣𝑛𝑒𝑤 using mutualism, zzzz zzz commensalism, and parasitism operations 5: Calculate 𝑓𝑛𝑒𝑤 using Algorithm 3 6: if 𝑓𝑛𝑒𝑤 > 𝑓𝑐𝑢𝑟𝑟𝑒𝑛𝑡 then 7: ⃗𝑣𝑐𝑢𝑟𝑟𝑒𝑛𝑡 = ⃗𝑣𝑛𝑒𝑤 8: 𝑓𝑐𝑢𝑟𝑟𝑒𝑛𝑡 = 𝑓𝑛𝑒𝑤 9: end if 10: end for IV. E XPERIMENT AND ANALYSIS In this section, several actual data sets originating from University of California, UCI benchmark data sets (http://archive.ics.uci.edu/ml/), including iris, seeds, statlog (Landsat Satellite). Table I summarizes details about environmental data sets. To identify the efforts, original ELM (http://www.ntu.edu.sg/home/egbhuang/) are used in the comparative trials. TABLE I E XPERIMENTAL DATA SETS Dataset

Number of Instances

of Attribute

of Class

Iris Wine Seeds Glass Statlog (Satellite)

150 178 210 214 6435

4 13 7 10 36

3 3 3 6 6

Choose 𝐾 = 2 in validation calculation, see Algorithm 3; the proportion of prior set and test set is assigned as 7 : 3; set the lower bound of initial hybrid vector ⃗𝑣𝑚𝑖𝑛 = [−1, ⋅ ⋅ ⋅ , −1, ⋅ ⋅ ⋅ , −1, 0], the upper bound ⃗𝑣𝑚𝑎𝑥 = [1, ⋅ ⋅ ⋅ , 1, ⋅ ⋅ ⋅ , 1, 20], the number of group 𝑛 = 30, the maximum iterations 𝐼𝑡𝑒𝑟𝑚𝑎𝑥 = 100; the number of hidden ˜ = 50. The list of software and hardware resources nodes 𝑁 in experiments are such that : Manufacturer (Dell); RAM (4GB); Processor (Intel(R) Core(TM) i3-4150 3.5GHz). To improve robustness in statistical, each algorithm is repeated 5 times. Table II demonstrates the results on average correct classification rate, it is obviously found that the new approach obtains a better performance in view of accuracy. Despite its extra computing burden, in compared with BP, it need not any derivative information during the process, which indicates the influence of metaheuristic optimization algorithm.


1647

TABLE II R ESULTS ON CLASSIFICATION

Sos-RELM

ELM

Dataset

Nodes

Test Accurate

Computing Time

Nodes

Test Accurate

Computing Time

Iris Wine Seeds Glass Statlog (Satellite)

50 50 50 50 50

0.9737 0.8622 0.9660 0.9481 0.7507

197.0 215.5 256.1 207.1 2690

50 50 50 50 50

0.9474 0.6711 0.9509 0.8815 0.5858

20.72 19.67 19.76 20.49 20.43

R EFERENCES V. C ONCLUSIONS In this issue, we develop a novel machine learning framework on the foundation of ELM and SOS. During the process, SOS is used to optimize the parameters consisting of input weights, bias and regularized factor, meanwhile, ELM is used to calculate the output weights, which is good operation to save computing time. Those two parts are integrated, exchanging the information with each other. By combine those two algorithms together, the advantages from both sides are retained, hence, the new one improve the performance. In the end of this paper, actual data from UCI indicate that SosELM performs better terms of classification accurate, thus to be a powerful method in practice. ACKNOWLEDGMENT The research was supported by SGCC (State Grid Corporation of China) Thousand Talents program special support project (EPRIPDKJ (2014)2863). The authors would like to thank Cheng and Huang generously sharing SOS and ELM source codes on http://cn.mathworks.com/matlabcentral/fileexchange/47465sos and http://www.ntu.edu.sg/home/egbhuang/.

0.79 0.78

Accurate

0.77 0.76 0.75 0.74 0.73 0.72 0.71

0

20

40

60

80

100

Iteration

Fig. 1.

1648

Convergence curve for Statlog (Satellite) based on Sos-RELM

[1] Cortes C, Vapnik V. Support-vector networks. Machine learning, 1995, 20(3): 273-297. [2] Huang G B, Babri H. Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions[J]. Neural Networks, IEEE Transactions on, 1998, 9(1): 224229. [3] Williams D E, Hinton G E. Learning representations by backpropagating errors. Nature, 1986, 323 (6088):533-538. [4] Huang G B, Zhu Q Y, Siew C K. Extreme learning machine: a new learning scheme of feedforward neural networks, Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on. IEEE, 2004, 2: 985-990. [5] Deng W, Zheng Q, Chen L. Regularized extreme learning machine, Computational Intelligence and Data Mining, 2009. CIDM’09. IEEE Symposium on. IEEE, 2009: 389-395. [6] Zheng W, Qian Y, Lu H. Text categorization based on regularization extreme learning machine. Neural Computing and Applications, 2013, 22(3-4): 447-456. [7] Holland J H . Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press, 1992. [8] Kennedy J. Particle swarm optimization, Encyclopedia of Machine Learning. Springer US, 2010: 760-766. [9] Cheng M Y, Prayogo D. Symbiotic Organisms Search: A new metaheuristic optimization algorithm. Computers and Structures, 2014, 139: 98-112. [10] Zhu Q Y, Qin A K, Suganthan P N, et al. Evolutionary extreme learning machine. Pattern recognition, 2005, 38(10): 1759-1763. [11] Cao J, Lin Z, Huang G B. Self-adaptive evolutionary extreme learning machine. Neural processing letters, 2012, 36(3): 285-305. [12] Mohapatra P, Chakravarty S, Dash P K. An improved cuckoo search based extreme learning machine for medical data classification. Swarm and Evolutionary Computation, 2015, 24: 25-49. [13] Wang G G, Lu M, Dong Y Q, et al. Self-adaptive extreme learning machine. Neural Computing and Applications, 2015: 1-13, doi:10.1007/s00521-015-1874-3. [14] Huang G B, Zhu Q Y, Siew C K. Extreme learning machine: theory and applications. Neurocomputing, 2006, 70(1): 489-501. [15] Huang G B, Zhu Q Y, Siew C K. Extreme learning machine: theory and applications. Neurocomputing, 2006, 70(1): 489-501. [16] Abdullahi M, Ngadi M A. Symbiotic Organism Search optimization based task scheduling in cloud computing environment. Future Generation Computer Systems, 2016, 56: 640-650. [17] Cheng M Y, Prayogo D, Tran D H. Optimizing Multiple-Resources Leveling in Multiple Projects Using Discrete Symbiotic Organisms Search. Journal of Computing in Civil Engineering, 2015: 04015036. [18] Ruskartina E. Symbiotic Organism Search (SOS) for Solving Capacitated Vehicle Routing ProblemCVRP). 2015. [19] Vapnik V N, Vapnik V. Statistical learning theory. New York: Wiley, 1998. [20] Rogers S, Girolami M. A first course in machine learning. Boca RatonFlorida: CRC Press, 2011.