IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 4, AUGUST 2009
1067
Correspondence Online Sequential Fuzzy Extreme Learning Machine for Function Approximation and Classification Problems Hai-Jun Rong, Guang-Bin Huang, Senior Member, IEEE, N. Sundararajan, Fellow, IEEE, and P. Saratchandran, Senior Member, IEEE Abstract—In this correspondence, an online sequential fuzzy extreme learning machine (OS-Fuzzy-ELM) has been developed for function approximation and classification problems. The equivalence of a Takagi– Sugeno–Kang (TSK) fuzzy inference system (FIS) to a generalized single hidden-layer feedforward network is shown first, which is then used to develop the OS-Fuzzy-ELM algorithm. This results in a FIS that can handle any bounded nonconstant piecewise continuous membership function. Furthermore, the learning in OS-Fuzzy-ELM can be done with the input data coming in a one-by-one mode or a chunk-by-chunk (a block of data) mode with fixed or varying chunk size. In OS-Fuzzy-ELM, all the antecedent parameters of membership functions are randomly assigned first, and then, the corresponding consequent parameters are determined analytically. Performance comparisons of OS-Fuzzy-ELM with other existing algorithms are presented using real-world benchmark problems in the areas of nonlinear system identification, regression, and classification. The results show that the proposed OS-Fuzzy-ELM produces similar or better accuracies with at least an order-of-magnitude reduction in the training time. Index Terms—Extreme learning machine (ELM), fuzzy inference system (FIS), online sequential ELM (OS-ELM), online sequential fuzzy ELM (OS-Fuzzy-ELM).
I. I NTRODUCTION Fuzzy inference Systems (FISs) have been increasingly used in the areas of function approximation and classification problems [1]. In a FIS, the methods used to update the parameters of membership functions can be broadly divided into batch learning schemes and sequential learning schemes. In batch learning, it is assumed that the complete training data are available before the training commences. The training usually involves cycling the data over a number of epochs. In sequential learning, the data arrive one by one or chunk by chunk, and the data will be discarded after the learning of the data is completed; the notion of epoch does not exist. In practical applications, new training data may arrive sequentially. In order to handle this using batch learning algorithms, one has to retrain the network all over again, resulting in a large training time. Hence, in these cases, sequential learning algorithms are generally preferred over batch learning algorithms, as they do not require retraining whenever new data are received. An example of batch learning FIS is the adaptive-network-based fuzzy inference system (ANFIS) of Jang [2] and Chiu [3]. These learning algorithms require cycling the whole training data over a number of learning cycles (epochs). Manuscript received April 2, 2008; revised July 31, 2008 and November 26, 2008. First published March 24, 2009; current version published July 17, 2009. This paper was recommended by Associate Editor H. Gao. H.-J. Rong was with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798. She is now with the School of Aerospace, Xi’an Jiaotong University, Xi’an 710049, China (e-mail:
[email protected]). G.-B. Huang, N. Sundararajan, and P. Saratchandran are with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798 (e-mail:
[email protected]). Digital Object Identifier 10.1109/TSMCB.2008.2010506
In contrast, recently, some truly sequential learning fuzzy algorithms have been proposed. They include dynamic evolving neurofuzzy inference system (DENFIS) [4], evolving Takagi–Sugeno (eTS) model [5], Simplified eTS (Simpl_eTS) algorithms [6], and sequential adaptive fuzzy inference system (SAFIS) [7]. Generally speaking, these algorithms are mainly designed to handle training data on a oneby-one basis, and they may face some limitations in applications where the data arrive on a chunk-by-chunk basis. In addition, these algorithms can only usually handle specified membership function types. For example, SAFIS and eTS use only the Gaussian membership function, while Simpl_eTS uses only the Cauchy membership function. Liang et al. [8] have recently developed an online sequential learning version of the batch extreme learning machine (ELM) [9]–[12] called (OS-ELM). In OS-ELM with additive nodes, the input weights (of the connections linking the input nodes to the hidden nodes) and hidden-node biases are randomly generated. Similarly, in OS-ELM with radial basis function (RBF) nodes, the centers and widths of the nodes are randomly generated and fixed. Based on this, the output weights are analytically determined. In both ELM and OS-ELM, all the hidden-node parameters are independent not only of the training data but also of each other. OS-ELM can learn the training data not only one by one but also chunk by chunk (with fixed or varying length) and discard the data for which the training has already been done. Combining the advantages of the neural network (learning) and the fuzzy inference system (approximate reasoning), one can develop a neuro-fuzzy system which exhibits the characteristics of both. Many researchers have developed such a neuro-fuzzy system for solving real-world problems effectively. The functional equivalence between a Gaussian RBF neural network and a FIS with Gaussian membership functions has been shown under some mild conditions by Jang and Sun [13]. Expanding this concept further, in this correspondence, we show the functional equivalence between a generalized single hiddenlayer feedforward network (SLFN) and a FIS with any membership function. Thus, the FIS with Gaussian membership functions becomes a special case of SLFNs. Using such a functional equivalence together with the recently developed OS-ELM, in this correspondence, we have developed an online sequential fuzzy ELM (OS-Fuzzy-ELM) that can handle any bounded nonconstant piecewise continuous membership function.1 This correspondence extends the OS-ELM developed in the domain of neural networks to FISs. In the proposed OS-Fuzzy-ELM, all the parameters of membership functions are randomly generated independent of the training data, and then, the consequent parameters are analytically determined. This significantly increases the learning speed by avoiding the iterative learning steps (and, perhaps, the human intervention) in traditional FISs (and neural networks). The basic idea of OS-Fuzzy-ELM for one-by-one learning has been presented by us earlier in [15]. Here, we present a comprehensive evaluation of OS-Fuzzy-ELM by comparing it with other well-known learning algorithms such as ANFIS, DENFIS, SAFIS, eTS, and Simpl_eTS. We have also extended the algorithm to a chunk-by-chunk mode. Study results based on the benchmark problems from the function 1 Recently, Sun [14] have proposed an ELM-based fuzzy inference system called E-TSK using K-means clustering to preprocess the data. ELM is used to obtain the membership of each fuzzy rule, and the consequent part is then determined using multiple ELMs. It should be noted that E-TSK is a batch processing method, as it requires all the training data to be available a priori.
1083-4419/$25.00 © 2009 IEEE
Authorized licensed use limited to: Nanyang Technological University. Downloaded on June 21, 2009 at 12:46 from IEEE Xplore. Restrictions apply.
1068
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 4, AUGUST 2009
approximation and classification areas indicate that the proposed OSFuzzy-ELM produces similar or better generalization performance with at least an order-of-magnitude reduction in the training time. II. P ROPOSED OS-F UZZY -ELM This section proposes the OS-Fuzzy-ELM (for simplicity) based on the Takagi–Sugeno–Kang (TSK) FISs [16]. Similar analysis can be applied to other FISs (e.g., Mamdani type). A. SLFN Equivalence of FISs The TSK fuzzy model is given by the following rules [16]: Rule i : if (x1 is A1i ) AND (x2 is A2i ) AND · · · AND (xn is Ani ), then (y1 is βi1 ) · · · (ym is βim ) where Aji (j = 1, 2, . . . , n, i = 1, 2, . . . , L) are the fuzzy sets of the jth input variable xj in rule i, n is the dimension of the input vector x (x = [x1 , . . . , xn ]T ), m is the dimension of the output vector y (y = [y1 , . . . , ym ]T ), and L is the number of fuzzy rules. βik (k = 1, 2, . . . , m, i = 1, 2, . . . , L) is the crisp value and is a linear combination of input variables, i.e., βik = qik,0 + qik,1 x1 + · · · + qik,n xn . In the previously described TSK fuzzy model, the degree to which the given jth input variable xj satisfies the quantifier Aji in rule i is specified by its membership function μAji (xj ). In this correspondence, a general nonconstant piecewise continuous membership function g(c, a)2 is considered, which includes commonly used membership functions [1] and almost any other practical membership functions. The membership value μAji (xj ) of the jth input variable xj in the ith rule can be achieved by any one of the bounded nonconstant piecewise continuous membership functions
Equation (4) shows that FIS is equivalent to a generalized SLFN (in a generalized formula), as presented in [12], where G(·) represents the output function of the hidden node and β represents the output weight vector. The output functions for the hidden nodes in the SLFN are based on the membership functions of FIS. According to Huang and Chen [12], it is reasonable to infer that the SLFNs with activation function G(·) [cf. (3)] could approximate any continuous target function as long as the parameters of the membership function g are randomly generated and the membership function g is bounded, nonconstant, and piecewise continuous. For brevity, the detailed proof of the universal approximation capability of such an equivalent SLFN for the TSK models is not addressed in this correspondence but will be provided in the future work. B. Proposed OS-Fuzzy-ELM Algorithm Since FIS is equivalent to an SLFN, ELM can be directly applied to a FIS. In such a scheme, the parameters of membership functions (c and a) are randomly generated, and based on this, the consequent parameters (β) are analytically determined. In many real applications, the training data arrive chunk by chunk or one by one (a special case of chunk), and hence, a FIS algorithm which can handle such sequential applications is a necessity. The description of OS-Fuzzy-ELM is given hereinafter. For N arbitrary distinct training samples (xi , ti ), where xi = [xi1 , xi2 , . . . , xin ]T ∈ Rn and ti = [ti1 , ti2 , . . . , tim ]T ∈ Rm , according to the equivalent SLFN structure of FIS, the mathematical model of the FIS with L fuzzy rules and the corresponding parameters β i , ci , and ai is given as fL (xj ) =
L
β i G(xj ; ci , ai ) = tj ,
j = 1, . . . , N.
(5)
i=1
μAji (xj ; cji , ai ) = g(xj ; cji , ai )
(1)
where cji and ai are the parameters existing in the membership function g corresponding to the jth input variable xj and the ith rule. Similar to Ying [17], we can use symbol ⊗ to represent any type of fuzzy logic AND operations (as many as used). The firing strength (if part) of the ith rule is given by Ri (x; ci , ai ) = μA1i (x1 ; c1i , ai ) ⊗μA2i (x2 ; c2i , ai ) ⊗ · · · ⊗ μAni (xn ; cni , ai ).
For the TSK fuzzy model, the consequent parameters are the linear equation of the input variables and given by β i = xTje qi
where xje is the extended input vector by appending the input vector xj with 1, i.e., [1, xTj ]T , and qi is the parameter matrix existing in the TSK model for the ith fuzzy rule and given by
⎡
(2)
qi1,0 qi = ⎣ ... qi1,n
Furthermore, each rule can be normalized as G(x; ci , ai ) =
Ri (x; ci , ai )
L
.
(3)
Ri (x; ci , ai )
L
ˆ= y
β i Ri (x; ci , ai )
i=1 L
= Ri (x; ci , ai )
L
β i G(x; ci , ai )
(4)
⎤
qim,0 .. ⎦ . ··· . · · · qim,n (n+1)×m
(7)
fL (xj ) =
L
xTje qi G(xj ; ci , ai ) = tj ,
j = 1, . . . , N.
(8)
i=1
The equation is further written in the following compact form: HQ = T
(9)
where H is the hidden matrix weighted by the normalized firing strength of fuzzy rules and Q is the parameter matrix for the TSK model, respectively
i=1
i=1
where β i = (βi1 , βi2 , . . . , βim ). function g(c) with parameter vector c ∈ Rn can also be denoted as g(c, a) with parameters (c, a) ∈ Rn−1 × R. 2 Any
···
Thus, the output (5) for the TSK model becomes
i=1
Similar to the work of Zeng and Singh [18], G can be called fuzzy basis function. The system output of the TSK fuzzy model is achieved by the weighted sum of the output of each normalized rule. As such, the ˆ for the given input x is calculated by system output y
(6)
H(c1 ,. . . , cL , a1 , . . . , aL ; x1 , . . . , xN ) = xTje G(xj ; c1 , a1 ), . . . , xTje G(xj ; cL , aL ) ⎡ ⎤ q1 . Q = ⎣ .. ⎦ . qL
Authorized licensed use limited to: Nanyang Technological University. Downloaded on June 21, 2009 at 12:46 from IEEE Xplore. Restrictions apply.
(10) (11)
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 4, AUGUST 2009
According to the functional equivalence between SLFNs and FISs, the online sequential ELM (OS-ELM) for SLFNs with additive or RBF hidden nodes [8] can be linearly extended to FISs, and the resulted algorithm is called OS-Fuzzy-ELM. All the parameters of the membership function c1 , . . . , cL , a1 , . . . , aL will be randomly generated in OS-Fuzzy-ELM. We have the following. 1) Proposed OS-Fuzzy-ELM Algorithm: Given certain membership function g and rule number L for a specific application, the data ℵ = {(xi , ti )|xi ∈ Rn , ti ∈ Rm , i = 1, . . .} arrive sequentially. Step 1) Initialization Phase: Initialize the learning using a small 0 chunk of initial training data ℵ0 = {(xi , ti )}N i=1 from the given training set ℵ = {(xi , ti )|xi ∈ Rn , ti ∈ Rm , i = 1, . . .}, N0 ≥ L. a) Assign random membership function parameters (ci , ai ), i = 1, . . . , L. b) Calculate the initial matrix H0 for the TSK models H0 = H(c1 , . . . , cL , a1 , . . . , aL ; x1 , . . . , xN0 )
(12)
where H is defined as (10). c) Estimate the initial parameter matrix Q(0) = P0 HT0 T0 , where P0 = (HT0 H0 )−1 and T0 = [t1 , . . . , tN0 ]T . d) Set k = 0. Step 2) Sequential Learning Phase: Present the (k+1)th chunk k+1
of new observations: ℵk+1 = {(xi , ti )}
Nj
j=0 k
i=(
j=0
Nj )+1
,
where Nk+1 denotes the number of observations in the (k + 1)th chunk, and do the following. a) Calculate the partial matrix Hk+1 for the (k + 1)th chunk of data ℵk+1 for the TSK model
Hk+1 = H c1 , . . . , cL , a1 , . . . , aL ; xk j=0
Nj
, . . . , xk+1 +1 N
A. Nonlinear System Identification
j=0
The nonlinear dynamic system to be identified is described by [21]
j
y(n) =
b) Calculate the parameter matrix Q(k+1)
Pk+1 = Pk − Pk HTk+1 I + Hk+1 Pk HTk+1 Q
(k+1)
=Q
(k)
+
Pk+1 HTk+1
−1
Tk+1 − Hk+1 Q
Performance evaluation of OS-Fuzzy-ELM has been carried out on benchmark problems in the areas of nonlinear system identification, regression, and classification. For the nonlinear system identification problem, OS-Fuzzy-ELM has been compared with other popular learning algorithms such as ANFIS [2], DENFIS [4], SAFIS [7], eTS [19], and Simpl_eTS [6]. For regression problems, performance comparisons have been made with ANFIS and Simple_eTS. For classification problems, comparison has been made with the Simple_eTS and eTS algorithms. Furthermore, performance evaluation of OS-Fuzzy-ELM is carried out in a chunk-by-chunk learning mode for regression and classification problems [20]. For OS-Fuzzy-ELM, the centers for regression and classification problems are uniformly randomly chosen from the [0, 1] and [−1, 1] ranges, respectively. For the nonlinear system identification problem, the centers are uniformly randomly chosen from the [−2, 2] range. In fact, for the sake of simplicity, the centers of these membership functions have been randomly generated in the range of the input data (or after normalization). The impact widths of membership functions for all the considered problems are randomly chosen from the [0.01, 9.01] range. All the simulations have been conducted in the MATLAB 6.5 environment running in an ordinary PC with 1.7-GHz CPU. The optimal number of rules is selected based on the best average validation accuracy of OS-Fuzzy-ELM with different numbers of rules: L ∈ {1, 2, 3, . . . , 98, 99, 100}. For each trial of simulation, 75% of data and 25% of data of the training data set are randomly chosen for training and validation purposes, respectively. The average crossvalidation results over 25 trials of simulations have been obtained for each OS-Fuzzy-ELM with fixed number of rules, and the number of rules increases one by one in order to obtain the best cross-validation results. The OS-Fuzzy-ELM with the best cross-validation results is finally chosen, and the average testing accuracy of 50 trials on this finally chosen OS-Fuzzy-ELM is reported in this correspondence.
(13)
where H is defined as (10). Set Tk+1 = [t(k N )+1 , . . . , tk+1 N ]T . j=0
III. P ERFORMANCE E VALUATION OF OS-F UZZY -ELM
j
j=0
j
Hk+1 Pk
(k)
1069
.
(14)
c) Set k = k + 1. Then, go to Step 2). OS-Fuzzy-ELM consists of two main phases. The initialization phase is to train FISs using the Fuzzy-ELM3 method with some batch of training data in the initialization stage, and these initialization training data will be discarded as soon as the initialization phase is completed. For this, the required number of training data is very small, which can be equal to the number of rules. For example, if there are ten rules, we may need ten training samples to initialize the learning. After the initialization phase, OS-Fuzzy-ELM will learn the train data one by one4 or chunk by chunk (with fixed or varying size), and then, all the training data will be discarded once the learning procedure on these data is completed. In the next section, the performance of the proposed OS-FuzzyELM is evaluated in detail based on a number of benchmark problems. 3 Fuzzy-ELM is a specific version of OS-Fuzzy-ELM when all the training data are learned in a batch mode. 4 The preliminary version of OS-Fuzzy-ELM [15] handles the case where the data come one by one instead of chunk by chunk.
y(n − 1)y(n − 2) (y(n − 1) + 2.5) + u(n − 1). 1 + y 2 (n − 1) + y 2 (n − 2)
(15)
The equilibrium state of the unforced system given by (15) is (0, 0). The training input u(n) is uniformly selected in the [−2, 2] range, and the testing input u(n) is given by u(n) = sin(2πn/25). Some 5000 and 200 observation data are produced for training and testing purposes, respectively. A uniformly distributed noise in the range of [−0.2, 0.2] is added to all the training samples while testing data remain noise free. Selecting [y(n − 1), y(n − 2), u(n − 1)] and y(n) as the input– output of OS-Fuzzy-ELM, the identified model is given by yˆ(n) = fˆ (y(n − 1), y(n − 2), u(n − 1))
(16)
where fˆ is the OS-Fuzzy-ELM approximation and yˆ(n) is the output of OS-Fuzzy-ELM. Table I gives performance comparison for OS-Fuzzy-ELM, ANFIS, SAFIS, eTS, Simpl_eTS, and DENFIS in terms of average training accuracy and testing accuracy along with standard deviations, number of fuzzy rules, and average training time. For comparison purposes, OS-Fuzzy-ELM uses the corresponding membership functions used by other algorithms. For this study, ANFIS’ training uses 50 epochs for each trial. From the table, it can be seen that OS-Fuzzy-ELM obtains similar or better testing accuracy than other algorithms, and moreover, it can be found that
Authorized licensed use limited to: Nanyang Technological University. Downloaded on June 21, 2009 at 12:46 from IEEE Xplore. Restrictions apply.
1070
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 4, AUGUST 2009
TABLE I PERFORMANCE COMPARISON FOR NONLINEAR SYSTEM IDENTIFICATION
Fig. 2.
Testing error history for the nonlinear identification problem.
TABLE II DETAILS OF REAL-WORLD REGRESSION BENCHMARK PROBLEMS
Fig. 1. Training time comparison for the nonlinear system identification problem.
OS-Fuzzy-ELM requires a smaller number of fuzzy rules than eTS, Simpl_eTS, and DENFIS and less training time compared to all the other algorithms. It can be seen from Table I that OS-Fuzzy-ELM produces smaller standard deviations of testing accuracies, indicating that random selection of antecedent parameters of membership functions does not affect the testing accuracy significantly. Similar behaviors are also seen for other applications. Training time comparison is also shown in Fig. 1. From this figure, it can be clearly seen that OS-FuzzyELM takes the lowest training time, whereas ANFIS takes the highest one. Fig. 2 shows the testing error history for OS-Fuzzy-ELM, SAFIS, and eTS algorithms. It can be seen from the figure that rms error curves for OS-Fuzzy-ELM are quite smooth compared with that for SAFIS and eTS. B. Regression Problems In this section, six real-world regression problems5 are considered to evaluate the performance of OS-Fuzzy-ELM. Details of the problems are listed in Table II. The input and output attributes are normalized in the [0, 1] range. 5 http://www.liaad.up.pt/~ltorgo/Regression/DataSets.html.
Performance comparison between OS-Fuzzy-ELM, ANFIS, and Simpl_eTS is given in Table III. In the table, all algorithms use the Cauchy membership function. Here, ANFIS is run under two different versions, i.e., one version with all the input attributes, and the other version using only limited significant input features. If the number of input variables for a problem is large, the ANFIS algorithm produces a large number of fuzzy rules by utilizing the gridtype partition method, resulting in a large training time. By selecting some significant input variables as the input of ANFIS, the number of fuzzy rules can be decreased, making the training faster without a significant loss of accuracy. The smallest number of input variables as the input of ANFIS is selected by trial and error in order to achieve the best performance. In the table, ANFIS uses one epoch for training, since it was found that accuracy is not improved significantly with larger number of epochs but will increase the training time only. From the table, it can be found that OS-Fuzzy-ELM obtains similar or better testing accuracy than ANFIS and Simpl_eTS by using much less training time. The training time spent by ANFIS includes the time used to select significant input variables. For the 2-D plane problem, Simpl_eTS and ANFIS with all the input variables produce large number of fuzzy rules, which results in system memory overflow. In OS-Fuzzy-ELM, the parameters of membership functions in fuzzy rules are assigned randomly. C. Classification Problems In this section, six real-world classification problems [20] given in Table IV are considered for classification performance evaluation of OS-Fuzzy-ELM. In classification applications, the input attributes are
Authorized licensed use limited to: Nanyang Technological University. Downloaded on June 21, 2009 at 12:46 from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 4, AUGUST 2009
TABLE III PERFORMANCE COMPARISON FOR BENCHMARK REGRESSION PROBLEMS
1071
TABLE V PERFORMANCE COMPARISON FOR CLASSIFICATION PROBLEMS
TABLE VI PERFORMANCE COMPARISON BETWEEN DIFFERENT ALGORITHMS FOR DNA CLASSIFICATION
TABLE IV DETAILS OF REAL-WORLD CLASSIFICATION BENCHMARK PROBLEMS
normalized into the [−1, 1] range. The Cauchy membership function is used for evaluation. Performance comparison has been made with Simple_eTS and eTS, and ANFIS is not included in this comparison because of its large training time needed for multiclass classification problems. Table V gives the classification results achieved by OS-Fuzzy-ELM, Simpl_eTS, and eTS for these real-world classification problems.
From the table, one can note that OS-Fuzzy-ELM achieves better training and testing accuracies than Simpl_eTS and eTS with much less training time. For the DNA problem, Simpl_eTS and eTS result in a large number of rules and hence lead to system memory overflow. Thus, for the DNA problem, we can compare the results of OSFuzzy-ELM with the other two algorithms, namely, support-vectorbased fuzzy neural network (SVFNN) [22] and fuzzy neural networks (FNN) [22]. Table VI gives the classification accuracy comparison between OS-Fuzzy-ELM, SVFNN, and FNN. For this comparison, as SVFNN and FNN use the Gaussian membership function, the same has been used for OS-Fuzzy-ELM. From the table, it can be seen that OS-Fuzzy-ELM produces similar training and testing accuracy as SVFNN. However, OS-Fuzzy-ELM requires only five rules, whereas SVFNN requires 334 rules. This indicates that the training time for SVFNN will be large. Compared with FNN, OS-Fuzzy-ELM obtains better testing classification accuracy. Recently, Toh et al. [23], [24] have made an extensive evaluation of their classification algorithm on a number of benchmark problems which also overlap with the problems considered here. It should be noted that the method proposed by Toh et al. is not a sequential learning scheme and is also not based on fuzzy or neural approaches, so a direct comparison with our method may not be appropriate. However, we found that the classification accuracies of our method are similar to theirs.
Authorized licensed use limited to: Nanyang Technological University. Downloaded on June 21, 2009 at 12:46 from IEEE Xplore. Restrictions apply.
1072
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 4, AUGUST 2009
TABLE VII PERFORMANCE COMPARISON OF OS-FUZZY-ELM IN C HUNK - BY -C HUNK M ODE
D. Performance Evaluation of OS-Fuzzy-ELM: Chunk-by-Chunk Mode We have also evaluated the performance of OS-Fuzzy-ELM in the chunk-by-chunk mode for the aforesaid regression and classification problems. OS-Fuzzy-ELM has been tested with fixed and varying (randomly between 10 and 30) chunk sizes. Table VII shows the results for one regression problem and one classification problem (for brevity). As seen from these results, the training and testing accuracies obtained by implementing in different chunk sizes are similar except that the OS-Fuzzy-ELM in the chunk-by-chunk mode and the batch mode (when a single chunk contains the entire training data) usually requires much less training time than the OS-Fuzzy-ELM in the one-by-one mode. Results for other problems also show the same trend. IV. C ONCLUSION In this correspondence, an OS-Fuzzy-ELM has been developed for the TSK fuzzy model for any bounded nonconstant piecewise continuous membership functions. Also, OS-Fuzzy-ELM can learn the data sequentially in a one-by-one or chunk-by-chunk mode. A performance comparison of OS-Fuzzy-ELM with other wellknown learning algorithms has been carried out on benchmark problems in the areas of nonlinear system identification, regression, and classification. This correspondence shows that the proposed OS-Fuzzy-ELM produces similar or better accuracies compared with other algorithms. It also produces a significantly lower training time than all the other algorithms. Results from the chunk-by-chunk learning mode indicate that training and testing accuracies are similar for all the membership functions. However, the training time for the one-by-one learning mode is higher than that for the chunk-by-chunk mode. Although, in this correspondence, we have shown the results for the TSK system, OS-Fuzzy-ELM can also be applied for the Mamdani type of fuzzy models. Based on a detailed study of OS-Fuzzy-ELM on Mamdani models, we have observed similar conclusions as reported in this correspondence.
R EFERENCES [1] J.-S. R. Jang, C.-T. Sun, and E. Mizutani, Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence. Englewood Cliffs, NJ: Prentice–Hall, 1997. [2] J.-S. R. Jang, “ANFIS: Adaptive-network-based fuzzy inference system,” IEEE Trans. Syst., Man, Cybern., vol.23, no.3, pp.665–685,May/Jun.1993. [3] S. L. Chiu, “Selecting input variables for fuzzy models,” J. Intell. Fuzzy Syst., vol. 4, pp. 243–256, 1996. [4] N. K. Kasabov and Q. Song, “DENFIS: Dynamic evolving neural-fuzzy inference system and its application for time-series prediction,” IEEE Trans. Fuzzy Syst., vol. 10, no. 2, pp. 144–154, Apr. 2002. [5] P. P. Angelov and D. P. Filev, “An approach to online identification of Takagi–Sugeno fuzzy models,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 34, no. 1, pp. 484–498, Feb. 2004. [6] P. Angelov and D. Filev, “Simpl_eTS: A simplified method for learning evolving Takagi–Sugeno fuzzy models,” in Proc. 14th IEEE Int. Conf. Fuzzy Syst., 2005, pp. 1068–1073. [7] H.-J. Rong, N. Sundararajan, G.-B. Huang, and P. Saratchandran, “Sequential Adaptive Fuzzy Inference System (SAFIS) for nonlinear system identification and prediction,” Fuzzy Sets Syst., vol. 157, no. 9, pp. 1260–1275, May 2006. [8] N.-Y. Liang, G.-B. Huang, P. Saratchandran, and N. Sundararajan, “A fast and accurate online sequential learning algorithm for feedforward networks,” IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1411–1423, Nov. 2006. [9] G.-B. Huang, Q.-Y. Zhu, K. Z. Mao, C.-K. Siew, P. Saratchandran, and N. Sundararajan, “Can threshold networks be trained directly?” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 53, no. 3, pp. 187–191, Mar. 2006. [10] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: Theory and applications,” Neurocomputing, vol. 70, no. 1–3, pp. 489–501, Dec. 2006. [11] G.-B. Huang, L. Chen, and C.-K. Siew, “Universal approximation using incremental constructive feedforward networks with random hidden nodes,” IEEE Trans. Neural Netw., vol. 17, no. 4, pp. 879–892, Jul. 2006. [12] G.-B. Huang and L. Chen, “Convex incremental extreme learning machine,” Neurocomputing, vol. 70, no. 16–18, pp. 3056–3062, Oct. 2007. [13] J.-S. R. Jang and C.-T. Sun, “Functional equivalence between radial basis function networks and fuzzy inference systems,” IEEE Trans. Neural Netw., vol. 4, no. 1, pp. 156–159, Jan. 1993. [14] Z.-L. Sun, K.-F. Au, and T.-M. Choi, “A neuro-fuzzy inference system through integration of fuzzy logic and extreme learning machines,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 37, no. 5, pp. 1321–1331, Oct. 2007. [15] G.-B. Huang, N.-Y. Liang, H.-J. Rong, P. Saratchandran, and N. Sundararajan, “On-line sequential extreme learning machine,” in Proc. IASTED Int. Conf. CI, Calgary, AB, Canada, Jul. 4–6, 2005. [16] T. Takagi and M. Sugeno, “Fuzzy identification of systems and its applications for modeling and control,” IEEE Trans. Syst., Man, Cybern., vol. SMC-15, no. 1, pp. 116–132, Feb. 1985. [17] H. Ying, “Sufficient conditions on uniform approximation of multivariate functions by general Takagi–Sugeno fuzzy systems with linear rule consequent,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 28, no. 4, pp. 515–520, Jul. 1998. [18] X.-J. Zeng and M. G. Singh, “Approximation theory of fuzzy systems— MIMO case,” IEEE Trans. Fuzzy Syst., vol. 3, no. 2, pp. 219–235, May 1995. [19] P. Angelov, J. Victor, A. Dourado, and D. Filev, “On-line evolution of Takagi–Sugeno fuzzy models,” in Proc. 2nd IFAC Workshop Adv. Fuzzy/Neural Control, Oulu, Finland, 2004, pp. 67–72. [20] C. Blake and C. Merz, “UCI repository of machine learning databases,” Dept. Inf. Comput. Sci., Univ. California, Irvine, CA, 1998. [Online]. Available: http://www.ics.uci.edu/~mlearn/MLRepository.html [21] K. S. Narendra and K. Parthasarathy, “Identification and control of dynamical systems using neural networks,” IEEE Trans. Neural Netw., vol. 1, no. 1, pp. 4–27, Mar. 1990. [22] C.-T. Lin, C.-M. Yeh, S.-F. Liang, J.-F. Chung, and N. Kumar, “Supportvector-based fuzzy neural network for pattern classification,” IEEE Trans. Fuzzy Syst., vol. 14, no. 1, pp. 31–41, Feb. 2006. [23] Q.-L. Tran, K.-A. Toh, D. Srinivasan, K.-L. Wong, and S. Q.-C. Low, “An empirical comparison of nine pattern classifiers,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 35, no. 5, pp. 1079–1091, Oct. 2005. [24] K.-A. Toh, Q.-L. Tran, and D. Srinivasan, “Benchmarking a reduced multivariate polynomial pattern classifier,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 6, pp. 740–755, Jun. 2006.
Authorized licensed use limited to: Nanyang Technological University. Downloaded on June 21, 2009 at 12:46 from IEEE Xplore. Restrictions apply.