810
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 9, NO. 6, DECEMBER 2001
Robust TSK Fuzzy Modeling for Function Approximation With Outliers Chen-Chia Chuang, Shun-Feng Su, and Song-Shyong Chen
Abstract—The Takagi–Sugeno–Kang (TSK) type of fuzzy models has attracted a great attention of the fuzzy modeling community due to their good performance in various applications. Various approaches for modeling TSK fuzzy rules have been proposed in the literature. Most of them define their fuzzy subspaces based on the idea of training data being close enough instead of having similar functions. Besides, in real world applications, training data sets often contain outliers. When outliers exist, traditional clustering and learning algorithms based on the principle of least square error minimization may be seriously affects by outliers. Various robust approaches have been proposed to solve this problem in the neural networks and pattern recognition community. In this paper, a novel robust TSK fuzzy modeling approach is presented. In the approach, a clustering algorithm termed as robust fuzzy regression agglomeration (RFRA) is proposed to define fuzzy subspaces in a fuzzy regression manner with robust capability against outliers. To obtain a more precision model, a robust fine-tuning algorithm is then employed. Various examples are used to verify the effectiveness of the proposed approach. From the simulation results, the proposed robust TSK fuzzy modeling indeed showed superior performance over other approaches. Index Terms—Fuzzy c-regression (FCR) model clustering, robust fuzzy regression agglomeration (RFRA) clustering, robust learning algorithms, Takagi–Sugeno–Kang (TSK) fuzzy model.
I. INTRODUCTION
R
ECENTLY, fuzzy modeling techniques have been successfully applied to modeling complex systems, where traditional approaches hardly can reach satisfactory results due to lack of sufficient domain knowledge. The Takagi–Sugeno–Kang (TSK) type of fuzzy models proposed in [1], [2] has attracted a great attention of the fuzzy modeling community due to their good performance in various applications. In TSK fuzzy models, fuzzy rules are equipped with functional-type consequences instead of fuzzy terms as that in the traditional Mamdani fuzzy models [2]. A great advantage of TSK fuzzy models is its representative power; it is capable of describing a nonlinear system using sufficient rules and training data. Various fuzzy modeling approaches are proposed in the literature and can be found in textbooks, such as [3]–[6]. In this paper, we proposed a novel way of constructing TSK fuzzy models in a robust sense. Manuscript received February 26, 2001; revised June 12, 2001. This work was supported in part by the National Science Council of Taiwan under Grants NSC 89-2623-D-011-007 and NSC 89-2218-E-009-091. C.-C. Chuang is with the Department of Electronic Engineering, Hwa-Hsia College of Technology and Commerce, Taipei, Taiwan, R.O.C. S.-F. Su and S.-S. Chen are with the Department of Electrical Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan, R.O.C. (e-mail:
[email protected]). Publisher Item Identifier S 1063-6706(01)09619-9.
To construct a TSK fuzzy model, the fuzzy subspaces required for defining fuzzy partitions in premise parts and the parameters required for defining functions in consequent parts must both be obtained. In the original TSK modeling approach [1], users must define fuzzy subspaces in advance, and then, the parameters in consequences are obtained through a Kalman filter type of estimation. However, due to no tuning capability for premise parts, the modeling accuracy of that approach may be limited. Various alternative approaches for modeling TSK fuzzy rules have been proposed in the literature [3]–[10]. Traditionally, the input space is first divided into fuzzy subspaces through unsupervised clustering algorithms according to only the input portion of training data [7]–[9]. After the fuzzy subspaces are defined, the system is approximated in each subspace by a simple function, mostly linear functions, through supervised learning algorithms, such as backpropagation or leastsquare learning algorithms [3]–[6]. In the meantime, the fuzzy subspaces may also be tuned usually through backpropagation learning algorithms. Each subspace together with its associated linear function are used to characterize a corresponding fuzzy rule, which is supposed to have a simple geometry in the inputoutput space, normally having the shape of ellipsoid [7]. Thus, traditional fuzzy clustering algorithms, such as the fuzzy -mean (FCM) [8], [9] are suitable to define fuzzy subspaces for TSK fuzzy modeling. Notice that the obtained fuzzy subspaces may not cover the entire input space. If an approach can take advantages of such phenomena, it then can have more accurate modeling capability when the same number of rules is used [6]. In general, the above mentioned approaches partition fuzzy subspaces based on only the clustering in the input space of training data and do not consider whether the output portion of the training data supports such clustering or not. In other words, such approaches do not account for the interaction between input and output variables. In order to detect the interaction between input and output variables, the authors in [7], [9] considered the product space of input and output variables instead of only the input space in classical clustering algorithms for fuzzy modeling. This type of approaches has included output behaviors in the fuzzy partition to avoid improper clustering. However, those approaches still define fuzzy subspaces in a clustering manner and do not take into account the functional properties in TSK fuzzy models. In other words, in those approaches, training data that are close enough instead of having a similar function behavior are said to be in the same fuzzy subspace. As a result, the number of fuzzy subspaces may tend to be more than enough. Recently, there is a novel approach proposed in [10]. In the approach, two phases of learning are employed; the coarse
1063–6706/01$10.00 © 2001 IEEE
CHUANG et al.: ROBUST TSK FUZZY MODELING FOR FUNCTION APPROXIMATION WITH OUTLIERS
learning phase and the fine-tuning phase. In the coarse learning phase, fuzzy subspaces and the functions in consequent parts are simultaneously identified through the use of the fuzzy c-regression model (FCRM) clustering algorithm. The idea is to find a set of training data whose input-output relationship is somehow linear, and then, those training data can be clustered into one fuzzy subspace. However, the clustering behavior does not incorporate the optimization process in modeling and the modeling performance is not good enough. Thus, the obtained fuzzy rules can be considered as a rough approximation to the desired TSK fuzzy model. The model is further adjusted by supervised learning algorithms to improve the modeling accuracy. This stage is referred to as the fine-tuning process. However, in the FCRM clustering algorithm, users must assign the cluster number, which is supposed to be unknown. Besides, the FCRM clustering algorithms is based on the principle of least square error minimization and is easily affected by outliers, which should be degraded in the clustering process [10]–[14]. In the fine-tuning process, classical supervised learning algorithms such as gradient descent approaches are used. When training data are corrupted by large noise, such as outliers, traditional backpropagation learning schemes usually cannot come up with acceptable performance [15], [16]. Based on the principle of robust statistics, various robust learning algorithms have been proposed in the neural network community [17]–[21]. Most of them are to replace the square term in the cost function by a so called loss function. For the FCRM approach, it is difficult to adopt such robust learning concept. In this paper, we proposed a novel approach termed as the robust fuzzy regression agglomeration (RFRA) clustering algorithm that can not only simultaneously identify fuzzy subspaces and the functions in consequent parts without knowing the cluster number but also can have robust learning effects against outliers. The RFRA clustering algorithm is modified from a robust algorithm called the robust competitive agglomeration (RCA) clustering algorithm used in computer vision and pattern recognition [14]. The original RCA clustering algorithm adopts the idea of robust statistics to reduce the effects of outliers and the concept of competitive agglomeration to determine a proper number of clusters. Notice that the RCA clustering algorithm can properly determine the number of clusters by itself. Since the RCA algorithm is only a clustering algorithm, it does not have the regression ability in determining the function parameters in the consequent parts of TSK fuzzy models. The proposed RFRA clustering algorithm, on the other hand, can simultaneously define fuzzy subspaces for the premise part and determine the function parameters for consequent parts. While the RFRA clustering algorithm determines the parameters in both premise and consequent parts, our approach also employs a robust learning algorithm to fine tune the obtained fuzzy model. The simulation results have indeed shown superior performance of the proposed algorithm. The remaining part of the paper is outlined as follows. Section II describes the concept for robust TSK fuzzy modeling. The architecture of the proposed robust TSK modeling approach is then presented. Two phases of learning are used for the approach. In Section III, the coarse learning phase is discussed. In this phase, the RFRA clustering algorithm is proposed to mean-
811
ingfully define a rough TSK fuzzy model. Afterward, a robust learning algorithm is employed in the fine-tuning phase. The detailed robust learning algorithms are presented in Section IV. Section V reports the use of our approach for modeling five examples. Several existing learning algorithms are also employed to model those examples for comparison. The simulation results all showed the superiority of the proposed approach over other existing approaches. Concluding remarks are presented in Section VI. II. ROBUST TSK FUZZY MODELING CONCEPT The considered problem is to obtain a model from a set with of observations, and , where is the number of training is the th input vector, data, is the desired output for the input . Suppose that and those observations are obtained from an unknown function . Ideally, we want to construct an that can accurately represent in term of input-output relationships. In this paper, we intended to construct a TSK fuzzy model for . This is called TSK fuzzy modeling. Typically, a TSK fuzzy model consists of IF-THEN rules that have the form If
is
and
is
is
then (1) , where is the number of rules, is the for fuzzy set of the th rule for with the adjustable parameter set , and is the parameter set in the consequent part. The predicted output of the fuzzy model is inferred as (2) is the output of the th rule, is the th rule’s firing strength, which is obtained as the minimum of the fuzzy membership degrees of all fuzzy variables. In (1), both the parameters of the premise parts (i.e., ) and of consequent parts (i.e., ) of a TSK fuzzy model are required to be identified. Moreover, the number of rules must be specified. For any real-world applications, the obtained training data are always subject to noise or maybe outliers. When noise becomes large or outliers exist, traditional modeling approaches may try to fit those improper data in the training process and thus, the learned systems are corrupted. In other words, if there exist outliers and the training process lasts long enough, the obtained systems may have overfitting phenomena. Various robust learning algorithms [15], [17]–[21] have been proposed to overcome this problem. Basically, those robust learning algorithms made use of the so called -estimators in the training process. The basic idea of -estimators is to replace the squared error term ( norm) in the cost function by the loss functions [16] so that the effects of outliers may be degraded. When TSK fuzzy models are constructed from training data, the corresponding learning
where
812
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 9, NO. 6, DECEMBER 2001
Since the RCA algorithm is only a clustering algorithm, it does not have the regression ability in determining the function parameters in consequent parts of TSK fuzzy models. The idea of the RFRA clustering algorithm is to find fuzzy regression instead of clustering from a given set of training data. Thus, the cost function in our RFRA clustering algorithm is a function of the errors between the desired output and the output of the corresponding rule instead of the distance between input data to some prototype of the considered cluster [13], [22]. Let be the error between the th desired output of the modeled system and the output of the th rule with the th input data
(3) where is the th desired output and and are the numbers of fuzzy rules and of the training data, respectively. The cost function of the RFRA algorithm is defined as
(4) subject to Fig. 1.
The constructing process of the proposed robust TSK fuzzy mode.
for algorithms must also be equipped with such robust capability. In this paper, we proposed a novel approach that can not only simultaneously identify fuzzy subspaces and the functions in consequent parts without knowing the cluster number but also can have robust learning effects against outliers. The process of the proposed approach for constructing TSK fuzzy model is shown in Fig. 1. There are two phases of learning in the approach, the coarse learning phase and the fine-tuning phase. In the coarse learning phase, the RFRA clustering algorithm, which is modified from the RCA clustering algorithm, is employed. After a rough fuzzy model is obtained, a robust learning algorithm is employed to refine the fuzzy model in the fine-tuning process. The detailed algorithms are introduced and discussed in the following sections.
III. RFRA CLUSTERING ALGORITHM In the coarse learning phase, the learning algorithm is to find a suitable fuzzy model for a given set of training data. Instead of using clustering concept in partitioning the fuzzy subspaces, we attempted to find fuzzy regressions for fuzzy rules. With the proposed approach, not only the interaction between input and output variables can be detected in a more proper manner, but also the robust characteristics of the proposed TSK modeling can be assured. The proposed coarse learning algorithm is called the RFRA clustering algorithm, which is modified from the Robust Competitive Agglomeration (RCA) clustering algorithm. The original RCA clustering algorithm adopts the idea of robust statistics to reduce the effects of outliers and the concept of competitive agglomeration to determine the proper number of clusters.
(5)
where firing strength of the th rule for the th training pattern; robust loss function associated with cluster ; weight function and obtained as ; parameter usually called the agglomeration parameter. By properly selecting , the cost function can be used to find compact clusters of various types while partitioning the data set into a minimal number of clusters. Ideally, the agglomeration process controlled by should be slow in the beginning to encourage the formation of small clusters. Then it should be increased gradually to promote agglomeration. When the number of clusters become close to an appropriate value, hopefully, should decay slowly again to allow the algorithm to converge. A suitable can be chosen as
(6)
where initial value; time constant; iteration index. Such an agglomeration parameter is also used in [14]. Fig. 2 shows one example of the curve of , which is used in the first example in our simulation. Actually, other examples all possess the similar curves. Notice that the used values in (6) are all known in iteration . In other words, is a fixed value, which is viewed as a constant in the later optimization derivation.
CHUANG et al.: ROBUST TSK FUZZY MODELING FOR FUNCTION APPROXIMATION WITH OUTLIERS
813
(a)
Fig. 2. One example of the curve of .
Various robust loss functions and weight functions have been used for robust learning algorithms in the literature [13]. In this paper, Tukey’s biweight function [14] is used and is defined as
(7)
then its weight function is obtained as (8) where
stands for the normalized errors and is calculated from (b)
(9)
Fig. 3. (a) The curve of Tukey’s biweight function. (b) Its weight function.
The necessary conditions for minimizing is the median of is the median of absoHere, [23], [24] and is a tuning parameter. is lute deviations of a function of the iteration index and gradually decreases as increases. This procedure is similar to the annealing schedule in an annealing -estimator [25], [26]. In this paper, is chosen as
(12)
(13)
(10) and . For illustration, the where Tukey’s biweight function and its weight function are shown in Fig. 3. To minimize in (4) subject to (5), the Lagrange multiplier method is applied. The Lagrange function is defined as
are
(14) Note that in (11) process.
is viewed as a constant in the optimization in (12) can be obtained by using (3) as (15)
Substituting (15) into (12), then we have
(16) (11)
(17)
814
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 9, NO. 6, DECEMBER 2001
Define is a matrix with as its th row is a vector with (entries in the first row of are all 1), as its th element and is a diagonal matrix with as its th diagonal element. Then, (16) can be rewritten as a matrix form as (18) Hence, the parameter vector rule is obtained as
for the consequent part of the th (19)
To solve (13), it is assumed that the membership value is not changed significantly within two consecutive iterations. in (13) is computed by using the Hence, the term membership value in the previous iteration and (13) becomes
Step 4) Compute the robust cardinality . , cluster is discarded and update the If number of clusters as Step 5) Update the tuning parameter as by (10). or , then stop; otherStep 6) If wise go to Step 2. Following this procedure, the number of clusters and the parameters of consequent parts are obtained in the process. Assume that Gaussian membership functions are used in the premise parts, (i.e., ), where and are two adjustable parameters of the th membership function of the th fuzzy rules. Then as they can easily be obtained from (23)
(20) (24) and is called the robust cardinality where of cluster . The robust cardinality is a measure about whether the considered cluster can be merged into its adjacent cluster; i.e., the agglomeration process. When the robust cardinality is less than a pre-specified constant , cluster is discarded. can be obtained by substituting (20) into (5) as In (20),
It is worth noting that the proposed RFRA clustering algorithm may need more computation time due to complicated formulas. Nevertheless, as shown in later simulations, the proposed algorithm can have better learning performance than other approaches do in all examples. Since the modeling process is offline learning, the computation time is not so important. We believe that the proposed approach is in general better than others.
(21) IV. THE ROBUST LEARNING ALGORITHM can be eliminated by using (21) and then (20) is Now, rewritten as (22) is rewhere ferred to as the weighted average of cardinalities. The proposed RFRA clustering algorithm is described in the is the number of clusters at following. In the algorithm, the th iteration. The initial value of is set to a large value (as in the algorithm). , and Step 1) Set the stop criterion (i.e., the acceptable error or the ). maximum epoch number Step 2) Compute the consequent parameter sets by using by using (3) for and (19) and . by (8), by (6) and Step 3) Update the weights by (22).
The fuzzy rule model obtained by the RFRA clustering algorithm is only a rough approximation to the desired model due to no fine-tuning process in the algorithm. After obtaining fuzzy rules, they can then be adjusted by supervised learning algorithms to improve the modeling accuracy. This stage is referred to as the fine-tuning process. In our approach, a robust learning algorithm [17]–[21] is employed to adjust the parameters of TSK fuzzy rules in the fine-tuning process. An important feature of robust learning algorithms is to use a loss function in place of the quadratic form of the cost function in the backpropagation (BP) algorithm. Based on this idea, a robust cost is defined as function (25) is the loss function, which directly stems from the rowhere bust statistics theory [23], [24]. Various loss functions are used in the literature [15], [16]. In this paper, the tanh-estimator [17], [18], [21] is used and is defined as (26) shown at the bottom of and are time-dependent cutoff points, the page, where
(26)
CHUANG et al.: ROBUST TSK FUZZY MODELING FOR FUNCTION APPROXIMATION WITH OUTLIERS
815
desired output; output of the TSK fuzzy model; th rule output of the TSK fuzzy model; , defined as follows. is defined as . Let be the First, when the minimization in occurs; i.e., . Then, when
(28) For consequent parts are updated as
(a)
. The parameters of
(29)
(b) Fig. 4. (a) The curve of the tanh estimator and (b) its influence function with cutoff points a = 1:5 and b = 3.
and and are constants selected as 1.73 and 0.93, respectively. The tanh estimator and its derivation, called the influence function, are shown in Fig. 4 for illustration. The shape of depends on the probabilistic distribution of the obtained errors and in (24). Basically, and and on the cutoff points depend on outliers. The CUTOFF algorithm used in [17] is also adopted in this paper. The CUTOFF algorithm is stated as follows. Let be the upper bound of the percentage of outliers and can be defined by the in the training data set, and following steps: Step 1) Compute . in an increasing order and define Step 2) Sort for and . Since TSK fuzzy models are described as (1) and its output is obtained as (2), then for the th training pattern, the parameters of premise parts are updated as (27) where derivative of (i.e., learning constant;
);
where is the learning constant. Finally, the procedure of the used robust learning algorithm is stated as follows: and are chosen Step 1) The initial cutoff points as 4 and 8, respectively. Set the upper bound of the percentage of outliers in the training data set as . , compute the estiStep 2) For a training pattern . mated result by (1) and (2) and its errors Step 3) Update the premise and consequent parameters using (27), (28), and (29). and by the Step 4) Determine the cutoff points CUTOFF algorithm. defined by Step 5) Compute the robust cost function (25). , then go to Step 2; otherwise terminate Step 6) If the tuning process. V. SIMULATION RESULTS In this section, five examples are tested to verify the validity of the proposed algorithm. In the first example, a simple function is considered and is defined as (30) 201 training data are generated and the gross error model is used for modeling outliers. The gross error model is defined as (31) is the added noise distribution and and are where probability distributions that occur with probabilities and , respectively. In this example, , and and are the Gaussian distribution with and . The function and those training data generated by the gross error model are shown in Fig. 5(a). In all examples, the learning constants of the learning algorithm are all selected as 0.01, and in the robust learning algorithm, the upper bound of the percentage of outliers in the training data is 0.05. After training, another 402 data pairs are used for evaluating the performance of the learned model. The index used for evaluating
816
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 9, NO. 6, DECEMBER 2001
(a)
(a)
(b) Fig. 5. (a) The training data generated by (30) with the gross error model (“ ”) and the true function (“-”). (b) The approximated results of example 1 using the proposed robust TSK fuzzy modeling (“-1”), the FCRM with the BP learning algorithm (“– –”), the SONFIN with the BP learning algorithm (“1”), and the true function (“-”)
the performance is the root mean square error (RMSE) defined as (32) where number of the test data; actual output; predicted output of the learned TSK fuzzy model for the th training pattern. In the second example, a function used in [19] is considered and is defined as
(b) Fig. 6. (a) The training data generated by (33) with the gross error model (“ ”) and the true function (“-”). (b) The approximated results of example 2 using the proposed robust TSK fuzzy modeling (“-1”), the FCRM with the BP learning algorithm (“– –”), the SONFIN with the BP learning algorithm (“1”), and the true function (“-”).
is 402. In the third example, the two-variable sinc function is is defined as considered. (34) 196 input-output data are used. The values used in the and gross error model are . The sinc function and the used training data are shown in Fig. 7(a). The number of the used test data is 392. The fourth example is a nonlinear function taken from [10] and defined as (35)
(33) 201 input-output data are used. The values used in the gross and . error model are The function and those training data generated by the gross error model are shown in Fig. 6(a). The number of the used test data
289 input-output data are used. The values used in the gross and . error model are The nonlinear system and the used training data are shown in Fig. 8(a). The number of the used test data is 578. Finally, a simple nonlinear autoregressive (NAR) model time series with
CHUANG et al.: ROBUST TSK FUZZY MODELING FOR FUNCTION APPROXIMATION WITH OUTLIERS
817
Fig. 7. (a) The training data generated by the (34) with the gross error model. (b) The approximated result of example 3 using the proposed robust TSK fuzzy modeling. (c) The approximated result of example 3 using the FCRM with the BP learning algorithm. (d) The approximated result of example 3 using the SONFIN with the BP learning algorithm.
the gross error model is considered. This example has also been used in [18] and is defined as (36) (37) where time index; generated by generated by a gross error model with and . generated Fig. 9(a) shows the time series of length is shown from (36) and (37). The scatter plot of versus in Fig. 9(b). The number of the used test data is 400. First, the modeling approach proposed in this paper is employed. In this approach, the RFRA clustering algorithm is first applied to obtain a rough approximation of the TSK fuzzy
model. In the algorithm, the required parameters are set as , and for all examples, except in the fifth example. Afterward, the fine-tuning process is then applied. For verifying the robustness of our approach, both the BP learning algorithm and the robust learning algorithm are used in the fine-tuning process. For comparison, two algorithms of constructing TSK fuzzy models are also implemented in our study. One is the FCRM clustering algorithm with BP learning algorithm and the other is the self-constructing neural fuzzy inference network (SONFIN) [27]. These two algorithms are selected for comparison because they are both TSK fuzzy modeling approaches and possess the structure learning capability, which means their rules are dynamically generated in the training process. As mentioned, the FCRM algorithm is a TSK modeling approach by using the regression type of cluster schemes but
818
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 9, NO. 6, DECEMBER 2001
Fig. 8. (a) The training data generated by the (35) with the gross error model. (b) The approximated result of example 4 using the proposed robust TSK fuzzy modeling. (c) The approximated result of example 4 using the FCRM with the BP learning algorithm. (d) The approximated result of example 4 using the SONFIN with the BP learning algorithm.
without the capability of degrading outliers. In the coarse learning process, similar to our approach, the initial values of the parameters of premise and of consequent parts can simultaneously be determined by the FCRM clustering algorithm. is used in FCRM and it A weighting exponent parameter is set as 2 for all examples. Since the cluster number must be defined in advance for the FCRM algorithm, for a fair comparison, the cluster number obtained in our RFRA clustering algorithm is used for the FCRM algorithm. The stopor . ping criterion is Notice that for the FCRM approach, the obtained model is further tuned by the BP learning algorithm in the fine-tuning process.
SONFIN is basically a fuzzy modeling approach being equipped with structure learning capability. In the approach, fuzzy subspaces are defined by clustering the input portion of the training data. We have implemented this approach for various applications [28], [29] and its results have all demonstrated good performance. Thus, in this study, SONFIN is also implemented for comparison. Since after one epoch of training, the SONFIN algorithm has already defined the rule number and roughly fuzzy rules, it is nature to consider the first epoch as the coarse learning phase. For the fine-tuning process, 3000 epochs of training are performed for all examples. In the SONFIN algorithm, the rule threshold, the similarity threshold of input fuzzy sets and the similarity threshold of output fuzzy
CHUANG et al.: ROBUST TSK FUZZY MODELING FOR FUNCTION APPROXIMATION WITH OUTLIERS
(a)
819
(b)
(c) Fig. 9. (a) The time series generated by the (36) and (37). (b) The scatter plot of y versus y . (c) The approximated results of example 5 using the proposed robust TSK fuzzy modeling (“1”), the FCRM with the BP learning algorithm (“2”), the SONFIN with the BP learning algorithm (“3”), and the true model predictor (“4”).
sets are selected to generate the same number of rules as that in our approach for a fair comparison. The detailed definitions of thresholds can be found in [27]. The simulation results for five examples under different learning algorithms are tabulated in Tables I–V, respectively. The obtained rule numbers for five examples are 3, 5, 9, 11, and 5, respectively. The rule thresholds used in SONFIN for five examples are chosen as 0.01, 0.035, 0.025, 0.045, and 0.2, respectively. The similarity threshold of input fuzzy sets and the similarity threshold of output fuzzy sets in SONFIN are all selected as 0.85 and 0.85, respectively, for all examples. From the results, it is clear that the testing RMSEs of the proposed robust TSK fuzzy modeling (i.e., the RFRA clustering algorithm with
the robust learning algorithm) is lower than that of the others. In the coarse learning process, both the FCRM and SONFIN learning algorithms adopt the least square criterion and then are sensitive to outliers. Thus, it can be found that the FCRM clustering algorithm and the SOFIN learning algorithm are not robust against outliers when compared to our RFRA clustering algorithm. However, the RFRA clustering algorithm needs a longer computation time for better initialization of TSK fuzzy models. For the fine-tuning process, the BP learning algorithm also adopts the least square criterion. Hence, the testing RMSEs of the RFRA with BP learning algorithm is worse than that with the robust learning algorithm. From the above results, it is clearly evident that the proposed approach indeed has very
820
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 9, NO. 6, DECEMBER 2001
TABLE I THE TESTING RMSES OF THE TSK FUZZY MODELS THAT ARE OBTAINED BY DIFFERENT LEARNING ALGORITHMS FOR EXAMPLE 1
TABLE IV THE TESTING RMSES OF THE TSK FUZZY MODELS THAT ARE OBTAINED BY DIFFERENT LEARNING ALGORITHMS FOR EXAMPLE 4
TABLE II THE TESTING RMSES OF THE TSK FUZZY MODELS THAT ARE OBTAINED BY DIFFERENT LEARNING ALGORITHMS FOR EXAMPLE 2
TABLE V THE TESTING RMSES OF THE TSK FUZZY MODELS THAT ARE OBTAINED BY DIFFERENT LEARNING ALGORITHMS FOR EXAMPLE 5
TABLE III THE TESTING RMSES OF THE TSK FUZZY MODELS THAT ARE OBTAINED BY DIFFERENT LEARNING ALGORITHMS FOR EXAMPLE 3
worse than that of our approach (0.0272). It is evident that even though the number of clusters (rules) is known, the results of the FCRM clustering algorithm with the classical learning algorithm still cannot obtain satisfactory results when outliers exist. VI. CONCLUSION
excellent performance when outliers exist. Finally, for illustration, the approximated results of examples using the proposed robust TSK fuzzy model, the FCRM with the BP learning algorithm and the SONFIN with the BP learning algorithm are shown in Figs. 5(b), 6(b), 7(b)–(d), 8(b)–(d), and 9(c) for those five examples, respectively. Finally, it can be found that the function used in Example 1 consists of two line segments. Then, it may be possible to set in the FCRM algorithm. After the the number of rules as coarse learning process, the testing RMSE of the first approximation of the TSK fuzzy model obtained by the FCRM algorithm is 0.0529. After the fine-tuning process, the testing RMSE using the BP learning algorithm after 3000 epochs is 0.0587. , but are still These results are better than that of using
Various TSK modeling approaches have been proposed in the literature. Most of them define their fuzzy subspaces by clustering training data based on either the input portion of training data only or the entire input-output space. However, those approaches do not take into account the functional properties in TSK fuzzy models. Even though the FCRM algorithm has adopted the fuzzy regression concept into the modeling scheme, the approach still need users to specify the cluster number, which is supposed to be unknown. Besides, the FCRM algorithm cannot prevent itself from the corruption of outliers. Even though various robust learning algorithms have been proposed in the neural network and pattern recognition community, those approaches cannot be applied into the construction of TSK fuzzy models. In this paper, a robust TSK fuzzy modeling approach is proposed. In this approach, a novel clustering algorithm termed as RFRA is proposed to simultaneously define fuzzy subspaces and find the parameters in consequent parts of TSK rules. This clustering algorithm not only finds regression instead of clustering for rules, but also has robust capability against outliers. Besides, in our approach, a robust fine-tuning algorithm is further employed to obtain a more precision model. The proposed robust TSK fuzzy
CHUANG et al.: ROBUST TSK FUZZY MODELING FOR FUNCTION APPROXIMATION WITH OUTLIERS
modeling approach is tested for various examples and indeed showed superior performance in our simulation. REFERENCES [1] T. Takagi and M. Sugeno, “Fuzzy identification of systems and its applications to modeling and control,” IEEE Trans. Syst., Man, Cybern., vol. 15, pp. 116–132, 1985. [2] M. Sugeno and T. Yasukawa, “A fuzzy-logic-based approach to qualitative modeling,” IEEE Trans. Fuzzy Syst., vol. 1, pp. 7–31, 1993. [3] L. X. Wang, Adaptive Fuzzy System and Control: Design and Stability Analysis. Englewood Cliffs, NJ: Prentice-Hall, 1994. [4] R. R. Yager and D. P. Filev, Essentials of Fuzzy Modeling and Control, New York: Wiley, 1994. [5] J. S. R. Jang, C. T. Sun, and E. Mizutani, Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence. Englewood Cliffs, NJ: Prentice-Hall, 1997. [6] C. T. Lin and C. S. G. Lee, Neural Fuzzy Systems: A Neural Fuzzy Synergism to Intelligent Systems. Englewood Cliffs, NJ: Prentice-Hall, 1996. [7] J. A. Dickerson and B. Kosko, “Fuzzy function approximation with ellipsoidal rules,” IEEE Trans. Syst., Man, Cybern., vol. 24, pp. 542–560, 1996. [8] A. Kroll, “Identification of functional fuzzy models using multi-dimensional reference fuzzy sets,” Fuzzy Sets Syst., vol. 80, pp. 149–158, 1996. [9] F. Klawonn and R. Kruse, “Constructing a fuzzy controller from data,” Fuzzy Sets Syst., vol. 85, pp. 177–193, 1997. [10] E. Kim, M. Park, J. Seunghwan, S. Ji, and M. Park, “A new approach to fuzzy modeling,” IEEE Trans. Fuzzy Syst.s, vol. 5, pp. 328–337, 1997. [11] J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithm, New York: Plenum, 1981. [12] A. K. Jain and R. C. Dubes, Algorithms for Clustering Data. Englewood Cliffs, NJ: Prentice-Hall, 1988. [13] R. N. Dave and R. Krishnapurum, “Robust clustering methods: A unified view,” IEEE Trans. Fuzzy Syst., vol. 5, pp. 270–293, 1997. [14] H. Frigui and R. Krishnapuram, “A robust competitive clustering algorithm with applocations in computer vision,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 21, pp. 450–465, 1999. [15] A. Cichocki and R. Unbehauen, Neural Networks for Optimization and Signal Processing, New York: Wiley , 1993. [16] P. J. Rousseeuw and M. A. Leroy, Robust Regression and Outlier Detection, New York: Wiley, 1987. [17] D. S. Chen and R. C. Jain, “A robust back-propagation learning algorithm for function approximation,” IEEE Trans. Neural Networks, vol. 5, pp. 467–479, 1994. [18] J. T. Connor, R. D. Martin, and L. E. Atlas, “Recurrent neural networks and robust time series prediction,” IEEE Trans. Neural Networks, vol. 5, pp. 240–254, 1994. [19] K. Liano, “Robust error measure for supervised neural network learning with outliers,” IEEE Trans. Neural Networks, vol. 7, pp. 246–250, 1996. [20] V. David and A. Sanchez, “Robustization of learning method for RBF networks,” Neurocomputing, vol. 9, pp. 85–94, 1995. [21] L. Huang, B. L. Zhang, and Q. Huang, “Robust interval regression analysis using neural network,” Fuzzy Sets Syst., vol. 97, no. 3, pp. 337–347, 1998. [22] A. K. Jain and R. C. Dubes, Algorithms for Clustering Data. Englewood Cliffs, NJ: Prentice-Hall, 1988. [23] D. M. Hawkins, Identification of Outliers. London, U.K.: Chapman & Hall, 1980. [24] F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel, Robust Statistics: The Approach Based on Influence Functions, New York: Wiley, 1986.
821
[25] S. Z. Li, Markov Random Field Modeling in Computer Vision, New York: Springer Verlag, 1995. [26] C.-C. Chuang, S.-F. Su, and C.-C. Hsiao, “The annealing robust backpropagation (ARBP) learning algorithm,” IEEE Trans. Neural Networks, vol. 11, pp. 1067–1077, 2000. [27] C. F. Juang and C. T. Lin, “An on-line self-constructing neural fuzzy inference network and its applications,” IEEE Trans. Fuzzy Syst., vol. 6, pp. 12–32, 1998. [28] G. Y. Chen, “On the Study of the Learning Performance for Neural Networks and Neural Fuzzy Networks,” Master Thesis, National Taiwan University of Science and Technology, 1998. [29] S. R. Huang, “Analysis of Model-Free Estimators—Applications on Stock Market with the Use of Technical Indices,” Master Thesis, Department of Electrical Engineering, National Taiwan University of Science and Technology , 1999.
Chen-Chia Chuang received the B.S. degree and the M.S. degree from the National Taiwan Institute of Technology, Taipei, Taiwan, and the Ph.D. degree in electrical engineering from the National Taiwan University of Science and Technology, Taipei, Taiwan in 1991, 1993, and 2000, respectively. He is currently an Assistant Professor with the Department of Electronic Engineering, Hwa-Hsia College of Technology and Commerce, Taiwan, R.O.C. His current research interests include neural networks, data mining, neural fuzzy systems, and signal processing.
Shun-Feng Su received the B.S. degree, from National Taiwan University, Taiwan, R.O.C., the M.S. and Ph.D. degrees from Purdue University, West Lafayette, IN, all in electrical engineering, in 1983, 1989 and 1991, respectively. He is currently an associate Professor in the Department of Electrical Engineering, National Taiwan University of Science and Technology, Taiwan, R.O.C. His current research interests include neural networks, fuzzy modeling, machine learning, virtual reality simulation, data mining, and intelligent control.
Song-Shyong Chen received the B.S. and M.S. degrees in electrical engineering from National Taiwan Institute of Technology, Taipei, Taiwan, in 1992 and 1994, respectively. He is currently working toward the Ph.D. degree in the Department Electrical Engineering at the National Taiwan University of Science and Technology, Taipei, Taiwan. His current research interests include nonlinear control, fuzzy modeling and control, and robust adaptive control.