A Gradient-Descent-Based Approach for Transparent ... - IEEE Xplore

12 downloads 0 Views 410KB Size Report
A Gradient-Descent-Based Approach for Transparent. Linguistic Interface Generation in Fuzzy Models. Long Chen, C. L. Philip Chen, Fellow, IEEE, and Witold ...
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 5, OCTOBER 2010

1219

A Gradient-Descent-Based Approach for Transparent Linguistic Interface Generation in Fuzzy Models Long Chen, C. L. Philip Chen, Fellow, IEEE, and Witold Pedrycz, Fellow, IEEE

Abstract—Linguistic interface is a group of linguistic terms or fuzzy descriptions that describe variables in a system utilizing corresponding membership functions. Its transparency completely or partly decides the interpretability of fuzzy models. This paper proposes a GRadiEnt-descEnt-based Transparent lInguistic iNterface Generation (GREETING) approach to overcome the disadvantage of traditional linguistic interface generation methods where the consideration of the interpretability aspects of linguistic interface is limited. In GREETING, the widely used interpretability criteria of linguistic interface are considered and optimized. The numeric experiments on the data sets from University of California, Irvine (UCI) machine learning databases demonstrate the feasibility and superiority of the proposed GREETING method. The GREETING method is also applied to fuzzy decision tree generation. It is shown that GREETING generates better transparent fuzzy decision trees in terms of better classification rates and comparable tree sizes. Index Terms—Fuzzy clustering, fuzzy decision tree, fuzzy systems, linguistic interface, membership functions, transparency and interpretability.

I. I NTRODUCTION

L

INGUISTIC interface is a group of linguistic terms or fuzzy descriptions that describe variables in a system utilizing corresponding membership functions. It maps data between linguistic and numeric values. It has been shown that linguistic interface is the foundation for many computing research areas such as computing with words [1], granular computing [2], and linguistic dynamic systems [3]. In addition, linguistic interfaces are usually generated first in many fuzzy models such as fuzzy decision trees [4]–[6], [47], fuzzy multiplexers [7], fuzzy association rules [39], [45], and fuzzy neural networks [8], [9], [23], [46], [48]. The first step in generating a linguistic interface for a system variable is to generate the membership functions for the corresponding linguistic terms. Usually, the membership functions are defined by the experts; thus, the success of a linguistic interface relies on the fact if the experts comprehend the knowledge of the fuzzy sets such that the membership functions will be constructed appropriately. The horizontal and vertical approaches [10] are the methods that belong to this category.

Manuscript received March 11, 2009; revised June 22, 2009 and October 22, 2009; accepted October 26, 2009. Date of publication December 4, 2009; date of current version September 15, 2010. This work was supported by NASA under Grant NNC04GB35G. This paper was recommended by Associate Editor S.-F. Su. L. Chen and C. L. P. Chen are with the Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX 78249-0669 USA (e-mail: [email protected]; [email protected]). W. Pedrycz is with the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6G 2V4, Canada, and also with the Systems Research Institute, Polish Academy of Sciences, 01-447 Warsaw, Poland (e-mail: [email protected]). Digital Object Identifier 10.1109/TSMCB.2009.2036443

More desirable methods are the ones that automatically generate linguistic interfaces and the corresponding membership functions from historical data—in particular, building membership functions through the clustering of the historical data [11]– [16], [30], [41], [45]. In recent numeric data based fuzzy modeling techniques, the transparency or the interpretability of fuzzy models has been receiving more and more interests [17], [28], [29]. Here, we use transparency and interpretability interchangeably as they have been referred to as the same. As the first step of fuzzy modeling, the transparency of linguistic interfaces is of great importance because they completely or partly decide the interpretability of fuzzy models. As a result, in [17], the interpretability of the linguistic interfaces is also referred to as the “low-level interpretability” of fuzzy models. In short, a transparent linguistic interface is a linguistic interface that considers the interpretability criteria desired for the interface. Although some attempts are made to generate linguistic interfaces that consider one or two interpretability criteria [13]–[16], [30], there are so many interpretability criteria available, and it is difficult to cover all possible considerations. Therefore, it is still a challenging problem to generate linguistic interfaces effectively that include most of the interpretability criteria [17]. As an attempt to solve the problem, this paper proposes a GRadiEnt-descEnt-based Transparent lInguistic iNterface Generation (GREETING) method that considers most of the important interpretability criteria. This paper is organized as follows. Section II discusses the linguistic interface generation methods in the literature and the most desirable criteria used in interpretability. Advantages and disadvantages of known approaches are reviewed. In Section III, a new method that generates transparent linguistic interfaces is proposed. The new method takes into account all the important interpretability criteria and tries to balance some of the criteria in conflicts. The gradient descent approach is used in the proposed learning algorithm. To demonstrate the feasibility and superiority of the proposed method, in Section IV, a comparison of different linguistic interface generation methods is performed on the famous Iris data set from the University of California, Irvine (UCI) machine learning databases [26]. In the same section, the proposed GREETING method is applied to a fuzzy decision tree induction. The interface generated by GREETING exhibits a better classification rate and a comparable tree size. Finally, the conclusion is given in Section V. II. T RANSPARENT I NDICES AND P REVIOUS M ETHODS A. Desired Transparent Properties and Their Indices For a system variable x in the universe of domain of U , if there are N historical data [x1 , x2 , . . . , xN ], a linguistic

1083-4419/$26.00 © 2009 IEEE

1220

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 5, OCTOBER 2010

Fig. 1. Nonconvex fuzzy set.

Fig. 3.

Fig. 2. Linguistic interface (top) without leftmost/rightmost fuzzy sets and (bottom) with leftmost/rightmost fuzzy sets.

interface F = [F1 , F2 , . . . , Fc ] is anticipated to be generated from the historical data. Here, Fi is a fuzzy set representing a linguistic term, and Fi (x) is the notation of the corresponding membership function. To make the linguistic interface F interpretable, desirable properties and widely used criteria are introduced in the following [17], [18], [22]. 1) Normality: A single fuzzy set Fi representing a linguistic term in the interface should be normal. The normality of a fuzzy set means that at least one data called prototype in the universe of domain U will be with full membership belonging to this fuzzy set or the concept semantically represented by the fuzzy set (the linguistic term). Formally, normality is defined as ∀Fi , ∃x ∈ U : Fi (x) = 1.

(1)

2) Convexity: The convexity of a fuzzy set is defined as ∀a, b, c ∈ U : a ≤ c ≤ b → Fi (c) ≥ min {Fi (a), Fi (b)} .

(2)

Convexity means that the other datum in the universe of domain will have decreased membership grade as their similarity with the prototype decreases. Fig. 1 shows a fuzzy set without convexity because, in this fuzzy set, a ≤ c ≤ b, but Fi (c) < min{Fi (a), Fi (b)}. This leads to the fact that data b has less similarity with the prototype a than c; it even holds a higher membership grade than c. This is in conflict with interpretability. 3) Leftmost and Rightmost Fuzzy Sets: Leftmost and rightmost fuzzy sets require that the leftmost membership function has a value of one for the left limit value of the universe of domain and that the rightmost membership function has a value of one for the right limit value of the universe of domain. Formally ∃F1 , Fc ∈ F : Fc (max U ) = 1 F1 (min U ) = 1.

(3)

Leftmost/rightmost fuzzy sets are of great importance in depicting linguistic terms like “low” and “high” on the boundary of the universe of domain. Fig. 2 shows linguistic interfaces with and without the leftmost/rightmost ones. 4) α-Completeness/Coverage: α-completeness is defined as ∀x ∈ U, ∃Fi ∈ F : Fi (x) ≥ α.

(4)

Two neighboring fuzzy sets with 0.5 and 0.25 overlappings.

This criterion makes sure that all the fuzzy sets in the linguistic interface will totally cover the universe of domain U . Moreover, usually, the coverage level α ≥ 0.5 is suggested because it assured that each data in the domain is well represented by at least one fuzzy set (or linguistic term) in the interface [18], [42], [43]. 5) Distinguishability: Distinguishability makes sure that two neighboring fuzzy sets in the linguistic interface do not overlap too much. Thus, the neighboring fuzzy sets (or linguistic terms) will have distinct semantic meaning. Formally, we want ∀Fi , Fi+1 ∈ F : sup min {Fi (x), Fi+1 (x)} ≤ ϑ.

(5)

x∈U

Based from Sections II-A4 and II-A5, assume that Fi and Fi+1 are overlapped at point d, and then, we get α ≤ Fi (d) = Fi+1 (d) = min {Fi (d), Fi+1 (d)} ≤ supx∈U min {Fi (x), Fi+1 (x)} ≤ ϑ. In other words, the distinguishability level is ϑ ≥ α. As mentioned in Section II-A4, α is suggested to be ≥0.5. Therefore, ϑ ≥ 0.5. However, to have a distinct semantic meaning for neighboring fuzzy sets, having a ϑ that is smaller is better. With the constraint of ϑ ≥ 0.5, a decent value for ϑ is 0.5. As a result, Fi (d) = Fi+1 (d) = 0.5, or in other words, the neighboring fuzzy sets are 0.5 overlapped. For example, Fig. 3 shows two neighboring fuzzy sets with 0.5 and 0.25 overlapings. Although the 0.25 overlapping offered two more distinguishable fuzzy sets, for the overlapped point d, it is only covered by Fi and Fi+1 with 0.25-completeness, or in other words, point d cannot be well represented by neither Fi nor Fi+1 . This is not what we desired. 6) Complementarility: Complementarility of a linguistic interface F is defined as c 

Fk (x) = 1

∀x.

(6)

k=1

This criterion assures that the universe of domain is divided into a Bezdek partition [19] and is widely used in fuzzy clustering. For example, in fuzzy c-means (FCM) clustering algorithm, (6) is a must requirement. Complementarility’s contribution to a linguistic interface is discussed in [20]. One natural advantage of adopting this criterion is that, if the membership of one linguistic term decreases from one, the membership of its neighboring linguistic term ought to increase from zero. Therefore, when one linguistic term does not represent a value in the universe of domain very well, there will be another linguistic term to represent that value. 7) Information Loss: Converting numeric data into linguistic data is an encoding process; on the other hand, defuzzification is the corresponding decoding procedure. Therefore, information loss after encoding and decoding should be minimized.

CHEN et al.: GRADIENT-DESCENT-BASED APPROACH FOR TRANSPARENT LINGUISTIC INTERFACE GENERATION

Given historical data [x1 , x2 , . . . , xN ] in the universe of domain, the original data xi are mapped into c linguistic terms [F1 , F2 , . . . , Fc ] with prototypes of [o1 , o2 , . . . , oc ]. This can be regarded as an encoding process. To rebuild the data xi from the memberships [F1 (xi ), F2 (xi ), . . . , Fc (xi )] and prototypes, the decoding process calculates the weighted average of prototypes as the decoded value of xi (i.e., the defuzzified value x ˆi [21])  c c   x ˆi = Fk (xi ) · ok Fk (xi ). (7) k=1

k=1

In order to minimize information loss from encoding to decoding, the representation error V or the difference between all the original data and the decoded data should be minimized [21]   c 2 N N c     2 xi − V= (xi − x ˆi ) = Fk (xi )·ok Fk (xi ) . i=1

i=1

k=1

k=1

(8) The representation error V is also considered in [22] as a constraint in ensuring the linguistic interface’s coverage. 8) Reconcile With Data’s Distribution: Because the linguistic interface is generated from historical data, the linguistic interface or the family of fuzzy sets in the interface should reconcile with the distribution of the historical data. The linguistic terms or fuzzy sets should be placed onto the data intensive places. This rationale is supported by the fact that the generation of the fuzzy sets from numeric data is mostly done by clustering algorithms. The most widely used index to measure if the resulting linguistic interface reconciles with the data’s distribution is the one used by the FCM algorithm and its variants Q=

c  N 

Fk (xj )m (xj − ok )2 .

(9)

k=1 j=1

Q represents the sum of the dissimilarities (distances) of individual data to the prototypes [o1 , o2 , . . . , oc ]. m is the fuzzification coefficient, which usually takes the value of two. B. Previous Linguistic Interface Generation Methods In order to generate a group of fuzzy sets to represent a linguistic interface from the historical data, many methods are proposed. 1) FCM Method [11], [31], [45]: Given a number of historical examples of a numeric variable [x1 , x2 , . . . , xN ] and the anticipated number of linguistic terms c, the FCM algorithm partitions these examples into c clusters by minimizing the objective function—the dissimilarity index Q introduced in Section II-A8. The complementarility property in Section II-A6 is also imposed in the FCM algorithm. The FCM iteratively updates the prototypes [o1 , o2 , . . . , oc ] and memberships Fk (xi ) through the following equations [19]  N N   m ok = Fk (xi ) · xi Fk (xi )m (10) i=1



Fk (xi ) = 1

i=1 c   j=1

xi − ok xi − oj

2  m−1

.

(11)

1221

Fig. 4. Linguistic interface generated by the FCM algorithm.

This iteration will stop when maxi,k (|Fk (xi )new − Fk (xi )old |) < e, where e is a termination criterion. This alternating optimization procedure helps in finding values of prototypes and memberships that achieve a saddle point or a local minimum of the objective function Q [19]. After obtaining the prototypes and membership matrix (Fk (xi )), c fuzzy sets F1 , F2 , . . . , Fc are defined as  Fk (x) = 1

 c   x − ok m−1 2

j=1

x − oj

.

(12)

Although the FCM method considers the distribution of the historical data and the complementarility of the resulting linguistic interface, most of the other interpretability criteria are violated. For example, Fig. 4 shows a typical result of using the FCM method. It is clear that the convexity of the fuzzy sets and the leftmost and rightmost fuzzy sets are missing. Observing the disadvantages of the FCM method, several improvements have been proposed in the following. 2) FCM+Approximation Method [15]: To avoid nonconvex fuzzy sets in the resulting linguistic interface, some researchers suggested to use some shaped membership functions with adjustable parameters to approximate the membership matrix (Fk (xi )) collected in FCM [15], [58]. We summarize these methods as the FCM+Approximation approach. For example, the study in [15] used Gaussian, triangular, and trigonometricshaped fuzzy sets to approximate the membership matrix. It is shown that the trigonometric-shaped fuzzy set generates the best approximation rate. The disadvantages of the FCM+Approximation approach are the following: First, it consumes too much time on approximation; second, the inevitable approximation error deteriorates the performance of linguistic interface on dissimilarity index Q and, sometimes, violates the complementarility and distinguishability criteria. 3) ACE Method [14], [32], [40]: Observing that nonconvex fuzzy sets were generated by FCM, alternating clustering estimation (ACE) was proposed to generate more interpretable linguistic interface [32], [40]. ACE adopted an alternative process to update the prototypes [o1 , o2 , . . . , oc ] and memberships Fk (xi ). However, at the stage of generating memberships, it directly calculates the memberships Fk (xi ) using some shaped fuzzy sets like triangular or some trigonometric-shaped ones. This method assured that the resultant fuzzy sets in the linguistic interface follows the predefined shaped functions; therefore, convexity and distinguishability are established. However, ACE “may or may not optimize a particular objective function” [32]. That is, the property of achieving a local minimum of the objective function Q in FCM algorithm is no longer guaranteed.

1222

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 5, OCTOBER 2010

bership from 0 at the previous fuzzy set center oj−1 to 0.5 at the overlapping point χj−1 and, finally, to 1 at the current fuzzy set center oj Fj (x) = 1 − Z(x, oj−1 , χj−1 , oj , K2j−3 , K2j−2 ).

Fig. 5. Preshaped triangular fuzzy sets generated over prototypes.

Similar to the ACE method, one-step clustering method that discards the optimization of Q has been also widely used [13], [23], [33]–[35], [44]. 4) One-Step Clustering Method: In this method, the prototypes are derived from the clustering algorithms like FCM and hard k-means. Then, shaped fuzzy sets like triangular or Gaussian ones are directly built over the prototypes [13], [23], [33]–[35]. For example, Fig. 5 shows the triangular fuzzy sets generated over prototypes derived from clustering algorithms. Some research like [36] does not even generate the prototypes via clustering algorithms. They just uniformly put the prototypes on the universe of domain and then build shaped fuzzy sets on prototypes. Table I summarizes different linguistic interface generation methods according to the desirable interpretability criteria. Clearly, no previous method considers the information loss problem in the linguistic interface. Moreover, all the former methods only consider part of the desirable criteria. In the table, we also show that the proposed GREETING method meets all the desired interpretability criteria which will be discussed next. III. P ROPOSED GREETING M ETHOD A. Proposed Linguistic Interface and Its Satisfactions to the Desired Interpretability Criteria Because of their simplicity, trigonometric-shaped fuzzy sets and their simplified versions, which are the triangular and trapezoidal fuzzy sets, are widely used in linguistic interfaces of fuzzy modeling [15], [37], [38]. Reference [15] has also demonstrated the similarity of trigonometric-shaped fuzzy sets to FCM’s resultant membership matrix. Mathematically, for the system variable x, given the c centers/prototypes (o1 , o2 , . . . , oc ), the c generalized trigonometric-shaped fuzzy sets in the linguistic interface are defined as F1 , F2 , . . . , Fc , in which the leftmost fuzzy set’s membership function Fj (x)(j = 1) is a Z-shaped function that decreases its membership from 1 at the center oj to 0.5 at χj (the overlapping point of current fuzzy set Fj and next fuzzy set Fj+1 ) and, finally, to 0 at the next fuzzy set center oj+1 Fj (x) = Z(x, oj , χj , oj+1 , K2j−1 , K2j ) ⎧ 1, if x ≤ oj ⎪

K2j−1 ⎪ ⎪ x−o ⎪ j −1 ⎨1 − 2 , if oj < x ≤ χj χj −oj Δ (13) =

K2j oj+1 −x ⎪ −1 ⎪ 2 , if χ < x < o ⎪ j j+1 o −χ ⎪ j+1 j ⎩ 0, if x ≥ oj+1 . The rightmost membership function Fj (x)(j = c) is an S-shaped or inverted Z-shaped function that increases its mem-

(14)

In the middle, Fj (x) (j = 2, 3, . . . , c − 1) is a combination of S- and Z-shaped functions (i.e., the π-shaped fuzzy set) that increases its membership from 0 at the previous fuzzy set center oj−1 to 0.5 at the cross point χj−1 and continues to 1 at the current fuzzy set center oj . Then, membership decreases from 1 at the current fuzzy set center oj to 0.5 at the cross point χj and, finally, to 0 at the fuzzy set center oj+1 Fj (x) = Z(x, oj , χj , oj+1 , K2j−1 , K2j ) · (1 − Z(x, oj−1 , χj−1 , oj , K2j−3 , K2j−2 )) . (15) In (13)–(15), χj ’s (j = 1, 2, . . . , c − 1) are overlapping points of two neighboring fuzzy sets. K2j and K2j−1 (j = 1, 2, . . . , c − 1) are fuzziness for membership functions. As they increase the value, the fuzziness of the membership function decreases. Both the overlapping points and the fuzziness of the membership functions are adjustable; this makes the generalized trigonometric-shaped fuzzy sets very flexible. An illustrated group of generalized trigonometric-shaped fuzzy sets (when c = 3) with tagged overlapping points and fuzziness rates is shown in Fig. 6. The trigonometric-shaped fuzzy sets used in [15] are special cases of generalized trigonometric fuzzy sets with χj = (oj + oj+1 )/2 and Kj = K for all j. One illustrated group of trigonometric-shaped fuzzy sets (when c = 3) is shown in Fig. 7. More specifically, when K = 1, the most widely used triangular and trapezoidal fuzzy sets are also shown in Fig. 7. Except for simplicity and flexibility, the interpretability of the (generalized) trigonometric-shaped fuzzy sets is the most important reason why they are used in the proposed linguistic interface. By choosing Z-, S-, and π-shaped membership functions, the normality and convexity of the fuzzy sets in linguistic interfaces are satisfied by their shapes. Furthermore, Z and S shapes are assigned to the leftmost and rightmost fuzzy sets, respectively. The complementarility  of the proposed linguistic interface is guaranteed because ck=1 Fk (x) = 1 based from the definitions in (13)–(15). Because the neighboring fuzzy sets overlap at χj (j = 1, 2, . . . , c − 1) and Fj (χj ) = Fj+1 (χj ) = 0.5, the neighboring fuzzy sets will be 0.5 overlapped. The distinguishability and coverage properties are guaranteed as well. When desired transparent properties in Sections II-A1– II-A6 are all satisfied, we have only two more properties to achieve, which are the following: to assure that the resultant linguistic interface reconciles with the historical data’s distribution and that it possesses a small information loss. We need to design the parameters of membership functions such that both the dissimilarity index Q and information loss (i.e., representation error) V are minimized. B. Leaning Algorithm to Optimize Both Q and V To minimize both the dissimilarity index Q and the representation error V , the new objective function P = Q + αV

CHEN et al.: GRADIENT-DESCENT-BASED APPROACH FOR TRANSPARENT LINGUISTIC INTERFACE GENERATION

1223

TABLE I C OMPARISON OF L INGUISTIC I NTERFACE G ENERATION M ETHODS

Fig. 6. Generalized trigonometric-shaped fuzzy sets with tagged overlapping points and fuzziness rates.

Fig. 7. Trigonometric-shaped fuzzy sets with fuzziness rates K = 1, 2, and 3.

is proposed, where α is a parameter that is used to balance the importance of Q and V . One method for selecting α is suggested in the next section on simulations. Indeed, the representation error V defined in (8) is a measure that stands for information loss before and after the encoding and decoding processes of fuzzy models; therefore, it is actually an accuracy measure for fuzzy models. On the other hand, the dissimilarity index Q defined in (9) stands for the fuzzy sets’ reconciliation with the historical data’s distribution. When fuzzy sets reconcile with the distributions of historical data, different fuzzy sets will be placed on different data intensive places, and their distinguishability will be improved. Therefore, P = Q + αV can be regarded as a tradeoff between accuracy and interpretability (distinguishability). If we take α as the regularization coefficient, P = Q + αV can also be treated as a regularization problem, like the one in [49]. Formally, the learning problem is MinP = Q + α V =

c  N 

Fk (xj )m (xj − ok )2

k=1 j=1



N  i=1

 xi −

c  k=1

 Fk (xi ) · ok

c 

2 Fk (xi )

(16)

k=1

where Fj (x) (j = 1, 2, . . . , c) is defined in (13)–(15) and [x1 , x2 , . . . , xN ] are historical data.

In (16), the adjustable variables are prototypes (o1 , o2 , . . . , oc ), the overlapping points χj (j = 1, . . . , c − 1), and the fuzziness rates Kj (j = 1, . . . , 2 ∗ c − 2). The gradient descent learning algorithm is applied to find the local optimal setting of the adjustable variables. Because Fj (x)’s are continuous piecewise functions, the objective function derived from them is continuous piecewise as well. Consequently, at some points, the left-sided derivatives of P over oj , χj , and Kj are different from the right-sided derivatives. More specifically, the objective function P may not be differentiable at points χj = xi or oj = xi because the functions Z(x, oj , χj , oj+1 , K2∗j−1 , K2∗j ) and Fj (x) defined based on Z may not be differentiable at these points. However, just like numerous gradient-descentbased algorithms for nondifferentiable min–max activation functions and/or triangular membership functions in neural networks and fuzzy system learning [51]–[55], nondifferentiability at several points does not impact the efficacy of gradient descent algorithm a lot. In a pragmatic point of view, the objective function is still differentiable almost everywhere, and the probability of falling onto a point of nondifferentiability is zero; the gradient descent algorithm is still convergent with a probability of one [51]–[55]. The simulation experiments in the next section also demonstrate this property. A more formal study of such kind of nondifferentiability problem and a more complex nonsmooth optimization algorithm can be found in [51], [55], and [56]. The gradient-descent-based optimization of P in (16) is listed below. Step 1) Initialize centers oj (j = 1, . . . , c). Let them be uniformly distributed at the domain of the considered system variable x. Set χj = (oj + oj+1 )/2, (j = 1, . . . , c − 1), and Kj = 1 (j = 1, . . . , 2c − 2). Set the learning rate β as a small positive number. Step 2) Update oj , χj , and Kj . a) Calculate the gradient ∇P = (∂P/∂o1 , . . . , ∂P/∂oc , ∂P/∂χ1 , . . . , ∂P/∂χc−1 , ∂P/∂K1 , . . . , ∂P/∂K2c−2 ). b) Update the parameters according to the inverse direction of the gradient, which are the follow= oj − β ∗ ∂P/∂oj , χnew = χj − β ∗ ing: onew j j ∂P/∂χj , and Kjnew = Kj − β ∗ ∂P/∂Kj . c) Because χj ’s are overlapping points of fuzzy sets Fj and Fj+1 , where oj < χj < oj+1 , we need o1 < χ1 < o2 < χ2 · · · < χc−1 < oc . Denote [o1 , χ1 , o2 , χ2 , . . . , χc−1 , oc ] as [r1 , r2 , . . . , r2c−1 ]; what we want is rk < rk+1 for all k. Because new limβ−>0 (rknew − rk+1 ) = limβ−>0 (rk −rk+1 − β (∂P/ ∂rk − ∂P/ ∂rk+1 )) = rk − rk+1 < 0, given a small enough β, we can guarantee

1224

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 5, OCTOBER 2010

new that rknew < rk+1 . Then, if current β makes new new rk ≥ rk+1 , by scaling down its current value, new . we can finally get rknew < rk+1 new d) Update oj = onew , χ = χ , and Kj = Kjnew . j j j Step 3) Calculate Fj (xi ) according to the updated oj , χj , and Kj , and (13)–(15). If maxi,k (|Fk (xi )new − Fk (xi )old |) < e or the iteration number I > Imax , STOP; otherwise, return to Step 2). Here, e is a termination criterion, and Imax is the maximum number of iterations.

Just like the classical gradient descent algorithm, the learning rate β plays an important role in the algorithm [50]. The selection of the learning rate is discussed in Section IV, which is the simulation experiment part of this paper. Just as Step 2a) suggested, we finally chose a quite small learning rate in our experiments. To analyze the computational complexity of the algorithm, we assume that the algorithm is stopped after l ≤ Imax iterations. At each iteration, the major computation is the calculation of the gradient ∇P , which can be finished in a linear time O(n), with n being the size of the historical data. Therefore, the total computational complexity of the proposed gradient-descent-based algorithm is O(l ∗ n). The proposed gradient-based algorithm updates all the adjustable parameters in the generalized trigonometric-shaped fuzzy sets—prototypes (o1 , o2 , . . . , oc ), the overlapping points χj (j = 1, . . . , c − 1), and the fuzziness rates Kj (j = 1, . . . , 2 ∗ c − 2)—to minimize the balanced objective functions P = Q + αV . If we preset that χj = (oj + oj+1 )/2 and Kj = K for all j and if we do not adjust them in the gradient descent process, this algorithm is tailored to minimize the balanced objective function with the trigonometric-shaped fuzzy sets used in the linguistic interface. In summary, this section introduced a group of membership functions applied in the linguistic interface and satisfied most of the desired transparency properties discussed in the previous section. To optimize the remaining transparency indices, namely, the dissimilarity index Q and the representation error V , the gradient-descent-based learning algorithm is proposed. The proposed linguistic interface with its learning algorithm is named the GREETING method. IV. S IMULATION E XPERIMENTS A. Proposed GREETING Versus Previous Methods This section applies the proposed method and previous linguistic interface generation methods on the famous Iris data in the UCI machine learning databases [26]. There are four variables in the data set measured to determine the type of Iris plants, and each variable holds 150 historical records. The four variables are “sepal length,” “sepal width,” “petal length,” and “petal width.” The GREETING result is compared with FCM, FCM+Approximation, and ACE. Because the study in [15] has demonstrated the similarity of trigonometric-shaped fuzzy sets to FCM’s resultant membership matrix, the trigonometricshaped fuzzy sets are used in the FCM+Approximation method. Similarly, in the ACE method, the prototypes are updated using (10), and the memberships are updated using the trigonometric-shaped fuzzy sets (i.e, (13)–(15), with

Fig. 8.

Convergence of GREETING based on different learning rates.

χj = (oj + oj+1 )/2 and Kj = K for all j). The one-step method applies the hard c-means algorithm [13] in generating prototypes of clusters first. Afterward, the membership functions are generated based on the prototypes with given trigonometric-shaped fuzzy sets. For the purpose of comparison, in this section, the fuzzy sets used in GREETING are also the trigonometric-shaped fuzzy sets but not the generalized ones. The balance rate α for index Q and V is selected as zero. In this way, the GREETING method also only tries to optimize Q like other methods. All the methods in the experiments use the same termination criterion—0.001. The maximum iteration number of GREETING is 1000. The fuzzification coefficient m is selected as two. There is an additional parameter, which is the learning rate β. The convergence of the objective function Q is shown in Fig. 8 for four learning rates with values of 0.001, 0.005, 0.01, and 0.1 on the variable of the “petal width.” It is clear that the algorithm cannot converge with a too large value of the learning rate (such as 0.1 in this experiment). Without loss of generality, we set the learning rate as a small value, which is 0.001 in all the simulations. Fig. 9 shows the linguistic interfaces for the variable “petal width” generated by different methods. K is selected from one to two to represent different fuzziness of the generated fuzzy sets. Three linguistic terms are used to describe the variable “petal width” (i.e., “small,” “average,” and “large”). In Fig. 9, it is clear that the FCM method generates fuzzy sets that are not always convex, and the leftmost/rightmost fuzzy sets are not achieved. All the other four methods generate convex fuzzy sets and assign the leftmost/rightmost fuzzy sets. As a result, although FCM is worse than the remaining four other methods in considering transparent resulting linguistic terms, it is difficult to identify the best one in the remaining four methods from Fig. 9. However, this also, in one aspect, demonstrates the advantage of using the interpretable trigonometric-shaped fuzzy sets in the linguistic interface. Table II shows the results of the dissimilarity index Q with different fuzziness K for the variable “petal width.” Owing to the performance of the gradient decent algorithm, it is evident that the proposed GREETING method obtains a better Q index than other algorithms except the FCM method. Of course, FCM reaches the best Q index with the price of losing the

CHEN et al.: GRADIENT-DESCENT-BASED APPROACH FOR TRANSPARENT LINGUISTIC INTERFACE GENERATION

Fig. 9.

1225

Generated membership functions using different methods and settings of K. TABLE II Q I NDICES OF D IFFERENT M ETHODS

TABLE III V I NDICES OF D IFFERENT M ETHODS

Fig. 10. Comparison between GREETING with different balance rates α and other methods with different fuzziness rates K.

B. GREETING Method With Balancing Q and V

interpretability properties of the generated linguistic interface. ACE obtains a better performance than FCM+Approximation and one-step methods when the fuzziness factor K is large (in this case, when K ≥ 1.5). Based from the data in Table II, it can be figured out that the best value of K is 2.5. Table III demonstrates information loss index V for the proposed GREETING method and previous methods. It is clear that the GREETING method has a better performance on V than FCM+Approximation and one-step methods. However, interestingly, ACE has a better V index than GREETING when K ≤ 2.0. This demonstrates the importance of the balance rate α. Without α, the GREETING method just tries to search the best performance of Q; thus, the performance on V is deteriorated. Based on the data in Table III, it can also be figured out that, as the fuzziness factor K increases from one, all the methods get a worse performance on V . When K = 1, the trigonometric-shaped fuzzy sets are indeed triangular/trapezoidal ones [as shown in Fig. 9(a)]. These fuzzy sets in linguistic interface achieve the best performance on V ; this is consistent with the study in [37], which has demonstrated that the linguistic interfaces with triangular membership functions offer the least information loss.

This section compares the GREETING method and other methods on both Q and V indices. Because the gradient descent algorithm applied in GREETING can optimize all the adjustable parameters in the flexible generalized trigonometricshaped fuzzy sets, to demonstrate the capability of the GREETING method, the generalized trigonometric-shaped fuzzy sets are used in linguistic interfaces. The balance rate α is selected from 0.00 to 1, with a step of 0.1. For ACE, FCM+Approximation, and one-step method, they cannot optimize so many parameters like GREETING, but trigonometricshaped fuzzy sets with different K’s (from one to three) are also selected. Fig. 10 shows the performance of all the approaches in the 2-D space of Q and V . The benchmark data are the variable “petal width” in the Iris data set. The single “x” point in the left side of Fig. 10 is the result of FCM. Although it gets a good Q index, the nontransparency of its fuzzy sets in the linguistic interface makes it out of favor. In Fig. 10, except the result of FCM, the result of the GREETING method clearly groups the Pareto front over two objectives of Q and V , or in other words, for any solution of other methods, there always is a solution of GREETING that can dominate it (both Q and V are smaller). The aforementioned observation also offers us a way to select linguistic interfaces generated by GREETING with different α’s (i.e., selecting the linguistic interface that dominates the reference result from ACE, FCM+Approximation, or one-step method). For example, we can apply the FCM+Approximation

1226

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 5, OCTOBER 2010

TABLE IV C OMPARISON OF M ETHODS ON Q AND V I NDICES

TABLE V GREETING AND FCM W ITH D IFFERENT m’s

Fig. 11. (a) Histogram of “petal width.” (b) Three linguistic terms generated by FCM. (c) Three linguistic terms generated by GREETING with α = 0. (d) Three linguistic terms generated by GREETING with α = 2.

method to obtain a reference result first. Then, we set α = 0 to 1, with a small step such as 0.1, and apply GREETING based on different α’s. Because GREETING generates a Pareto front, we can select one result of GREETING that dominates FCM+Approximation’s result as our final selection. Using the same procedure, in the following section on fuzzy decision trees, the linguistic interface generated by GREETING is selected as the one that dominates the original one-step method [4]. In summary, Table IV lists the best testing results for all the four variables in the Iris data set based on ACE, FCM, and GREETING linguistic interface generation methods. One-step and FCM+Approximation methods are not listed because their performance is worse than ACE. In Table IV, the proposed GREETING method has the best V index, and its Q is better than ACE. FCM gets the best Q index with the sacrificing of interpretability. As a conclusion, the proposed GREETING method is superior to other linguistic interface generation methods. Table V lists GREETING and FCM’s performance on variable “petal width” with different fuzzification coefficients m. Because the Q index as shown in (9) is a monotonic decreasing function of m, GREETING and FCM’s performance on Q is better with larger m. V index is not directly related to m. However, in Table V, GREETING gets a better V as m increases. Because α is the balance rate that balances V ’s impact on the final optimization objective, it is anticipatable that the larger the α is (more important of V on the final objective), the better is the V index that GREETING obtains. Fig. 11 shows the histograms of the variable “petal width” and the three linguistic terms “low,” “average,” and “high” generated by GREETING and FCM. In which when α = 0, GREETING just tries to optimize the Q index, and its output is similar to the result of FCM. However, it keeps the interpretability of the resulting linguistic terms. When α = 2, GREETING cares more about in minimizing the information loss V , and its output is the linguistic interface with almost

Fig. 12. Q and V indices of linguistic interfaces with different numbers of linguistic terms.

triangular membership functions. This is consistent with [37], which has demonstrated that the linguistic interfaces with triangular membership functions offer the least information loss. This paper, however, does not discuss how to decide the number of linguistic terms in the linguistic interface because it is application driven and user centric, and it is beyond the scope of the discussion. It is similar to choosing the number of clusters in the clustering algorithms. Fig. 12 shows the Q and V indices for the variable “sepal width” in the Iris data. Different numbers of linguistic terms are selected, and the balance rate α is selected from 0 to 1, with a step of 0.1. Clearly, both Q and V incline to decrease as the number of linguistic terms increases. To show that the number of linguistic terms is needed, Fig. 13 lists the famous Bezdek c  2 index [24] (B = (1/N ) N i=1 j=1 (Fj (xi )) ) and Xie–Beni index [25] (S = Q/(n ∗ (mini,j oi − oj 2 ))) for the variable “sepal width.” From the Bezdek index, two linguistic terms are the best, but by the Xie–Beni index, three linguistic terms are the best.

CHEN et al.: GRADIENT-DESCENT-BASED APPROACH FOR TRANSPARENT LINGUISTIC INTERFACE GENERATION

1227

Fig. 13. Bezdek and Xie–Beni indices.

Fig. 15. Comparison of GREETING and original Yuan and Shaw’s method on synthetic data. (a) Synthetic data. (b) Yuan and Shaw. (c) GREETING.

Fig. 14. Example of fuzzy decision tree.

C. Fuzzy Decision Tree Based on Different Linguistic Interfaces Fuzzy decision tree is a natural extension of traditional decision tree. The major difference between fuzzy decision tree and the traditional one is that, at each node of fuzzy decision tree, the considered attribute is compared to a group of linguistic terms depicted by fuzzy sets. For example, in a fuzzy decision tree shown in Fig. 14, the attribute considered in the root node is temperature; the temperature value is then compared to three linguistic terms “hot,” “mild,” and “cold” in the linguistic interface. For each term, the temperature value will have a membership according to the membership function of the linguistic term; therefore, each branch of the root node has a firing strength. Similarly, considering all the branches for each leaf node, we can obtain its firing strength by the minimum of the firing strengths of all the branches from the root node to the leaf node. For example, the firing strength of the leaf node “play” (tennis) in Fig. 14 is the minimum of the firing strength of “temperature is mild” and the firing strength of “wind is mild.” Many fuzzy decision tree induction algorithms are proposed in previous researches [5], [6]. The first step of most of them is to fuzzify the data according to some linguistic interface. Given the fuzzified data, different heuristics, like the minimum fuzzy entropy or the minimum classification ambiguity [5], are applied to induce the fuzzy decision trees. Recently, the more advanced fuzzy decision tree induction and optimization techniques, like rough set techniques [6] and axiomatic fuzzy set (logic) theory [57], are reported. This section only applies the traditional Yuan and Shaw’s induction method [4], and all the parameters’ settings for fuzzy decision trees induction are adopted from [5] in the comparison study. Traditional fuzzy decision tree induction methods use onestep clustering method to determine the linguistic interface

(i.e., generating the prototypes by some clustering algorithm and then defining the preshaped membership functions, such as triangular membership functions directly over the cluster prototypes [4]). Based on different linguistic interfaces, this section compares the performance of fuzzy decision trees on classification problems. Yuan and Shaw’s fuzzy decision tree induction algorithm [4] is first applied on a synthetic data, as shown in Fig. 15(a), using different linguistic interfaces. The 100 data sets are uniformly sampled in the unit box [0, 1] by [0, 1]. The data [x, y] are classified as class 0 if x2 + y 2 < 0.8; otherwise, it is classified as class 1. The GREETING-based linguistic interface is generated by selecting different balance rates α and choosing the one that dominates the Yuan and Shaw’s one-step-based linguistic interface. The classification results are shown in Fig. 15(b) and (c). Because of the flexibility of the GREETING-based linguistic interface, the GREETING-based decision tree has a much better classification rate. More comparisons have been performed on several machine learning benchmark data sets. Yuan and Shaw’s original interface (one-step type), FCM, and GREETING interface are used in comparisons. The GREETING-based linguistic interface is still generated by selecting different balance rates α and choosing the one that dominates the Yuan and Shaw’s one-step-based linguistic interface. The data sets are split into 90% training and 10% testing data; the fuzzy decision tree is induced from training data and tested on the testing data. A tenfold cross validation is carried out, and the average performance of the fuzzy decision trees on the training and testing data and the average decision trees’ node number and leaf’s number are listed in Table VI with standard deviations. In Table VI, it is easy to figure out that the GREETINGbased fuzzy decision trees have the best performance on testing data. The tree sizes (node # or leaf #) are generally smaller. One exception is the diabetes data, in which the GREETING-based decision tree achieves higher training and testing rates but with larger tree size than original Yuan and Shaw’s decision tree. Because the difference between the GREETING’s and Yuan

1228

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 5, OCTOBER 2010

TABLE VI C OMPARISON OF M ETHODS ON D ECISION T REES

Fig. 17. (a) Original Yuan and Shaw’s linguistic-interface-based tree. (b) GREETING linguistic-interface-based tree for wine data.

Fig. 16. (a) Original Yuan and Shaw’s linguistic-interface-based tree. (b) GREETING linguistic-interface-based tree for diabetes data.

and Shaw’s tree sizes is smaller than 0.5, it is still more preferable to use GREETING to have a better classification rate. For example, Fig. 16 lists two typical trees generated by the Yuan and Shaw’s original interface and the GREETING-generated linguistic interface. The Yuan and Shaw’s tree [Fig. 16(a)] is smaller than the GREETING-based tree [Fig. 16(b)]. However, with the additional node, “body mass index,” the GREETING’s tree offers much better classification rate than Yuan and Shaw’s original tree (0.812 versus 0.753). To demonstrate the GREETING-based linguistic interface’s capability of deriving a better tree, Fig. 17 shows two decision trees generated for wine data. The linguistic interfaces generated by Yuan and Shaw’s original method and GREETING are also plotted on trees. It can be figured out that the two trees are the same except for the different linguistic interfaces. However, the testing rate for the GREETING-based tree is 0.833, which is much better than 0.778—the performance of Yuan and Shaw’s original-method-based tree. For the thyroid gland data, the tree size of FCM is smaller than that of GREETINGs, but the noninterpretable fuzzy sets in the FCM-generated linguistic interface and the better classifica-

tion rate of GREETING make GREETING still the preferable choice for linguistic interface generation. In summary, because of better interpretability of generated linguistic interfaces and better classification rates with comparable tree sizes, the proposed GREETING method is the best choice for linguistic interface generation in transparent fuzzy decision tree induction.

V. C ONCLUSION This paper has proposed a GREETING approach to overcome the disadvantage of traditional linguistic interface generation methods where the consideration of the interpretability aspects of linguistic interface is limited. At first, a group of fuzzy sets is applied in linguistic interfaces to assure most of the desired transparency properties of linguistic interfaces. Then, through a gradient descent method, the transparency indices in conflicts—the dissimilarity index Q and the information loss index V are optimized. The numeric experiments on the data sets from UCI machine learning databases demonstrate the feasibility and superiority of the proposed GREETING method. The GREETING approach is also applied to fuzzy decision tree generation. It is shown that GREETING generates better transparent fuzzy decision trees in terms of better testing rates and comparable tree sizes.

CHEN et al.: GRADIENT-DESCENT-BASED APPROACH FOR TRANSPARENT LINGUISTIC INTERFACE GENERATION

The proposed GREETING method can be widely used in places where the linguistic terms or the fuzzy sets in linguistic interface are generated from historical data and the transparency (or interpretability) of the linguistic interface is desired. R EFERENCES [1] L. A. Zadeh, “Fuzzy logic equals computing with words,” IEEE Trans. Fuzzy Syst., vol. 4, no. 2, pp. 103–111, Apr. 1996. [2] W. Pedrycz, Granular Computing: An Emerging Paradigm. New York: Springer-Verlag, 2001. [3] F.-Y. Wang, “On the abstraction of conventional dynamic systems: From numerical analysis to linguistic analysis,” Inf. Sci., vol. 171, no. 1–3, pp. 233–259, Mar. 2005. [4] Y. F. Yuan and M. J. Shaw, “Induction of fuzzy decision tree,” Fuzzy Sets Syst., vol. 69, no. 2, pp. 125–139, Jan. 1995. [5] X.-Z. Wang, E. C. C. Tsang, and D. S. Yeung, “A comparative study on heuristic algorithms for generating fuzzy decision trees,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 31, no. 2, pp. 215–226, Apr. 2001. [6] X.-Z. Wang, J.-H. Zhai, and S.-X. Lu, “Induction of multiple fuzzy decision trees based on rough set technique,” Inf. Sci., vol. 178, no. 16, pp. 3188–3202, Aug. 2008. [7] W. Pedrycz, “Logic-driven fuzzy modeling with fuzzy multiplexers,” Eng. Appl. Artif. Intell., vol. 17, no. 4, pp. 383–391, Jun. 2004. [8] X.-Z. Wang and C. R. Dong, “Improving generalization of fuzzy IF–THEN rules by maximizing fuzzy entropy,” IEEE Trans. Fuzzy Syst., vol. 17, no. 3, pp. 556–567, Jun. 2009. [9] W. Pedrycz, “Logic-oriented fuzzy neural networks,” Int. J. Hybrid Intell. Syst., vol. 1, no. 1/2, pp. 3–11, Apr. 2004. [10] W. Pedrycz and F. Gomide, An Introduction to Fuzzy Sets: Analysis and Design. Boston, MA: MIT Press, 1998. [11] S. Medasani, J. Kim, and R. Krishnapuram, “An overview of membership function generation techniques for pattern recognition,” Int. J. Approx. Reason., vol. 19, no. 3/4, pp. 391–417, Oct./Nov. 1998. [12] P. Guo, C. L. P. Chen, and M. R. Lyu, “Cluster number selection for a small set of samples using the Bayesian Ying-Yang model,” IEEE Trans. Neural Netw., vol. 13, no. 3, pp. 757–763, May 2002. [13] S.-K. Oh, W. Pedrycz, and H.-S. Park, “Hybrid identification in fuzzyneural networks,” Fuzzy Sets Syst., vol. 138, no. 2, pp. 399–426, Sep. 2003. [14] L. Chen and C. L. P. Chen, “Pre-shaped fuzzy c-means algorithm (PFCM) for transparent membership function generation,” in Proc. IEEE Int. Conf. SMC, Montreal, QC, Canada, Oct. 2007, pp. 789–794. [15] T. W. Liao, A. K. Celmins, and R. J. Hammell, II, “A fuzzy c-means variant for the generation of fuzzy term sets,” Fuzzy Sets Syst., vol. 135, no. 2, pp. 241–257, Apr. 2003. [16] R. Krishnapuram, “Generation of membership functions via possibilistic clustering,” in Proc. 3rd IEEE Conf. Fuzzy Syst., Orlando, FL, Jun. 26–29, 1994, pp. 902–908. [17] S. M. Zhou and J. Q. Gan, “Low-level interpretability and high-level interpretability: A unified view of data-driven interpretable fuzzy system modelling,” Fuzzy Sets Syst., vol. 159, no. 23, pp. 3091–3131, Dec. 2008. [18] C. Mencar, “Theory of fuzzy information granulation: Contributions to interpretability issues,” Ph.D. dissertation, Univ. Bari, Bari, Italy, 2005. [19] J. C. Bezdek, Pattern Recognition With Fuzzy Objective Function Algorithms. New York: Plenum, 1981. [20] C. S. Herrman, “Symbolical reasoning about numerical data: A hybrid approach,” Appl. Intell., vol. 7, no. 4, pp. 339–354, Nov. 1997. [21] W. Pedrycz and J. V. de Oliveira, “A development of fuzzy encoding and decoding through fuzzy clustering,” IEEE Trans. Instrum. Meas., vol. 57, no. 4, pp. 829–837, Apr. 2008. [22] J. V. de Oliveira, “Semantic constraints for membership function optimization,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 29, no. 1, pp. 128–138, Jan. 1999. [23] M. Dissanayake, L. Chen, W. Pedrycz, A. Fayek, and A. Russell, “Fuzzy logic modeling of causal relationships-case study: Reasoning about construction performance,” in Proc. NAFIPS Annu. Meeting North Amer. Fuzzy Inf. Process. Soc.: Fuzzy Sets Heart Can. Rockies, Banff, AB, Canada, Jun. 27–30, 2004, pp. 605–610. [24] J. C. Bezdek, “Numerical taxonomy with fuzzy sets,” J. Math. Biol., vol. 1, no. 1, pp. 57–71, May 1974. [25] X. L. L. Xie and G. Beni, “A validity measure for fuzzy clustering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 13, no. 8, pp. 841–847, Aug. 1991.

1229

[26] C. Merz and P. Murphy, UCI Repository of Machine Learning Databases, 1996. [Online]. Available: ftp://ftp.ics.uci.edu/pub/machinelearning-databases [27] K. Nozaki, H. Ishibuchi, and H. Tanaka, “A simple but powerful heuristic method for generating fuzzy rules from numerical data,” Fuzzy Sets Syst., vol. 86, no. 3, pp. 251–270, Mar. 16, 1997. [28] A. Riid and E. Rustern, “Transparent fuzzy systems and modelling with transparency protection,” in Proc. IFAC Symp. AIRTC, Kidlington, U.K., Oct. 2–4, 2000, pp. 225–30. [29] S. Guillaume, “Designing fuzzy inference systems from data: An interpretability-oriented review,” IEEE Trans. Fuzzy Syst., vol. 9, no. 3, pp. 426–443, Jun. 2001. [30] M. S. Chen and S. W. Wang, “Fuzzy clustering analysis for optimizing fuzzy membership functions,” Fuzzy Sets Syst., vol. 103, no. 2, pp. 239– 254, Apr. 16, 1999. [31] F. Hoppner, Fuzzy Cluster Analysis: Methods for Classification, Data Analysis, and Image Recognition. New York: Wiley, 1999. [32] T. A. Runkler and J. C. Bezdek, “Alternating cluster estimation: A new tool for clustering and function approximation,” IEEE Trans. Fuzzy Syst., vol. 7, no. 4, pp. 377–393, Aug. 1999. [33] S. Mitra, K. M. Konwar, and S. K. Pal, “Fuzzy decision tree, linguistic rules and fuzzy knowledge-based network: Generation and evaluation,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 32, no. 4, pp. 328– 339, Nov. 2002. [34] J. Eggermont, “Evolving fuzzy decision trees with genetic programming and clustering,” in Proc. 5th EuroGP, Berlin, Germany, Apr. 3–5, 2002, pp. 71–82. [35] L. Chen, W. Pedrycz, and C. L. P. Chen, “Computational intelligence techniques for building transparent construction performance models,” in Proc. IEEE Int. Conf. SMC, Taipei, Taiwan, Oct. 2006, pp. 1166–1171. [36] H. Ishibuchi, T. Nakashima, and M. Nii, Classification and Modeling With Linguistic Information Granules: Advanced Approaches to Linguistic Data Mining. New York: Springer-Verlag, 2005. [37] W. Pedrycz, “Why triangular membership functions,” Fuzzy Sets Syst., vol. 64, no. 1, pp. 21–30, May 25, 1994. [38] L. Rondeau, E. Levrat, and J. Bremont, “An analytical formulation of the influence of membership functions shape,” in Proc. IEEE 5th Int. Fuzzy Syst., New York, Sep. 8–11, 1996, pp. 1314–1319. [39] Y. C. Hu, R. S. Chen, and G. H. Tzeng, “Discovering fuzzy association rules using fuzzy partition methods,” Knowl.-Based Syst., vol. 16, no. 2, pp. 137–147, Apr. 2003. [40] T. A. Runkler and J. C. Bezdek, “Function approximation with polynomial membership functions and alternating cluster estimation,” Fuzzy Sets Syst., vol. 101, no. 2, pp. 207–218, Jan. 16, 1999. [41] T. P. Hong and C. Y. Lee, “Induction of fuzzy rules and membership functions from training examples,” Fuzzy Sets Syst., vol. 84, no. 1, pp. 33– 47, Nov. 1996. [42] Y. W. Teng and W. J. Wang, “Constructing a user-friendly GA-based fuzzy system directly from numerical data,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 34, no. 5, pp. 2060–2070, Oct. 2004. [43] J. Espinosa and J. Vandewalle, “Constructing fuzzy models with linguistic integrity from numerical data—AFRELI algorithm,” IEEE Trans. Fuzzy Syst., vol. 8, no. 5, pp. 591–600, Oct. 2000. [44] B.-J. Park, W. Pedrycz, and S.-K. Oh, “Identification of fuzzy models with the aid of evolutionary data granulation,” Proc. Inst. Elect. Eng.—Control Theory Appl., vol. 148, no. 5, pp. 406–418, Sep. 2001. [45] H. Verlinde, M. D. Cock, and R. Boute, “Fuzzy versus quantitative association rules: A fair data-driven comparison,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 36, no. 3, pp. 679–684, Jun. 2006. [46] W. Pedrycz, “Logic-based fuzzy neurocomputing with unineurons,” IEEE Trans. Fuzzy Syst., vol. 14, no. 6, pp. 860–873, Dec. 2006. [47] L. Chen and C. L. P. Chen, “Transparent linguistic interface generation and its application in fuzzy decision trees,” in Proc. IEEE Int. Conf. SMC, Singapore, Oct. 2008, pp. 1337–1342. [48] W. Pedrycz, M. Reformat, and K. Li, “OR/AND neurons and the development of interpretable logic models,” IEEE Trans. Neural Netw., vol. 17, no. 3, pp. 636–658, May 2006. [49] S. M. Zhou and J. Q. Gan, “Extracting Takagi–Sugeno fuzzy rules with interpretable submodels via regularization of linguistic modifiers,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 8, pp. 1191–1204, Aug. 2009. [50] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty, Nonlinear Programming: Theory and Algorithms., 2nd ed. New York: Wiley, 1993. [51] P. Dadone and H. F. VanLandingham, “On the non-differentiability of fuzzy logic systems,” in Proc. IEEE Int. Conf. SMC, Nashville, TN, Oct. 2000, pp. 2703–2708.

1230

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 40, NO. 5, OCTOBER 2010

[52] H. Nomura, I. Hayashi, and N. Wakami, “A learning method of fuzzy inference rules by descent method,” in Proc. IEEE Int. Conf. Fuzzy Syst. FUZZ-IEEE, San Diego, CA, Mar. 8–12, 1992, pp. 203–210. [53] P. S. Vishnupad and Y. C. Shin, “Adaptive tuning of fuzzy membership functions for non-linear optimization using gradient descent method,” J. Intell. Fuzzy Syst., vol. 7, no. 1, pp. 13–25, Jan. 1999. [54] Y. Shi and M. Mizumoto, “Some considerations on conventional neurofuzzy learning algorithms by gradient descent method,” Fuzzy Sets Syst., vol. 112, no. 1, pp. 51–63, May 2000. [55] X. Zhang, C. C. Hang, S. Tan, and P. Z. Wang, “The min–max function differentiation and training of fuzzy neural networks,” IEEE Trans. Neural Netw., vol. 7, no. 5, pp. 1139–1150, Sep. 1996. [56] N. J. Redding and T. Downs, “Learning in feedforward networks with nonsmooth functions: I∞ example,” in Proc. IEEE IJCNN, Singapore, Nov. 18–21, 1991, pp. 947–952. [57] X. D. Liu and W. Pedrycz, “The development of fuzzy decision trees in the framework of axiomatic fuzzy set logic,” Appl. Soft Comput., vol. 7, no. 1, pp. 325–342, Jan. 2007. [58] G. Castellano, A. M. Fanelli, and C. Mencar, “A compact Gaussian representation of fuzzy information granules,” in Proc. Joint 1st Int. Conf. SCIS, 3rd ISIS, Tsukuba, Japan, 2002, pp. 21–25.

Long Chen received the B.S. degree in information sciences from Peking University, Beijing, China, in 2000, the M.S.E. degree from Institute of Automation, Chinese Academy of Sciences, Beijing, in 2003, and the M.S. degree in computer engineering from the University of Alberta, Edmonton, AB, Canada, in 2005. He is currently working toward the Ph.D. degree in the Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio. His current research interests include computational intelligence, Bayesian methods, and other machine learning techniques and their applications. Mr. Chen has been working in publication matters for many IEEE conferences and is the Publications Cochair of the IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2009.

C. L. Philip Chen (S’88–M’88–SM’94–F’07) received the M.S. degree from the University of Michigan, Ann Arbor, in 1985 and the Ph.D. degree from Purdue University, West Lafayette, IN, in 1988. He is currently a Professor and the Chair of the Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, where he has been the Associate Dean for Research and Graduate Studies of the College of Engineering. He was a Visiting Research Scientist with the Materials Directorate, U.S. Air Force Wright Laboratory, OH. He was also a Senior Research Fellow sponsored by the U.S. National Research Council and a Research Faculty Fellow with the National Aeronautics and Space Administration Glenn Research Center for several years. His current research interests include theoretic development in computational intelligence, intelligent systems, robotics and manufacturing automation, networking, diagnosis and prognosis, and life prediction and life-extending control. Dr. Chen has been involved in IEEE professional service for 20 years. He is the Vice President on Conferences and Meetings of the IEEE Systems, Man and Cybernetics Society (SMCS), 2010–2011. He has been the Vice President for Tech Activities on Systems Science and Engineering, the Treasurer of the SMCS Conference and Management Committee, a founding Cochair of three IEEE SMCS Technical Committees, an Associate Editor of IEEE SMC-C and the IEEE S YSTEMS J OURNAL, and the General Chair of the IEEE International Conference on SMC 2009. He was the General Cochair of the IEEE 2007 Secure System Integration and Reliability conference, the Program Cochair of the International Conference on Machine Learning and Cybernetics 2008, the Program Chair of IEEE SMC-SoSE 2006, and the Conference Cochair of the International Conference on Artificial Neural Networks in Engineering in 1995 and 1996. He received an Outstanding Contribution Award from IEEE SMCS in 2008. He is a member of the Tau Beta Pi and Eta Kappa Nu honorary societies. He is the founding faculty advisor of an IEEE Computer Society Student Chapter and the faculty advisor of the Tau Beta Pi engineering honor society in his university.

Witold Pedrycz (M’88–SM’90–F’99) received the M.Sc., Ph.D., and D.Sci. degrees from the Silesian University of Technology, Gliwice, Poland. He is currently a Professor and the Canada Research Chair in Computational Intelligence with the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada. He is also with the Systems Research Institute of the Polish Academy of Sciences, Warsaw, Poland. He is the author of 13 research monographs covering various aspects of computational intelligence and software engineering. He is the Editor-in-Chief of Information Sciences. He has edited a number of volumes; the most recent one is Handbook of Granular Computing (J. Wiley, 2008). His research interests include computational intelligence, fuzzy modeling and granular computing, knowledge discovery and data mining, fuzzy control, pattern recognition, knowledge-based neural networks, relational computing, and software engineering. He has published numerous papers in this area. Dr. Pedrycz has been a member of numerous program committees of IEEE conferences in the area of fuzzy sets and neurocomputing. He is intensively involved in editorial activities. He serves as the Editor-in-Chief of IEEE T RANSACTIONS ON S YSTEMS , M AN , AND C YBERNETICS —PART A. He currently serves as an Associate Editor of IEEE T RANSACTIONS ON F UZZY S YSTEMS and a number of other international journals. In 2007, he received a prestigious Norbert Wiener Award from the IEEE Systems, Man, and Cybernetics Society. He was a recipient of the IEEE Canada Computer Engineering Medal in 2008. In 2009, he received a Cajastur Prize for Soft Computing from the European Centre for Soft Computing for “pioneering and multifaceted contributions to granular computing.”