Evolving Fuzzy Rule-Based Classifier Based on ... - Semantic Scholar

4 downloads 4015 Views 253KB Size Report
School of Engineering and Information Technology. The University of New ... Abstract—this paper presents a novel evolving fuzzy rule-based classifier stemming ...
Evolving Fuzzy Rule-Based Classifier Based on GENEFIS Mahardhika Pratama Sreenatha.G.Anavatti School of Engineering and Information Technology The University of New South Wales Canberra, ACT-2600, Australia email: [email protected], [email protected] Abstract—this paper presents a novel evolving fuzzy rule-based classifier stemming from our recently developed algorithm for regression problem termed generic evolving neuro-fuzzy system (GENEFIS). On the one hand, the novel classifier namely GENEFIS-class is composed of two different architectures specifically zero and first orders which are dependent on the type of consequent used. On the other hand, GENEFIS-class refurbishes GENEFIS algorithm as the main learning engine to conform classification requirement. The interesting property of GENEFIS is its fully flexible rule base and its computationally efficient algorithm. GENEFIS can initiate its learning process from scratch with an empty rule base and highly narrow expert knowledge. The fuzzy rules are then flourished based on the novelty of streaming data via their statistical contribution. Conversely, the fuzzy rules, which contribute little during their lifespan, can be pruned by virtue of their contributions up to the end of training process. Meanwhile, the fuzzy rules and fuzzy sets, which are redundant, can be merged to purpose a transparent rule base. Online feature selection process coupled during the training process can be undertaken to cope with possible combinatorial rule explosion drawback. All of these are fruitful to grant significant reduction of rule base load while retaining the classification accuracy which is in line with online real-time necessity. The efficacy of GENEFIS-class was numerically validated exploiting real world and synthetic problems and compared with state-of-the-art algorithms where it generally speaking outperforms other algorithms in terms of classification performance and rule-base complexity. Keywords—GENEFIS, Evolving Fuzzy Classifier, Fuzzy System, Neural Network.

I.

INTRODUCTION

Classification where the main task is to subsume data to particular groups/classes is a basic foundation in many datamining applications for instance: fault diagnosis [1], pattern recognition [2], image processing [3]. The data grouped in the same class usually share the typical characteristics whereas the inter-class data are usually distinct. The development of classification field has undergone a rapid progress signified by many variants of classifiers involving neural network (NN) [4] and later on support vector machine (SVM) [5]. All of them adopt a black-box approach which isn’t interpretable to nonexpert users. One may comprehend that, in most industrial applications, behind-the scene process plays a precarious role to allow the user to grasp the rationale of decision being made. Fuzzy rule-based (FRB) system [6] is a powerful impetus to the progress of computational intelligence research as it is capable of realizing approximate reasoning trait to cope with imprecision and uncertainty in decision making process [7]. The third author acknowledges financial support of the Austrian fund for promoting scientific research (FWF, contract number I328-N23, acronym IREFS), and the research programme at the Austrian Center of Competence in Mechatronics (ACCM), which is a part of the COMET K2 program of the Austrian government.

Edwin Lughofer Department of Knowledge-Based Mathematical Systems Johannes Kepler University Linz, A-4040, Austria email: [email protected] Furthermore, its working principle is described as a set of human-like linguistic rules which is tractable to the users. The major drawback of traditional fuzzy system is its static rule-base which is impossible to be adapted once its initial setting. This bottleneck has led the researchers to propose a more advanced technique to learning and construction of fuzzy rules. [8-11] have been put forward by some researchers using Genetic Algorithm (GA), or simple heuristic method as solutions of inflexible rule base in the conventional fuzzy system. Nevertheless, these algorithms engage time-consuming or offline learning phase usually entailing considerable computational resources. One may envisage that, in the contemporary trend of industry, vast amounts of data stream from the sensor over time in rapid rate overtime. All of them have to be quickly processed to warrant a process safety. It will be inefficient if we utilize the offline learning algorithm as they invoke a complete training dataset at hand before the training process is executed. If the data uncovered in the previous training cycle are captured, a retraining phase from scratch using an up-to-date dataset ought to be carried out. The retraining phase can evoke system memory overflow as the number of data may exponentially grow to massive amounts. To remedy this issue, evolving fuzzy systems (EFS) are an immense breakthrough [12]. A principle cornerstone of EFS is an open structure where the rule base is adaptable and evolvable on requests according to underlying characteristics of training samples. It can initiate its learning process from an empty rule base and the fuzzy rules are proliferated afterward from streaming data based on a particular measure of data quality (potential [12], distance [13,14], etc). Another salient aspect is that EFS only exploits a small snapshot of overall training patterns or, in ideal case, the newest datum while discarding other data (also called single pass learning). Automatic knowledge building in EFS also circumvents a catastrophic forgetting of previous valid knowledge thus allowing a more interpretable rule base as the rule base guides to an abstract of data history and so is fruitful to encounter nonstationary streaming data. EFS benefits from a minimal expert knowledge in its design phase, it can be perceived as one step towards plug and play algorithm. Most of the research works in EFS are dominated to regression cases [15-17], nonetheless, several works focusing in the classification case have been published. [18] explores architectures of evolving fuzzy classifier namely single model (SM) and multi-model (MM) architectures whereas eTS [12] and FLEXFIS [14] are employed as base learners. [19] amalgamates a feature weighting strategy in FLEXFIS-class MM in order to perform a soft dimensionality reduction

leverage. A new architecture of eClass 1 was outlined by [20]. The prominent construct is to regress over the features. More specifically, the consequent part maps the degree of confidence of one rule to class labels. Simp_eClass was proposed by Baruah et al [21], it adopts the same notion of [20], it however utilizes simp_eTS+ [22] algorithm asserting the concept of density increment in lieu of data potential. An evolving fuzzy classifier emanating from evolving multivariate Gaussian function was proposed in [23]. eMG itself stems from the construct of participatory learning [24]. This classifier utilizes a classical classifier architecture where the consequent consists of class labels and it is applied to the fault-diagnosis problem. [25] devises a technique in order to obviate conflict and ignorance bottlenecks in the conventional evolving fuzzy classifier. All-pairs evolving fuzzy classifiers have been proposed in [26]. This architecture differs itself from classical evolving classifier architecture as it transforms the multi-class classification problem into several binary classification subproblems based on a likelihood of all-pairs of class labels. The confidence degree of all pairs of class labels is stored in the preference relation matrix where, based on which, the classification task is settled. Evolving fuzzy classier assembled based on GENEFIS algorithm [27-29] namely GENEFIS-class is developed in this paper. GENEFIS adopts a generalized form of Takagi Sugeno Kang (TSK) fuzzy system where the premise part is composed of multivariate Gaussian function. This type of fuzzy system triggers ellipsoidal clusters in arbitrary position whose axes are not necessarily parallel to the input feature axes. It features a more attractive representation of the cluster in the feature space with hindsight of uni-variate Gaussian kernel or Gaussian function with diagonal covariance matrix generating axesparallel ellipsoids. This approach has been explored in [23], the light in our approach, is however, capable of yielding a classical-interpretable rule. That is, it also models local correlation between input variables or the center and radii of Gaussian fuzzy set can be extracted from multidimensional Gaussian kernels. GENEFIS-class features a holistic concept of evolving system where its rule base is fully dynamic and evolving. The fuzzy rule growing and adaptation are orchestrated by the socalled datum significance (DS) method and generalized adaptive resonance theory+ (GART+). Superfluous fuzzy rules, which are either obsolete or inactive, can be observed over time via extended rule significance (ERS) technique. Redundant fuzzy rules or fuzzy sets are quested by kernel-based metric approach and are in turn merged in underpinning the intelligible rule semantics. Fuzzily Weighted Generalized Recursive Least Square (FWGRLS) method coupling the weight decay term to confine the weight vectors being updated in the small range is consolidated. Online feature selection can be performed in the training process benefiting the so-called input significance (IS) method. More importantly, these learning modules comply the online life-long learning landscape outperforming the one-shot training which is computationally prohibitive. GENEFIS-class is built on two different architectures by virtue of consequent types. The remainder of this paper is organized as follows: Section 2 discloses two different architectures explored by GENEFISclass, Section 3 elaborates algorithmic development of

GENEFIS, Section 4 outlines the empirical study on the numerous artificial and real-life datasets, last section deduces this paper. ARCHITECTURE OF FUZZY RULE-BASED CLASSIFIER

II.

A. GENEFIS-class0 This architecture [17] is a classical architecture of the fuzzy classifier whose consequent is defined by a class label. It directly maps the data samples directly to the class label. It benefits from the concept of winner takes all principle where the winner is chosen as the most matching rule to the training sample frequently computed via its membership or compatibility degree. The membership degree is elicited by the multidimensional Gaussian function as follow:

ϕ i = exp(−( X t − C i )Σ i −1 ( X t − Ci ) T ) where

[

t

t

t

(1)

]

X t = x1 , x 2 ,..., x p ∈ ℜ1× p is an input vector of

interest at t-training episode and C i ∈ ℜ1× p is a center of Gaussian function. Meanwhile, ∑ i −1 ∈ ℜ p× p is an inverse covariance matrix whose elements represent the spreads of Gaussian function in every direction or dimension σ ij and i = [1, C ] . The notations P and C constitute the numbers of input dimension and fuzzy rule. An ubiquitous technique to point out the winning rule is by means of the membership grade. This method may convey a misleading forecasting especially when the distance of the training sample is almost tantamount. A winning rule choice based on Bayesian concept is deemed more sensible as it entangles the priori probability taking into account the population of the cluster. A posterior probability of the i-th category can be portrayed as follows:  pˆ ( X ϕ i ) Pˆ (ϕ i ) (2) P(ϕ i X ) = C ˆ pˆ ( X ϕ ) P(ϕ )



i

i

i =1

where i is the number of existing category, whereas pˆ ( X ϕ i ) and Pˆ (ϕ i ) define the likelihood and the prior

probability respectively. Furthermore, the likelihood and the prior probability can be represented as follows: exp(−( X t − C i ) ∑ i −1 ( X t − C i ) T ) Pˆ ( X ϕ i ) = P 1 (2π ) 2 Vi 2 Pˆ (ϕ i ) =

Ni

(4)

C

∑N

(3)

i

i =1

where N i labels the number of time that i-th fuzzy rule or category wins the competition. Meanwhile, Vi calculates the estimated hyper-volume of feature space covered by i-th category or fuzzy rule which can be defined as follows: Vi = det(∑ i ) (5)

In summary, if the category or rule complies with win = arg max( Pˆ (ϕ i X )) , then this category is the candidate to undergo a resonance. The classifier output isn’t signified directly from the consequent of the winning rule. However, we infer it based on maximum relative proportions of all samples occupying the winning cluster falling to l-th class as follow: L = max(wil )

(6)

1≤l ≤ k

where wil , k denote the relative proportion, the number of class and is then defined as follow: nil ni

wil =

(7)

where nil and n i counts the number of samples forming i-th cluster falling to l-th class and labels the number of population of i-th cluster. This strategy is plausible to the fact that the cluster not only accommodates samples belonging to one class but also is contaminated by data points of several classes. B. GENEFIS-class1 This type is subsumed as regression-based classifier. The consequent part represents a mapping from the input features to the class labels [21]. This rationale leads to a better classification accuracy as the classifier resorts to predict the classification surface instead of direct class label in the zero order classifier. The classifier output is inferred in the similar way as in the regression case where the main pillar is first order polynomial consequents delineating hyperplanes snuggling along the real trend of the approximation curve. That is, every fuzzy region produced by non-linear mapping (2) is incorporated with a local linear subsystem as follows: C

C

yˆ l t =



ϕi yil

i =1 C

∑ϕ i =1

= i

∑ (X

t

− Ci )T Σi −1 ( X t − Ci ) yil

i =1 C

∑ (X

t

T

(8)

−1

− Ci ) Σi ( X t − Ci )

i =1

where y il is the local sub-model of i-th rule to l-th class

y il = a 0l i + a1l i x1 + .... + a pl i x p and y i = [ y i1 , y i 2 ,..., y ik ] is the i-th local sub-system or it can be mathematically written as follow ⎡a 01 i a 02 i ...a 0 k i ⎤ ⎢ ⎥ ⎢a11 i a12 i ...a1k i ⎥ y i = x in a i , x in = [1, x1 ,.., x p ], a i = ⎢ ⎥ ⎢...................... ⎥ ⎢ i i i⎥ ⎣a n1 a n 2 ...a nk ⎦

(9)

This implies that Multi-Input-Multi-Output (MIMO) structure is used herein. In what follows, the true class label values should be converted in the range of [0,1]. For example: if the binary classification is undertaken and the first class is the true class label, it can be expressed as Tt = [1,0] . If the class dimension is 3, and the sample falls to the second class, the

class label can be modified as Tt = [0,1,0] . The merit of this mechanism over the direct regression concept to the class label (using Multi-Input-Single-Input (MISO) system to predict the class label) notably for multi-class classification problem. The direct regression strategy intrinsically fits the target value in a smooth way, it is impractical to be exploited in the case of abruptly changing class label. If the true class label is three, the approximation surface will lie in the transition between the previous class label to the new one (class 1- class3). A mismatch of class prediction will be as a reciprocal impact imposed and, in this case, class 2 will be granted as the classification result. This classifier structure can be expanded to K-MISO classifiers, so that every class label is handled by a standalone classifier. The final output is endued by a classifier with the highest output. This classifier type is dubbed as one-versus-rest classification strategy. This classifier structure is deemed efficacious to deliver a better classification rate, it, nonetheless, consumes costlier computations and rule base burden. The rationale is the total rule base burden is obtained by a summation of rule base burdens of all classifiers whereas, in the viewpoint of computation, it evokes the same situation. III.

GENEFIS-CLASS BASED ON GENEFIS ALGORITHM

GENEFIS algorithm has its root in [27,28], it was enhanced from the so-called parsimonious network based on fuzzy inference system (PANFIS) whose algorithmic backbone anchors in [29]. A brief summary of GENEFIS algorithm is elaborated in the sequel as follows: •

Rule base proliferation and adaptation: The crux of rule growing in the GENEFIS is steered by datum significance (DS) concept which is intended to oversee the blueprint of the statistical contribution of training data for a possible input space partition with fuzzy rule recruitment. Generalized adaptive resonance theory+ (GART+) is another criterion to expand the rule base size in addition to DS method. GART+ is solicited to confine the fuzzy region sizes so as to eradicate cluster delamination effect [30]. The DS method is mathematically epitomized as follows: det(Σ j +1 ) p (10) Dt = C +1 (

∑ det(Σ ) i

p

)

i =1

If Dt ≥ ρ1 where ρ1 is a predefined threshold which is designated in the range [0.1,0.01], the datum is a candidate of a new rule. On the one hand, the GART+ concentrates a resonance when the datum is in close proximity to the rule base as follow ρ 2 ≥ ϕ winner (11) where ρ 2 is a predefined constant which is set based on a critical value of of freedom and 2

ρ 2 = exp(− χ p (α )) .

χ 2 distribution with P degrees a

significance

level

α

On the other hand, the winning cluster should be confirmed to be limited in size, its volume should be examined as depicted in the following condition. If

C

V winner ≤ V max , V max = ρ 3 ∑ Vi

where

i =1

as the consequent of the winning rule and the covariance matrix is allocated as a diagonal matrix whose main diagonal is a positive constant: y C +1 = y winner (20)

ρ 3 is

(21) PC +1 = ϖI where ϖ is a positive constant usually allocated as ϖ = 750 (3) D n ≥ ρ1 , V winner > V max : the cluster can well capture the datum and size of the winning cluster is deemed oversized. This datum may influence a redundancy if it is tailored as an extraneous rule. This cluster may also bear the cluster delamination effect. To this end, we adopt the concept of cluster replacement and diminish the size of the cluster as follow: Cwin = X t (22)

allowable percentage to the volume of winning cluster allocated a priori and we assign ρ 3 = 10% , the resonance is taken place. This vigilance test is undertaken by virtue of these conditions to strike our dual goals. Referring to these two conditions, there occur four possible circumstances in the training process as follow: (1) D n ≥ ρ1 , V winner ≤ V max : this condition ascertains that the winning cluster is close to the injected datum and its volume is limited in size inferring the cluster which fully covers the datum. A resonance should be carried out to this datum as follow:

C win new = ∑ win (new) −1 =

N win N win

old

old

+1

C win old +

( X − C win N win

old

old

)

+1

∑ win (old ) −1 α + 1−α 1−α

(∑ win (old ) −1 ( X − C win new ))(∑ win (old ) −1 ( X − C win new )) T

∑ win (new) −1 = 1

∑ (old ) −1 (23) k win win where k win is a priori fixed constant where we choose k win = 0.98 . (4) D n < ρ1 , V winner < V max : the datum is uncovered by the rule base, however, its size is still manageable. A resonance with the use of equation (12-15) is executed to trigger a better coverage of the winning cluster. In the case of GENEFIS-class1, the rule growing and adaptation modules are executed to data which are potential to stimulate a better predictive performance. The concept of rate of change in the global mean error is required, this component is not only noteworthy to expedite the training process but also pivotal to avoid an over-fitting problem. The rate of change is derived from the mean and variance of global error shown as follows: t −1 1 et = et −1 + et (24) t t t −1 1 σt2 = σ t −1 2 + (et − et −1 ) (25) t t where et , et , σ t are mean of global error, global error and variance of global error. If

(12)

(13)

1 + α ( X − C win old ) ∑ win (old ) −1 ( X − C win old ) T N win new = N win old + 1 (14) n winl = n winl + 1, n win = n win + 1 (15) It is worth-noting that the adaptation benefiting equation (13) is appealing as the re-inversion isn’t necessitated when the adaptation is taken place navigating to a prompter model update. (2) D n < ρ1 , V winner > V max : this condition means the datum situated in the region which isn’t yet traversed. The extraneous fuzzy rule should be crafted to capture the future data points nearby this region as follows: C C +1 = X t (16) diag (∑ C +1 ) =

max((Ci − Ci −1 ), (C i −Ci +1 )) 1 ln(ε )

(17)

Equation (17) can be easily proven to satisfy ε completeness criterion, we, however, omit the mathematical proof to save the space. If GENEFIS-class0 is explored, the consequent parameter indicates the class label and is allocated as the true class label of training datum as follows: y C +1 = Tt (18) n C +1l = 1, nC +1 = 1 (19) Meanwhile, if GENEFIS-class1 is demanded, the newly created consequent parameter is designed



et + σ t 2 − (et −1 + σ t −1 2 ) > 0 , the DS criterion and GART+ are switched on, they are otherwise deactivated and GENEFIS will perform the next learning scenarios. Rule base simplification: Rule pruning strategy is one of paramount facet in the EFS to assure the rule base compactness. Rule pruning technology of the socalled extended rule significance concept (ERS) is mounted in the GENEFIS where the landmark is an approximation of statistical contribution of fuzzy rule. If we apply GENEFIS-class1, statistical

contribution of the fuzzy rule can be defined as follow: p +1

∑ yi, j l K

j =1

Einf (i ) = ∑

det( ∑i ) p C

∑ det(∑i ) p i =1

(26)

K

l =1

If E inf (i ) ≤ k err , where k err is a preset constant and



we usually assign k err = 10% ρ1 , the selected rules can be evicted outright without significant loss of classification accuracy. Apart from that, we can modify the equation (26), if we employ GENEFISclass0 engaging the relative proportion of the l-th class in lieu of a l-th local-sub model as follow det(∑ i ) p wil C ∑ det(∑ i ) p K i =1 (27) E inf (i ) = ∑ K l =1 Input pruning: the massive input features can complex the system being classified as they can affect a curse of dimensionality bottleneck in EFS. This drawback retards the training scenario, albeit the EFS in essence doesn’t partition the input space with the grid partitioning method which obviously owns this technical flaw. Online feature selection mechanism should be integrated in the GENEFIS in order to cope with this drawback where it is devised based on Extended Rule Significance (ERS) method as the rule pruning module and we gear it to probe the sensitivity of input features as follow:

weight decay term in the cost function of FWRLS method. This strategy has been proposed in [31], this, nevertheless, is scrutinized in a global learning framework, whereas our point of departure is local learning which mostly yields a better interpretability of resultant rule base as the consequent can be depicted as hyper-plane snuggling along the real trend of approximation curve. We finally arrive at the following mathematical expressions: 1 K (n) = Pi (n − 1) x in ( + x in Pi (n − 1) x in ) −1 (29) Λ i ( n) Pi (n) = Pi (n − 1) − K (n) xin Pi (n − 1) y i (n) = y i (n − 1) − αPi (n)∇ψ ( y i (n − 1)) + K (n)(t (n) − x in y i (n))

C

I IS =

det(Σ h ) p

∑ det(Σ h )

* p

j =1

i =1 p C

(28)

∑ ∑ η ji j =1 i =1

where η ji is composed of the h-th input parameters

[

]

in every rule η ji = a j1 , a j 2 ,...., a jc and Σ h is a matrix containing elements σ covariance matrix Σ i .

j

from non-diagonal

If I j ≤ kin , where k in is a −k



predefined threshold and is fixed as k in = 10 2 , the input attributes can be dispossessed without detrimental effect of classification accuracy so as to mitigate the rule base complexity intended to land on a frugal rule base. Note, this method is distinguishable with the input pruning method in [38] as our method appends the first term of equation (25). Consequent parameters adaptation: Consequent parameters are often adjusted by Fuzzily Weighted Recursive Least Square (FWRLS) method in the EFS. The weight decay effect, which plays a substantial role to boost the generalization ability, even so lessens overtime. We, hence, place the

(31)

where K(n) is the Kalman gain and Pi (n ) is the covariance matrix whereas x en is an extended input

∑ η ji

p

(30)



vector xin = [ x1n , x2 n ,..., xk n ]T . Meanwhile, y i (n ) is the local sub-system of the i-th rule and we prescribe the regularization parameter α as a small value nearby zero α ≈ 10 −15 . On the one side, we may envisage that the update of each rule is separately affected in the rubric of the local learning, hence, the covariance matrix is different to each rule Pi (n) ∈ ℜ ( k +1)×( k +1) . A peculiar setting of covariance matrix isn’t entailed and they can be truncated immediately if the rule pruning strategy is activated. On the other side, we exploit the most popular weight decay function which is the quadratic type written as follows: 1 ψ ( yi (n − 1)) = ( yi (n − 1)) 2 (32) 2 We can obtain its gradient as follows: ∇ψ ( yi (n − 1)) = yi (n − 1) (33) The use of this function stimulates the weight being adapted to decrease to a factor proportional to its current value. Therefore, the magnitude of the output parameters hover around the small value thus intensifying the generalization ability. Suppressing the redundancy of fuzzy sets and fuzzy rules The redundancy can be apparent in the fuzzy sets and fuzzy rules levels. In online situation, the data can dwell anywhere in the input space, this can inflict a redundancy if they fill the gap between two clusters. The projection concept can as well lead significantly overlapping fuzzy sets in conjunction with a redundancy aspect. The fuzzy rules redundancy, in essence, doesn’t coincide with the spirit of fuzzy system emulating human-like linguistic rules. GENEFIS solicits the aid of kernel-based metric method [32] as a rule-set merging constituent. This method offers a fast similarity measure between two Gaussian kernels in comparison with other techniques. The level of interaction between two

Gaussian functions can be yielded by the following equation. −C

−C

−σ

−σ

Sker r = e i , j i , j i , j i , j (34) If S ker r ≥ ε , where a constant ε is set a priori and its default value is 0.8, the fuzzy set is a subject of fuzzy rule merging mechanism thus obviating a redundancy effect. The merging of two fuzzy sets can be carried out in retrospect of α -cuts of two overlapping fuzzy sets. The α itself is prescribed as α = 0.6 to arrive on a higher coverage span as follows: cnew = (max(U ) + min(U )) / 2

(35)

σ new = (max(U ) − min(U )) / 2

(36)

where U = {c A ± σ A , cB ± σ B } . As the ellipsoidal cluster in arbitrary directions is portrayed by the premise part of GENEFIS’s rule, the fuzzy set representation should be formed. The key is to extract the fuzzy set radii from non-diagonal covariance matrix as the center of fuzzy set can be acquired directly from multivariate Gaussian kernel. This can be elicited with the distance from the center to axisparallel intersection with ellipsoids. Meanwhile, the center of the ellipsoid can be regarded as the center of the fuzzy set. All of which are written as follows: υ i = ci (37)

σi =

r

(38)

Σ ii

where Σ ii is the diagonal elements of the inverse covariance matrix. This method can prevail in the online learning scenario rather than its counterpart [27-29] as it accentuates the rapid computation. Similarity of fuzzy sets are encapsulated, all of them later on lay out the similarity of fuzzy rules with the aid of min operator as follow: Srule (i, j ) = min j =1,..., p ( Sker r ( Aij , Bhj )) (39) If S rule > ε1 , where we specify ε 1 the same as ε , two fuzzy rules are coalesced to foster the rule base transparency. With hindsight of antecedent parameters adaptation, two fuzzy rules are fused as follows: C old N dom old + Ci +1 N dom+1 C dom new = dom (40) N dom old + N dom +1old Σ dom −1 =

Σ dom −1 α (Σ dom −1 (C dom old − C dom new )) + 1−α (1 − α ) Σ dom −1 (C dom old − C dom new ) T )

(1 + α (C dom old − C dom +1old )Σ dom −1 (C dom old − C dom new ) T )

(41) N dom

new

= N dom

old

+

N dom+1old

(42)

where dom is a rule which is abided by more data samples than dom+1 N dom > N dom +1 . Contradictory degree of the fuzzy rules can be measured with the similarity degree of input and output parameters. This is fruitful to manifest an ideal state of consequent merging. The similarity of output parameters can be obtained by the angle between them as the consequent parameters can be obviously regarded as the hyper-planes snuggling into the real trend of classification surface. All of which are denoted as follows [32]: ≤ S out ⎧1, S δ = ⎨ rule (43) ⎩0, S rule > S out

φ = arccos(

wi T wi +1 ) wi wi +1

(44)

where φ is in the range of [0, π ] whereas wi = [a1i , a2i ,..., ak ,i ] and wi +1 = a1,i +1 , a2,i +1 ,..., ak ,i +1 . According to [32], the similarity between two hyperplanes is defined as follows: ⎧ 2 ⎡ π⎤ ⎪1 − φ ,φ ∈ ⎢0, ⎥ π ⎪ ⎣ 2⎦ S out ( yi , yi +1 ) = ⎨ (45) ⎪ 2 (φ − π ),φ ∈ ⎡ π , π ⎤ ⎢2 ⎥ ⎪⎩ π 2 ⎣ ⎦ The merging strategy of the consequent parameters isn’t necessarily executed whenever two fuzzy rules are identical. They are merged if the fuzzy rules are deemed contradictory. Inspired by the construct of Yager in participatory learning [33], the merging of consequents is denoted as follows:

[

]

wdom new = wdom old + γδ ( wdom old − wdom +1old ) (46)

γ=

N dom old

(47) N dom old + N dom +1old The rule merging mechanism can retard the training process, if they are undertaken in every training episode. We switch on this leverage whenever the input parameters are adjusted as it is a key constituent of rule redundancy. datasets iris wine thyroid hyperplane

table 1. dataset description number of attributes number of classes 4 3 13 3 21 3 4 2 IV. EXPERIMENTATION

number of samples 150 178 7200 1.2 Million

A. Experimental Setup GENEFIS-class is numerically validated benefiting three small to medium scale datasets emanating from UCI machine learning repository [34] and one high scale synthetic dataset empowering hyperplane dataset originating from Massive Online Analysis (MOA) [35]. We encourage the reader to perceive [29] to more comprehensive information of this dataset. The datasets namely iris, wine and thyroid are wellknown as benchmark problems thus allowing for convenient

comparisons with state of the art algorithms and affirms reallife problems, albeit the datasets characterizes offline nature. Hyperplane dataset, meanwhile, emphasizes a masive size dataset containing 1.2 million data thus representing the socalled digital obesity phenomenon of real world problems. This dataset, apart from that, features regime drifting property which is challenging and uneasy to extract its underlying learning context into a particular rule base. Dataset characteristics are encapsulated in Table 1. table 2. consolidated results of benchmarked system in three datasets algorithm iris wine thyroid average classification 0.89 0.88 0.889 0.886 GENEFISrate class0 std 0.026 0.0787 0.065 0.057 rule 4.3 2.2 1.8 2.77 time 0.55 0.101 7.01 2.55 classification 0.953 0.911 0.9363 0.933 GENEFISrate class1 std 0.0653 0.0835 0.0041 0.05 rule 1.9 2 2 1.97 time 0.0794 0.1246 7.2301 2.48 classification 0.983 0.9444 0.9414 0.956 GENEFISrate classMM std 0.0628 0.0642 0.0064 0.045 rule 3.8-3.813-1320.812.53 3.8 13 20.8-20.8 time 0.228 0.3027 19.53 classification 0.88 0.87 0.87 0.873 eClass0 rate std 0.02 0.2 0.33 0.18 rule 11 7.17 2.11 6.76 time 0.045 0.056 5.56 1.89 classification 0.8933 0.7925 0.9337 0.8731 eClass1 rate std 0.09 0.0783 0.0028 0.057 rule 3.2 3.7 6.5 4.47 time 0.0626 0.0882 23.067 7.72 classification 0.735 0.745 0.78 0.753 BARTFIS rate std 0.11 0.098 0.1708 0.13 rule 10.7 3.2 14.8 9.57 time 0.066 0.0752 6.02 2.05

On the one hand, Empirical study is undertaken with the use of cross-validation method [36] to small to medium scale datasets. Firstly, the dataset is partitioned into 10 mutually exclusive bins. Later on, the first bin is exploited as a testing dataset and the rest of data are fed as training data points. Second data bin is injected to evaluate the GENEFIS-class generalization while other data are utilized as training samples and so on. This mechanism is to point out GENEFIS-class performance against the data order dependencies. It is worthstressing that, even though, this mechanism can be categorized as the offline procedure, the training samples are converted into pseudo streaming data. The training samples are therefore discarded once learned, in other word, GENEFIS-class still adhere a single-pass learning context. A periodic hold-out process is, on the other hand, carried out to the high scale dataset where the first 12K are exploited. 10K data constitute training data while using other 2K data as testing data. The next 12K data and so on, henceforth, embark in the next periodic hold-out processes where training and testing data proportions are as with the previous periodic holdout process. This experimental procedure is in essence fruitful

to simulate the online real-time environment where the process moves forward disabling to look back the past learning cycle. Aside from that, GENEFIS-class is compared with its counterparts such as: eClass [20], BARTFIS [37]. In what follows, three different architectures are explored entangling GENEFIS-class0, GENEFIS-class1, GENEFIS-classMM. Consolidated experimental results of small to medium scale datasets are displayed in Table 2 whereas to high scale dataset is summarized by Table 3. table 3. consolidated results of benchmarked system in the last dataset. algorithm hyperplane classification rate 0.901 GENEFIS-class0 std 0.026 rule 4.4 time 3.1 classification rate 0.911 GENEFIS-class1 std 0.01 rule 3 time 3.39 classification rate 0.914 GENEFIS-classMM std 0.02 rule 3.2-3.2-3.2 time 4 classification rate 0.89 eClass0 std 0.02 rule 6.6 time 3.6 classification rate 0.912 eClass1 std 0.03 rule 3.5 time 0.0626 classification rate 0.87 BARTFIS std 0.2 rule 8.8 time 0.03

B. Experimental results Referring Table 2, GENEFIS-class using various classifier architectures can outperform its counterpart in both of classification accuracy and rule base burden. The same situation ensues in the hyperplane dataset where the best results in terms of complexity and accuracy can be achieved by GENEFIS-class in conjunction with Table 3. One may comprehend that eClass family and BARTFIS experience slightly more instantaneous training episodes than GENEFISclass as, on the one hand, they aren’t endued by rule-set merging adornment, rule pruning leverage and input pruning strategy, on the other hand, GENEFIS-class incorporates these learning components. Execution of these learning modules indeed draws extraneous computational costs. In summary, GENEFIS-class MM can produce the most encouraging result in term of classification rate as it tailors standalone classifiers to deal with every class label. This is, nonetheless, in line with the undesirable increase in computational complexity. As it resorts to predict the direct class label, classification rate of GENEFIS-class0 deteriorates. It is more appealing to prognosticate the classification surface rather than the class label, as the classification surface usually sheds a great flexibility to distinguish data points.

V.

CONCLUSION

This paper presents a novel evolving fuzzy rule-based classifier termed generic evolving neuro-fuzzy inference system classifier (GENEFIS-class) engaging three different classifier architectures. The general design standpoint is a fully open structure classifier mimicking autonomous memory development in human brain, an interpretable rule base and a plausible tradeoff between predictive accuracy and a computational effectiveness. The efficacy of GENEFIS-class has been numerically validated by means of three real-life problems and one artificial dataset where GENEFIS-class delivers more favorable empirical results than its predecessors in evolving fuzzy rule-based classifier. Application of GENEFIS in cancer genomic classification problem is a subject of future investigation. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

[12] [13] [14] [15]

[16]

[17]

J. C. Bezdek, J. Keller, R. Krishnapuram, and N. R. Pal, Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Norwell, MA: Kluwer, 1999. D. Nauck and R. Kruse, “A neuro-fuzzy method to learn fuzzy classification rules from data,” Fuzzy Sets Syst., vol. 89, pp. 277–288, 1997. L. Kuncheva, Fuzzy Classifiers. Heidelberg, Germany: Physica-Verlag, 2000. S. Haykin, Neural Networks: A Comprehensive Foundation (2nd Edition), Prentice Hall inc., Upper Saddle River, New Jersey, 1999. V. N. Vapnik, The Statistical Learning Theory. New York: SpringerVerlag, 1998. Zadeh, L. A. Soft computing and fuzzy logic. IEEE Software, vol. 11, pp. 48–56. (1994) L. I. Kuncheva, “How good are fuzzy if–then classifiers?” IEEE Transactions on Systems, Man and Cybernetics part- B: Cybernetics, vol. 30(4), pp. 501–509,2000. F. Hopner and F. Klawonn, “Obtaining interpretable fuzzy models from fuzzy clustering and fuzzy regression,” in Proc. 4th Int. Conf. Knowl.Based Intell. Eng. Syst. (KES), Brighton, U.K., 2000, pp. 162–165. O. Cordon, F. Gomide, F. Herrera, F. Hoffmann, and L. Magdalena, “Ten years of genetic fuzzy systems: Current framework and newtrends,” Fuzzy Sets and Systems., vol. 141(1), pp. 5–31, 2004. H. Ishibuchi, T. Nakashima, and M. Nii, Classification and Modeling With Linguistic Granules: Advanced Information Processing. Berlin, Germany: Springer-Verlag, 2004. H. Ishibuchi, K. Nozaki, N. Yamamoto, and H. Tanaka, “Selecting fuzzy if–then rules for classification problems using genetic algorithms,” IEEE Transactions on. Fuzzy Systems., vol. 3(3), pp. 260– 270, 1995. P.Angelov and D. Filev, "An approach to online identification of Takagi-Sugeno fuzzy models," IEEE Transactions on Systems, Man, and Cybernetics, Part B.vol. 34, pp. 484-498. 2004 N. Kasabov, and Q. Song, DENFIS: dynamic evolving neural-fuzzy inference system and its application for time series prediction, IEEE Transactions Fuzzy Systems , vol. 10 (2) pp. 144–154.2002 E. Lughofer, "FLEXFIS: A robust incremental learning approach for evolving Takagi.–Sugeno fuzzy models," IEEE Transactions on FuzzySystems.vol. 16, pp. 1393-1410. (2008). L.T.Whye, Q.Chai, “eFSM-A Novel Online Neural Fuzzy Semantic Memory Model”, IEEE Transaction on Neural Network, .Vol.21(1), pp.136-157. 2010 G. Leng, G. Prasad, and T. M. McGinnity, “An on-line algorithm for creating self-organizing fuzzy neural networks”, Neural Networks vol. 17, pp. 1477–1493. 2004 N. Kasabov, “Evolving fuzzy neural networks for supervised/unsupervised online knowledge-based learning,” IEEE Transaction on System, Man, Cybernetic, part B: Cybernetic, vol. 31( 6), pp. 902–918 , 2001

[18] P. Angelov, E. Lughofer, and X. Zhou, “Evolving fuzzy classifiers using different model architectures,” Fuzzy Sets and Systems, vol. 159(23), pp. 3160–3182, 2008. [19] E.Lughofer,“On-line incremental feature weighting in evolving fuzzy classifiers,” Fuzzy Sets and Systems, vol. 163, no. 1, pp. 1–23, 2011. [20] P. Angelov and X. Zhou, “Evolving fuzzy-rule-based classifiers from data streams,” IEEE Transactions on Fuzzy Systems, vol. 16(6), pp. 1462–1475, 2008. [21] R.D.Baruah, P.Angelov, J.Andreu,” Simpl_eClass: Simplified Potential-free Evolving Fuzzy Rule-Based Classifiers” In: Proceedings of 2011 IEEE International Conference on Systems, Man and Cybernetics, SMC 2011, Anchorage, Alaska, USA, 7-9 Oct, 2011. pp. 2249-2254. [22] P.Angelov, “Fuzzily Connected Multimodel Systems Evolving Autonomously From Data Streams” IEEE Transaction on Systems, Man and Cybernetics-part B: Cybernetics, vol. 40(2), (2010) [23] A. Lemos, W. Caminhas, and F. Gomide, “Fuzzy multivariate Gaussian evolving approach for fault detection and diagnosis,” in Proc. of the 13th International Conference on Information Processing and Management of Uncertainty, Part II (Applications), ser. CCIS, E. H¨ullermeier, R. Kruse, and F. Hoffmann, Eds. Dortmund, Germany: Springer, 2010, vol. 81, pp. 360–369. [24] E. Lima, M. Hell, R. Ballini, and F. Gomide, “Evolving fuzzy modelling using participatory learning,” in Evolving Intelligent Systems: Methodology and Applications, P. Angelov, D. Filev, and N. Kasabov, Eds. New York: John Wiley & Sons, 2010, pp. 67–86. [25] Edwin Lughofer, Single-Pass Active Learning with Conflict and Ignorance, Evolving Systems, vol. 3 (4), pp. 251-271, 2012 [26] Edwin Lughofer and Oliver Buchtala, Reliable All-Pairs Evolving Fuzzy Classifiers, IEEE Transactions on Fuzzy Systems, on-line and in press, 2013 [27] M.Pratama, S.Anavatti, M.Garratt, E. Lughofer, “Online Identification of A Complex Multi-Input-Multi-Output System Based on GENEFIS”, in Proceedings of IEEE SSCI conference (EAIS 2013 workshop), 2013, Singapore. [28] M.Pratama, S.Anavatti, E.Lughofer, “GENEFIS: Towards An Effective Localist Network”, IEEE Transactions on Fuzzy Systems, to appear. [29] M.Pratama, S.Anavatti, P.Angelov, E.Lughofer, PANFIS: A Novel Incremental Learning, Submitted to IEEE Transaction on Neural Network and Learning System, 18-Oct-2012 (in revision to minor corrections). [30] E.Lughofer, “A Dynamic Split-and-Merge Approach for Evolving Cluster Models”, vol. 3 (3), pp. 135-151, Evolving Systems, 2012. [31] Y.Xu, K.W.Wong, C.S.Leung, “Generalized Recursive Least Square to The Training of Neural Network”, IEEE Transaction on Neural Network, Vol. 17(1), pp. 19-34,2006 [32] E. Lughofer, J.-L. Bouchot and A. Shaker, On-line elimination of local redundancies in evolving fuzzy systems, Evolving Systems.vol. 2 (3), pp. 380—387 (2011) [33] R. R. Yager, “A model of participatory learning,” IEEE Transactions on Systems, Man and Cybernetics., vol. 20(5), pp. 1229–1234 1990 [34] A. Asuncion and D. J. Newman. (2007). UCI Machine Learning Repository [Online]. School of Information and Computer Science, University of California, Irvine. Available: http://www.ics.uci.edu/ mlearn/MLRepository.html [35] A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, “MOA: Massive online analysis,” Journal of Machine Learning Research, vol. 11, pp.1601–1604, 2010 [36] M.Stone, “Cross-Validatory Choice and Assessment of Statistical Predictions”, Journal of Royal Statistic Society, vol. 36, pp. 111-147, (1974). [37] R.J.Oentaryo, M.J.Er, L.San, L-Y.Zhai, X.Li, “Bayesian ART-Based Fuzzy Inference System: A New Approach to Prognosis of Machining Process”, in proceeding of IEEE Annual Conference of The Prognostic and Health Society, 2011