Subsethood-product fuzzy neural inference system - Semantic Scholar

2 downloads 0 Views 661KB Size Report
A fuzzy input is then simply one of these prespeci- fied fuzzy sets [10], [27], ..... The prediction error plots for three rules and ten rules are portrayed graphically in ...
578

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 3, MAY 2002

Subsethood-Product Fuzzy Neural Inference System (SuPFuNIS) Sandeep Paul and Satish Kumar, Member, IEEE

Abstract—A new subsethood-product fuzzy neural inference system (SuPFuNIS) is presented in this paper. It has the flexibility to handle both numeric and linguistic inputs simultaneously. Numeric inputs are fuzzified by input nodes which act as tunable feature fuzzifiers. Rule based knowledge is easily translated directly into a network architecture. Connections in the network are represented by Gaussian fuzzy sets. The novelty of the model lies in a combination of tunable input feature fuzzifiers; fuzzy mutual subsethood-based activation spread in the network; use of the product operator to compute the extent of firing of a rule; and a volume-defuzzification process to produce a numeric output. Supervised gradient descent is employed to train the centers and spreads of individual fuzzy connections. A subsethood-based method for rule generation from the trained network is also suggested. SuPFuNIS can be applied in a variety of application domains. The model has been tested on Mackey–Glass time series prediction, Iris data classification, Hepatitis medical diagnosis, and function approximation benchmark problems. We also use a standard truck backer-upper control problem to demonstrate how expert knowledge can be used to augment the network. The performance of SuPFuNIS compares excellently with other various existing models. Index Terms—Fuzzy mutual subsethood, fuzzy neural network, gradient descent learning, product conjunction, volume defuzzification.

I. INTRODUCTION

I

NTEGRATED fuzzy neural models exploit parallel computation and demonstrate the ability to operate and adapt in both numeric as well as linguistic environments. Numerous examples of such synergistic models have been proposed in the literature [1]–[5]. These include models for: approximate reasoning, inferencing and control [6]–[15]; classification [16]–[18]; diagnosis [19], [14]; rule extraction from numerical training data [20]–[23]; and rule simplification and pruning [24]–[26]. Other fuzzy-neural networks that fuzzify standard neural network architectures include: the fuzzy multilayer perceptron [27], [28]; models that utilize fuzzy teaching inputs with fuzzy weights in neural networks [29]–[32]; and evolvable neuro-fuzzy systems [33]–[37]. The development of fuzzy neural models have a common thread that derives from the desire to: Manuscript received January 10, 2001; revised July 11, 2001. This work was supported by the Department of Science and Technology, Ministry of Science and Technology, New Delhi, under Research Grant III.5(142)-ET. S. Paul is with the Department of Electrical Engineering, D.E.I. Technical College, Dayalbagh Educational Institute, Dayalbagh, Agra 282005, India (e-mail: [email protected]). S. Kumar is with the Department of Physics and Computer Science, Faculty of Science, Dayalbagh Educational Institute, Dayalbagh, Agra 282005, India (e-mail: [email protected]). Publisher Item Identifier S 1045-9227(02)04432-6.

1) embed data-driven knowledge into a network architecture to facilitate fast learning; 2) design an appropriate composition and evidence aggregation mechanism that can simultaneously handle numeric and linguistic features in order to generate outputs or derive conclusions; 3) incorporate a mechanism for fine tuning rules by learning from numeric data; 4) extract and interpret the learned knowledge as a rule base. Let us consider each of these points in greater detail. Embedding Data Driven Knowledge: Most hybrid models embed data-driven or expert derived knowledge in the form of fuzzy if–then rules, which are ultimately represented in a neural network framework [7], [9], [14]. This embedding of knowledge is often done by assuming that antecedent and consequent labels of standard fuzzy if–then rules are represented as connection weights of the network as in [14], [30], [29]. It has been shown formally that knowledge-based networks require relatively smaller training set size for better generalization [38]. When such rule based knowledge is extracted from numeric data, a common approach is to use either clustering or partitioning to derive the rules. Using clustering, the centers of fuzzy rules are initialized as cluster vectors extracted from the input data set [11], [16], [39], [40]. Subsequently, a learning algorithm fine tunes these rules based on the available training data that describes the problem. Partitioning techniques recursively divide the input–output cross space into finer regions depending upon a local mean-squared error estimate. Each partition leads to an if–then rule [21, ch. 5], [41]. In each of these techniques, the selection of the number of rules to solve a problem is more or less still based on a heuristic approach. Composition and Evidence Aggregation: The issue of composition of input information with the embedded rule base depends on whether the input feature is numeric or linguistic. With numeric inputs the usual way is to work with membership values computed from fuzzy membership functions that represent network weights [16], [22]. In order to handle fuzzy inputs, a given universe of discourse is generally quantized into prespecified fuzzy sets. A fuzzy input is then simply one of these prespecified fuzzy sets [10], [27], [28]. Alternatively, there are models where interval arithmetic has been employed to handle such situations [29], [30], [42]. Learning: The third issue is commonly dealt with by using either supervised gradient descent and their variants [9], [13], [16], [27], unsupervised learning, reinforcement learning [8], [34], [43], [44], and heuristic methods [22], or genetic algorithm based search [23], [35], [36].

1045-9227/02$17.00 © 2002 IEEE

Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 13,2010 at 21:59:28 EST from IEEE Xplore. Restrictions apply.

PAUL AND KUMAR: SUBSETHOOD-PRODUCT FUZZY NEURAL INFERENCE SYSTEM (SuPFuNIS)

Rule Interpretation: The final issue of extracting and interpreting the tuned fuzzy weights is done by assigning each fuzzy weight a linguistic label chosen on the basis of comparison with a set of fixed fuzzy sets, based on a similarity measure [24], [26], [45]. This helps in generating a rule base that is easily comprehensible. In this paper we present the design of a fuzzy-neural network model that specifically addresses the following objectives: 1) to incorporate a mechanism that can handle numeric and linguistic inputs seamlessly; 2) to stress on the economy of the number of parameters that a model employs to solve a particular problem; 3) to be able to easily incorporate data-driven as well as expert knowledge in the generation of an initial set of if–then rules; 4) to attempt to have the system learn data-driven knowledge to fine tune the set of if–then rules; 5) to be able to interpret a trained fuzzy-neural system. The resulting Subsethood-Product Fuzzy Neural Inference System (SuPFuNIS) adequately addresses each of these issues. SuPFuNIS uses a standard fuzzy-neural network architecture that embeds fuzzy if–then rules as hidden nodes; rule antecedents as input to hidden connections, and rule consequents as hidden to output connections [16], [22]. Knowledge in the form of if–then rules derived from clustering numeric data is used to initialize the rules embedded in the network [11], [12], [39], [46]. However, SuPFuNIS is different from other fuzzyneural network models on various counts. 1) It uses a tunable input fuzzifier that is responsible for fuzzification of numeric data. In other words, numeric inputs are fuzzified using a feature-specific Gaussian spread. 2) All information that propagates from the input layer is fuzzy. The model therefore uses a composition mechanism that employs a fuzzy mutual subsethood measure to define the activation that propagates to a rule node along a fuzzy connection. 3) The model aggregates activities at a rule node using a fuzzy inner product: a product of mutual subsethoods. This is different from the more common approach to use a conjunction operator for activity aggregation. fuzzy 4) Outputs are generated using volume defuzzification, which is a variant of the commonly employed centroidal defuzzification procedure. As demonstrated in this paper, it is the combination of the above four mechanisms that lends the model its uniformly high performance and its high level of parameter economy. Earlier variants of the proposed model with applications in function approximation, inference and classification have been presented elsewhere [47]–[49]. In [47] a combination of weighted subsethood and soft-minimum conjunction operator was employed. The model used a triangular approximation instead of Gaussian fuzzy weights for subsethood computation. It addressed the applications of function approximation and inference. In [48], which extended [47] by increasing the number of free parameters, a simple heuristic to derive the number of rules using clustering was introduced. A combination of

579

mutual subsethood and product conjunction operator with a nontunable feature fuzzifier has been presented in [49]. The network in [49] uses Gaussian fuzzy weights and targets the classification problem domain. SuPFuNIS also has a diversity of application domains. In support of our claims, SuPFuNIS is tested on five different applications: approximation of nonlinear Mackey–Glass time series; Iris data classification; hepatitis medical diagnosis; function approximation; and a truck backer-upper control problem. For the Mackey–Glass time series approximation problem we also show the efficacy of employing cluster-based initialization, and subsethood-based interpretation of the learnt knowledge in the form of rules. The idea of seamlessly presenting mixed linguistic–numeric inputs to the model is exemplified in the hepatitis diagnostic problem. The ease with which expert knowledge can be incorporated into SuPFuNIS is demonstrated on the truck backer-upper application where we show the effect of using a network trained using only numeric data, and compare it with a network trained on a reduced training set but augmented with expert knowledge. All the applications demonstrate the high performance of the SuPFuNIS model. The organization of the paper is as follows: Section II provides the operational details of SuPFuNIS; Section III details the supervised learning of model; Section IV presents four applications of SuPFuNIS: time series prediction, classification, diagnosis, and function approximation. In Section V we discuss the issue of rule interpretation; and Section VI shows the effectiveness of SuPFuNIS to work with a numerically trained network augmented with linguistic knowledge. Finally, Section VII concludes the paper. II. ARCHITECTURE AND OPERATIONAL DETAILS The proposed SuPFuNIS model directly embeds fuzzy rules of the form If

is LOW and

is HIGH then

is MEDIUM

(1)

where LOW, MEDIUM, and HIGH are fuzzy sets defined, respectively, on input or output universes of discourse (UODs). Input nodes represent domain variables or features, and output nodes represent target variables or classes. Each hidden node represents a rule, and input-hidden node connections represent fuzzy rule antecedents. Each hidden-output node connection represents a fuzzy-rule consequent. Fuzzy sets corresponding to linguistic labels of fuzzy if–then rules (such as LOW, MEDIUM, and HIGH), are defined on input and output UODs and are represented by symmetric Gaussian membership functions specifrom input nodes fied by a center and spread. Fuzzy weights to rule nodes are thus modeled by the center and spread of a Gaussian fuzzy set and denoted by . In a similar fashion, consequent fuzzy weights from rule nodes to output nodes are denoted by . Data-driven knowledge in the form of fuzzy if–then rules is translated directly into a network architecture as shown in Fig. 1. SuPFuNIS can simultaneously admit numeric as well as fuzzy inputs. Numeric inputs are first fuzzified so that all inputs to the network are uniformly fuzzy. Now since the antecedent weights are also fuzzy, this requires the adoption of

Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 13,2010 at 21:59:28 EST from IEEE Xplore. Restrictions apply.

580

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 3, MAY 2002

Fig. 1. Architecture of the SuPFuNIS model.

(a)

(b)

Fig. 2. (a) An example of prespecified fuzzy sets for fuzzy inputs. (b) An example of fuzzification of numeric input by a tunable fuzzy set.

a method to transmit a fuzzy signal along a fuzzy weight. In conventional neural networks numeric inputs are scaled by the weight directly and these scaled values are aggregated as the activation of a node using simple summation. In the SuPFuNIS model signal transmission along the fuzzy weight is handled by calculating the mutual subsethood (detailed in Section II-B). We now proceed to discuss these issues in detail.

Gaussian shape is chosen to match the Gaussian shape of weight fuzzy sets since this facilitates subsethood calculations detailed in Section II-B. Therefore, the signal transmitted from a numeric node of the input layer is also represented by the pair . These fuzzy signals from numeric or linguistic inputs are transmitted to hidden rule that correspond nodes through fuzzy weights to rule antecedents.

A. Signal Transmission at Input Nodes can comSince the input feature vector prise either numeric or linguistic values, there are two kinds of nodes in the input layer. Linguistic nodes accept a linguistic input represented by a fuzzy set with a Gaussian membership and spread . These function and modeled by a center linguistic inputs can be drawn from prespecified fuzzy sets as shown in Fig. 2(a), where three Gaussian fuzzy sets have been defined on the UOD [ 1, 1]. Thus, a linguistic input is rep. This is also the signal resented by the pair transmitted out of the linguistic node since no transformation of inputs takes place at these nodes in the input layer. Numeric nodes are tunable feature-specific fuzzifiers. They accept numeric inputs and fuzzify them using Gaussian fuzzy sets. The numeric input is fuzzified by treating it as of a Gaussian membership function with a the center tunable spread . This is shown in Fig. 2(b) where a numeric feature value of 0.25 has been fuzzified into a Gaussian membership function centered at 0.25 with spread 0.5. The

B. Mutual Subsethood Since both the signal and the weight are fuzzy sets, being represented by Gaussian membership functions, we intuitively seek to quantify the net value of the signal transmitted along the weight by the extent of overlap between the two fuzzy sets. This is measured by their mutual subsethood which is introduced next. and described by Gaussian Consider two fuzzy sets , and spreads , membership functions with centers respectively: (2) (3) The cardinality

of fuzzy set

is then defined by (4)

Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 13,2010 at 21:59:28 EST from IEEE Xplore. Restrictions apply.

PAUL AND KUMAR: SUBSETHOOD-PRODUCT FUZZY NEURAL INFERENCE SYSTEM (SuPFuNIS)

(a)

(b)

(g)

(h)

(e)

(f)

(g)

(h)

=c

Fig. 3. Four cases depending upon relative values of c , c ,  and  . Case 1: (a) c  . (d) c < c and   . Case 3: (e) c > c and  >  . (f) c < c and 



=

=

Then, the mutual subsethood the degree to which fuzzy set

[21, ch. 13], measures equals fuzzy set

Degree Degree

581

and

(5)

and can be formulated as

and  >  . (b) c = c and  =  . Case 2: (c) c > c and >  . Case 4: (g) c > c and  <  . (h) c < c and  <  .

do not cross over—either one fuzzy set belongs completely to the other, or the two fuzzy sets are identical. In Case 2 there is exactly one crossover point, whereas in Cases 3 and 4 there are exactly two crossover points. To calculate the crossover points we set to obtain the equal valued points. This yields the two crossover points and (7)

(6) (8) The mutual subsethood measure has values in the interval [0,1] that depend on the relative values of centers and spreads of fuzzy sets and . Four different cases of overlap can arise. having any values of and . • Case 1: and . • Case 2: and . • Case 3: and . • Case 4: These four cases depend on relative values of , , , and , as portrayed in Fig. 3. Notice that in Case 1, the two fuzzy sets

can then be evaluated in terms of as in (6). In subsequent sections, to facilitate evaluation of cardinality we express it in terms of the standard error function (9) which has limiting values and .

,

Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 13,2010 at 21:59:28 EST from IEEE Xplore. Restrictions apply.

582

Fig. 4.

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 3, MAY 2002

Fuzzy signal transmission.

C. Mutual Subsethood Based Signal Transmission As now shown schematically in Fig. 4, SuPFuNIS transmits a fuzzy signal from an input node along a fuzzy weight that represents an antecedent connection. The transmitted signal is quantified by , which denotes the mutual subsethood between the and fuzzy weight ( ). is computed fuzzy signal using (6). Symbolically, for a signal (generated from either a numeric or linguistic input node) and , the mutual subsethood is dea fuzzy weight fined as (10) for each The derivations of the expressions for of the four cases identified above are given below. : If the signal fuzzy set Case 1: completely belongs to the weight fuzzy set [as portrayed in case of Fig. 3(a)] and the cardinality

If is

, [Fig. 3(d)] the expression for cardinality

(14) : In this case there will be two Case 3: and as calculated in (7) [see Fig. 3(e) crossover points and [Fig. 3(e)], the and 3(f)]. Assuming can be evaluated as cardinality

(11) if and Similarly, . If , the two fuzzy sets are identical [as portrayed in case of Fig. 3(b)]. Summarizing these three subcases if if

(12) (15)

if

.

: In this case there will be Case 2: as shown in Fig. 3 (c) and 3(d). exactly one crossover point [Fig. 3(c)], the cardinality can Assuming be evaluated as

[Fig. 3(f)] the expression for is identical If to (15). : This case is similar to Case 3, Case 4: and as and once again there will be two crossover points and [Fig. 3(g)], calculated in (7). Assuming can be evaluated as the cardinality

(13)

Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 13,2010 at 21:59:28 EST from IEEE Xplore. Restrictions apply.

PAUL AND KUMAR: SUBSETHOOD-PRODUCT FUZZY NEURAL INFERENCE SYSTEM (SuPFuNIS)

583

E. Output Layer Signal Computation

(16) [Fig. 3(h)] the expression for cardinality is identical If to (16). are obtained by The corresponding expressions for from (12)–(16) into (10). substituting for

The signal of each output node is determined using standard volume based centroid defuzzification [21]. The term volume is used in a general sense as to include multidimensional functions. For two-dimensional functions the volume reduces to area. If the activation of the output node is , and s denote s are the weights that scale , consequent set volumes and then the general expression of defuzzification is

(19) D. Activity Aggregation at Rule Nodes With a Fuzzy Inner Product By measuring all the values of mutual subsethoods for a rule node we are in essence assessing the compatibility between the linguistic signal vector (transmitted from the input layer) and the that fans-in to the fuzzy weight vector rule node . Each rule node is expected to somehow aggregate this vector in such a way that the resulting node activation reflects this compatibility. In other words, the extent of rule firing as represented by the rule node activation, measures the extent to which the corresponding linguistic input matches of the rule in question. We replace the the antecedent operator commonly used in fuzzy systems with standard operator to aggregate activities at a rule node. the The activation , of rule node is thus a mutual subsethood based product, the differentiability of which allows the model to employ gradient descent based learning in a straightforward way. of the rule node , is a In summary, the net activation product of all mutual subsethoods, the fuzzy inner product

, in our case where is the number of rule nodes. The volume is simply the area of consequent weights which are represented . If the weights by Gaussian fuzzy sets. Thus, are considered to be unity as we do so in this paper, then

(17)

are normalized and sum where the coefficients to one. The defuzzifier (20) thus essentially computes a convex sum of consequent set centers. This completes our discussion on how inputs are mapped to outputs in SuPFuNIS.

Notice that the inner product in (17) (and thus the rule node activation function) exhibits the following properties: it is bounded between zero and one; monotonic increasing; continuous; symmetric; and nonidempotent. The use of such a fuzzy inner product of subsethoods lends novelty to SuPFuNIS. The behavior of the product aggregation operator has been discussed at length in [49], where we pointed out that the operator does not ignore information regarding operator [21]. It the dimension of the input, as does the provides a better estimate of the joint strength of the various inputs. Also, over a wide range of spreads, the operator is able to clearly differentiate between inputs that are similar to the weight vector, and inputs that are dissimilar operator from the weight vector. In other words, the operator. We is capable of better discrimination than the believe that this is an important contributing factor to the high performance and economy of SuPFuNIS networks. The signal function for a rule node is linear (18) Numeric activation values are transmitted unchanged to consequent connections.

(20)

The signal of output node is and substitutions simplified

. Note that with the , (20) can be

(21) (22)

III. SUPERVISED LEARNING The SuPFuNIS network is trained by supervised learning. This involves repeated presentation of a set of input patterns drawn from the training set. The output of the network is compared with the desired value to obtain the error, and network weights are changed on the basis of an error minimizing criterion. Once the network is trained to the desired level of error, it is tested by presenting a new set of input patterns drawn from the test set. A. Iterative Update Equations Learning is incorporated into SuPFuNIS model using the gradient descent method. A squared error criterion is used as a at itertraining performance parameter. The squared error ation is computed in the standard way (23)

Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 13,2010 at 21:59:28 EST from IEEE Xplore. Restrictions apply.

584

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 3, MAY 2002

where is the desired value at output node , and the error . For evaluated over all outputs for a specific pattern a one-of- class classification the desired outputs will be then , , and spreads , , of zero or one. Both the centers antecedent and consequent connections and the spreads of the are modified on the basis of update equations input features for that take on the form (24)

(37) and the error derivative chains with respect to input feature spreads is

(25) (26) (38) (27)

where

(28) where

is the learning rate,

is the momentum parameter and (29) (30) (31) (32) (33) (39)

B. Evaluation of Partial Derivatives The expressions of partial derivatives required in these update equations are derived as follows. For the error derivative with respect to consequent centers

and (40)

(34)

and the error derivative with respect to the consequent spreads

(35)

The error derivatives with respect to antecedent centers and spreads involve subsethood derivatives in the chain and are somewhat more involved to evaluate. Specifically, the error derivative chains with respect to antecedent centers and spreads are, respectively

The expressions for antecedent connection mutual subsethood , and are obpartial derivatives , and as tained by differentiating (10) with respect to shown in (41)–(43) at the bottom of the next page. , and In (41)–(43), depend on the nature of overlap of the input feature fuzzy set and weight fuzzy set, i.e., upon the values of , , and . Case-wise expressions therefore need to be derived as follows. : As is evident from (12), is Case 1: , and therefore independent of (44) Differentiating (12) with respect to if

and

if

and

and differentiating (12) with respect to (36) and

we have (45)

we have

if

and

if

and

.

(46)

Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 13,2010 at 21:59:28 EST from IEEE Xplore. Restrictions apply.

PAUL AND KUMAR: SUBSETHOOD-PRODUCT FUZZY NEURAL INFERENCE SYSTEM (SuPFuNIS)

Case 2:

: When , and rived by differentiating (13) as follows:

585

, are de-

(51)

(47)

(52)

(48)

Case 3: as in Case 2. When and (15)

: Once again two subcases arise , , are derived by differentiating

(49) When

, , and are derived by differentiating (14) as follows:

(53)

(50)

(41)

(42) and (43)

Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 13,2010 at 21:59:28 EST from IEEE Xplore. Restrictions apply.

586

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 3, MAY 2002

(54)

(58)

(55)

Similarly, if

(59) the expressions for and respectively.

If

, are the same as (57)–(59),

(56) or , identical expressions for are obtained. Similarly, the expression for and remains the same as or . (54) and (55), respectively, when : When , Case 4: , and is derived by differentiating (16)

Thus, whether

IV. APPLICATIONS The SuPFuNIS model finds application in a variety of domains. In this section, we compare and contrast the performance of SuPFuNIS with other models on four applications: Mackey–Glass time series approximation; Iris data classification; Hepatitis disease diagnosis; and a function approximation problem. We deal with the Mackey–Glass series in greater detail than others to highlight important behavioral properties of the model that carry over to the other problems where we report only final results. A. Mackey–Glass Time Series Prediction

(57)

Nonlinear dynamical time series modeling is a central problem in different disciplines such as economics, forecasting, planning and control. In this paper we consider a benchmark chaotic time series first investigated by Mackey and Glass [50] which is a widely investigated problem in the fuzzy-neural

Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 13,2010 at 21:59:28 EST from IEEE Xplore. Restrictions apply.

PAUL AND KUMAR: SUBSETHOOD-PRODUCT FUZZY NEURAL INFERENCE SYSTEM (SuPFuNIS)

domain [9], [13], [36], [39], [51]–[53]. The series is generated by the following delay differential equation:

587

TABLE I rmse OF SuPFuNIS FOR MACKEY–GLASS TIME SERIES OBTAINED AFTER 500 EPOCHS FOR DIFFERENT RULE COUNTS

(60) As in (60) is varied, the system can exhibit either fixed point, the system exhibits limit cycle or chaotic behavior. For chaotic behavior and we attempt the problem of approximating time series function of (60) for this value of . In the present context, the time series prediction problem in( being the prevolves predicting a future value at certain diction time step) based on a set of values of times less than . The standard method for this type of predicpoints of the time series tion is to create a mapping from , spaced apart, to . To facilitate comparison with predict a future value , and . The goal then is to use earlier work we use a fuzzy neural model to construct a function as follows: (61) For the purpose of training and testing the generalization ability of the model, a data set was generated using the Runge–Kutta 0.1, with an iniprocedure applied to (60) with time step . From the Mackey–Glass time setial condition generated by the above procedure, we extracted 1000 ries 118 to input–output data pairs of the following format from 1117 (62) where the first four values are the inputs to the system and the last value is the desired output. The first 500 pairs were used as training data, and the second 500 pairs were employed as the test set. Training involves sequential presentation of data pairs and standard application of batch mode gradient descent learning procedure. The number of free parameters that SuPFuNIS employs is straightforward to calculate: one spread for each numeric input; a center and a spread for each antecedent and consequent connection of a rule. For the Mackey–Glass time series application SuPFuNIS employs a 4- -1 network architecture, where is the number of rule nodes. Therefore, since each rule has four antecedents and one consequent, an -rule SuPFuNIS system will free parameters. have 10 Experiments were conducted on SuPFuNIS to test its performance with and without data driven initialization, steadily increasing the rules from three to ten. For the simulation results of Table I, the rule base was initialized in two ways: 1) randomizing weights (centers in interval [0, 1.5] and spreads in the interval [0.2, 0.9]); 2) using fuzzy -means clustering in conjunction with the . Here, given Xie–Beni cluster validity measure, rules, , 1000 randomly generated sets of clusters each were evaluated using the Xie–Beni cluster validity measure. The best cluster was then selected for initialization. Details of this procedure are provided in Appendix I.

Fig. 5. Error-epoch trajectories for training of SuPFuNIS on Mackey–Glass time series data.

The training and testing root mean square errors (rmses) after 500 epochs of training for different number of rules for both randomized initial weight values and weight values initialized procedure (see Appendix I) are shown in using the FCMTable I. During training the learning rate and momentum were initialized to 0.1 and decayed linearly to 0.01 in 500 steps. Notice that the final results obtained after cluster based initialization are better than those obtained by simple randomization, and this difference is more pronounced at lower rule counts. This difference however reduces as the number of rules increases—a result that is intuitively expected, since at a higher rule count the system has many more rules to cover the data and the initial placement of these rules is not so critical as it is when the rule count is low. Finally, as the number of rules increases the rmse decreases. As is to be expected, if the SuPFuNIS model is trained for a larger number of epochs, lower rmse is obtained at the cost of computation time. For example: after 5000 epochs, using ten rules the training rmse is 0.003 70 and testing rmse is 0.003 74. These values are to be compared with those reported for 500 epochs in Table I. An important aspect of the training behavior of SuPFuNIS is that most learning is complete during the first few tens of epochs. Error-epoch training plots for different numbers of rules with FCM based initialization are shown in Fig. 5(a). It is clear that most of the learning is complete within 50 epochs, after

Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 13,2010 at 21:59:28 EST from IEEE Xplore. Restrictions apply.

588

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 3, MAY 2002

TABLE II COMPARISON OF SuPFuNIS NrmseS WITH OTHER MODELS FOR MACKEY–GLASS TIME SERIES APPLICATION. y: RESULTS ADAPTED FROM [52]

Fig. 6. Approximation performance of SuPFuNIS for Mackey–Glass time series using three and ten rules, respectively.

which the network goes through a fine-tuning phase. Fig. 5(b) compares error-epoch training plots for random initialization based initialization. Clearly, correct initializaand FCMtion certainly improves not only the final performance but accelerates learning as well. The approximation performance for the cases of three rules (34 parameters) and ten rules (104 parameters) after 500 epochs based initialization are shown in of training with FCMFig. 6(a) and (b), respectively. The plots show a zoomed portion from data points 1 to 100 of the time series so as to visualize the difference in approximation quality. Solid lines indicate the desired function, and dash-dot lines indicate the predicted function. The prediction error plots for three rules and ten rules are portrayed graphically in Fig. 6(c) and (d), respectively. We now summarize the above observations: 1) As the number of rules increases the approximation tends to improve. Notice from Table I that random initialization does not necessarily yield this improvement. 2) For low rule counts considerably lower rmse values in less epochs are obtained when FCM is employed to initialize the weights as against randomization. This indicates that initial knowledge can help improve the performance of the model. 3) The use of the Xie–Beni index with FCM for initialization can give gains in the speed of learning at the cost of an increased preprocessing computation time. Observing the decreasing trend of the error-epoch trajectory in Fig. 5(b) justifies the above statement. Specifically, for the case for ten rules, for random initialization of fuzzy weights the training and testing rmse after 30 epochs are 0.011608 and 0.011691, respectively, while initialization method yields the values 0.009838 using the FCMand 0.009792, respectively. The Mackey–Glass time series prediction problem has been attempted with various classical models, neural networks, and fuzzy neural networks in the literature. A comparison of SuPFuNIS with a selection of these models is shown in Table II based on normalized root mean square error (Nrmse). The

Nrmse is defined as the rmse divided by the standard deviation of the target series [13]. The other results in Table II are adapted for comparison from [9], [37], [52], [54], [55]. Both ANFIS [9] and GEFREX [37] outperform other models in terms of Nrmse. However, ANFIS has the drawback is that it has less interpretability in terms of learned information; and the implementation of GEFREX is difficult. As can be seen, excluding GEFREX and ANFIS, SuPFuNIS performs the best with an Nrmse of 0.016 with just ten rules or 104 parameters. By way of example, note that ANFIS employs 104 parameters, cascade correlation learning uses 693 connections, the backpropagation trained neural network uses 540 connections, and EPNet employs 103 (average) parameters. Clearly SuPFuNIS has a combination of architectural economy and high performance. Importantly, as we discuss in Section V, SuPFuNIS also provides easy interpretability of learned information since the rule base structure remains intact after learning completes. We report numeric values of rules for a ten-rule SuPFuNIS in Appendix II. B. Iris Data Classification Iris data involves classification of three subspecies of the Iris flower, namely Iris sestosa, Iris versicolor, and Iris virginica on the basis of four feature measurements of the Iris flower—sepal length, sepal width, petal length, and petal width [56]. There are 50 patterns (of four features) for each of the three subspecies of Iris flower. The input pattern set thus comprises 150 four-dimensional patterns. This data can be obtained from UCI repository of machine learning databases from http://www.ics.uci.edu/~mlearn/MLRepository.html. For this classification problem, SuPFuNIS employs a 4- -3 network architecture: the input layer consists of four numeric nodes; the output layer comprises three class nodes; and rule nodes in the hidden layer. To train the network, initially the centers of antecedent weight fuzzy sets were randomized in the range of the minimum and maximum values of respective input features of Iris data. Feature-wise, these ranges are (4.3, 7.9), (2.0, 4.4), (1.0, 6.9), and (0.1, 2.5). The centers of hidden-output weight fuzzy sets were randomized in the range (0,1) and the spreads of all fuzzy weights and feature spreads were randomized in the range (0.2, 0.9). All 150 patterns of the Iris data were presented sequentially to the input layer of the network for training. The learning rate and momentum were both taken

Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 13,2010 at 21:59:28 EST from IEEE Xplore. Restrictions apply.

PAUL AND KUMAR: SUBSETHOOD-PRODUCT FUZZY NEURAL INFERENCE SYSTEM (SuPFuNIS)

TABLE III NUMBER OF RESUBSTITUTION ERRORS FOR IRIS DATA FOR STANDARD ALGORITHMS WITH DIFFERENT NUMBER OF PROTOTYPES/RULES. y: RESULTS ADAPTED FROM [58]

TABLE IV BEST RESUBSTITUTION ACCURACY FOR IRIS DATA DIFFERENT SOFT COMPUTING ALGORITHMS

FOR

as 0.0001 and kept constant during the training period. Once the network was trained, the test patterns (which again comprised all 150 patterns of Iris data) were presented to the trained network and the resubstitution error computed. Simulation experiments were conducted with different numbers of rule nodes to illustrate the performance of the classifier with a variation in the number of rules. Notice that for rules, the number of connections in the 4- -3 architecture for Iris data will be 7 . Once again, since the representation of a fuzzy weight requires two parameters (center and spreads), the . total number of free parameters to be trained will be 14 Attempts to solve the same problem using other techniques like genetic algorithm (GA), learning vector quantization (LVQ) and its family of generalized fuzzy algorithms (GLVQ-F) [57], [58], and random search (RS), have been reported in the literature. Table III compares the resubstitution error with these techniques. The results obtained from GA, RS, LVQ, and GLVQ-F have been adapted from [58] for the purpose of comparison. In GA and random search techniques two resubstitution errors for three prototypes are reported. For four prototypes the GA performed poorly with four errors in comparison to two errors in random search. Results from Table III show that the SuPFuNIS model has only one resubstitution error for three and four rules and zero resubstitution error any number of rules greater than or equal to five. Note that for three rules, 46 parameters require specification in our model. The 84th pattern (6.0 2.7 5.1 1.6) that belongs to Iris versicolor is misclassified as belonging to Iris virginica. In Table IV, SuPFuNIS is compared with other soft computing models [16], [35], [36], [59], [60] in terms of the number of rules and % resubstitution accuracy. The performance of the SuPFuNIS model is better than all other techniques and is at of the trained par with FuGeNeSys [36]. The fuzzy weights network with five rules that produces zero resubstitution error are illustrated in the scatter plot of Iris data in Fig. 7 and the

589

corresponding numeric values of weight parameters are given in Appendix II. Apart from the above techniques, we mention that the Iris classification problem has also been solved using a multilayer perceptron (MLP) with a 4-6–6-3 architecture (four nodes in the input layer and three nodes in the output layer with two hidden layers consisting six nodes each) [4]. The MLP can achieve a zero resubstitution error with 93 connections (parameters). Zero resubstitution error was obtained with SuPFuNIS using five rules or 74 free parameters. The SuPFuNIS model clearly performs very well as a classifier. C. Medical Diagnosis The next benchmark problem deals with hepatitis diagnosis which requires classifying patients into two classes Die or Live on the basis of features which are both numeric and linguistic (symbolic). The data can be obtained from http://www.ics.uci.edu/~mlearn/MLRepository.html. The purpose of including this example is to show how easily SuPFuNIS can handle both numeric and symbolic data. In addition, we show that SuPFuNIS is robust against variations in training data. The hepatitis data set has 155 patterns of 19 input features with a number of missing values. An example pattern having 19 features and class Live with all feature values defined is given in Table V. There are six numeric features namely Age, Bilirubin, Alk Phosphate, SGOT, Albumin, and Protime, and the remaining 13 features are linguistic in nature. As there are a number of missing data, preprocessing of data is required. The data set consists of 75 patterns that have one or more features unspecified. A new set of data was formed by fitting some of the missing numeric values. Twenty patterns which had either a missing symbolic feature value, or more than two missing numeric feature values were first discarded. The missing numeric values in the remaining 55 incomplete cases were filled with the average value of the missing feature calculated on a class-wise basis from the 80 original complete data [61]. This way we were able to reconstruct a data set of 135 patterns. The numeric features of these 135 patterns were normalized feature-wise in the range [0, 1]. Symbolic features (yes/no or male/female) were represented by constructing two fuzzy sets: the symbolic value “no” represented by a fuzzy set with Gaussian membership function having center as zero and spread as 0.5, and “yes” represented by a Gaussian membership function centered at one and spread 0.5. The spreads were assumed to be trainable during the learning procedure. Experiments were conducted using two data sets: Data Set 1 comprising of only 80 of 155 patterns that were originally complete in all respects; and Data Set 2 comprising 135 patterns (80 originally complete and 55 reconstructed). For training, 70% patterns were randomly chosen and the remaining 30% were used for testing. Five combinations of such 70%(train)–30%(test) were randomly generated separately for Data Set 1 and Data Set 2. Experiments were then conducted on each of these individual data set combinations using a 19-3-2 SuPFuNIS architecture. During training both learning rate and momentum were kept constant as 0.0001. These results are reported in Table VI. In all the experiments SuPFuNIS has a

Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 13,2010 at 21:59:28 EST from IEEE Xplore. Restrictions apply.

590

Fig. 7.

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 3, MAY 2002

SuPFuNIS rule patches for Iris data for the case of five rules. TABLE V A CASE FOR HEPATITIS DATA

high classification accuracy ranging from 87.5% to 100% with only three rules. In addition to its architectural economy this experiment demonstrates an important aspect of the model: it is robust against random variations in data sets. Table VII shows the average classification accuracy obtained using SuPFuNIS for both data sets, compared with other approaches like CN2 [63], Bayes [64], Assistant-86 [64], k-NN, LVQ, multilayer perceptron, and Wang and Tseng approach using GA based technique [23] to solve the same problem. The results of other approaches are adapted from [23] and [62]. Once again, we stress on the economy of the number of rules that is able to yield a high classification accuracy with SuPFuNIS. At the same time we have shown the model to be robust against data set variations. Above all, mutual subsethood products allow seamless integration of numeric and linguistic information, while being amenable to gradient descent learning. Although in this example, linguistic inputs were only of the Yes/No type, truly graded linguistic inputs would only prove the worth of the model to a greater extent.

TABLE VI TESTING ACCURACY IN % USING THREE RULES FOR HEPATITIS DATA

TABLE VII COMPARISON OF SuPFuNIS WITH OTHER METHODS FOR HEPATITIS DATA

D. Function Approximation A single input function as given in (63), frequently used in literature [36], [37], [65], [66] to test the learning capacities of the proposed models, was used to test the performance of SuPFuNIS (63) Twenty-one training patterns were generated at intervals of 0.05. Thus, the training patterns are of the form: . The evaluation was

done using 101 test data taken at intervals of 0.01. For the purand as depose of comparison, performance indexes fined in [65], were also used in this paper. Table VIII compares for different models the test accuracy performance index along with the number of rules and tunable parameters used in achieving it. With three rules SuPFuNIS obtained and which is better than all others [36], [65], [66] except for GEFREX [37]. With five rule SuPFuNIS oband which are comparable to tained and ). GEFREX (

Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 13,2010 at 21:59:28 EST from IEEE Xplore. Restrictions apply.

PAUL AND KUMAR: SUBSETHOOD-PRODUCT FUZZY NEURAL INFERENCE SYSTEM (SuPFuNIS)

591

TABLE VIII COMPARISON OF SuPFuNIS WITH OTHER METHODS FOR NARAZAKI–RALESCU’S FUNCTION

Fig. 8.

Fuzzy interpretation sets.

V. INFERRING RULES FROM A TRAINED NETWORK In this section we expand upon the ease of interpretation of knowledge that is embedded in SuPFuNIS which we demonstrate for the case of the Mackey Glass time series prediction problem. A similar approach can be applied to other applications as well. The trained rules obtained after 500 epochs from the experbased initialization are iment using ten rules with FCMshown in Fig. 9 whose numeric values are given in Appendix II. To interpret these rules, we consider a fuzzy interpretation set which provides an exhaustive linguistic interpretation on the interval representing a UOD. In the present example, linguistic labels of a fuzzy interpretation set are represented by normalized symmetric Gaussian membership functions with identical spreads, and centers fixed at equal intervals. Exemplar fuzzy interpretation sets of three and five linguistic labels defined on a UOD of [0 1.5] are shown in Fig. 8. In order to interpret a trained rule in terms of linguistic labels of a selected fuzzy interpretation set, the fuzzy subsethood is measured between each antecedent set and every fuzzy set of a fuzzy interpretation set; and between the consequent set and every fuzzy set of a fuzzy interpretation set. A rule antecedent or consequent is then associated with the fuzzy interpretation set for which the maximum fuzzy subsethood measure is obtained. With three linguistic labels SMALL (S), MEDIUM (M) and LARGE (L) in a fuzzy interpretation set [Fig. 8(a)], the if–then rules generated from the ten-rule model (obtained after 500 epochs of training as shown in Fig. 9) are summarized in Table IX. It can be observed from Table IX that rules 1, 2, 4, 5, and 10 are identical. Thus instead of ten rules, the system can be represented broadly by six rules using three linguistic labels. However, observe that at this level of set-granularity rules 1, 2, 4, 5, 10 are inconsistent with rule 3. This calls for an increase in the number of sets in the fuzzy interpretation set employed

for interpretation. Specifically, if the rule interpretation procedure is carried out using five linguistic labels [Fig. 8(b)] namely VERY SMALL (VS), SMALL (S), MEDIUM (M), LARGE (L), VERY LARGE (VL), defined on the same UOD [0, 1.5], the system can be represented by 10 distinct rules as shown in Table X. Note that the aforesaid inconsistency is now eliminated. The minimum level of fuzzy interpretation set granularity at which no inconsistencies exist is the appropriate level at which to interpret the embedded knowledge of SuPFuNIS. Thus, in SuPFuNIS rule interpretation and pruning are facilitated in a straightforward fashion by employing fuzzy subsethood in conjunction with a fuzzy interpretation set with specified number of labels. VI. AUGMENTING SuPFuNIS WITH EXPERT KNOWLEDGE Finally, we show that the SuPFuNIS model is also suitable in situations where a small set of numeric data is to be augmented by expert linguistic knowledge. This is demonstrated in case of truck backer-upper control problem. By employing this application example we also show that the model can be applied to a control problem with excellent results. The problem at hand deals with backing up a truck to a loading dock. The truck corresponds to the cab part of the truck in the Nguyen–Widrow neural truck backer-upper system [67]. The truck position is exactly determined by three state variables , , and where is the angle of the truck with the horizontal, and are the coordinates in the space as depicted in Fig. 10. The control of the truck is the steering angle . The truck moves backward by a fixed unit distance every stage. We also assume enough clearance between the truck and the loading dock such that the coordinate does not have to be considered as an input. (For validation of this assumption refer to [68].) We deand sign a control system, whose inputs are

Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 13,2010 at 21:59:28 EST from IEEE Xplore. Restrictions apply.

592

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 3, MAY 2002

Fig. 9. Plots of antecedent and consequent sets for Mackey–Glass time series for 10 rule SuPFuNIS. Numeric values are given in Appendix II.

TABLE IX FUZZY RULES GENERATED WITH THREE LABELS FOR THE MACKEY–GLASS TIME SERIES

TABLE X FUZZY RULES GENERATED WITH FIVE LABELS FOR THE MACKEY–GLASS TIME SERIES

Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 13,2010 at 21:59:28 EST from IEEE Xplore. Restrictions apply.

PAUL AND KUMAR: SUBSETHOOD-PRODUCT FUZZY NEURAL INFERENCE SYSTEM (SuPFuNIS)

Fig. 10.

Diagram of simulated truck and loading zone.

(a) Fig. 11.

593

(b)

Truck trajectories from three testing points (a) using three rules and (b) using five rules obtained from complete set of numeric data.

, and whose output is , such that the . final state will be The following kinematic equations are used to simulate the control system [20]:

TABLE XI DOCKING ERRORS WHEN ONLY NUMERIC DATA IS USED

(64) (65) (66) Trajectory Error where is the length of the truck and is assumed as four for the present simulation. We used a normalized variant of the docking error [which essentially measures the Euclidean distance from the actual ) to the desired final position ( )], final position ( as well as the trajectory error (the ratio of the actual length of the trajectory to the straight line distance from the initial point to the loading dock) as performance measures (derived from [68]): Normalized Docking Error (67)

length of truck trajectory distance initial position, desired final position (68)

A. Simulation Results Using Only Numeric Data The training data (adapted from [20]) comprise 238 pairs ) which are accumulated from 14 sequences of desired ( values. The data was linearly normalized in range [0, 1] and used to train SuPFuNIS for different numbers of rules. The learning rate and momentum were kept as 0.0001 throughout the training period. The number of free parameters for this application are . Three initial states, (3, 30), (10, 220), and 6 (13, 30) were used to test the performance of the controller. The

Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 13,2010 at 21:59:28 EST from IEEE Xplore. Restrictions apply.

594

Fig. 12.

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 3, MAY 2002

Fuzzy sets for linguistic labels of x,  and  .

Fig. 13. Truck trajectories from three test points: (a) using five rules obtained from 42 numeric data (b) using five rules obtained from reduced numeric data and five linguistic rules (c) using five rules obtained from reduced numeric data and nine linguistic rules. TABLE XII THE FUZZY SET LABELS FOR THE MEMBERSHIP FUNCTIONS OF FIG. 12

docking errors for three test points for rules 3 and 5 are reported in Table XI. Results shows that SuPFuNIS is able to perform very well (high docking accuracy) with just five rules. This has to be compared with Kosko and Kong’s fuzzy controller for backing up the truck to the dock that uses 35 linguistic rules [68], and the Wang–Mendel controller [20] that uses 27 rules which are either linguistic or a mixture of linguistic and rules obtained from numeric data. The truck trajectories from three initial states are shown in Fig. 11. B. Simulation Results Using Numeric and Linguistic Data Next we trained SuPFuNIS with 42 data pairs obtained by considering the first three pairs of data from each of the

14 sequences. These pairs train the system for initial path control of the truck. The finer control of the trajectory toward the dock was implemented using nine linguistic rules constructed from expert knowledge [20]. In the present simulation the controller consists of five rules obtained by learning from numeric data, and nine linguistic rules. The nine linguistic rules are derived from [20] by suitably modifying the membership functions to be Gaussian Rule 1: If

is

and

is VE then

is ZE

Rule 2: If

is

and

is LV then

is PM

Rule 3: If

is

and

is RV then

is NM

Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 13,2010 at 21:59:28 EST from IEEE Xplore. Restrictions apply.

PAUL AND KUMAR: SUBSETHOOD-PRODUCT FUZZY NEURAL INFERENCE SYSTEM (SuPFuNIS)

TABLE XIII DOCKING ERRORS FOR 5 NUMERIC RULES (OBTAINED FROM 42 NUMERIC PAIRS) AND NINE LINGUISTIC RULES

Rule 4: If

is

and

is VE then

is PM

Rule 5: If

is

and

is VE then

is NM

Rule 6: If

is

and

is RV then

is NS

Rule 7: If

is

and

is RV then

is NB

Rule 8: If

is

and

is LV then

is PB

Rule 9: If

is

and

is LV then

is PS

The linguistic labels are defined as in Table XII and are represented by fuzzy sets with Gaussian membership functions as shown in Fig. 12. Truck trajectory simulation results for the 14 rule hybrid controller are reported in Table XIII, and truck trajectories are shown in Fig. 13 illustrating the effect of linguistic rules. Clearly, SuPFuNIS is able to successfully generate low error trajectories from each of the initial test points. From Table XIII we observe that the overall average normalized docking errors are lower with the incorporation of expert knowledge, than the case with five rules trained directly on the entire numeric data. In addition we were able to incorporate expert knowledge easily and seamlessly into the network. Once again, notice the economy of the rule base. This kind of economy has been consistently observed in all the applications presented. VII. CONCLUSION In this paper we propose an SuPFuNIS that employs a novel combination of: tunable feature fuzzifiers that convert numeric inputs to Gaussian fuzzy sets; mutual subsethood based activation spread; a fuzzy inner product conjunction operator; and a volume defuzzification technique. SuPFuNIS embeds rule based knowledge directly into its architecture. This facilitates not only easy data-driven cluster-based initialization of the network, but also the read-back of rules from a trained network. The mutual subsethood measures the similarity between a fuzzy input and a fuzzy weight to decide the extent of activation transfer from input nodes to rule nodes. The extent of rule firing is computed by a product operator which lends good discriminatory power to the model. The network generates outputs using a volume defuzzification technique. Gradient

595

descent learning is used to adjust the centers and spreads of various weights of the network, as well as the spreads of the input fuzzifiers. The application potential of SuPFuNIS is demonstrated on various benchmark problems, each bringing out different strengths of the model. In the Mackey–Glass time series prediction problem, high performance is achieved with an economical network architecture. In addition, the significance of a data-driven FCM cluster based weight initialization is justified by simulation results. The Iris data classification problem highlights the network economy further: SuPFuNIS achieves a zero resubstitution error with a mere five rules. In the hepatitis medical diagnosis problem we demonstrate the ease with which both numeric and linguistic input features can be seamlessly integrated by the network to achieve a high performance. This application also highlights the performance of SuPFuNIS against data set variations. SuPFuNIS also compares well with other models on a function approximation application. Finally, in the truck-backer upper control problem we not only show the capability of SuPFuNIS to deal with control problems, but also demonstrate how expert rule-based knowledge can be easily integrated into networks that are trained on partial numeric data. The paper compares the performance of the model with various other classical and soft-computing techniques. The mutual subsethood measure used for activation spread in the network also provides a natural measure for identifying the minimal number of rules that can help characterize the knowledge embedded in a trained network. This makes the interpretation of embedded knowledge quite straightforward. This is demonstrated for the Mackey–Glass time series prediction problem. We reiterate that the major strengths of the model are its consistently high performance on a wide variety of applications, its economy of parameters, fast learning, ease of integration of expert knowledge and transparency of fine tuned knowledge. However, the model suffers various drawbacks. These include the use of a heuristic approach to select the number of rule nodes to solve a particular problem. Also, in the present version of the model rule formats that use disjunctions of conjunctive antecedents cannot be accommodated. These limitations are presently being investigated and the network is currently being extended to include genetic algorithm based evolvable SuPFuNIS. This will be reported as part of future work.

APPENDIX I INITIALIZATION OF RULE BASE One of the methods to extract initial knowledge from the training data set is to cluster the data using a clustering technique [11], [16], [39], [40]. Cluster-based initialization has been known to improve the rate of learning as well as the performance of the model. The number of clusters decides the number of rules. If the clustering is done in the input–output cross space then the centroids and boundaries of clusters can then be employed to initialize values of the centers and spreads of fuzzy

Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 13,2010 at 21:59:28 EST from IEEE Xplore. Restrictions apply.

596

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 3, MAY 2002

weights that fan in and out of a rule node. In this paper we employ the fuzzy -means (FCM) clustering algorithm [69] in conjunction the Xie–Beni index to cluster the given data and choose the best cluster, respectively. FCM Clustering: In clustering using FCM the following obis minimized jective function (69)

represent the training data, represents the cluster centroids, is a matrix, where denotes the th and is the membership of data point in column of are given by cluster . and where

(70)

(71) and is a constant with a value 2. where The approximate solution of (69) is obtained by an alternating optimization (AO) approach in which is initialized and is calculated from (71). The new value of obtained is used to calculate from (70). and are alternately updated and the process is iterated till a termination criterion is met. Xie–Beni Cluster Validity Index: Given the number of clusters desired, an issue that arises is how do we ensure that the cluster we are using for initialization is good enough. Since FCM itself depends on initial values of the cluster centers ( ), we employ a cluster validity index to identify the optimal clusters from those obtained using different initial values. The Xie–Beni index [70] is one such cluster validity measure. A cluster structure having the least value of this index is considered the best cluster, and is used to initialize the free parameters (centers and spreads) of SuPFuNIS. The Xie–Beni , is computed as follows: index

(72)

Since the best cluster is obtained for the least value of this index, it implies that for an optimal cluster the numerator of (72) (which indicates the compactness of clusters) should be as small as possible, and the denominator of (72) (which suggests the separability between clusters), should be large. The attempts to search compact and separable clusters. The way initialization is performed is: 1) run FCM on a set of randomly chosen centroids; 2) compute the Xie–Beni index for the resulting cluster;

3) repeat 1) and 2) a large number of times (say 1000); 4) select the cluster with the minimum Xie–Beni index. In this way, FCM coupled with cluster validity index provides a convenient and somewhat optimal initialization of the centers of the fuzzy weights of SuPFuNIS. Projection of Cluster Centers and Boundaries: Given a set of clusters of training data, we next derive the initial rule base. We do so in two steps. First, since the FCMprocedure outlined above yields cluster centroids, these are used to directly initialize the centers of Gaussian membership functions defined on the input and output universes of discourse. Second, to derive the spreads of each of the sets we use the covariance matrices of individual clusters since a covariance matrix of a data clusters defines an ellipsoidal patch centered at the centroid [21, ch. 5], which in algorithm. An ellipour case is obtained from the FCMsoid corresponding to the th cluster is defined by (73) where is a positive real number, and is the center of the th is the covariance matrix of the th cluster. ellipsoid, and is computed from the data points that belong to the cluster as follows: (74) is the standard expectation operator. In the present where is computed by averaging over data points that are case members of the th cluster. For ease of calculation, the ellipsoids described by (74) are inscribed in rectangles [21] which are projected onto the axes of the input–output space to derive the fuzzy sets. Although this would mean that some correlation information would get lost we believe that this might not present a very serious problem since we are subsequently going to fine tune each of the spreads during training. of the th rectangle onto the th diThe projected length mension is defined as (75) are the eigenvalues of and is the angle bewhere tween the th eigenvector and the th dimension for the th ellipsoid. A triangular fuzzy set can thus be generated having unity height and as base length. The area of this triangular fuzzy set . The area of the Gaussian fuzzy sets used in SuPFuNIS is where is the spread. Therefore, if we consider a trianis gular fuzzy set and a Gaussian fuzzy set having equal areas, the resulting spread is (76) Thus, given a cluster centroid , the center and spreads of all weights that fan in and out of the th rule node are initialized. Note that the overlap of the projected fuzzy sets proportional to . In the simulations presented in the text we used a value of

Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 13,2010 at 21:59:28 EST from IEEE Xplore. Restrictions apply.

PAUL AND KUMAR: SUBSETHOOD-PRODUCT FUZZY NEURAL INFERENCE SYSTEM (SuPFuNIS)

THE FINAL VALUES OF FUZZY WEIGHTS w

AND

v

FOR

597

TABLE XIV MACKEY–GLASS TIME SERIES APPLICATION USING 10 RULES PLOTTED IN FIG. 9

TABLE XV THE FINAL VALUES OF TRAINED FEATURE SPREAD (x ) OF INPUT FUZZIFIER FOR MACKEY–GLASS TIME SERIES APPLICATION FOR THE CASE OF TEN RULES AND 500 EPOCHS

APPENDIX II ACKNOWLEDGMENT The authors wish to thank the Editor, Associate Editor, and referees for their detailed comments, suggestions and encouragement that helped strengthen this paper and mould it into the present form.

TABLE XVI THE FINAL VALUES OF FUZZY WEIGHTS w IRIS DATA CLASSIFICATION

AND

v

FOR

TABLE XVII THE FINAL VALUES OF TRAINED FEATURE SPREAD (x ) INPUT FUZZIFIER FOR IRIS DATA CLASSIFICATION

OF

since we found that this value gives an overlap ranging from 30–45% for cluster counts ranging from three to ten, for Mackey–Glass time series data. Note that this middle-of-theground value is chosen solely for the purpose of initialization. Supervised learning will finally tune the Gaussian set spreads. In a similar fashion, the value of for a data set in general, can be selected in a way that ensures an overlap of approximately 30–40% averaged over all input features. This computation is straightforward and is facilitated by the subsethood measure.

REFERENCES [1] C. T. Lin and C. S. G. Lee, Neural Fuzzy Systems: A Neuro-Fuzzy Synergism to Intelligent Systems. Upper Saddle River, NJ: Prentice-Hall, 1996. [2] S. Pal and S. Mitra, Neuro-Fuzzy Pattern Recognition: Methods in Soft Computing: Wiley, 1999. [3] J. Buckley and T. Feuring, “Fuzzy and neural: Interactions and applications,” in Studies in Fuzziness and Soft Computing. Heidelberg, Germany: Physica-Verlag, 1999. [4] J. C. Bezdek, J. Keller, R. Krishnapuram, and N. R. Pal, Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Boston, MA: Kluwer, 1999. [5] S. Mitra and Y. Hayashi, “Neuro-fuzzy rule generation: Survey in soft computing framework,” IEEE Trans. Neural Networks, vol. 11, pp. 748–768, May 2000. [6] H. Takagi and I. Hayashi, “Artificial neural network driven fuzzy reasoning,” Int. J. Approximate Reasoning, vol. 5, pp. 191–212, 1991. [7] C. T. Lin and C. S. G. Lee, “Neural-network-based fuzzy logic control and decision system,” IEEE Trans. Comput., vol. 40, pp. 1320–1336, Dec. 1991. [8] H. Berenji and P. Khedkar, “Learning and tuning fuzzy logic controllers through reinforcements,” IEEE Trans. Neural Networks, vol. 3, pp. 724–740, 1992. [9] J.-S. R. Jang, “ANFIS: Adaptive-network-based fuzzy inference system,” IEEE Trans. Syst., Man, Cybern., vol. 23, pp. 665–685, May 1993. [10] S. Mitra and S. Pal, “Fuzzy multi-layer perceptron, inferencing and rule generation,” IEEE Trans. Neural Networks, vol. 6, pp. 51–63, Jan. 1995. [11] J. Chen and Y. Xi, “Nonlinear system modeling by competitive learning and adaptive fuzzy inference system,” IEEE Trans. Syst., Man, Cybern., vol. 28, pp. 231–238, May 1998. [12] C. Juang and C. Lin, “An on-line self-constructing neural fuzzy inference network and its applications,” IEEE Trans. Fuzzy Syst., vol. 6, pp. 12–32, 1998. [13] J. Kim and N. Kasabov, “HyFIS: Adaptive neuro-fuzzy inference systems and their application to nonlinear dynamical systems,” Neural Networks, vol. 12, no. 9, pp. 1301–1321, 1999. [14] D. Nauck and R. Kruse, “Obtaining interpretable fuzzy classification rules from data,” Artificial Intell. Med., vol. 16, no. 2, pp. 149–169, 1999. [15] A. Wu and P. K. S. Tam, “A fuzzy neural network based on fuzzy hierarchy error approach,” IEEE Trans. Fuzzy Syst., vol. 8, pp. 808–816, Dec. 2000. [16] D. Nauck and R. Kruse, “A neuro-fuzzy method to learn fuzzy classification rules from data,” Fuzzy Sets Syst., vol. 89, pp. 277–288, 1997.

Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 13,2010 at 21:59:28 EST from IEEE Xplore. Restrictions apply.

598

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 3, MAY 2002

[17] L. Cai and H. Kwan, “Fuzzy classifications using fuzzy inference networks,” IEEE Trans. Syst., Man, Cybern. B, vol. 28, pp. 334–347, June 1998. [18] H. Ishibuchi, K. Nozaki, N. Yamamoto, and H. Tanaka, “Selecting fuzzy if–then rules for classification problems using genetic algorithms,” IEEE Trans. Fuzzy Syst., vol. 3, pp. 260–270, Aug. 1995. [19] S. Mitra and S. Pal, “Logical operation based MLP for classification and rule generation,” Neural Networks, vol. 7, no. 2, pp. 353–373, 1994. [20] L. X. Wang and J. M. Mendel, “Generating fuzzy rules from numerical data, with applications,” Univ. Southern California, Los Angeles, Tech. Rep. 169, USC SIPI, Jan. 1991. [21] B. Kosko, Fuzzy Engineering. Englewood Cliffs, NJ: Prentice-Hall, 1997. [22] D. Nauck and R. Kruse, “A neuro-fuzzy approach to obtain interpretable fuzzy systems for function approximation,” in Proc. IEEE Int. Conf. Fuzzy Syst. (FUZZ-IEEE’98), May 1998, pp. 1106–1111. [23] C.-H. Wang, T.-P. Hong, and S.-S. Tseng, “Integrating fuzzy knowledge by genetic algorithms,” IEEE Trans. Evol. Comput., vol. 2, pp. 138–148, Nov. 1998. [24] C. Chao, Y. Chen, and C. Teng, “Simplification of fuzzy-neural systems using similarity analysis,” IEEE Trans. Syst., Man, Cybern. B, vol. 26, pp. 344–354, Apr. 1996. [25] N. R. Pal and T. Pal, “On rule pruning using fuzzy neural networks,” Fuzzy Sets Syst., vol. 106, pp. 335–347, 1999. [26] Y. Jin, “Fuzzy modeling of high-dimensional systems: Complexity reduction and interpretability improvement,” IEEE Trans. Fuzzy Syst., vol. 8, pp. 212–221, Apr. 2000. [27] S. Pal and S. Mitra, “Multilayer perceptron, fuzzy sets, and classification,” IEEE Trans. Neural Networks, vol. 3, pp. 683–697, Sept. 1992. [28] S. Mitra, R. K. De, and S. K. Pal, “Knowledge-based fuzzy MLP for classification and rule generation,” IEEE Trans. Neural Networks, vol. 8, pp. 1338–1350, Nov. 1997. [29] H. Ishibuchi, “Neural network that learn from fuzzy if then rules,” IEEE Trans. Fuzzy Syst., vol. 1, pp. 85–97, May 1993. [30] Y. Hayashi, J. J. Buckley, and E. Czogala, “Fuzzy neural network with fuzzy signals and weights,” Int. J. Intell. Syst., vol. 8, no. 4, pp. 527–537, 1993. [31] H. Ishibuchi and Y. Hayashi, “A learning algorithm of fuzzy neural networks with triangular fuzzy weights,” Fuzzy Sets Syst., vol. 71, pp. 277–293, 1995. [32] T. Feuring, J. Buckley, and Y. Hayashi, “A gradient descent learning algorithm for fuzzy neural networks,” in Proc. IEEE Int. Conf. Fuzzy Syst. FUZZ-IEEE’98, Anchorage, AK, May 1998, pp. 1136–1141. [33] L. Wang and J. Yen, “Extracting fuzzy rules for system modeling using hybrid of genetic and Kalman filter,” Fuzzy Sets Syst., vol. 101, pp. 353–362, 1999. [34] N. Kasabov, Neuro-Fuzzy Techniques for Intelligent Information Processing, ser. Studies in Fuzziness and Soft Computing. Heidelberg, Germany: Physica-Verlag, 1999, vol. 30. [35] N. Kasabov and B. Woodford, “Rule insertion and rule extraction from evolving fuzzy neural networks: Algorithms and applications for building adaptive, intelligent expert systems,” in Proc. IEEE Int. Conf. Fuzzy Syst. FUZZ-IEEE’99, vol. 3, Seoul, Korea, Aug. 1999, pp. 1406–1411. [36] M. Russo, “FuGeNeSys—A fuzzy genetic neural system for fuzzy modeling,” IEEE Trans. Fuzzy Syst., vol. 6, pp. 373–388, Aug. 1998. , “Genetic fuzzy learning,” IEEE Trans. Evol. Comput., vol. 4, pp. [37] 259–273, Sept. 2000. [38] L. M. Fu, “Learning capacity and sample complexity on expert networks,” IEEE Trans. Neural Networks, vol. 7, pp. 1517–1520, 1996. [39] J. Leski and E. Czogala, “A new artificial neural network based fuzzy inference system with moving consequents in if–then rules and selected applications,” Fuzzy Sets Syst., vol. 108, pp. 289–297, 1999. [40] N. R. Pal, K. Pal, and J. C. Bezdek, “Some issues in system identification using clustering,” in Proc. IEEE Int. Conf. on Neural Networks. Pisctaway, NJ, 1997, pp. 2524–2529. [41] Y. Lin, G. A. Cunningham, III, and S. V. Coggeshall, “Using fuzzy partition to create fuzzy systems from input–output data and set the initial weights in a fuzzy neural network,” IEEE Trans. Fuzzy Syst., vol. 5, pp. 614–621, Nov. 1997. [42] J.-L. Chen and J. Y. Chang, “Fuzzy perceptron neural networks for classifiers with numerical data and linguistic rules as inputs,” IEEE Trans. Fuzzy Syst., vol. 8, pp. 730–745, Dec. 2000.

[43] D. Nauck and R. Kruse, “NEFCON-I: An X-window based simulator for neural fuzzy controllers,” in Proc. IEEE Int. Conf. Neural Networks, Orlando, FL, June 1994, pp. 1638–1643. [44] S. Mitra and S. Pal, “Fuzzy self organization, inferencing and rule generation,” IEEE Trans. Syst. Man, Cybern., vol. 26, pp. 608–620, 1996. [45] Y. Jin, W. von Seelen, and B. Sendhoff, “On generating F C fuzzy rule systems from data using evolution strategies,” IEEE Trans. Syst., Man, Cybern. B, vol. 29, pp. 829–845, Dec. 1999. [46] F. Klawonn and R. Kruse, “Constructing a fuzzy controller from data,” Fuzzy Sets Syst., vol. 85, pp. 177–193, 1997. [47] S. Paul and S. Kumar, “Rule based neuro-fuzzy linguistic networks for inference and function approximation,” in Knowledge Based Computer Systems, M. Sasikumar, D. D. Rao, P. R. Prakash, and S. Ramani, Eds. Mumbai, India: NCST, Dec. 1998, pp. 287–298. , “Adaptive rule-based linguistic networks for function approxima[48] tion,” in Advances in Pattern Recognition and Digital Techniques, N. R. Pal, A. K. De, and J. Das, Eds. Calcutta, India: Narosa, Dec. 1999, pp. 246–250. , “Subsethood based adaptive linguistic networks for pattern clas[49] sification,” IEEE Trans. Syst., Man, Cybern. C, 2002, to be published. [50] M. Mackey and L. Glass, “Oscillation and chaos in physiological control systems,” Science, vol. 197, pp. 287–289, 1977. [51] L.-X. Wang and J. M. Mendel, “Generating fuzzy rules by learning from examples,” IEEE Trans. Syst., Man, Cybern., vol. 22, pp. 1414–1427, Nov./Dec. 1992. [52] D. Kim and C. Kim, “Forecasting time series with genetic fuzzy predictor ensemble,” IEEE Trans. Fuzzy Syst., vol. 5, pp. 523–535, Nov. 1997. [53] S. Wu and M. J. Er, “Dynamic fuzzy neural networks—A novel approach to function approximation,” IEEE Trans. Syst., Man, Cybern. B, vol. 30, pp. 358–364, Apr. 2000. [54] N. Kasabov and Q. Song, “Dynamic evolving fuzzy neural networks with ‘m-out-of-n’ activation nodes for on-line adaptive systems,” Dep. Inform. Sci., Univ. Otago, Dunedin, New Zealand, Tech. Rep. TR99-04, 1999. [55] X. Yao and Y. Lin, “A new evolutionary system for evolving artificial neural networks,” IEEE Trans. Neural Networks, vol. 8, pp. 694–713, May 1997. [56] R. A. Fisher, “The use of multiple measurements in taxonomic problems,” Ann. Eugenics, vol. 7, no. 2, pp. 179–188, 1936. [57] N. B. Karayiannis, J. C. Bezdek, N. R. Pal, R. J. Hathaway, and P.-I. Pai, “Repairs to GLVQ: A new family of competitive learning schemes,” IEEE Trans. Neural Networks, vol. 7, pp. 1062–1071, Sept. 1996. [58] L. I. Kuncheva and J. C. Bezdek, “Nearest prototype classification: Clustering, genetic algorithms, or random search?,” IEEE Trans. Syst., Man, Cybern. C, vol. 28, pp. 160–164, Feb. 1998. [59] N. Kasabov, “Learning fuzzy rules and approximate reasoning in fuzzy neural networks and hybrid systems,” Fuzzy Sets Syst., vol. 82, pp. 135–149, 1996. [60] S. Halgamuge and M. Glesner, “Neural networks in designing fuzzy systems for real world applications,” Fuzzy Sets Syst., vol. 65, pp. 1–12, 1994. [61] C. Bishop, Neural Networks for Pattern Recognition. Oxford, U.K.: Clarendon, 1995. [62] W. Duch, R. Adamezak, and K. Grabezewski, “A new methodology of extraction, optimization and application of crisp and fuzzy logical rules,” IEEE Trans. Neural Networks, vol. 12, pp. 277–306, Mar. 2001. [63] P. Clark and T. Niblett, “The CN2 induction algorithm,” Machine Learning, vol. 3, pp. 261–283, 1989. [64] G. Cestnik, I. Konenenko, and I. Bratko, “Assistant-86: A knowledge elicitation tool for sophisticated users,” in Machine Learning, Bratko and Lavrac, Eds. South Bound Brook, NJ: Sigma, 1987, pp. 31–45. [65] H. Narazaki and A. Ralescu, “An improved synthesis method for multilayered neural networks using qualitative knowledge,” IEEE Trans. Fuzzy Syst., vol. 1, pp. 125–137, May 1993. [66] Y. Lin and G. A. Cunningham, III, “A new approach to fuzzy-neural system modeling,” IEEE Trans. Fuzzy Syst., vol. 3, pp. 190–198, May 1995. [67] D. Nguyen and B. Widrow, “The truck backer-upper: An example of self-learning in neural network,” IEEE Contr. Syst. Mag., vol. 10, pp. 18–23, 1990.

Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 13,2010 at 21:59:28 EST from IEEE Xplore. Restrictions apply.

PAUL AND KUMAR: SUBSETHOOD-PRODUCT FUZZY NEURAL INFERENCE SYSTEM (SuPFuNIS)

[68] S.-G. Kong and B. Kosko, “Adaptive fuzzy systems for backing up a truck-and-trailer,” IEEE Trans. Neural Networks, vol. 3, pp. 211–223, Mar. 1992. [69] J. C. Bezdek, Pattern Recognition With Fuzzy Objective Function Algorithms. New York: Plenum, 1981. [70] X. Xie and G. Beni, “Validity measure for fuzzy clustering,” IEEE Trans. Pattern Anal. Machine Learning, vol. 3, pp. 841–846, Aug. 1991.

Sandeep Paul received the B.Sc. degree in electrical engineering from the Aligarh Muslim University, Aligarh, India, in 1992 and the M.Tech. degree in the engineering systems from the Dayalbagh Educational Institute, Agra, India, in December 1994. He is currently pursuing the Ph.D. degree in fuzzy-neural systems from Dayalbagh Educational Institute. He is presently working as a Lecturer in the Department of Electrical Engineering, D.E.I. Technical College, Dayalbagh Educational Institute, Dayalbagh, Agra. His current interests include hybrid fuzzy neural systems and their applications to pattern recognition, classification, and function approximation. Mr. Paul was recipient of the Director’s Medal for achieving the highest marks in the M.Tech. course.

599

Satish Kumar (M’87) received the B.Sc. degree in electrical engineering from the Dayalbagh Educational Institute, Dayalbagh, Agra, India, in 1985, the M.Tech. degree in the integrated electronics and circuits from the Indian Institute of Technology, Delhi, in December 1986, and the Ph.D. degree in physics and computer science from the Dayalbagh Educational Institute, in 1992. In the course of his doctoral studies, he worked on structured models for software engineering, system dynamics, and neural networks. He was Senior Research Assistant at the Center for Applied Research in Electronic, Indian Institute of Technology, Delhi, where he was involved in research on CAD applications from January to July 1987. He then joined the Department of Physics and Computer Science at Dayalbagh Educational Institute as a lecturer, where he has been teaching ever since. He is presently a Reader in computer science and applications in the department. His current research interests are in the area of fuzzy-neural systems, evolvable systems, theoretical aspects of neural networks, and pulsed neuron models.

Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 13,2010 at 21:59:28 EST from IEEE Xplore. Restrictions apply.

Suggest Documents