Genetically Evolved Fuzzy Rule-Based Classifiers ... - Semantic Scholar

4 downloads 26690 Views 598KB Size Report
and Application to Automotive Classification. TeckWee Chua and .... rule-based classifier (FRBC) and section 3 explains the training procedure with. GA.
Genetically Evolved Fuzzy Rule-Based Classifiers and Application to Automotive Classification TeckWee Chua and WoeiWan Tan Department of Electrical and Computer Engineering, National University of Singapore, 4, Engineering Drive 3, Singapore 117576 {cteckwee,wwtan}@nus.edu.sg

Abstract. Type-2 fuzzy logic systems (FLSs) have been treated as a magic black box which can better handle uncertainties due to the footprint of uncertainty (FOU). Although the results in control applications are promising, the advantages of type-2 framework in fuzzy pattern classification is still unclear due to different forms of outputs produced by both systems. This paper aims at investigating if type-2 fuzzy classifier can deliver a better performance when there exists imprecise decision boundary caused by improper feature extraction method. Genetic Algorithm (GA) is used to tune the fuzzy classifiers under Pittsburgh scheme. The proposed fuzzy classifiers have been successfully applied to an automotive application whereby the classifier needs to detect the presence of human in a vehicle. Results reveal that type-2 classifier has the edge over type-1 classifier when the decision boundaries are imprecise and the fuzzy classifier itself has not enough degrees of freedom to construct a suitable boundary. Conversely, when decision boundaries are clear, the advantage of type-2 framework may not be significant anymore. In any case, the performance of a type-2 fuzzy classifier is at least comparable with a type-1 fuzzy classifier. When dealing with real world classification problem where the uncertainty is usually difficult to be estimated, type-2 fuzzy classifier can be a more rational choice. Key words: Type-2 Fuzzy, Fuzzy Rule-Based Classifier, Genetic-Fuzzy.

1

Introduction

The idea of incorporating type-2 fuzzy sets into a FLS framework stems from the need to model uncertainty in the description of antecedents and consequents in the system rule base. The type-2 FLS had the problem that it was envisaged as a computability expensive system due to the computational overhead associated with type-reduction and the use of the iterative Karnik-Mendel procedure [1]. Therefore, an important question arises: is it worth making use of type-2 FLS instead of type-1 FLS at the cost of complexity? Starczewski [2] shows that under certain conditions the output of both type-2 and type-1 FLSs are equivalent which might invalidate the type-2 approach in the majority of real application tasks. However, in most other circumstances the output differences

between both FLSs still exist. The subtle output differences might be critical in some applications such as precision control engineering. Type-2 FLSs have been applied successfully in control engineering field where type-2 fuzzy logic controllers (FLCs) are known to deliver better performances in face of uncertainties and imprecisions. Although a considerable amount of effort has gone into type-2 fuzzy controller in the past, less research has been performed on the application of type-2 fuzzy in pattern classification. In [3], type-2 FLCs with non-hierarchical and hierarchical architecture was applied to the classification of battlefield ground vehicles. The input to the system is a set of acoustic features. The input is inherently noisy due to the variation of the vehicle traveling speed, along with the environmental variations (e.g., wind and terrain). To further model the input uncertainties, the input is modeled as an interval type-2 fuzzy set whose membership function (MF) is a Gaussian function that is centered at the measured value but with an uncertain standard deviation. Given the noisy acoustic inputs, it was observed that interval type-2 fuzzy rule-based classifier (FRBC) only gives marginally improvements over type-1 FRBC. The authors have raised a few important questions with regards of fuzzy pattern classification. These include in what way are type-2 FRBCs considered outperform T1 counterpart (e.g., in terms of the classification error rate, or generalizability or robustness), and how much uncertainty must be present in a problem so that it is worthy of trading the complexity of type-2 FRBC for better performance. The performance of a classifier is heavily related to questionable choices that the designer of the classifier makes based in his/her insights into that problem [3]. One of them is the feature extraction method. An effective set of features can ease the design of classifier tremendously, especially for fuzzy classifier whereby the number of rules will increase exponentially with the increase of feature dimensions [4]. In contrast, if the feature selection is not optimum then the classification performance will be degraded. Unfortunately, it is difficult, and generally an open problem, to select an optimum set of features for different applications. Designer with experience may incorporate his/her knowledge about the classification problem. For example, by knowing that the ECG amplitude is a better feature to differentiate between ventricular tachycardia and ventricular fibrillation the designer can use this feature to improve the classifier performance [5]. On the other hand, designer with statistical background may try to use statistical tool such as Principal Component Analysis (PCA) or Linear Discriminant Analysis (LDA) to select a compact set of projected features. Therefore, one source of uncertainty in pattern classification is the ambiguity in feature selection. The objective of this paper is to investigate if type-2 FRBC can better handle the uncertainty associated with feature selection. The motivation is to investigate if the extra degrees of freedom provided by FOU may enable a type-2 FRBC to outperform type-1 counterpart. In the worst case scenario, if all FOUs of a type-2 FRBC disappear, then type-2 FRBC is immediately reduced to type-1 FRBC and there is no difference between the final outputs from both classifiers. This paper is organized as follows. Section 2 outlines the interval type-2 fuzzy

rule-based classifier (FRBC) and section 3 explains the training procedure with GA. The automobile classification problem and feature extraction method are explained in section 4 and sub-section 4.1 respectively. The experimental results are presented in sub-section 4.2 and finally section 5 offers concluding remarks.

2

Interval Type-2 Fuzzy Rule-Based Classifiers

Fig. 1. Structure of type-2 classifier.

This section introduces the interval type-2 FRBC. Fig. 1 shows the general structure of the proposed type-2 fuzzy rule-based classifier. There are six components in the architecture. The rule-base consists of M rules where each rule relates the domain X1 × · · · × Xp ⊆ Rp to the range Y ∈ R and can be expressed as the following intuitive IF-THEN statement: ej and · · · xp is A ej , THEN y is C j Rj : IF x1 is A p

1

ej A k

j

where R denotes the jth rule, is an interval type-2 antecedent set associated with the kth input variable xk (k = 1, . . . , p), and C j represents the consequent set associated with the output variable y. The role of the fuzzifier in a fuzzy system is to map each of the element, x0k , in the input vector x0 = (x01 , . . . , x0p )T e 0 . This process provides a natural framework for handling into the fuzzy set X k uncertain input information. There is a variety of methods for performing fuzzification. The most common approach is singleton fuzzification, which maps a crisp input into the following MF: ½ 1 x = x0 µXe 0 (x) = 0 x 6= x0 for ∀x ∈ X. Next, the inference engine component computes the firing strengths e 0 match the anfor each rule which expresses how well the fuzzified input X e0 . For type-2 FRBC, the inference engine produces two firing strengths tecedents A for each rule, the lower and upper firing strengths of the jth rule, f j (x0 ) and f¯j (x0 ), are computed as: Yp (1) sup[µXe 0 (xk ), µAej (xk )] f j (x0 ) = k=1 xk

f¯j (x0 ) =

Yp

k

k

¯Aej (xk )] sup[µXe 0 (xk ), µ

k=1 xk

k

k

(2)

where sup[.] denotes supremum operation [1]. Before the final crisp output can be obtained, the output of the inference engine and the consequent must be e j are interval processed. In a more general case where the consequent fuzzy sets C type-2 sets, the type-reduced set Ycos can be computed with center-of-sets type reduction: Z Z Z Ycos = [yl , yr ] = ··· y 1 ∈[yl1 ,yr1 ]

Z

···

y M ∈[ylM ,yrM ]

f M ∈[f M ,f¯M ]

, PM

1

j=1

PM

f 1 ∈[f 1 ,f¯1 ]

f j yj

j=1

fj

(3)

e j , which can be obtained from where [ylj , yrj ] denotes to the centroid of the set C various methods defined in [1]. However, the consequent fuzzy sets in our classification problem correspond to the class labels and are represented by crisp number (singleton), the center-of-sets type-reduction above is simplified to height type-reduction by simply setting ylj = yrj . The type-reduced set which is an interval output, [yl (x0 ), yr (x0 )] can be obtained via Karnik-Mendel iterative algorithm [6]. The type-reduced set is then defuzzified to the crisp output, y by simply taking the average of yl and yr , i.e.: y(x0 ) =

yl (x0 ) + yr (x0 ) 2

(4)

Finally, the decision maker will determine the class label by comparing the crisp output against the threshold. Since in this work type-2 FRBC will be compared against type-1 FRBC, it is appropriate to briefly highlight the differences between both classifiers. The structure of a type-1 FRBC is similar to a type-2 FRBC except for a few aspects. Firstly, the inference engine will produce a firing strength, f j for jth rule rather than an interval value. Secondly, the type-reducer does not exist since no type2 number is involved. In other words, the output processing only consists of defuzzification. For height defuzzification, the crisp output, y can be computed as: PT j j j=1 y f 0 y(x ) = PT (5) j j=1 f

3

Training of Fuzzy Classifier

The tuning of fuzzy classifiers with GAs have been pioneered by ValenzuelaRend´ on [7] where GAs are used to select fuzzy rules. Since then, GAs have been successfully applied in the tuning of FRBCs. One of the most interesting problems in GAs is coding the solution space. In this work, the Pittsburgh approach is used where each chromosome encodes the whole rule base, and the best chromosome at the end of the evolution determines the winning FRBC. The

classification accuracy of the classifier with the respective rule base is used as the fitness function. The chromosome encoding is straightforward. A complete chromosome that represents a type-2 FRBC rule base has five parts as shown in Fig. 2. The first part encodes the upper membership function (UMF) parameters. A Gaussian MF needs two parameters–mean and standard deviation while a triangular MF needs three parameters–left, apex and right points. Assuming that global fuzzy rules (non rule-specific MFs) are adopted and each feature is partitioned into q fuzzy sets, then the number of genes in this part is equal to p× q × z where z is the number of parameters required for a MF (i.e., 2 for Gaussian MF and 3 for triangular MF). Next, the second part represents the consequent labels, L ∈ {1, . . . , K} where K denotes the total number of classes. The length of this part is equal to the number of rules, M = q p . The third part defines the ratio of lower membership function (LMF) height to UMF height which is in the range of (0, 1]. This part has p × q genes. The next part comprises of genes with length of M which characterises the rule flags to control whether a rule should be ignored (flag = ‘0’) or included in the rule base (flag = ‘1’). This is known as rule pruning method. Later in section 4.2, chromosome which represents full rule base fuzzy classifier will have non-evolvable flags in which every flag is preset to ‘1’. Finally, the last part represents a set of thresholds to divide the crisp outputs into discrete classes. It has length of (K − 1). For example, a two-class problem requires a threshold at interval [1, 2] while a three-class problem requires two thresholds: one at interval [1, 2] and another at interval (2, 3]. As such, a complete type-2 FRBC chromosome has (2M +pq(1+z)+k −1) genes. All genes are binary coded in the current framework. For parts “MF parameters” and “threshold” where continuous values are required in the phenotype space, each gene from these parts is encoded as a 8-bit string. During the fitness evaluation, the parameters are decoded into real numbers using linear mapping equation as shown below: Ai g = Gmin + (Gmax − Gmin )× 8 (6) i i i 2 −1 where g denotes the actual value of the ith parameter, Ai denotes the integer represented by a 8-bit string gene, Gmax and Gmin denote the user defined upper i i and lower limits of the gene respectively. For the remaining parts of the chromosomes, the binary encoded genes are directly decoded into integer numbers. The selection method is tournament with size of two. The elitist strategy is used to ensure that the best chromosome (the one with the highest fitness) always survives in the next generation. According to [8], simple GA models should be tried first and be dismissed only if they do not provide a satisfactory result. Thus, the genetic operators in this work– bitwise flipping mutation and single-point crossover, are kept as simple as possible and yet achieving good solutions. The mutation rate is kept relatively low (0.03) while the crossover rate is set moderately high (0.8) to keep a good balance between exploration and exploitation and thus avoiding premature convergence. In section 4, the population size is set at 50 and the maximum number of generations is fixed at 200. The optimization process stops if there is no improvement in the fitness functions of the past 30 generations.

Fig. 2. The structure of a chromosome.

4

Application to Ford Automotive Dataset

The proposed fuzzy rule-based classifiers have been applied to Ford automotive dataset [9]. In this real world application, the classifier needs to detect the presence of a human in a vehicle. One possible scenario would be that when a driver returns to his or her vehicle at night, particularly in a deserted location, the knowledge that no one is hiding inside the vehicle can provide peace of mind. Raw analog signals were collected from a vibration sensor which is located at the vehicle’s suspension system. The signals are then filtered by a low pass filter (LPF) and converted into digital signals [10]. Each diagnostic session has 500 sample points. The length of the sequences reflects the time available for making the classification decision. Presumably, the task would be easier if the sequence length were increased, but this would violate the requirements of the application. The beginning of the sampling process is not aligned with any external circumstance or any aspect of the observed pattern. The training data (3306 samples) were collected under typical operating condition with minimum noise but the testing data (810 samples) were collected under noisy conditions such as wind disturbances.

4.1

Feature Extraction Method

2500

70 Empty Occupied

1500

Amplitude

1000 500 0 −500 −1000 −1500 −2000

Empty Occupied

60 Power Density / Frequency (dB/rad/s)

2000

50 40 30 20 10 0 −10 −20

0

100

200

Samples

(a)

300

400

500

−30

0

0.2

0.4 0.6 Normalized Frequency (rad/s)

0.8

(b)

Fig. 3. (a) Vibration signals , (b) average periodogram of the training samples.

1

The problem does not appear to have a simple solution that emerges from visual inspection of these data sequences as shown in Fig. 3(a). A signal processing tool such as periodogram may be useful to reveal any interesting features. In this application, the periodogram are computed with 512-point FFT and triangle window. Thus, the periodogram is a coefficient vectors with length of 257. The average periodograms of the training samples are shown in Fig. 3(b). The figures show that the discriminative features are mostly located at the low frequency regions. If each of the periodogram coefficient is regarded as an input dimension, then the total number of feature dimensions is 257 which is impractical for most classifiers. Therefore, it is necessary to reduce the feature dimensions to lower dimensions. PCA is one of the most popular feature dimensionality reduction techniques. This technique searches for directions in the data that have largest variance and subsequently project the data onto it. However, it is completely unsupervised, knows only about variance, but nothing about different classes of data. In light of this, LDA may reveal class structure better. This technique maximizes the ratio of between-class variance to the within-class variance in any particular data set thereby guaranteeing maximal separability. Fig. 4 and 5 show the two-dimensional scatter plots where the features are extracted with PCA and LDA respectively. It is clear that the variance within classes is smaller and the variance between classes is bigger in LDA projection. The feature space produced by LDA is more linearly separable while PCA gives less optimal separation between two classes especially on the noisy test data. As a result, the data produced by PCA require a more sophisticated classifier in order to handle the blurred decision boundary. In contrast, LDA projected data impose a less stringent requirement on the classifiers. Thus, it would be interesting to investigate if a more advanced classifier like type-2 FRBC can perform better than type-1 FRBC when ambiguity in the feature extraction process becomes a source of uncertainty.

60

60 Occupied Empty

40

20

0

0

−20

−20

−40

−40

x2

x2

20

−60

−60

−80

−80

−100

−100

−120 −140 −100

Occupied Empty

40

−120 −50

0

50

x

1

(a)

100

−140 −100

−50

0 x

50

1

(b)

Fig. 4. 2-D scatter plots of PCA projected (a) train data, (b) test data.

100

40

40 Occupied Empty

30

30

20

20

10

10

x2

x2

Occupied Empty

0

0

−10

−10

−20 −40

−30

−20

−10

0 x1

10

20

30

(a)

40

−20 −40

−30

−20

−10

0 x1

10

20

30

40

(b)

Fig. 5. 2-D scatter plots of LDA projected (a) train data, (b) test data.

4.2

Performance Evaluation

Four FRBCs are proposed to examine if type-2 FRBC can outperform type-1 counterpart when the decision boundary is imprecise. They are type-1 FRBCs with full and pruned rule base (T1-FRBC(F), T1-FRBC(P)) and type-2 FRBCs with full and pruned rule base (T2-FRBC(F), T2-FRBC(P)) also. Each kind of classifier consists of 10 different designs in which they are evolved with GAs separately. The performance metrics used in the evaluation are the accuracy of the classifier and the false positive rate (FPR). False positive occurs when the classifier reports that the vehicle is occupied when no one is actually in it. In particular, the systems can be sensitive to false positives in windy conditions. Table 1 shows that FRBCs with pruned rule base generally perform much better. This is consistent with the findings in [11] whereby it was demonstrated that the decision boundary of a winner-takes-all based T1-FRBC with complete rule base is rectangle or hyperrectangle. Although the proposed T1-FRBC(F) and T2-FRBC(F) in this paper will not have completely rectangular boundaries, the boundaries near the edge of the feature space is still parallel to the feature axes. Conversely, at the middle region the decision boundary can be non-linear due to the rules aggregation effect of defuzzification and type-reduction. As such, the proposed FRBCs with complete rule base may not classify the PCA projected data (see Fig. 4(a)) very well near the edges because these regions require boundary that is not in-parallel with the feature axes. This shows the limitation of the full rules FRBCs themselves. On the other hand, FRBC with pruned rule base does not suffer from this issue. In addition, the results show that T1-FRBC(F) has the worst performance whereas T2-FRBC(P) is clearly the winner. It is interesting to see how well the T2-FRBC(F) fares against the FRBCs with pruned rule base although it has the aforementioned limitation. In fact, it achieves the highest test accuracy and the lowest FPR. From Table 2,

the results show an overall improvement on both classification accuracy and false positive rate. This is not surprising as the LDA does a better job than PCA in partitioning the data from both classes. Fig. 5(a) indicates that the data can be classified well with boundary which is in-parallel with feature axes. Thus, all FRBCs regardless of full or pruned rule base are free from limitations and both type-1 and type-2 FRBCs can perform just equally well. Table 1. Average and Standard Deviation of Classification Accuracy and False Positive Rate Across 10 Iterations with PCA Based Feature Extraction. Classifier

Dataset

Average ACC (%)

Average FPR (%)

Mean

Stdv

Mean

Stdv

T1-FRBC(F)

Train Test

90.90 72.27

1.83 3.06

6.87 25.02

2.94 5.10

T1-FRBC(P)

Train Test

92.63 73.99

0.70 1.43

4.85 23.66

1.23 2.47

T2-FRBC(F)

Train Test

92.08 74.19

0.64 1.16

5.44 22.75

1.09 1.69

T2-FRBC(P)

Train Test

92.96 74.14

0.47 1.28

4.97 23.98

1.22 2.68

Table 2. Average and Standard Deviation of Classification Accuracy and False Positive Rate Across 10 Iterations with LDA Based Feature Extraction. Classifier

Dataset

Average ACC (%)

Average FPR (%)

Mean

Stdv

Mean

Stdv

T1-FRBC(F)

Train Test

95.49 81.04

0.00 0.06

3.10 7.14

0.00 0.17

T1-FRBC(P)

Train Test

95.47 81.05

0.05 0.47

3.27 7.56

0.41 0.88

T2-FRBC(F)

Train Test

95.50 81.01

0.02 0.05

3.09 7.21

0.02 0.13

T2-FRBC(P)

Train Test

95.48 81.25

0.04 0.49

3.35 7.50

0.57 0.85

5

Conclusion

In this paper, genetically evolved FRBCs are used to analyse if type-2 framework can help in the case of poor feature selection through its FOU. The observations

from the automotive application above show that type-2 FRBC may excel when the decision boundary is imprecise and the fuzzy classifier itself does not have enough degrees of freedom to construct a suitable boundary. When the classification task becomes easier or the FRBCs are given enough degrees of freedom then the advantage of type-2 framework may no longer be significant anymore. This probably explains why in [3] both type-1 and type-2 FRBCs have close performances. Nevertheless, using a type-2 FRBC in real world classification problems can be a better choice than type-1 FRBC since the amount of uncertainty in real problem most of time is difficult to estimate. As a rule of thumb, type-2 FRBC performance is at least comparable, if not better than type-1 FRBC. Hence, for applications where the computation speed is not a major consideration, type-2 framework should be adopted.

References 1. Mendel J.: Uncertain Rule-Based Fuzzy Logic Systems: Introduction and New Directions. Prentice Hall, Upper Saddle River, NJ (2001) 2. Starczewski, J.: What Differs Interval Type-2 FLS from Type-1 FLS? In: Artificial Intelligence and Soft Computing - ICAISC 2004. LNAI, vol. 3070, pp. 381–387. Springer, Heidelberg (2004) 3. Wu, H., Mendel, J.: Classification of Battlefield Ground Vehicles Using Acoustic Features and Fuzzy Logic Rule-Based Classifiers. IEEE Trans on Fuzzy Systems, 15(1), 56–72 (2007) 4. Ravi, V., Reddy, P.J., Zimmermann H.-J.: Pattern Classification with Principal Component Analysis and Fuzzy Rule Bases. European Journal of Operational Research, 126, 526–533 (2000) 5. Chua T.W., Tan W.W.: GA Optimisation of Non-Singleton Fuzzy Logic System for ECG Classification. In: Proceeding of IEEE Congress of Evolutionary Computation, pp. 1677-1684 (2007) 6. Mendel, J., Hagras, H., and John, R.I., Standard Background Material About Interval Type-2 Fuzzy Logic Systems That Can Be Used By All Authors. IEEE Computational Intelligence Society on, M.: The Fuzzy Classifier System. Motivations and First Results. 7. Valenzuela-Rend´ LNCS, vol. 496, pp. 338–342 (1991) 8. Kuncheva, L.I.: Fuzzy Classifier Design. Physica-Verlag, New York (2000) 9. Ford Classification Challenge, http://home.comcast.net/~nn_classification/ 10. Eagen et al.: System and Method for Detecting Presence of a Human In a Vehicle. U.S. Patent No. 7,353,088 B2 (2008) 11. Ishibuchi, H., Nakashima, T.: Effect of Rule Weights in Fuzzy Rule-Based Classification Systems. IEEE Trans on Fuzzy Systems, 9(4), 506–515 (2001)

Suggest Documents