AISC 337 - An Efficient Framework for Building Fuzzy ... - Springer Link

56 downloads 6751 Views 1MB Size Report
algorithm for high-dimensional data with large datasets. So by using the ... Association Rule Mining on Crime Pattern Mining, in which an analyst can analyze.
An Efficient Framework for Building Fuzzy Associative Classifier Using High-Dimensional Dataset S. Naresh1, M. Vijaya Bharathi2, and Sireesha Rodda3 1,2

GMR Institute of Technology, Rajam, Srikakulam, A.P., India 3 GITAM University, Visakhapatnam, A.P., India 1 PG –M.Tech, Dept. of CSE [email protected]

Abstract. Association Rule Mining (ARM) with reference to fuzzy logic is used to further data mining tasks for classification and clustering. Traditional Fuzzy ARM algorithms have failed to mine rules from high-dimensional data efficiently, since those are meant to deal with relatively much less number of attributes or dimensions. Fuzzy ARM with high-dimensional data is a challenging problem to be addressed. This paper uses a quick and economical Fuzzy ARM algorithm FAR-HD, which processes frequent item sets using a two-phased multiple-partition approach especially for large high-dimensional datasets. The proposed algorithm is an extension to the FAR-HD process in which it improves the accuracy in terms of associative soft category labels by building a framework for fuzzy associative classifier to leverage the functionality of fuzzy association rules. Fuzzy ARM represent latent and dominant patterns in the given dataset, such a classifier is anticipated to supply superb accuracy particularly in terms of fuzzy support. Keywords: Fuzzy Associative Classifier, Fuzzy Clustering, High Dimensional Data, SURF Vectors, Fuzzy Association Rule Mining.

1

Introduction

Association Rule Mining extracts frequent patterns within the sort of any dataset, the obtained patterns are generated based on their frequencies. Using association rule mining to process the high-dimensional dataset will not work efficiently and definitely there is a loss of information while getting the results and which are also not accurate, by using fuzzy logic can able to scale back the loss of information called sharp boundary problem. Fuzzy ARM has been extensively employed in relational/transactional [3] dataset with less to medium number of attributes or dimensions. Rules can be derived from high-dimensional datasets like the domain of images in order to train fuzzy associative classifier. The traditional approaches are failed to mine association rules in high-dimensional dataset [17], [18] but FAR-HD [1] [2] given better performance when compare to the traditional fuzzy association rule mining algorithms. FAR-HD specially designed © Springer International Publishing Switzerland 2015 S.C. Satapathy et al. (eds.), Emerging ICT for Bridging the Future – Volume 1, Advances in Intelligent Systems and Computing 337, DOI: 10.1007/978-3-319-13728-5_72

641

642

S. Naresh, M. Vijaya Bharathi, and S. Rodda

algorithm for high-dimensional data with large datasets. So by using the existing efficient approach the implementation of classifiers with association rules and fuzzy logic delivers better accurate results.

2

Related Work

Fuzzy Association Rule Mining (ARM) based classification gives good results instead of taking directly a trained dataset. The new techniques of fuzzy ARM provides good performance with increase in datasets like image. Mangalampalli and Pudi[1][2] proposed a Fuzzy ARM algorithm specifically for large high-dimensional datasets. FAR-HD [1] and FAR-Miner [2] has been compared experimentally with Fuzzy Apriori and FAR-Miner is to work efficiently on large datasets, both the approaches are faster than fuzzy Apriori. FAR-HD provides better performance with comparison of FAR-Miner. F-ARMOR [4] is another approach which also gives fast and efficient performance on large datasets but not on high dimensional datasets. Rajendran and Madheswaran[19] proposed a Novel Fuzzy Association Rule Image Mining Algorithm for Medical Decision Support System. Novel Fuzzy Association Rule Mining (NFARM) gives the diagnosis keywords to physicians for making a better diagnosis system. Usha and Rameshkumar[20] provides a complete survey on application of Association Rule Mining on Crime Pattern Mining, in which an analyst can analyze the criminal tactics based on their behavior like crime reports, arrest reports and other statistics by using association rules. Wei Wang and Jiong Yang explained, with the rapid growth of computational biology and e-commerce applications, high-dimensional data [21] become very common. The emergence of various new application domains, such as bioinformatics and e-commerce, underscores the need for analyzing high dimensional data. They both presented a few techniques for analyzing high dimensional data, e.g., frequent pattern mining, clustering, and classification. Many existing works have shown through experimental results that fuzzification of association rules provide better performance.

3

Fuzzy Preprocessing

This preprocessing approach consists of two phases, the generation of numerical vectors from the input dataset is comes under the first phase and therefore the second phase is the process of conversion of numerical feature vectors in to a fuzzy clusters representation. Fuzzy preprocessing [1], [2] is the process of converting feature vectors in the form of SURF vectors. FAR-HD uses the concept of SURF vectors and Fuzzy C-Means clustering algorithm is to search out the fuzzy association rules with efficiency.

An Efficient Framework for Building Fuzzy Associative Classifier

3.1

643

SURF (Speeded-Up Robust Features)

Here input dataset is images, the features of the images in the form of SURF values. SURF [4] is an advanced image processing approach to find the matching points between the two similar scenes or objects of images and is part of many computer vision applications. Each SURF vector consists of 64 dimensions like size and resolution. The applications of SURF are used in Bio-informatics like Face, Palm, Finger, Iris, Knuckle recognition. SURF is 3-times better faster than SIFT(Scale Invariant Feature Transform).First it will take each image and found interesting points in the image and by using Fast-Hessian Matrix approach it will generate the eigen vectors called matching points between two similar scenes of images. The matching is often done based on a distance between the feature vectors. The SURF values are then applied to the FCM algorithm to find the fuzzy clusters. 3.2

Fuzzy C-Means Clustering (FCM)

Fuzzy C-Means (FCM) [8] is an algorithm for fuzzy clustering [9] which allows one piece of data belong to two or more clusters, it is an extension of K-Means algorithm [13], [14]. In the 70’s mathematicians introduced spatial term FCM algorithm to improve the accuracy of fuzzy clustering under noise. Initially in the year 1973 Dunn developed FCM algorithm and then improved by Bezdek in 1981, especially used in pattern recognition and the minimization of the objective function of FCM is as follows in (eq.1). It’s a very important tool for processing the image in clustering objects from image. Identifying exactly the number of clusters is a critical task because each feature vector belongs to each cluster to some degree instead of whole belonging to just single cluster. It reduces the lexical ambiguity (polysemy) and semantic relation (synonymy) that happens in crisp clustering. ||

− ||

,

1≤

B) =

∑ ∈ ( ∩ | |

Confidence (A => B) =

5

)( )

∑ ∈ ( ∩ )( ) ∑ ∈ ( )( )

(2) (3)

Fuzzy Associative Classifier Using High-Dimensional Data

The major difference between a non- fuzzy classifier and fuzzy classifier is, instead of assigning a class label to each pattern, a soft class label with degree of membership in each class is attached to each pattern. Fuzzy Associative Classifier [5], [12] uses Fuzzy Association Rules as input and generates class labels based on the measures such as

An Efficient Framework for Building Fuzzy Associative Classifier

645

Support and Confidence the class labels are generated and then derive Associative soft class labels. Associative soft class labels can be derived by using previous rules called fuzzy association rules with respect to a fuzzy support value, such value provides an accurate result when compare to traditional classifier approaches. 5.1

Applications of FAC Framework

By extending this framework of Fuzzy Associative Classifier [15], [19] can be used in the face, iris, knuckle, finger, palm authentication with a simple fuzzy value and also in the weather prediction, temperature prediction, and disease prediction in a faster way with a fuzzy accurate support value. Fuzzy Association Classifier with Fuzzy Support [15] value gives a better accurate result by using a fast and efficient algorithm called FAR-HD with respect to this proposed algorithm called Fuzzy Associative Classifier. 5.2

Proposed Algorithm

The algorithm searches for the rules from the images using an advanced image processing concept SURF to achieve feature vectors and by applying fuzzy clustering called fuzzy c means algorithm to represent the feature vectors in the form of fuzzy clusters then by applying the efficient process of two phases of algorithms, [1] the fuzzy rules can be derived. The output rules which is having confidence in between 70-100% comes under one group of class and calculate the support of the range of rules and assign it as fuzzy support and similarly for other ranges between 40-69 % as one group and 0 to 39% as another group of class. Thus experts prefer to work with fuzzified data (i.e. linguistic variables) rather than with the exact numbers. It is in a way, a quantization of the numerical attributes into categories directly understandable by the experts. Hence closing the gap between the data analyst and the domain expert with an accurate value has been done. Thus the classifier is able to produce accurate classifications from high dimensional data (image), can be used to make well informed decisions. $OJRULWKP )X]]\ $VVRFLDWLYH &ODVVLILHU IRU +LJK  'LPHQVLRQDO'DWDVHW ,QSXWV,PDJH'DWDVHW ' VXSS  2XWSXWV&ODVVLILHUV &  ,QLWLDOL]H),WRGHQRWH)UHTXHQW,WHPVHWV  ), *HW), 'VXSS  )RUHDFKLWHPLWLQ),  *HQHUDWHDVVRFLDWLRQUXOHDQGDGGWR5 (1')25  )RUHDFKUXOHULQ5 ,IFRQILGHQFHLVLQUDQJHWKHQ DVVLJQFODVVODEHO&  DVVLJQVXSS HOVHLIFRQILGHQFHLVLQUDQJHWKHQ  DVVLJQFODVVODEHO& DVVLJQVXSS  HOVHDVVLJQFODVVODEHO& DVVLJQVXSS (1')25 5(7851&

3URFHGXUH*HW), ,PDJH'DWDVHWVXSS  ,QLWLDOL]H), ([WUDFW6XUI9DOXHV 6  )RUHDFKXQLTXHLWHPVLQ6 &RPSXWHVXSSRUWVXSS ,I VXSS! VXSS  $GGLWHPWR), (1',) (1')25 UHWXUQ),

646

6

S. Naresh, M. Vijaya Bharathi, and S. Rodda

Results

The following are the experimental results of the framework of fuzzy associative classifier [18] which was implemented using java platform and are represented as below figures. The accurate results of fuzzy associative classifier are shown in figure: 8 by comparing with the fuzzy association rules with class labels in figure: 7. Figure: 1 represents the GUI for input images and applying SURF to calculate interest points of each image as shown in figure: 2. Figure: 3 displays the matching points between the two images as graphical lines, figure: 4 represents the matching points between the images as values and then click fuzzy clustering button it will asks for a file name with extension .num to save the clusters in it, as shown in the figure: 5. The final process is to load that clustered file for the association rules it will ask for the measures of support in the interval [0.0-1.0] and confidence as 0-100 % , it will process the frequent itemsets as shown in the section 5.1 and the results of the itemsets are shown in Figure: 6. Figure: 7 display all the rules with its respective confidence and class labels. Finally the proposed algorithm applicable here and generates simple class labels with a fuzzy support value (refer section 4.2). Thus figure: 8 represent the soft class labels with accurate fuzzy value using association rules.

Fig. 1. GUI - Input of two similar scenes of images

An Efficient Framework for Building Fuzzy Associative Classifier

Fig. 2. Interesting points of each image

Fig. 3. GUI-Matching points between the input images

Fig. 4. SURF matching vectors in the text box

647

648

S. Naresh, M. Vijaya Bharathi, and S. Rodda

Fig. 5. Fuzzy Clustering for SURF vectors

Fig. 6. Fuzzy Measures and Frequent Itemsets

Fig. 7. Rules with confidence and class labels

An Efficient Framework for Building Fuzzy Associative Classifier

649

Table 1. Fuzzy Association Rules with Class Labels

6.1

Rules

Support

Class Label

R1

1.0

C1

R2

1.0

C1

R3

0.9

C1

R4

0.8

C1

R5

0.72

C1

R6

0.75

C1

R7

0.66

C2

R8

0.58

C2

Measures

Table 1 represents the sample rules having support with class labels, with respect to this table the measures of confidence and support can be calculated by using Eq. 2 and 3. Actual results are shown in fig.7; the table is to understand the calculation part of support and confidence. 6.1.1 Sample Calculation Confidence for C1= (1.0+1.0+0.8+0.9+0.72+0.75+0.6+0.58) =81.41

(1.0+1.0+0.8+0.9+0.72+0.75)/

Confidence for C2= (0.6+0.58)/ (1.0+1.0+0.8+0.9+0.72+0.75+0.6+0.58) =18.6 Support for C1= (1.0+1.0+0.8+0.9+0.72+0.75)/8=0.65 Support for C2= (0.6+0.58)/ 8=0.15

Fig. 8. Fuzzy Associative Classifier with soft labels

650

S. Naresh, M. Vijaya Bharathi, and S. Rodda

Fig. 9. Membership Graph

7

Conclusions and Future Work

In this paper association rule mining was studied for very large high-dimensional data in the image domain. The framework by name “Fuzzy Associative Classifier using images” was implemented using Java platform. The algorithm is capable of modeling fuzzy association rules extracted from image dataset. The algorithm is meant for classifying future datasets with the model that has been built. The underlying fuzzy association rules from which the classifier is trained, provide hidden patterns or relationships among attributes of the dataset. The application of this framework uses in Bioinformatics, it helps to identify the authentication with the help of categorical values .Experts prefer to work with fuzzified data (i.e. linguistic variables) rather than with the exact numbers. It is in a way, a quantization of the numerical attributes into categories directly understandable by the experts. Hence closing the gap between the data analyst and the domain expert with an accurate value has been done. Thus the classifier is able to produce accurate classifications from high dimensional data (image), can be used to make well informed decisions. The empirical results are encouraging. As future work, we would like to extend the applications of fuzzy associative classifiers to other domains. Another direction is to explore the classifier with Big Data which is characterized by volume, velocity and variety. The application of this framework for investigating the abnormal behavior of the criminals can also be explored.

References [1] Mangalampalli, A., Pudi, V.: FAR-HD: A Fast And Efficient Algorithm For Mining Fuzzy Association Rule In Large High-Dimensional Datasets. In: 2013 IEEE Fuzzy Systems (FUZZ) (2013)

An Efficient Framework for Building Fuzzy Associative Classifier

651

[2] Mangalampalli, A., Pudi, V.: FAR-miner: a fast and efficient algorithm for fuzzy association rule mining. IJBIDM 7(4), 288–317 (2012) [3] Rajput, D.S., Thakur, R.S., Thakur, G.S.: Fuzzy Association Rule Mining based Frequent Pattern Extraction from Uncertain Data. 2012 IEEE Department of Computer Applications (2012) 978-1-4673-4805-8/12 [4] Mangalampalli, A., Pudi, V.: Fuzzy association rule mining algorithm for fast and efficient performance on very large datasets. In: FUZZ-IEEE, pp. 1163–1168 (2009) [5] Bay, H., Ess, A., Tuytelaars, T., Gool, L.J.V.: Speeded-up robust features (SURF). Computer Vision and Image Understanding 110(3), 346–359 (2008) [6] Thabtah, F.A.: A review of associative classification mining. Knowledge Eng. Review 22(1), 37–65 (2007) [7] Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: SIGMOD Conference, pp. 207–216 (1993) [8] Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB, pp. 487–499 (1994) [9] Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Norwell (1981) [10] Fung, B.C.M., Wang, K., Ester, M.: Hierarchical document clustering using frequent itemsets. In: SDM (2003) [11] Malik, H.H., Kender, J.R.: High quality, efficient hierarchical document clustering using closed interesting itemsets. In: ICDM, pp. 991–996 (2006) [12] Delgado, M., Marn, N., Sánchez, D., Miranda, M.A.V.: Fuzzy association rules: General model and applications. IEEE Transactions on Fuzzy Systems 11, 214–225 (2003) [13] Veloso, A., Meira Jr., W., Zaki, M.J.: Lazy Associative Classification. In: ICDM 2006: International Conference on Data Mining, pp. 645–654. IEEE Computer Society (2006) [14] Fung, B.C.M., Wang, K., Ester, M.: Hierarchical document clustering using frequent itemsets. In: SDM (2003) [15] Zhuang, L., Dai, H.: A maximal frequent itemset approach for web document clustering. In: 2004 IEEE International Conference on Computer and Information Technology, CIT( 2004) 0-7695-2216-5/04 [16] Mangalampalli, A., Chaoji, V., Sanyal, S.: I-FAC: Efficient Fuzzy Associative Classifier for Object Classes in Images. In: 2010 IEEE Fuzzy Systems (FUZZ) (2010) [17] Alcalá-Fdez, J., Alcalá, R., Herrera, F.: A Fuzzy Association Rule-Based Classification Model for High-Dimensional Problems with Genetic Rule Selection and Lateral Tuning. IEEE Transactions on Fuzzy Systems 9(5), 857–872 (2011) [18] Guvena, E., Buczaka, A.L.: An OpenCL Framework for Fuzzy Associative Classification and Its Application to Disease Prediction. Conference by Missouri University 2013Baltimore, MD [19] Rajendran, P., Madheswaran, M.: Novel Fuzzy Association Rule Image Mining Algorithm for Medical Decision Support System. International Journal of Computer Applications (0975 - 8887) 1(20) (2010) [20] Usha, D., Rameshkumar, K.: A Complete Survey on application of Frequent Pattern Mining and Association Rule Mining on Crime Pattern Mining. International Journal of Advances in Computer Science and Technology 3(4) (April 2014) [21] Wang, W., Yang, J.: Mining High-Dimensional Data. In: Data Mining and Knowledge Discovery Handbook, pp. 793–799 (2005)

Suggest Documents