Mining Patterns with Attribute Oriented Induction

4 downloads 0 Views 437KB Size Report
[6] Cheung, D.W., Hwang, H.Y., Fu, A.W. and Han, J. 2000. Efficient rule-based attribute-oriented induction for data mining. Journal of Intelligent Information ...
Proceedings of the International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Jakarta, Indonesia 2015

Mining Patterns with Attribute Oriented Induction Spits Warnars Database, Datawarehouse & Data Mining Research Center, Surya University Jl. Boulevard Gading Serpong Blok O/1, Tangerang 15810, Indonesia [email protected]

ABSTRACT Mining data in human activity life such as business, education, engineering, health and so on, is important and help human itself in order to justify their decision making process. Attribute Oriented Induction (AOI) has been using to mine significant different patterns since was coined in 1989, has been combined and as complement with other data mining pattern. AOI has been proved and powerful, has future opportunity to be explored in order to help human life to find data patterns. AOI is chosen since can reduce many patterns by summarize/roll up many patterns in low into high level in concept tree/hierarchy. However, non summarize pattern at low level in concept tree/hierarchy can be used to sharpen the mining knowledge pattern just as like roll up and drill down in data warehouse. Mapping implementation of AOI in human life area such as business, education, engineering, health and so on, is useful in order to give valuable knowledge AOI mining pattern, particularly for those who interest with AOI data mining technique as data mining technique which can summarize many pattern into simple patterns.

KEYWORDS Data Mining, Attribute Oriented Induction, AOI, pattern, rule.

1.

INTRODUCTION

Attribute Oriented Induction (AOI) method was first proposed in 1989 integrates a machine learning paradigm especially learning-fromexamples techniques with database operations, extracts generalized rules from an interesting set of data and discovers high level data regularities [39]. AOI provides an efficient and

ISBN: 978-1 -941968-20-8 ©2015 SDIWC

effective mechanism for discovering various kinds of knowledge rules from datasets or databases. AOI approach is developed for learning different kinds of knowledge rules such as characteristic rules, discrimination rules, classification rules, data evolution regularities [1], association rules and cluster description rules[2]. 1) Characteristic rule is an assertion which characterizes the concepts which satisfied by all of the data stored in database. This rule provide generalized concepts about a property which can help people recognize the common features of the data in a class. For example the symptom of the specific disease [9]. 2) Discriminant rule is an assertion which discriminates the concepts of one (target) class from another (constrasting). This rule give a discriminant criterion which can be used to predict the class membership of of new data,for example to distinguish one disease from the other [9]. 3) Classification rule is a set of rules which classifies the set of relevant data according to one or more specific attributes. For example, classifying diseases into classes and provide the symptoms of each [40]. 4) Association rule is association relationships among the set of relevant data. For example, discovering a set of symptoms frequently occurring together[12,35]. 5) Data evolution regularities rule is a general evolution behaviour of a set of the relevant data (valid only in time-related/temporal data). For example, describing the major factors that influence the fluctuations of 11

Proceedings of the International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Jakarta, Indonesia 2015

stock values through time [3,37]. Data evolution regularities can then be classified into characteristic rule and discrimination rule[3]. 6) Cluster description rule is used to cluster data according to data semantics [12], for example clustering the university student based on different attribute(s). 2. QUANTITATIVE AND QUALITATIVE RULES IN AOI Rules in AOI can be represented with quantitative and qualitative rules: 1) Quantitative rule is a rule which is associated with quantitative information such as statistical information which asses the representativeness of the rule in the database [1]. There are three types quantitative rule i.e. quantitative characteristic rule, quantitative discriminative rule and quantitative characteristic and discriminative rule. a. Quantitative characteristic rule is quantitative information of a characteristic rule and each rule in final generalization can be measured with tweight in formula 1. t-weight =Votes(qa)/ (1) where :  t-weight = percentage of each rule in the final generalized relation.  Votes(qa) = number of tuples in each rule in the final generalized relation Where Votes(qa) is in Votes{q1,...,qN}.  N = number of rules in the final generalized relation. Quantitative characteristic rule is represented with symbol  and should be in the form of: V(x)=target_class(x)condition1(x) [t:w1] V...V conditionn(x)[t:wn] Where :

ISBN: 978-1 -941968-20-8 ©2015 SDIWC

 x is the target class between 1..n.  n is the number of rules in the final generalized relation.  [t:w1] is t-weight (formula 1) for rule 1 until  [t:wn] as t-weight (formula 1) for rule n. Example: V(x) = graduate(x)(Birthplace(x) Є Canada Λ GPA(x) Є excellent) [t:75%] V (Major(x) Є science Λ Birthplace(x) Є Foreign Λ GPA(x) Є good) [t:25%] b. Quantitative discriminative rule is a discrimination rule that use quantitative information. Each rule in the target class will be discriminated against a rule in the constrating class and is measured with d-weight in formula 2. d-weight =Votes(qa ϵ Cj) / (2) where :  d-weight = percentage ratio per rule in the target class to the total number of tuples in the target class and the contrasting class for the same rule.  Votes(qa) = number of tuples in each rule in the target class Cj.  Cj is in {C1,...,CK}.  K = total number of the target and constrating classess for the same rule. Quantitative discriminative rule is shown with symbol  and should be in the form of: V(x)=target_class(x) condition1(x) [d:w1] V...V conditionn(x)[d:wn] Where:  x is the target class between 1..n.  n is the number of rules in the target class.  [d:w1] is d-weight (formula 2) for rule 1 in the target class.

12

Proceedings of the International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Jakarta, Indonesia 2015

 [t:wn] is d-weight (formula 2) for rule n of target class. Example: V(x) = graduate(x)  (Birthplace(x) ЄForeign Λ GPA(x) Є good) [d:100%] V (Major(x) Є social Λ GPA(x) Є good) [d:25%] c. Quantitative characteristic and discriminative rule use quantitative information characteristic rule and discriminative rule which have both tweight and d-weight for the same rules. Each rule is measured with t-weight in formula 1 for characteristic rule and dweight in formula 2 for discriminative rule. Quantitative characteristic and discriminative rule is shown with symbol  and should be in the form of: V(x)=target_class(x)  condition1(x)[t: w1,d:w1] V...V conditionn(x)[t:wn,d:wn] Where:  x is target class between 1..n.  n is number of rules in target class.  [t: w1] is t-weight in formula 1.  [d: w1] is d-weight in formula 2. Example: V(x) = professor(x) (Birthplace(x) ЄForeign Λ GPA(x) Єgood) [t:20%,d:100%] V (Major(x) Є social Λ GPA(x) Є good) [t:10%,d:25%] 2) Qualitative rule can be obtained by using the same process of learning applied in its quantitative counterpart without the association of the quantitative attribute in the generalized relations [1]. Qualitative characteristic rule uses symbol  and qualitative discriminative rule uses symbol. Qualitative rule either characteristic or discriminative rules should be in the form of: V(x)=target_class(x) [|] condition1(x) V...V conditionn(x) Example:

ISBN: 978-1 -941968-20-8 ©2015 SDIWC

V(x) = graduate(x)  (Birthplace(x) ЄCanada Λ GPA(x) Є excellent) V (Major(x) Є science Λ Birthplace(x) Є Foreign Λ GPA(x) Є good) 3. Concept Hierarchies One advantage of AOI is that it has concept hierarchy as the background knowledge which can be provided by the knowledge engineers or domain experts [2,3,4]. Concept hierarchy stored a relation in the database provides essential background knowledge for data generalization and multiple level data mining. Concept hierarchy represents a taxonomy of concept of the attribute domain values. Concept hierarchy can be specified based on the relationship among database attributes or by set groupings and be stored in the form of relations in the same database [7]. Concept hierarchy can be adjusted dynamically based on the distribution of the set of data relevant to the data mining tasks. The hierarchies for numerical attributes can be constructed automatically based on data distribution analysis [7]. Concept hierarchy for numeric will be treated differently for the sake of efficiency [20,21,22,23,26]. For example if there are a range of value between 0 and 1.99, then there will be 199 values start from 0.00 until 1.99, but for efficiency there will be only 1 record created with 3 fields rather than with 200 records with 2 fields. In concept hierarchy concepts are ordered by levels from specific or low level concepts into general or higher level. Generalization is achieved by ascending to the next higher level concepts along the paths of the concept hierarchy. The most general concept is the null description as the most specific concepts correspond to the specific values of the attributes in the database which described as ANY. Concept hierarchy can be balanced or unbalanced, where unbalanced hierarchy then must be converted to a balanced hierarchy. Figure 1 shows the concept hierarchy tree for attribute workclass in adult dataset [18] which 13

Proceedings of the International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Jakarta, Indonesia 2015

has three levels. The first level as the low level has 8 concepts and they are without-pay, neverworked, private, self-emp-not-inc, self-emp-inc, federal-gov,state-gov and local-gov concepts. The second level has 5 concepts and they are charity, unemployed, entrepreneur, centre and territory concepts. The third level as the high level has 2 concepts and they are non government and government concepts. For example, the concept of non government at the high level has 3 sub concepts in the second level: charity, unemployed and entrepreneur concepts. The concept entrepreneur at the second level has 3 sub concepts in the low level: private, self-emp-not-inc and self-empinc concepts.

Figure 1. A concept hierarchy tree for attribute workclass in adult dataset[18]

Concept hierarchy in figure 1 can be represented with:  Charity Without-pay  Unemployed Never-worked {Private, self-emp-not-inc,  entrepreneur self-emp-inc}  Centre {federal-gov,state-gov}  Territory Local-gov {Charity,Unemployed,  Non government entrepreneur}  Government {Centre, Territory} {Non government,  ANY(workclass) Government}

ISBN: 978-1 -941968-20-8 ©2015 SDIWC

Where symbol  indicates generalization, for example Without-pay  Charity indicates that Charity concept is a generalization of Withoutpay concept. There are four types of concept generalization in the concept hierarchy [6]: 1) Unconditional concept generalization: rule is associated with the unconditional IS-A type rules. A concept is generalized to a higher level concept because of the subsumption relationship indicated in the concept hierarchy. 2) Conditional/deductive rule generalization: rule is associated with a generalization path as a deduction rule where the type of rules is conditional and can only be applied to generalize a concept if the corresponding condition can be satisfied. For example, form: A(x) Λ B(x)  C(x) has the meaning that for a tuple x, the concept(attribute value) A can be generalized to concept C if condition B can be satisfied by x. Or concept C can be generalized if it can be satisfied by concept A and B. 3) Computational rule generalization: each rule is represented by a condition which is value-based and can be evaluated against an attribute or a tuple or the database by performing some computations. The true value of the condition would then determine whether a concept can be generalized via the path. 4) Hybrid rule-based concept generalization: a hierarchy can have paths associated with all the above 3 different types of rules. It has a powerful representation capability and is suitable for many kinds of application. Rules number 2-4 is three types of rule based concept hierarchy [5,34] while rule number 1 is a non rule based concept hierarchy. A rule-based concept hierarchy is a concept hierarchy whose paths have associated generalization rules. In the rule-based induction, data cube (hypercube) in multidimensional datawarehouse is the favourable data structure [6]. To perform a rulebased induction on the data in a large 14

Proceedings of the International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Jakarta, Indonesia 2015

warehouses, the path relation algorithm is an excellent choice because datawarehouse has already structured as cube/hypercube [6]. Rulebased concept has induction anomaly problem which affects the efficiency which is caused by: 1) A rule may depend on an attribute which has been removed. 2) A rule may depend on an attribute whose concept level in the prime relation has been generalized too high to match the condition of the rule. 3) A rule may depend on a condition which can only be evaluated against the initial relation, e.g. the number-of-tuples in the relation. There are three ways to solve the induction anomaly problem [6]: 1) Reapplying the deduction rules all over again on the initial relation which are costly and wasteful. 2) Repetitive generalization required by rollup and drill-down which can be done in an efficient way without induction anomaly problem. 3) Propose the use of path relation (the last method backtracking algorithm [5,6] 4. AOI prototype The AOI method was implemented in a data mining system prototype called DBMINER [5,7,17,28,29] which previously called DBLearn and been tested successfully against large relational database. DBLearn [24,25,27,38] is a prototype data mining system which was developed in Simon Fraser University. DBMINER was developed by integrating database, OLAP and data mining technologies [17,36] has following features: 1) Incorporating several data mining techniques like attribute oriented induction, statistical analysis, progressive deepening for mining multiple-level rules and metarule guided knowledge mining [7] data cube and OLAP technology [17]. 2) Mining new kinds of rules from large databases including multiple level ISBN: 978-1 -941968-20-8 ©2015 SDIWC

3)

4) 5) 6)

7)

association rules, classification rules, cluster description rules and prediction. Automatic generation of numeric hierarchies and refinement of concept hierarchies. High level SQL-like and graphical data mining interfaces. Client server architecture and performance improvements for larger application. SQL-like data mining query language DMQL and Graphical user interfaces have been enhanced for interactive knowledge mining. Perform roll-up and drill-down at multiple concept levels with multiple dimensional data cubes.

5. AOI algorithms AOI can be implemented with an architecture design shown in figure 2 where characteristic rule (LCHR) and classification rule (LCLR) can be learned directly from the transactional database (OLTP) or Data warehouse (OLAP) [6,8] with the help of the concept hierarchy as the knowledge generalization. Concept hierarchy can be created from OLTP database as a direct resource.

Figure 2. AOI architecture

From a database we can identify two types of learnings: 1) Positive learning as the target class where the data are tuples in the database which are consistent with the learning concepts.

15

Proceedings of the International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Jakarta, Indonesia 2015

Positive learning/target class will be built when learn characteristic rule 2) Negative learning as the contrasting class in which the data do not belong to the target class. negative learning/contrasting class will be built when learn discrimination or classification rule. Characteristic rule has been used by AOI in order to recognize, learning and mining as a specific character for each of attribute as their specific mining characterization. Characteristic rule process the generalization with help of concept hierarchy as the standard saving background knowledge to find target class as a positive learning. Mining rule can not be limited with just only one rule, as the more rules can be created the more mining can be done. This has been proven as an intelligent system which can help human to make a system that has ability to think like a human [3]. Rules often can be discovered by generalization in several possible directions [9]. Relational database as resources for data mining with AOI can be read with data manipulation language select sql statement [13,14,15,16]. Using a query for building rules gives an efficient mechanism for understanding the mined rules [11,12]. In the current AOI, a query is processed with SQL-like data mining query language DMQL at the beginning of the process. It collects the relevant sets of data by processing a transformed relational query, generalizes the data by AOI and then presents the outputs in different forms [7]. AOI generalizes and reduces the prime relation further until the final relation can satisfy the user expectation based on the set threshold. One or two thresholds can be applied, where one threshold is used to control both of number of distinct attributes and tuples in the generalization process, whilst two thresholds are used to control the number of distinct attributes and tuples in the generalization process. Threshold as a control for the maximum number of tuples of the target class in the final generalized relation can be replaced with group ISBN: 978-1 -941968-20-8 ©2015 SDIWC

by operator in sql select statement which will limit the final result of generalization. Setting different threshold will generate different generalized tuples as the needed of global picture of induction repeatedly as timeconsuming and tedious work [10]. All interesting generalized tuples as multiple rule can be generated as the global picture of induction by using group by operator or distinct function in the sql select statement. AOI can perform datawarehouse techniques by doing generalization process repetitively in order to generate rules at different concepts levels in a concept hierarchy, enabling the user to find the most suitable discovery levels and rules. This technique performs roll up (progressive generalization [6]) or drill down (progressive specialization [6]) and operation [2,7] have been recognized as datawarehouse techniques. Finding the most suitable discovery levels and rules would add multidimensional views to a database using generalization process repetitively at different concepts level. Building a logical formula as the representation of a final result of AOI can not be done with select sql statement and not select sql statement. However, the sql statement can be matched with other applications like Java, Visual Basic, programming server program like ASP, JSP or PHP. The data resulted from the sql statement can be used to create a logical formula using one of those application softwares. There are 8 strategy steps that must be done [3] in the process of generalization. Here is step one to seven which is for characteristic rule and step one to eight are for classification/discriminant rule. 1) Generalization on the smallest decomposable components, generalization should be performed on the smallest decomposable components of a data relation. 2) Attribute removal, if there is a large set of distinct values for an attribute but there is no higher level concept provided for the attribute, the attribute should be removed during generalization. 16

Proceedings of the International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Jakarta, Indonesia 2015

3) Concept tree Ascension, if there exists a higher level concept in the concept hierarchy for an attribute value of a tuple, the substitution of the value by its higher level concept would generalize the tuples. 4) Vote propagation, the value of the vote is the value of accumulated tuples where the vote will be accumulated when merging identical tuples in the generalization. 5) Threshold control on each attribute, if the number of distinct values in a resulting relationthe is larger than the specified threshold value, further generalization on this attribute should be performed. 6) Threshold control on generalized relations, if the number of tuples is larger than the specified threshold value, further generalization will be done based on the selected attributes and the merging of the identical tuples should be performed. 7) Rule transformation, change final generalization to quantitative rule and qualitative rule from a tuple (conjunctive) or multiple tuples (disjunctive). 8) Handling overlapping tuples, if there are overlapping tuples in both target and constrasting classes, these tuples should be marked and eliminated from the final generalized relation. AOI characteristic rule algorithm [3] is given as follow: For each of attribute Ai (1  i  n, where n= # of attributes) in the generalized relation GR { While #_of_distinct_values_in_attribute_Ai > threshold {If no higher level concept in concept hierarchy for attribute_Ai Then remove attribute Ai Else substitute the value of Ai by its corresponding minimal generalized concept Merge identical tuples } } While #_of_tuples in GR > threshold { Selective generalize attributes

ISBN: 978-1 -941968-20-8 ©2015 SDIWC

Merge identical tuples }

This AOI characteristic rule algorithm is the implementation of step one to seven of the generalization strategy steps. The algorithm shows two sub processes i.e. control number of distinct attributes and control number of tuples. 1) Control number of distinct attributes is a vertical process which checks every per attribute vertically. This is done by checking all attributes in the learning results of a dataset which have distinct attributes less equal than the threshold. This first sub process is just applied attributes that have distinct attributes greater than threshold while the number of distinct attributes are also greater than the threshold. Each attribute which have distinct attribute greater than threshold will be checked if it has a higher level concept in the concept hierarchy. If it has no higher level concept then the attribute will not be used. On the other hand if they have higher level concept then the attribute value will be substituted with the value of the higher level concept. Merging identical tuples will be done in order to summarize generalization and accumulate the value of the vote of the identical tuples by eliminating the redundant tuples. Eventually, after this first sub process all the attributes in generalization will have number of distinct attributes less equal than the threshold. This first sub process is implementation of step one to five of the generalization strategy steps. 2) Control number of tuples is a horizontal process which checks per rule horizontally. This is carried out for those attributes which passed the first sub process where each attribute will have number of distinct

17

Proceedings of the International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Jakarta, Indonesia 2015

attributes less equal than the threshold. This second sub process is only done while the number of rules is greater than threshold. Selective generalization of the attributes and merging of the indentical tuples will reduce the number of rules. Selecting candidate attribute for further generalization can be done by preferences with finding the ratio on the number of tuples or the number of distinct attribute values. Selecting candidate attribute for further generalization can be examined by user based on the non interesting one, either non interesting attribute or non interesting rule. As with first sub process merging the identical tuples will be done in order to summarize generalization and accumulate the vote value of identical tuples by eliminating the redundant tuples. Eventually, after this second sub process the number of rules is less equal than the threshold. This second sub process is the implementation of step three, four and six of the generalization strategy steps. AOI discriminant rule algorithm [1] is shown below:

For each of attribute Ai (1  i  n, where n= # of attributes) in the generalized relation GR { Mark the overlapping tuples While #_of_distinct_values_in_attribute_Ai > threshold { If no higher level concept in concept hierarchy for attribute_Ai Then remove attribute Ai Else substitute the value of Ai by its corresponding minimal generalized concept Mark the overlapping tuples Merge identical tuples } } While #_of_tuples in GR > threshold

ISBN: 978-1 -941968-20-8 ©2015 SDIWC

{ Selective generalize attributes Mark the overlapping tuples Merge identical tuples }

AOI discriminant rule algorithm is the implementation of step one until eight of generalization strategy steps. Since AOI discriminant rule and AOI characteristic rule algorithms have the same generalization strategy steps between steps one and seven, then literally they have the same process and the difference is just only in step eight. They also have the same sub processes i.e. control number of distinct attributes as the first sub process and control number of tuples as the second sub process. The step handling overlapping tuples as the eight generalization strategy step is process in the beginning before the first sub process and both in first and second processes before merge indentical tuples. 6. AOI Advantages and disadvantages AOI provides a simple and efficient way to learn knowledge rules from a large database and has many advantages [9] such as: 1) AOI provides additional flexibility over many machine learning algorithms. 2) AOI can learn knowledge rules in different conjunctive and disjunctive forms and provides more choices for the experts and users. 3) AOI can use database facilities as the traditional relational database such as selection, join, projection whereas most learning algorithms suffer from inefficiency problems in a large database environment. 4) AOI can learn qualitative rules with quantitative information while many machine learning algorithm only can learn qualitative rules. 5) AOI can handle noisy data and exceptional cases elegantly by incorporating statistical techniques in the learning process whereas

18

Proceedings of the International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Jakarta, Indonesia 2015

some learning system can only work in a ‘noise free’ environment. However, AOI also has disadvantages [10] such as: 1) AOI can only provides a snapshot of the generalized knowledge and not a global picture. Yet, the global picture can be revealed by trying different thresholds repeatedly. 2) Adjusting different thresholds will result in different sets of generalized tuples. However, using different thresholds repeatedly is a time consuming and tedious work. 3) There will be a problem in selecting the best generalized rules between the large and small threshold. Where in a large threshold value will lead to a relatively complex rule with many disjuncts and the results may not be fully generalized. On the other hand a small threshold value will lead to a simple rule with few disjuncts and the results may over generalized the rule with a risk of losing some valuable informations. 7. AOI Current Studies There are a number of recent studies on AOI. One study by Chen et al has proposed a global AOI method employing multiple-level mining technique with multiple minimum supports in order to generalize all interesting general knowledge [30]. Wu et al have proposed a Global Negative AOI (GNAOI) approach that can generate comprehensive and multiple-level negative generalized knowledge at the same time [31]. Furthermore, Muyeba et al have proposed clusterAOI, a hybrid interestingness heuristic algorithm, which uses attribute features such as concept hierarchies and distinct domain attribute values to dynamically recalculate new attribute thresholds for each less significant attribute [32]. Moreover, Huang et al have introduced the Modified AOI (MAOI) method to deal with the multi-valued attribute table and further sort the readers into different

ISBN: 978-1 -941968-20-8 ©2015 SDIWC

clusters. Instead of using the concept hierarchy and concept trees, MAOI method implemented the concept climbing and generalization of multi-valued attribute table with Boolean Algebra and modified Karnaugh Map, and then described the clusters with concept description [33]. Meanwhile, Over generalization problem in AOI was reduced with entropy measurement, where AOI algorithm was extended by feature selection for generalization process depends on feature entropy measurement [41]. Meanwhile, AOI is combined with EP(Emerging Pattern) become AOI-HEP(Attribute Oriented Induction High Emerging Pattern) use to mine frequent and similar pattern [42,43,44] and have future research such as inverse discovery learning, learning more than two datasets and learning other knowledge rules[45]. Moreover, MAOI (Modified AOI) algorithm was proposed to deal with multi-valued attributes which convert the data to Boolean bit uses K-map to converge the attributes[46]. Furthermore, AOI was modified and called Frequency Count AOI (FC-AOI) and used to mine the network data[47]. Meanwhile, AOI was extended and used as Extended Attribute Oriented Induction (EAOI) for clustering mixed data type, where EAOI has function to drawback major values and numeric attributes[48,49]. Moreover, AOI was chosen as second step from 5 steps proposed algorithm in order to produce AOI characteristic rule for parallel machine scheduling[50]. Another approach was proposed where doing classification using decision tree induction which improve C4.5 classifier with 4 steps where the first step is generalization by AOI[51]. Meanwhile, CFAOI (Concept-Free AOI) was proposed in order to improve AOI from the constraint of concept tree on multi value attributes, by combining the simplified binary digits with Karnaugh Map [52]. 8. Conclusion AOI has ages of 26 years since 1989 proof that still exist in finding pattern and have been

19

Proceedings of the International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Jakarta, Indonesia 2015

combined and as complement with other data mining techniques. AOI can mine many different patterns and other possible patterns in the future. AOI has been proof as powerful mining technique when many patterns can be mining with simple pattern results. AOI has powerful in order to roll up/summarize data in low to high level in concept tree/hierarchy, which show that produce simple pattern. Implementation AOI shows that AOI is useful and recognized to mine pattern summarize pattern from huge pattern and many kinds of different patterns. Using AOI in many kind of field such as business, education, engineering, health and so on, should be mapped in order to increase the reliability of AOI as proof and powerful data mining technique. Acknowledgement This research is supported under Program of research incentive of national innovation system (SINAS) from Ministry of Research, Technology and Higher Education of the Republic of Indonesia, decree number 147/M/Kp/IV/2015, Research code: RD-20150020. REFERENCES [1]

[2]

[3]

[4] [5]

[6]

[7]

Han,J., Cai, Y., and Cercone, N. 1993. Data-driven discovery of quantitative rules in relational databases. IEEE Trans on Knowl and Data Engin, 5(1),29-40. Han,J. and Fu, Y. 1995. Exploration of the power of attributeoriented induction in data mining. in U. Fayyad, G. PiatetskyShapiro, P. Smyth and R. Uthurusamy, eds. Advances in Knowledge Discovery and Data Mining, 399-421. Han, J., Cai, Y. and Cercone, N. 1992. Knowledge discovery in databases: An attribute-oriented approach. In Proceedings of the 18th Int. Conf. Very Large Data Bases, 547-559. Han,J. 1994. Towards efficient induction mechanisms in database systems. Theoretical Computer Science, 133(2), 361-385. Cheung, D.W., Fu, A.W. and Han, J. 1994. Knowledge discovery in databases: A rule-based attribute-oriented approach. In Proceedings of Intl Symp on Methodologies for Intelligent Systems, 164-173. Cheung, D.W., Hwang, H.Y., Fu, A.W. and Han, J. 2000. Efficient rule-based attribute-oriented induction for data mining. Journal of Intelligent Information Systems, 15(2), 175-200. Han,J., Fu, Y.,Wang, W., Chiang, J., Gong, W., Koperski, K., Li,D., Lu, Y., Rajan,A., Stefanovic,N., Xia,B. and Zaiane,O.R.1996. DBMiner:A system for mining knowledge in

ISBN: 978-1 -941968-20-8 ©2015 SDIWC

[8] [9] [10]

[11]

[12]

[13]

[14]

[15]

[16] [17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

large relational databases. In Proceedings of Int'l Conf. on Data Mining and Knowledge Discovery, 250-255. Han,J., Lakshmanan, L.V.S. and Ng, R.T. 1999. Constraint-based, multidimensional data mining. IEEE Computer, 32(5), 46-50. Cai, Y. 1989. Attribute-oriented induction in relational databases. Master thesis, Simon Fraser University. Wu, Y., Chen, Y. and Chang, R. 2009. Generalized Knowledge Discovery from Relational Databases. International Journal of Computer Science and Network, 9(6),148-153. Imielinski, T. and Virmani, A. 1999. MSQL: A Query Language for Database Mining. in Proceedings of Data Mining and Knowledge Discovery, 3, 373-408. Muyeba, M. 2005. On Post-Rule Mining of Inductive Rules using a Query Operator. In Proceedings of Artificial Intelligence and Soft Computing. Meo, R., Psaila,G. and Ceri,S. 1998. An Extension to SQL for Mining Association Rules. In Proceedings of Data Mining and Knowledge Discovery,2,195-224. Muyeba,M.K. and Keane,J.A. 1999. Extending attribute-oriented induction as a key-preserving data mining method. In Proceedings 3rd European Conference on Principles of Data Mining and Knowledge Discovery, Lecture Notes in Computer science, 1704, 448-455. Muyeba, M. and Marnadapali, R. 2005. A framework for PostRule Mining of Distributed Rules Bases. In Proceeding of Intelligent Systems and Control. Zaiane, O.R. 2001. Building Virtual Web Views. Data and Knowledge Engineering, 39, 143-163. Han, J., Chiang, J. Y., Chee, S., Chen, J., Chen, Q., Cheng, S., Gong, W., Kamber, M.,Koperski, K., Liu, G., Lu, Y., Stefanovic, N., Winstone, L., Xia, B. B., Zaiane, O. R., Zhang, S., and Zhu, H. 1997. DBMiner: a system for data mining in relational databases and data warehouses. In Proceedings of the 1997 Conference of the Centre For Advanced Studies on Collaborative Research, 8-. Frank, A. and Asuncion, A. 2010. UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. Elfeky, M.G., Saad, A.A. and Fouad, S.A. 2000. ODMQL: Object Data Mining Query Language. In Proceedings of the International Symposium on Objects and Databases, 128-140. Han, J. and Fu, Y. 1994. Dynamic Generation and Refinement of Concept Hierarchies for Knowledge Discovery in Databases. In Proceedings of AAAI Workshop on Knowledge Discovery in Databases, 157-168. Huang, Y. and Lin, S. 1996. An Efficient Inductive Learning Method for Object-Oriented Database Using Attribute Entropy. IEEE Transactions on Knowledge and Data Engineering, 8(6),946-951. Hu, X. 2003. DB-HReduction: A Data Preprocessing Algorithm for Data Mining Applications. Applied Mathematics Letters,16(6),889-895. Hsu, C. 2004. Extending attribute-oriented induction algorithm for major values and numeric values. Expert Systems with Applications, 27, 187-202. Han, J., Fu, Y., Huang, Y., Cai, Y., and Cercone, N. 1994. DBLearn: a system prototype for knowledge discovery in relational databases. ACM SIGMOD Record, 23(2), 516. Han, J., Fu, Y., and Tang, S. 1995. Advances of the DBLearn system for knowledge discovery in large databases. In Proceedings of the 14th international Joint Conference on Artificial intelligence, 2049-2050. Beneditto, M.E.M.D. and Barros, L.N.D. 2004. Using Concept Hierarchies in Knowledge Discovery. Lecture Notes in Computer Science, 3171,255–265.

20

Proceedings of the International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Jakarta, Indonesia 2015

[27] Fudger, D. and Hamilton, H.J. 1993. A Heuristic for Evaluating Databases for Knowledge Discovery with DBLEARN. In Proceedings of the International Workshop on Rough Sets and Knowledge Discovery: Rough Sets, Fuzzy Sets and Knowledge Discovery (RSKD '93), 44-51. [28] Han, J. 1997. OLAP Mining: An Integration of OLAP with Data Mining. In Proceedings of the 7th IFIP 2.6 Working Conference on Database Semantics (DS-7),1-9. [29] Han, J., Fu,Y., Koperski, K., Melli, G., Wang, W. And Zaïane, O.R. 1996. Knowledge Mining in Databases: An Integration of Machine Learning Methodologies with Database Technologies, Canadian Artificial Intelligence,(38),4-8. [30] Chen, Y.L., Wu,Y.Y. and Chang, R. 2012. From data to global generalized knowledge. Decision Support Systems, 52(2), 295307. [31] Wu,Y.Y., Chen,Y.L., and Chang,R., 2011, Mining negative generalized knowledge from relational databases, KnowledgeBased Systems,24(1), 134-145. [32] Muyeba, M.K., Crockett, K. and Keane, J.A. 2011. A hybrid interestingness heuristic approach for attribute-oriented mining. In Proceedings of the 5th KES international conference on Agent and multi-agent systems: technologies and applications (KESAMSTA'11), 414-424. [33] Huang, S., Wang, L. and Wang, W. 2011. Adopting data mining techniques on the recommendations of the library collections. In Proceedings of the 11th international conference on Information and Knowledge engineering, 46-52. [34] Thanh, N.D., Phong, N.T. and Anh, N.K. 2010. Rule-Based Attribute-Oriented Induction for Knowledge Discovery. In Proceedings of the 2010 2nd International Conference on Knowledge and Systems Engineering (KSE '10), 55-62. [35] Han,J. and Fu, Y. 1995. Discovery of Multiple-Level Association Rules from Large Databases. In Proceedings of the 21th International Conference on Very Large Data Bases (VLDB '95), 420-431. [36] Han, J. 1998. Towards on-line analytical mining in large databases. SIGMOD Rec. 27(1), 97-107. [37] Han, J., Cai, O., Cercone, N. and Huang, Y. 1995. Discovery of Data Evolution Regularities in Large Databases. Journal of Computer and Software Engineering,3(1),41-69. [38] Cercone, N., Han, J., McFetridge, P., Popowich, F., Cai,Y., Fass, D., Groeneboer, C., Hall, G. and Huang, Y. 1994. System X and DBLearn: How to Get More from Your Relational Database, Easily. Integrated Computer-Aided Engineering, 1(4),311-339. [39] Cai, Y., Cercone, N. and Han, J. 1991. Learning in relational databases: an attribute-oriented approach. Comput. Intell, 7(3),119-132. [40] Cai, Y., Cercone, N. and Han, J. 1990. An attribute-oriented approach for learning classification rules from relational databases. In Proceedings of 6th International Conference on Data Engineering, 281-288. [41] Al-Mamory, S.O., Hasson, S.T. and Hammid, M.K. 2013. Enhancing Attribute Oriented Induction of Data Mining, Journal of Babylon University, 7(21), 2286-2295. [42] S. Warnars. 2015. Mining Frequent and similar patterns with Attribute Oriented Induction High Level Emerging Pattern (AOIHEP) Data Mining technique, International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS), 3(11), 266-276. [43] S.Warnars, 2014. Mining Frequent pattern with Attribute Oriented Induction High Level Emerging Pattern (AOI-HEP). Proceedings of the 2nd International Conference on Information and Communication Technology (ICoICT), 144-149. [44] S.Warnars, 2012. Attribute Oriented Induction High Level Emerging Pattern. Proceedings of the International Conference on Granular Computing(GrC).

ISBN: 978-1 -941968-20-8 ©2015 SDIWC

[45] S.Warnars, 2014. Attribute Oriented Induction High Level Emerging Pattern (AOI-HEP) future research. Proceedings of the 8nd International Conference on Information & Communication Technology and Systems (ICTS), 13-18. [46] Huang, s, Hsu, P. and Lam, H.N.N. 2013. An attribute oriented Induction approach for Knowledge discovery from relational databases. Advances in Information Sciences and Service Sciences (AISS), 5(3), 511-519. [47] Tanutama, L. 2013. Frequency count Attribute Oriented Induction of Corporate Network data for Mapping Business activity. International Conference on Advances Science and Contemporary Engineering (ICASCE), 149-152. [48] Prasad, D.H. and Punithavalli, M. 2012. An integrated GHSOMMLP with Modified LM Algorithm for Mixed Data Clustering, ARPN Journal of Engineering and Applied Sciences, 7(9), 11621169. [49] Prasad, D.H. and Punithavalli, M. 2013. A Novel approach for mixed Data Clustering using Dynamic Growing Hierarchical SelfOrganizing Map and Extended Atrribute-Oriented Induction,Life Science Journal, 10(1), 3259-3266. [50] Kaviani, M., Aminnayeri, M, Rafienejad, S.N. and Jolai, F.2012. An appropriate pattern to solving a parallel machine scheduling by combination of meta-heuristic and data mining, Journal of American Science, 8(1), 160-167. [51] Ali, M.M, Qaseem, M.S., Rajamani, L. and Govardhan, A. 2013. Extracting useful Rules Through Improved Decision Tree Induction using Information Entropy, International Journal of Information Sciences and Techniques(IJIST), 3(1), 27-41. [52] Huang, S. 2013. CFAOI: Concept-Free AOI on Multi Value Attributes. Life Science Journal, 10(4), 2341-2348.

21