2012 IEEE International Conference on Systems, Man, and Cybernetics October 14-17, 2012, COEX, Seoul, Korea
Association Rules Based Algorithm for Identifying Outlier Transactions in Data Stream Li-Jen Kao
Yo-Ping Huang*
Department of Computer Science and Information Engineering Hwa Hsia Institute of Technology Taipei, 23568 Taiwan
[email protected]
Department of Electrical Engineering National Taipei University of Technology Taipei, 10608 Taiwan
[email protected] *corresponding author
Abstract—Most outlier detection algorithms are proposed to discover outlier patterns from static databases. Those algorithms are infeasible for instant identification of outlier patterns in data streams that continuously arriving and unbounded data serve as the data sources in many applications such as sensor data feeding. In this paper an association rules based method is proposed to find outlier patterns in data streams. The presented work segments transactions from data streams and then finds approximate frequent itemsets with single data scan instead of requiring multiple scans. Based on the derived association rules some transaction can be identified as outliers if their outlier degrees are higher than a predefined threshold. The proposed method not only just finds the outlier patterns but also identifies the most possible items that induce the abnormal transactions in the data streams. Efficiency comparisons with frequent itemsetsbased work are also done to verify the effectiveness of the proposed framework.
estimate missing data in sensor data stream [12]. Although detecting outlier patterns on data stream that incurred abnormal activities or signals is a critical issue in many applications, the outlier detection in a real time manner is still a challenging work due to the characteristics of data stream.
Keywords-outlier detection; data stream, frequent itemsets; association rules
I. INTRODUCTION Outlier detection is an important data mining technique commonly used to detect deviant data in data set [1, 3, 5-7, 10, 15-18, 20-21, 23-24, 26-27, 34]. The credit card fraud detection and network intrusion detection are examples that require efficient algorithms to resolve the problems instantly. In fact, plenty of algorithms are developed in outlier detection community [33]. For example, distance-based methods defining an outlier in a set D is a data point p if a certain percentage of other points in D were away from p over a predefined distance [20]. Clustering methods are also used to identify outliers which are the points that are not included in any of the formed clusters [16, 19, 32]. Other study includes density-based method [5] and support vector methods [30]. The concept of data stream is widely used in a variety of applications, such as stock tickers, network traffic measurements, or data feeds from sensors. The characteristic of data stream is that the data arrive in timely order and continuously [11, 33]. Since data streams applications have received lots of attention [13], a variety of data analysis researches are presented to find valuable information or to predict future trends. For example, Manku et al. proposed framework of frequent pattern mining in data stream [22]. Halatchev et al. proposed association rule mining method to
978-1-4673-1714-6/12/$31.00 ©2012 IEEE
Most of the existing outlier detection algorithms work on static data sets [31]. They repeatedly scan entire data sets to find global outliers. In data stream applications, the data sets are unbounded; therefore, it is impractical to scan whole data stream to discover outliers. Besides, many data stream applications need to have online and quick responses to request. Data scans for the specific applications are asked to complete within a limited time to provide good efficiency. These are two of the reasons that explain the current outlier detection methods are infeasible for data stream outlier detection. Very few studies [2, 4, 9, 25, 29] tackled outlier detection in data stream. They adopted compact data structures to avoid scanning the entire data stream. For example, a sliding window is used to help define current data sets that the local outliers could be detected inside them [2, 4, 29]. Auto-regression based methods are widely found in stream data outlier detection [9, 25]. He et al. [33] employed frequent itemsets to find outliers in data stream. In this paper, an algorithm is proposed to employ association rules to check the data stream and find the embedded outliers. The algorithm will define a bounded transaction data set in data stream and then derive association rules from those transactions. The outlier transactions will be detected according to the mined association rules. For example, assuming an association rule {A,B} Ш {C}, with high confidence value 80%, is derived from the data stream DS. A transaction is treated abnormal because item C is supposed to appear but actually not being seen there. That is, an outlier is thought as a transaction that is expected to contain some items that actually did not appear there. The proposed framework also aims on exploring the reasons in data stream that cause the transactions to become outliers. This provides the data mining users to have better knowledge to further understand the mining results. The proposed framework is divided into two parts. One part extends the work of Shin and Lee [28] to get association rules and then identify outlier transactions in data stream. In [28], the estDec algorithm [8] is used to find frequent itemsets and
3203
all the possible associations rules derived from t_stack. Shin and Lee also proposed another algorithm to get real association rules. Please refer to [28] for more detailed definition or methods on traversal stack, enumerated association rules, and the algorithm to get final association rules.
then a traversal stack is employed to generate association rules. A predefined outlier degree is set to validate whether a transaction is an outlier or not. After finding the outliers, the other part of this study is to extract the relationship between outliers and infrequent items. We approach from mapping each original outlier transaction to its corresponding unobserved frequent itemsets and infrequent itemsets. Then, the new transformed outlier transaction can be mined again by employing association rules mining algorithm to find the relationship between outliers and items.
C. Definitions of Outlier Transactions Narita and Kitagawa gave definitions on outlier transactions [23]. They were interested in probing what a transaction is likely to be an outlier if some items were supposed to appear but they are actually not found there [23]. Based on this concept, an outlier degree is defined to evaluate whether a single transaction is an outlier or not. The following example illustrates what outlier transactions are [23].
The remaining parts of this paper are organized as follows. Section II introduces related works on outlier detection. Section III describes the proposed method and section IV provides experimental results. Section V concludes the paper.
Example 1. Table II is a transaction dataset. In order to derive association rules, both the minimum support and minimum confidence are set to 50% and 80%, respectively. Table III gives partial association rules generated from Table II. Since all the rules in Table III have high confidence, one can find that transaction 2 is abnormal. By checking with rule 2, this transaction did not include the item {Bread} that is supposed to appear in the transaction. In fact, transaction 2 is an outlier according to the study in [23] and the evaluation procedures will be introduced in the following sections.
II. RELATED WORKS This section describes works related to outlier detection from transactional datasets. A. Counting Frequent Itemsets in Data Stream Compared with traditional databases, data stream is an unbounded data set and is hard to define a finite number of transactions for the underlying data mining. To resolve the problem, two data models were proposed for data mining algorithms. The accumulative data model considers the transactions from the beginning to current time. Any new transactions are accumulated to the current database as time goes by. The other model used a sliding window to form a bounded transaction data set. The data mining algorithms only considered the transactions inside the sliding window. As window drifts, the mining results are updated immediately. In this paper, the estDec method [8] is adopted to track frequent itemsets in the current sliding window. In the beginning, the estDec method checks the transactions in the sliding window and then a prefix-tree P is built to keep track
Figure 1. An example of traversal stack. TABLE I. ALL ASSOCIATION RULES DERIVED FROM TRAVERSAL STACK
antecedent
of frequent itemsets. While the window moves to the next bucket, the estDec method can find new frequent itemsets to be inserted to P or delete an existing itemset in P that is no
precedetors i1i2…in-3in-2 i1i2…in-3
longer a frequent itemset. Briefly speaking, the estDec method alternates in count updating and itemsets insertion.
i1i2…in-4
B. Mining Association Rules in Data Stream The estDec method employs a prefix-tree to continuously monitor frequent itemsets. In order to get association rules, it is necessary to traverse the prefix-tree to obtain subsets of a frequent itemset. Shin and Lee proposed a traversal stack to derive the association rules [28].
… i1i2…ij-1
… i1
A traversal stack stores a specific frequent itemset in the prefix-tree and then is used to enumerate all possible association rules [28]. Fig. 1 shows an example of a traversal stack t_stack, if the itemset is {i1, i2, i3,..., in} in the prefix tree P. Next, the traversal stack is scanned twice to obtain all possible association rules. In order to enhance the enumeration efficiency, the antecedent part of the rules is divided into three parts, precedetors, alternators and combinators. Table I shows
i2 … in
3204
alternators in in
in-1 in in-1 in-2 … in in-1 … ij+1 … in in-1 … i3
consequent combinators
ij ij in
ij
in in-1in …
ij
…
in … ij+2ij+3…in … ij in ... i4i5…in i3i4i5…in …
ij
ij
ij
in-1 in-2in-1 in-2 in-3in-2in-1 in-3in-2 in-3 … ijij+1…in-1 ijij+1…in-2 … ij … i2i3i4…in-1 i2i3i4…in-2 … i2 i1 … i1i2…in-1
TABLE II. A SAMPLE TRANSACTION DATASET
TID 1 2 3 4 5
If the min_od is set to 0.3, both transactions 2 and 10 will be the outliers.
Items Bread, Jam, Milk Bacon, Corn, Jam, Milk Bread, Jam, Milk Bacon, Bread, Corn, Egg, Milk Bacon, Bread, Corn, Egg, Jam, Milk Bread, Corn, Jam, Milk Bacon, Bread, Egg, Milk Bacon, Bread, Egg, Jam, Milk Bread, Jam, Milk Bacon, Egg, Milk
6 7 8 9 10
III. OUTLIER DETECTION AND ANALYSIS In this section, we will provide evidences to show that the outlier degree is not only affected by frequent itemsets but also the rare items should be taken into consideration. Otherwise, the judgment on outlier detection may be insufficient to reflect the facts inside the database. Example 2 is given to illustrate this new observation. Besides, a new algorithm is proposed to discover which items may cause the transactions to become outliers. Definition 4. The rare items or infrequent items are the items that are not contained in any frequent itemsets. The infrequent itemset is the set that includes all the rare items.
TABLE III. ASSOCIATION RULES DERIVED FROM TABLE II
Rule ID 1 2 3 4 5 6 7
Rule {Jam}ĺ{Bread} {Jam, Milk}ĺ{Bread} {Jam}ĺ{Bread, Milk} {Bacon}ĺ{Egg} {Bacon, Milk}ĺ{Egg} {Bacon}ĺ{Egg, Milk} {Milk}ĺ{Bread}
Example 2. Table IV gives another example of transaction set. The setting of the minimum support and minimum confidence are the same as Example 1 did. Comparing to transaction 2 in Table II, this transaction has only one additional rare item, Battery. The associative closure for transaction 2 is , and the outlier degree 2 is equal to . If the minimum outlier degree is set to 0.3, this 7 transaction is no longer an outlier. But according to the work in [14], an outlier is a transaction with some items that were expected to appear but were not found there. Transaction 2 in Table IV is basically the same as transaction 2 in Table II, and it still should be an outlier. The problem arose from the definition of associative closure that the more rare items a transaction has, the more normal the transaction is.
D. Outlier Transaction Detection Narita and Kitagawa introduced the outlier degree to evaluate whether a transaction is an outlier or not [23]. The following gives two definitions about the outlier degree. Definition 1. Let t be a transaction and R be the set of high confidence association rules. t' s associative closure t + is defined as follows: t0 = t t
i +1 +
i
{
i
= t ∪ e | e ∈Y ∧ X ⊆ t ∧ X → Y ∈ R
t =t
∞
Based on this observation, one should remove the rare items in the transactions first whenever trying to derive the outlier degree. The following is the new definition of transaction’s associative closure and its maximal associative closure.
}
The associative closure t + is an ideal form for t and does not violate any association rule.
Definition 5. Let t be a transaction, AR be the set of high confidence association rules, and RIS be the set of all infrequent items. t − is denoted as t’s frequent itemset if all the infrequent items are removed from t. t' s associative closure
Definition 2. Let t be a transaction, R be the set of high confidence association rules, and t + be the associative closure of t. The outlier degree of t is defined as od (t ) :
od (t ) =
t+ − t t+
t + is redefined as follows: t − = t − {e | e ∈ RIS }
.
(3)
t0 = t−
{
t i +1 = t i ∪ e | e ∈ Y ∧ X ⊆ t i ∧ X → Y ∈ AR
The range of outlier degree should be between 0 and 1. When t + = t , od(t) is equal to 0, and od(t) is always less than 1. In Table II, the associative closure t + for transaction 2 is . The outlier degree for 2 transaction 2 is therefore equal to = 0.33 . 6 Definition 3. An outlier transaction is a transaction with an outlier degree od(t) greater than or equal to min_od, a predefined outlier degree threshold.
t+ = t∞
}
Next, we will discuss how to derive the relationships between a certain outlier transaction and its items. In fact, this work is similar to find the associations between items. That is, the concept to find association rules can be applied to relate an outlier and its items. The first step is to transform each transaction to two parts, unobserved frequent itemsets and infrequent itemset. Definition 6. Let t be a transaction, R be the set of association rules. An unobserved frequent itemset is defined as follows:
3205
UNF (t ) = {e | e ∈ X ∪ Y ∧ X ⊆ t ∧ Y ⊄ t ∧ X → Y ∈ R}
infrequent items. 5.Apply FP-growth to get association rules set Rtrans on OTtrans.
The complete algorithm that includes both outlier detection and finding the relationship between rare items and outliers is shown in Fig. 2. The following is an example to demonstrate a more detailed procedure of finding the relationship between rare items and outlier transactions.
Figure 2. The complete algorithm that can get outlier transactions and find which items to cause transaction abnormal. TABLE V. A SYNTHETIC TRANSACTION DATASET WITHIN A SLIDING WINDOW
Example 3. A synthetic dataset is made to verify that the proposed method can find the relationship between outlier and its rare items. Table V gives a synthetic transaction dataset within a sliding window. All items for the synthetic transaction set are a, b, c, d, e, f, g, h and i. Table VI gives partial frequent itemsets and Table VII shows association rules derived from Table VI if the minimum support is set to 50% and minimum confidence is 80%. According to Table VI, the rare items are a, e, g, h, and i. If the minimum outlier degree is set to 0.5, then transaction 5, 8, and 14 will be outliers. The first step is to transform each found outlier to two parts, unobserved frequent itemsets and infrequent items. Transaction 5 has two unobserved frequent itemset, {dc*} and {dfc*} (according to the association rule 2 and 4 in Table VII), and the infrequent items are e and h. Transaction 8 has two unobserved frequent itemset, {cd*} and {cfd*} (according to the association rule 1 and 3), and the infrequent items are e and i. Table VIII is the final transformation result. Table VIII can be thought as a new transaction set and each unobserved frequent itemset can be treated as an item. By applying FP-growth algorithm [14] with minimum support and minimum confidence respectively setting to 50% and 80% to the transformed data set in Table VIII, we find that, for example, item h is the one that induces the abnormal transaction since the rule h → {dfc*} and h → {dc*} can be derived from Table VIII. It means that item c should appear but because of the infrequent item h, item c is not observed in the transaction. That is, item h may cause the transaction to become abnormal.
Items c, d, f, g a, b, c, d, e, g a, c, d, f c, d, h, i d, e, f, h a, c, d, f, e, g b, c, d, e, f b, c, f, e, i c, d, e, f, g, i b, c, d, f a, b, c, d b, g c, d, f, h b, d, f, h b, c, d, f c, d, f, g
TABLE VI. (a) A PREFIX TREE IS USED TO TRACK FREQUENT ITEMSETS. THE MINIMUM SUPPORT IS SET TO 50%. (b)THE FREQUENT ITEMSETS DERIVED FROM (a) (b)
(a)
prefix f c d b
suffix c d d ij ij
L1
L2
L3
TABLE VII. ASSOCIATION RULES DERIVED FROM PREFIX TREE. THE MINIMUM CONFIDENCE IS SET TO 80%
TABLE IV. ANOTHER SAMPLE TRANSACTION DATASET
TID 1 2 3 4 5 6 7 8 9 10
TID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Rule ID 1 2 3 4
Items Bread, Jam, Milk Battery, Bacon, Corn, Jam, Milk Bread, Jam, Milk Bacon, Bread, Corn, Egg, Milk Bacon, Bread, Corn, Egg, Jam, Milk Bread, Corn, Jam, Milk Bacon, Bread, Egg, Milk Bacon, Bread, Egg, Jam, Milk Bread, Jam, Milk Battery, Bacon, Egg, Milk
Rule (confidence) c ĺ d (92.3%) d ĺ c (85.7%) c, f ĺ d (90.0%) d, f ĺ c (81.8%)
TABLEVIII. THE TRANSFORMED TRANSACTION DATASET FROM TABLE V
TID 5 8 14
Items dc*, dfc*, e, h cd*, cfd*, e, i dc*, dfc*, h
IV. EXPERIMENTAL RESULTS AND DISCUSSION One database is experimented to evaluate the effectiveness on our algorithm against the work in [33]. The experiment was run on a workstation with an Intel 2.5GHz processor and 2G of memory. The algorithm proposed by [33] and the algorithm in this research are implemented in Dev C++.
1.Employ a prefix tree to keep track frequent itemsets whthin a sliding window. 2.Derive association rules from the prefix tree. 3.Get outlier transactions set OT from transaction set T within the sliding window. 4.Transafer OT to OTtrans Transfer each t in OT to ttrans by dividing t into two parts, unobserved frequent itemsets and
The experiment uses a synthetic data set as input data source which is generated by IBM synthetic data generator.
3206
transactions. Outlier detection can even provide knowledge on which items cause the transaction patterns to become abnormal.
The two generated data sets include the one with 500 transactions that are viewed as outliers and the other with 9,500 transactions that are treated as normal. The two sets are then merged to one dataset with 10,000 transactions. In order to check the efficiency, both the accuracy and precision rates are defined as follows: accuracy =
no. of detected outliers that are positive . (4) no. of all outliers
precision =
no. of detected outliers that are positive . (5) no. of detected outliers
In this paper, an improved algorithm based on both [23] and [28] is proposed to derive association rules and then identify outlier transactions in data stream. Due to the characteristics of data stream, the traditional Apriori-based methods are not appropriate to find association rules in data stream. The proposed algorithm employs a prefix-tree to monitor frequent itemsets in a sliding window, and a traversal stack to generate association rules in data stream. The experimental results provide evidences to verify that the proposed algorithm is more efficient in both accuracy and precision rates. Another contribution is that the relationship between outlier and its items can be found by the proposed algorithm. The reasons to cause the transactions to become outliers can be found. In other words, the mining results became understandable. In the future, we will apply our framework to identify the outliers and unveil the facts behind the abnormal transactions. Since the proposed work of discovering outliers has not been experimented on real datasets, we only validated the proposed work by checking both the accurate and precision rates. This will initiate another direction towards the knowledge mining.
The proposed method is compared with the work in [33]. The top-k highest outlier degrees are chosen as outliers and Table IX compares the efficiency between them. The reason behind this better performance is the infrequent items’ effect on outlier calculation has been removed and the outliers can just be evaluated by considering those candidate items that should be seen but actually were not inside a transaction. Next, each outlier is transformed to two parts, unobserved frequent itemsets and infrequent items. The 500 transformed outliers are treated as a new transaction set. By applying FPgrowth algorithm [14] to the newly transformed data set, the association between an outlier and its infrequent itemset can be found. That is, the reasons that cause the transactions to become abnormal can be discovered.
ACKNOWLEDGMENTS This work is supported by National Science Council, Taiwan under Grants NSC100-2221-E-027-110- and joint project between National Taipei University of Technology and Mackay Memorial Hospital under Grant NTUT-MMH-100-09.
The above experiment shows that the proposed outlier detection method is more practical since it can not only identify the outlier transactions but also discover the associations between outlier and its items. This is quite useful since the mining results can help the users figure out what causes the transactions to become abnormal without inquiry the experts knowledge. In fact, this existing problem is time consuming and biased that deters the efficiency of data mining overall process. Our proposed framework cultivates a new research direction of knowledge mining that helps people understand the data mining results easily and make the action plans without any further inquiries.
REFERENCES [1]
[2]
[3]
TABLE IX. OUTLIER DECTION UNDER DIFFERENT K VALUE
top-k top-300 top-350 top-500 top-600
true outliers deteced by the proposed algorithm (accuracy, precsion) 215 (43%, 72%) 327 (65%, 93%) 494 (99%, 99%) 500 (100%, 83%)
[4]
true outliers deteced by He et al. algorithm [33] (accuracy, precsion) 210 (42%, 70%) 318 (64%, 91%) 494 (99%, 99%) 500 (100%, 83%)
[5]
[6]
V. CONCLUSIONS
[7]
Knowledge mining on data stream attracts more attention in recent years due to the wide demand on utilizing data stream technologies. For example, the sensors emit data continuously, and people are interested in extracting useful information inside them immediately. Data mining on data stream performs the role as a bridge to obtain knowledge from analyzing outlier
[8]
[9]
3207
C.C. Aggarwal and P.S. Yu, “Outlier detection for high dimensional data,” in Proc. of ACM SIGMOD Int. Conf. on Management of Data, Santa Barbara, CA, USA, pp.37-46, May 2001. F. Anguiulli and F. Fassetti, “Detecting distance-based outliers in streams of data,” in Proc. of the 16th ACM Conf. on Information and Knowledge Management, Lisbon, Portugal, pp.811-820, November 2007. A. Arning, R. Agrawal and P. Raghavan, “A linear method for deviation detection in large databases,” in Proc. of the Second Int.Conf. on Knowledge Discovery and Data Mining, Portland, Oregon, USA, pp.164-169, August 1996. S. Basu and M. Meckesheimer, “Automatic outlier detection for time series: an application to sensor data,” Knowledge Information System, vol. 11, no. 2, pp.37-154, February 2007. M.M. Breunig, H.-P. Kriegel, R.T. Ng and J. Sander, “Lof: Identifying density-based local outliers,” in Proc. of ACM SIGMOD Int. Conf. on Management of Data, Dallas, Texas, USA, pp.93-104, May 2000. D. Burdick, M. Calimlim and J. Gehrke, “Mafia: A maximal frequent itemset algorithm for transactional databases,” in Proc. of the 17th Int. Conf. on Data Engineering, Heidelberg, Germany, pp.443-452, April 2001. P.K. Chan, M.V. Mahoney and M.H. Arshad, “A machine learning approach to anomaly detection,” Technical Report, Department of Computer Science, Florida Institute of Technology, June 2003. J.H. Chang and W.S. Lee, “Finding recent frequent itemsets adaptively over online data streams,” in Proc. of the 9th ACM SIGKDD, Washington, DC, USA, pp.487-492, August 2003. D. Curiac, O. Banias, F. Dragan, C. Volosencu, and O. Dranga, “Malicious node detection in wireless sensor networks using an
[10]
[11]
[12]
[13]
[14]
[15]
[16] [17]
[18]
[19]
[20]
[21]
[22]
autoregression technique,” in Proc. of the 3rd Int. Conf. on Networking and Services, Athens, Greece, pp.83-88, June 2007. K. Das and J.G. Schneider, “Detecting anomalous records in categorical datasets,” in Proc. of the Second Int. Conf. on Knowledge Discovery and Data Mining, San Jose, California, USA, pp.220-229, August 2007. S. Guha, N. Koudas,and K. Shim, “Data streams and histograms,” in Proc. of the 33rd ACM Symposium on Theory of Computing, Heraklion, Greece, pp.471-475, July 2001. M. Halatchev and L. Gruenwald, “Estimating missing values in related sensor data streams,” in Proc. of the 11th Int. Conf. on Management of Data, Goa, India, pp.83-94, January 2005. F. Han, Y.-M. Wang, and H.-P. Wang, “ODABK: An effective approach to detecting outlier in data stream,” in Proc. of the 5th Int. Conf. on Machine Learning and Cybernetics, Dalian, China, pp.10361041, August 2006. J. Han, J. Pei and Y. Yin, “Mining frequent patterns without candidate generation,” in Proc. of ACM SIGMOD Int. Conf. on Management of Data, Dallas, Texas, USA, pp.1-12, May 2000. Z. He, X. Xu and S. Deng, “Fp-outlier: Frequent pattern based outlier detection,” Computer Science and Information System, vol. 2, no. 1, pp.103-118, June 2005. Z. He, X. Xu, and S. Deng, “Discovering cluster-based local outliers,” Pattern Recognition Letters, vol. 24, no. 9-10, pp.1641-1650, June 2003. Z. He, S. Deng and X. Xu, “An optimization model for outlier detection in categorical data,” in Proc. of IEEE Int. Conf. on Intelligent Computing, Hefei, China, pp.400-409, August 2005. Z. He, S. Deng and X. Xu, “A fast greedy algorithm for outlier mining,” in Proc. of the 10th Pacific-Asia Conf. on Knowledge Discovery and Data Mining, Singapore, pp.567-576, April 2006. M.F. Jiang, S.S. Tseng, and C.M. Su, “Two-phase clustering process for outliers detection,” Pattern Recognition Letters, vol. 22, no. 6-7, pp.691-700, May 2001. M.E. Knorr, R.T. Ng and V. Tucakov, “Distance-based outliers: Algorithms and applications,” VLDB Journal, vol. 8, no. 3-4, pp.237253, 2000. A. Koufakou, M. Georgiopoulos, G.C. Anagnostopoulos and K.M. Reynolds, “A scalable and efficient outlier detection strategy for categorical data,” in Proc. of IEEE Int. Conf. on Tools with Artificial Intelligence, Patras, Greece, pp.210-217, October 2007. G. S. Manku and R. Motwani, “Approximate frequency counts over data streams,” in: Proc. of the 28th Int. Conf. on Very Large Data Bases, Hong Kong, China, pp.346-357, August 2002.
3208
[23] K. Narita and H. Kitagawa, “Outlier detection for transaction Databases using association rules,” in Proc. of the 9th Int. Conf. on Web-Age Information Management, Zhangjiajie, Hunan, pp.373-380, July 2008. [24] M.E. Otey, A. Ghoting and A. Parthasarathy, “Fast distributed outlier detection in mixed-attribute data sets,” Data Mining and Knnowledge Discovery, vol. 12, no. 2-3, pp.203-228, May 2006. [25] V. Puttagunta and K. Kalpakis, “Adaptive methods for activity monitoring of streaming data,” in Proc. of the 2002 Int. Conf. on Machine Learning and Applications, Las Vegas, Nevada, USA, pp.197203, June 2002. [26] S. Papadimitriou, H. Kitagawa, P.B. Gibbons and C. Faloutsos, “Loci: Fast outlier detection using the local correlation integral,” in Proc. of the 19th Int. Conf. on Data Engineering, Bangalore, India, pp.315-315, May 2003. [27] S. Ramaswamy, R. Rastogi and K. Shim, “Efficient algorithms for mining outliers from large data sets,” in Proc. of ACM SIGMOD Int. Conf. on Management of Data, Dallas, Texas, USA, pp.427-438, May 2000. [28] S.J. Shin and W.S. Lee, “Enumerating association rules of an online data stream,” in Proc. of the Int. Conf. on Applied Computing, Salamanca, Spain, pp.320-327, February 2007. [29] S. Subramaniam, T. Palpanas, D. Papadoppoulos, V. Kalogeraki, and D. Gunopulos, “Online outlier detection in sensor data using nonparametric models,” in Proc. of the 32nd Int. Conf. on Very Large Data Bases, Seoul, Korea, pp.187-198, September 2006. [30] D. Tax and R. Duin, “Support vector data description,”Machine Learning, vol. 54, no. 1, pp.45-66, January 2004. [31] D. Toshniwal and S. Yadav, “Adaptive outlier detection in streaming time series,” in Proc. of the 1st Int. Conf. on Asia Agriculture and Animal, Hong Kong, China, pp.186-191, July 2011. [32] D. Yu, G. Sheikholeslami, and A. Zhang, ”FindOut: Finding out outliers in large datasets,” Knowledge and Information Systems, vol. 4, no. 4, pp.387-412, September 2002. [33] Z. He, X. Xu, and S. Deng, “Outlier detection over data streams,” in Proc. of the 7th Int. Conf. for Young Computer Scientists, Harbin, China, August 2003. [34] C. Zhu, H. Kitagawa and C. Faloutsos, “Example-based robust outlier detection in high dimensional datasets,” in Proc. of the 5th IEEE Int. Conf. on Data Mining, Houston, Texas, USA, pp.829-832, November 2005.