Progressive Sampling for Association Rules Based ... - Semantic Scholar
Recommend Documents
(IJCSE) International Journal on Computer Science and Engineering. Vol. ... Dept of Computer Science, .... sample size while retaining any desired degree of.
Information Sciences Research Center. 600 Mountain Avenue ... The second algorithm, which we call the interleaved algorithm, uses cycle pruning and.
Action rules describe possible transitions of objects from one state to another with respect to a distinguished at- tribute. Previous research on action rule ...
Sep 12, 2012 - with over 500 sampled individuals (hairy wood ant, Formica lugubris, and black-browed ...... markers in the ant Formica exsecta. Mol. Ecol.
agents (with unicycle dynamics) in the air at one time. Furthermore, the framework ..... of different heuristics on the behavior of RRTs, we conducted trials of RRT.
Jul 8, 2016 - patients and non-RBBB (i.e., left bundle branch block, RBBB with left ... ment of non-RBBB was associated with the primary endpoint (HR 3.02; ...
that of the incoming and outgoing message roles on the lifeline role. Figure 1 shows an IPS for the MAC pattern where roles are denoted by the symbol â|â, and ...
due to Alexander [11] who stressed that architectures should be âtimelessâ and prove to be stable ...... In F. Moller and G.M. Birtwistle, editors, Logics for Concur-.
MpRi Ris the passive gravitational mass related to agent i, MiRiR is the inertia mass related to agent i. In (7), G(t) is gravitational constant at time t such as G(G0 ...
together (e.g., 70% of costumers who buy chips and drinks buy also pizza with ... domain knowledge in the form of taxonomy to mine negative associations. How- ever, it is ... generalized disjunction-free literal set representation (GDFLR). In [9] ...
4Graduate School of Engineering' Kyoto University' Kyoto Japan. Abstract ... Keywords: Data Mining; Data Pre-processing; Association Rules; Decision Tree; At-.
Email: [email protected], [email protected] ... Channel coding, such as forward error correc- ... coder knows the error concealment algorithm applied.
performance for the low bit rate coding. Experimental results ... Later on, with the popularity of the Internet, the progressive compression and transmission has.
Keywords: intelligent agents, GIS, association rules, clustering, data mining. 1. ... framework creates an intelligent agent that will handle this request until it will be ...
[3] X. Li, F. Zhu, âApplication of the zero-crossing rate,. LOFAR spectrum and wavelet to the feature extraction of passive sonar signalsâ, Proc. 3rd World Conf.
says that 30% (support) of customers purchase both Desktop PC and Ink-jet .... of items I ={Laser printer, Ink-jet printer, Dot matrix printer, Desktop PC, Note-.
Notebook. Laser Ink-jet. Figure 1. An example of taxonomy T. It is likely to happen that the association rule. Desktop â Ink-jet (Support=30%, Confidence=60%).
Association rule discovery, a successful and important mining task, aims at ... We present the association rule problem statement in Section 2. A ... Jane Austen.
Nov 26, 2008 - Generate a set of 20 random attribute orderings named SAOi, with i = 1, .., 20. 2. ... Learn the Bayesian network BNLa,i by using La on SAOi.
where data mining algorithms are analyzed for the side effects they incur in data privacy. For example, through data mining, one is able to infer sensitive ...
Keywords: Data Mining, Association Rules, Random Sampling, Cherno bounds. ... si cation { nding rules that partition the database into disjoint classes; ...
AbstractâMost outlier detection algorithms are proposed to discover ... The credit card fraud ... mined again by employing association rules mining algorithm.
Progressive Sampling for Association Rules Based ... - Semantic Scholar
appropriate sample size without the need of executing association rule mining ..... normalized self-similarity curve which is generated by algorithm RC-S. Note.
Progressive Sampling for Association Rules Based on Sampling Error Estimation Kun-Ta Chuang, Ming-Syan Chen, and Wen-Chieh Yang Graduate Institute of Communication Engineering, National Taiwan University, Intelligent Business Technology Inc., Taipei, Taiwan, ROC [email protected][email protected][email protected]
Abstract. We explore in this paper a progressive sampling algorithm, called Sampling Error Estimation (SEE ), which aims to identify an appropriate sample size for mining association rules. SEE has two advantages over previous works in the literature. First, SEE is highly efficient because an appropriate sample size can be determined without the need of executing association rules. Second, the identified sample size of SEE is very accurate, meaning that association rules can be highly efficiently executed on a sample of this size to obtain a sufficiently accurate result. This is attributed to the merit of SEE for being able to significantly reduce the influence of randomness by examining several samples with the same size in one database scan. As validated by experiments on various real data and synthetic data, SEE can achieve very prominent improvement in efficiency and also the resulting accuracy over previous works.
1
Introduction
As the growth of information explodes nowadays, reducing the computational cost of data mining tasks has emerged as an important issue. Specifically, the computational cost reduction for association rule mining is elaborated upon by the research community [6]. Among research efforts to improve the efficiency of mining association rules, sampling is an important technique due to its capability of reducing the amount of analyzed data [2][5][7]. However, using sampling will inevitably result in the generation of incorrect association rules, which are not valid with respect to the entire database. In such situations, how to identify an appropriate sample size is key to the success of the sampling technique. Progressive sampling is the well-known approach to determine the appropriate sample size in the literature. Progressive sampling methods are based on the observation that when the sample size exceeds a size Ns , the model accuracy λ obtained by mining on a sample will no longer be prominently increased. The sample size Ns can therefore be suggested as the appropriate sample size. In general, the sample size Ns can be identified from the ”model accuracy curve”, which is in essence the model accuracy versus the sample size T.B. Ho, D. Cheung, and H. Liu (Eds.): PAKDD 2005, LNAI 3518, pp. 505–515, 2005. c Springer-Verlag Berlin Heidelberg 2005
506
K.-T. Chuang, M.-S. Chen, and W.-C. Yang Knee of the model accuracy curve Knee of the curve