Mar 5, 2014 - Vendors are peddling Big Data solutions; consulting firms employ ...... The Analytics archetype processes DDSs to enhance decision making by ...
choose phantoms and how to allocate space for hash tables. 1FTA stands ...... the hash tables and we let a uniform random dataset pass through the phantoms ...
92 West Dazhi Street, P.O Box 315, Harbin 150001, P. R. China ..... In the clustering process, the main operation of Squeezer is to maintain and update multiple histograms. ..... It performed second best or third best in most other cases.
Stochastic Data Stream Algorithms. â What needs to be ... Storage space, communication should be sublinear .... Massiv
ICDE 2005 Tutorial. 13. Online Mining Data Streams. • Synopsis/sketch
maintenance. • Classification, regression and learning. • Stream data mining
languages.
several algorithms were already proposed to find the top-k frequent elements, ..... strategy is to monitor top-m states, using only the guaranteed top-m elements.
data compression with multiple read/write streams than we can with only one. ... in the size of the smallest context-free grammar than generates s and only s.
chapter, we shall make another assumption: data arrives in a stream or streams, and if it is not ... âwindowâ consis
1Department of Computer Science, Sun Yat-Sen University, Guangzhou .... OCTS (stands for online clustering of text streams) ...... Jian Yin received the B.S.,.
Institut für Verteilte Systeme. Fachgebiet Wissensbasierte Systeme (KBS). Anomaly Detection in Data Streams. By,. Amit Amit. June, 2017. Supervised by: Prof.
Cluster analysis on data streams becomes more difficult, because the ... Data stream is also an appropriate model for access to large data sets stored in ...
cates are merged into one distinct reading (duplicate ..... SELECT *. FROM Readings R2. OVER(max distance milliseconds PRECEDING R1). WHERE R2.key = ...
Page 1 ... analyze the incoming data in an online manner, tolerating but a constant time delay. For this ... In addition, the goal is sometimes to arrange the.
4.3.3 Analysis of Bloom Filtering. If a key value is in S, then the element will surely pass through the Bloom filter. H
Mar 5, 2014 - pany Macy's implemented a strategy for engag- ...... Hackathorn, âCurrent Practices in Active Data Warehousing,â Bolder Technology, Inc.,.
Jul 2, 1996 - The Ultrix kernel was modified to add the cue system call. This machine is used for transmitting data read from a file on the disk to the network ...
Artificial Intelligence Group. 1christian.bockermann .... MOA provides a collection of online learning algorithms with a focus on evaluation and benchmarking.
RapidMiner Streams Plugin. Christian Bockermann and Hendrik Blom. Technical
University of Dortmund. Artificial Intelligence Group. 1christian.bockermann ...
all cases, computational procedures have to deal with a large amount of data that are ... incremental training set, the accuracy classification and the through- put of the data ... transmitted and to provide a shift in paradigms for data analysis.
Aug 19, 2004 - duce a general framework for mining concept-drifting data streams using weighted ensemble classifiers. We train an ensemble of classification ...
online shopping website, which is browsed by thousands ... method similar to the one for online association rule mining in [3] ..... generator provided by Zaki [6].
Analyzing Data Streams by Online DFT. Alexander Hinneburg1, Dirk Habich2, and Marcel Karnstedt3. 1 Martin-Luther University of Halle-Wittenberg, Germany.
Recently, a framework called Massive Online Analysis (MOA) for implementing algorithms and running ... Chapter 2 presents the basics of data stream mining.
Cham C. Aggarwal, Jiawei Hon, Jianyong Wang and Philip S. Yu. 1. ... Jiawei Han, Y. Dora Cai, Yixin Chen, Guozhu Dong, Jian Pei, Benjamin W. Wah, and.
Data Streams Models and Algorithms
edited by
Charu C. Aggarwal IBM, T.J. Watson Research Center Yorktown Heights, NY, USA
4y Springer
Contents
ListofFigures ListofTables Preface
xi xv xvii
1
An Introduction to Data Streams Cham C. Aggarwal 1. Introduction 2. Stream Mining Algorithms 3. Conclusions and Summary References 2 On Clustering Massive Data Streams: A Summahzation Paradigm Cham C. Aggarwal, Jiawei Hon, Jianyong Wang and Philip S. Yu 1. Introduction 2. The Micro-clustering Based Stream Mining Framework 3. Clustering Evolving Data Streams: A Micro-clustering Approach 3.1 Micro-clustering Challenges 3.2 Online Micro-cluster Maintenance: The CluStream Algorithm 3.3 High Dimensional Projected Stream Clustering 4. Classification of Data Streams: A Micro-clustering Approach 4.1 On-Demand Stream Classification 5. Other Applications of Micro-clustering and Research Directions 6. Performance Study and Experimental Results 7. Discussion References 3 A Survey of Classification Methods in Data Streams Mohamed Medhat Gaber, Arkady Zaslavsky and Shonali Krishnaswamy 1. Introduction 2. Research Issues 3. Solution Approaches 4. Classification Techniques 4.1 Ensemble Based Classification 4.2 Very Fast Decision Trees (VFDT)
DATA STREAMS: MODELS AND ALGORITHMS 4.3 On Demand Classification 4.4 Online Information Network (OLIN) 4.5 LWClass Algorithm 4.6 ANNCAD Algorithm 4.7 SCALLOP Algorithm 5. Summary References
4 Frequent Pattern Mining in Data Streams RuomingJin and Gagan Agrawal 1. Introduction 2. Overview 3. New Algorithm 4. Work onOther Related Problems 5. Conclusions and Future Directions References 5 A Survey of Change Diagnosis Algorithms in Evolving Data Streams Cham C. Aggarwal 1. Introduction 2. The Velocity Density Method 2.1 Spatial Velocity Profiles 2.2 Evolution Computations in High Dimensional Case 2.3 On the use of clustering for characterizing stream evolution 3. On the Effect of Evolution in Data Mining Algorithms 4. Conclusions References 6 Multi-Dimensional Analysis of Data Streams Using Stream Cubes Jiawei Han, Y. Dora Cai, Yixin Chen, Guozhu Dong, Jian Pei, Benjamin W. Jianyong Wang 1. Introduction 2. Problem Definition 3. Architecture for On-line Analysis of Data Streams 3.1 Tilted time frame 3.2 Critical layers 3.3 Partial materialization of stream cube 4. Stream Data Cube Computation 4.1 Algorithms for cube computation 5. Performance Study 6. Related Work 7. Possible Extensions 8. Conclusions References
Contents 7 Load Shedding in Data Stream Systems Brian Babcock, Mayur Datar and Rajeev Motwani 1. Load Shedding for Aggregation Queries 1.1 Problem Formulation 1.2 Load Shedding Algorithm 1.3 Extensions 2. Load Shedding in Aurora 3. Load Shedding for Sliding Window Joins 4. Load Shedding for Classification Queries 5. Summary References 8 The Sliding-Window Computation Model and Results Mayur Datar and Rajeev Motwani 0.1 Motivation and Road Map 1. A Solution to the BASICCOUNTING Problem 1.1 The Approximation Scheme 2. Space Lower Bound for BASICCOUNTING Problem 3. BeyondO'sandl's 4. References and Related Work 5. Conclusion References 9 A Survey of Synopsis Construction in Data Streams Cham C. Aggarwal, Philip S. Yu 1. Introduction 2. Sampling Methods 2.1 Random Sampling with a Reservoir 2.2 Concise Sampling 3. Wavelets 3.1 Recent Research on Wavelet Decomposition in Data Streams 4. Sketches 4.1 Fixed Window Sketches for Massive Time Series 4.2 Variable Window Sketches of Massive Time Series 4.3 Sketches and their applications in Data Streams 4.4 Sketches with p-stable distributions 4.5 The Count-Min Sketch 4.6 Related Counting Methods: Hash Functions for Determining Distinct Elements 4.7 Advantages and Limitations of Sketch Based Methods 5. Histograms 5.1 One Pass Construction of Equi-depth Histograms 5.2 Constructing V-Optimal Histograms 5.3 Wavelet Based Histograms for Query Answering 5.4 Sketch Based Methods for Multi-dimensional Histograms 6. Discussion and Challenges
10 A Survey of Join Processing in Data Streams Junyi Xie and Jun Yang 1. Introduction 2. Model and Semantics 3. State Management for Stream Joins 3.1 Exploiting Constraints 3.2 Exploiting Statistical Properties 4. Fundamental Algorithms for Stream Join Processing 5. Optimizing Stream Joins 6. Conclusion Acknowledgments References 11 Indexing and Querying Data Streams Ahmet Bulut, Ambuj K. Singh 1. Introduction 2. Indexing Streams 2.1 Preliminaries and definitions 2.2 Feature extraction 2.3 Index maintenance 2.4 Discrete Wavelet Transform 3. Querying Streams 3.1 Monitoring an aggregate query 3.2 Monitoring a pattem query 3.3 Monitoring a correlation query 4. Related Work 5. Future Directions 5.1 Distributed monitoring Systems 5.2 Probabilistic modeling of sensor networks 5.3 Content distribution networks 6. Chapter Summary References 12 Dimensionality Reduction and Forecasting on Streams Spiros Papadimitriou, Jimeng Sun, and Christos Faloutsos 1. Related work 2. Principal component analysis (PCA) 3. Auto-regressive modeis and recursive least Squares 4. MUSCLES 5. Tracking correlations and hidden variables: SPIRIT 6. Putting SPIRIT to work 7. Experimental case studies
A Survey of Distributed Mining of Data Streams Srinivasan Parthasarathy, Amol Ghoting and Matthew Eric Otey 1. Introduction 2. Outlier and Anomaly Detection 3. Clustering 4. Frequent itemset mining 5. Classification 6. Summarization 7. Mining Distributed Data Streams in Resource Constrained Environments 8. Systems Support References 14 Algorithms for Distributed Data Stream Mining Kanishka Bhaduri, Kamalika Das, Krishnamoorthy Sivakumar, Hillol Kargupta, Wolffand Rong Chen 1. Introduction 2. Motivation: Why Distributed Data Stream Mining? 3. Existing Distributed Data Stream Mining Algorithms 4. A local algorithm for distributed data stream mining 4.1 Local Algorithms: definition 4.2 Algorithm details 4.3 Experimental results 4.4 Modifications and extensions 5. Bayesian Network Leaming from Distributed Data Streams 5.1 Distributed Bayesian Network Leaming Algorithm 5.2 Selection of samples for transmission to global site 5.3 Online Distributed Bayesian Network Leaming 5.4 Experimental Results 6. Conclusion References 15 A Survey of Stream Processing Problems and Techniques in Sensor Networks Sharmila Subramaniam, Dimitrios Gunopulos 1. Challenges
The Data Collection Model Data Communication Query Processing 4.1 Aggregate Queries 4.2 Join Queries 4.3 Top-fc Monitoring 4.4 Continuous Queries 5. Compression and Modeling 5.1 Data Distribution Modeling 5.2 OutlierDetection 6. Application: Tracking of Objects using Sensor Networks 7. Summary References