Proceedings of the IASTED International Conference Biomedical Engineering (Biomed 2011) February 16 - 18, 2011 Innsbruck, Austria
CASE-BASED AND STREAM-BASED CLASSICATION IN BIOMEDICAL APPLICATION Yang Hang
Simon Fong
Andy Ip
Sabah Mohammed
Department of Computer and Information Science University of Macau Macau
Department of Computer and Information Science University of Macau Macau
Faculty of Science and Technology University of Macau Macau
Department of Computer Science Lakehead University Thunder Bay, Canada
[email protected]
[email protected]
[email protected]
[email protected]
methods still support mining biomedical data streams effectively? Although many biomedical signal processing methods exist that can detect anomalous patterns out from the incoming streams, it is desirable to have a decisionsupport technique that offers accurate predication based on the latest updates of the incoming signal streams. To this end, it is well known that traditional data-mining, for instance, induction-based decision tree classification works by scanning a finite and structured set of data multiple times, in order to build a classifier model. We generalized this method to be named as cases-based data mining, which has been widely applied in bio-medical fields of chronic diseases prognosis and diagnosis [1], traditional Chinese medicine [2], children healthcare [3], heart–lung transplantation [4], diabetes diagnosis [5], etc. The main problem with this traditional mining method is that the training datasets must be finite and it takes a relatively long time to construct or even to refresh a classifier model. Recently a new breed of mining algorithms, generally called data stream mining, are designed for handling high-speed and infinite amount of data that stream in continuously. It was argued in [6] that case-based data mining are not suitable for very high speed data rates in real-time decision-support application. Stream-based data mining may address the challenges of processing high-volume, real-time biomedical data or signals. The requirement for acquiring timely decisions from the data mining model is that the mining time must be less than the speed of the incoming data streams [7]. The other unique requirement is that we do not take for granted that a full set of data are always available – data stream mining algorithm only can process one pass of data at a time, and a decision can be made instantly with certain accuracy. These requirements fit in quite well in biomedical applications especially those involve monitoring and instant analysing. Our previous research [28, 29] already evaluated the differences between traditional and real-time data mining applications, but only with artificially generated data and financial related dataset. To the best of the authors’ knowledge, this is the first attempt to experiment data stream mining techniques on bio-medical datasets. The prime objective of this paper is to investigate how well data stream mining, especially real-time classification performs over bio-medical datasets, and what the corresponding prediction accuracy is.
ABSTRACT Two major families of biomedical data exist in bioinformatics, namely case-based data which are historical record archival, and stream-based data which are dynamic signals usually collected from sensors or monitors. Traditional decision tree classification has proven its use in data-mining over static case-based data for revealing interesting patterns. However, data mining over biomedical data streams have not been explored by previous researchers, despite biomedical signal processing techniques existed for decades but they are mainly for pattern detection rather than prediction or classification. In this paper, we shed light into the impacts of data stream mining techniques on biomedical data streams. We illustrate the two different workflows of case-based and stream-based data mining for bio-medical classification. For comparison of the two (case-based and stream-based) mining techniques, a simulation is programmed for conducting experiments over these two types of biomedical data. From the results we observed that: casebased classification has a higher accuracy but slower in running time because of the multi-scans over a database. Stream-based classification has a high speed but achieves a relatively lower accuracy unless the data size reaches certain large size. As a novel contribution in this paper, we propose a method to solve the problem of long boosting step in stream mining. KEY WORDS Decision Tree, Stream Mining, Biomedical Classification.
1. Introduction Biomedical data pose certain challenges to bioinformatics because of their inherent natures of high-dimensionality, huge volume, and demand for extremely high accuracy (as it often involves life-and-death). Recent advances in biomedical sensing and monitoring technologies further step up the challenges as data are generated in streams of time series in real-time; one example is fetal cardiotocograms, where the respective diagnostic features are automatically and continuously measured from the streaming signals, processed and displayed. Therefore a fundamental question arises: Can traditional data mining
DOI: 10.2316/P.2011.723-153
207
This paper is structured as follow: Section 2 reviews relevant background information on classification techniques and data mining in bio-medical applications. Section 3 illustrates the two different types of applications – case-based and stream-based. Section 4 discusses the experimental results on the performance of the two different types of classification. Discussion and conclusion then followed in the final section.
2.2 Decision Tree in Stream Mining Maron and Moore [11] in 1993 first highlighted that a small amount of available data may be sufficient to be used as sample at any given node for picking the split attribute for building a decision tree. But the small available data must come in continuously at a relatively fast speed, for completing the tree building process. But exactly how many such streaming data are needed should be governed by a statistical method called Hoeffding bound or additive Chernoff bound. The statistical bounds are for deciding how many samples are statistically required at each node, whether to be split, such that a decision tree can be built ‘on the fly’ similar to the TDT. The ultimate difference is that the arriving data come in streams, which potentially can sum to infinity. The following equations are extracted from our earlier paper [29] that essentially depict about the backbone of the stream mining model by utilising Hoeffding bound. They are basically heuristic evaluation function that is used to materialize split attributes that are converted from leaves to nodes. A decision tree is ‘testedand-trained’ recursively in this manner; when sufficient data that bias a decision at a particular node arrive the leaves are replaced with the relevant decision nodes to reflect the current conditions (or rules) as fed by the data. G(.) denotes the heuristic evaluation function for building a decision tree based on Information Gain of an attribute, info(Aj). The info(Aj) function measures the amount of information that is sufficient to classify a sample as a node by the theory of information gain. The merit of a discrete attribute’s counts nijk, representing the number of samples of class k that reach the leaf where the attribute j takes the value i. is estimated by sufficient statistics. In Equation 3, Pi is the probabilities of observing the value of attribute i. Pik is the probability of observing the value of the attribute i given the class k.
2. Background Knowledge Background information about traditional decision tree classification can be found in [8, 9, 10]. Recent literature studied decision tree in stream mining, especially using Hoeffding bound method, can be found in [11-17]. Readers may refer to [28] for a concise review of the relevant stream mining technology. 2.1 Decision Tree Classification In this paper we generalized the traditional classifier with a name called “traditional decision tree (TDT) model”. They are typically implemented by using classical algorithms that are based on induction and informationgain theories. Some classical algorithms can be found in [8] for ID3, [9] for C4.5, and [10] for CART that have been very widely used in the past decades. What they have in common is that they need to scan through all the data from a database for multiple times in order to construct a tree-like structure; one example is given in Figure 1.
(1) G(A j ) = info(samples) - info(A j ) (2) info(Aj ) = ∑ Pi (∑ − Pik log(Pik )) i
k
(3) Pik = nijk / ∑ najk a (4) Pi = ∑ nija / ∑ ∑ najb a a b Figure 1: A typical decision tree graph layout
2 (5) ε = R ln(1 / δ )
2N
When a substantial amount of fresh data arrives, TDT model has to reprocess the tree by reading through the whole database again so that the decision tree model can be updated. In the past and even for present, many TDT related applications work on relatively static data which do not consider refreshing very often. TDT was not initially designed for accepting high-speed data streams and refreshing the model frequently in almost real-time. However, this TDT model has been extensively used in biomedical data mining for the data are usually historical – decision from TDT model is derived from an overall large population of medical records, and the temporal properties of those medical records usually are not to be stringently concerned.
Assume we have a real-valued random variable r with a bounded range of R, that arrives in n number of independent observations. Equation 5 shows how the Hoeffding bound is computed, with a confidence level 1 − δ , and mean of r is r − ε at least. The observed mean of samples is r . We assume that the range R has a probably of 1 given that the information gain of R is log 2 Class # . The core of the algorithm is the use of Hoeffding bound for choosing a split attribute as the decision node. We let xa be the attribute that has the highest G(.), and xb be the attribute that has the 2ndhighest G(.). Such that the difference between the pair of the top quality attributes is defined as ∆G = G ( xa ) − G ( xb ) .
208
3. Case- and Stream-based Applications
If given N samples that have ∆G > ε observed in a leaf, Hoeffding bound requires that xa to be the winner attribute with highest value in G(.) with probability 1 − δ , the leaf can be changed to a decision node that splits at xa .
3.1 Case-based Bio-medical Application Traditional Decision Tree (TDT) classification algorithm’s accuracy is proportional to the abundance of samples in the training dataset. In general, the more samples there are, the better accuracy can be achieved. Nowadays, many bio-medical institutes and centers have established their own data warehouses for medical uses. The process of creating data warehouse is long-term and complicated. The case documents of each observation, as well as the diagnosis that depends on experts’ analyses written in feature bio-medical record are stored in a database. There are usually thousands of such records. Traditional knowledge discovery uses this constant and bounded amount of data to explore the potential patterns in data, which is presented in Figure 2. In the example shown in Figure 2, bio-medical centers maintain their own knowledge system as a reference for diagnosis. The knowledge is mined by historical cases of diagnosis, which are composed as formatted patterns through KDD classification. With time goes by, the model will gradually become out-dated because the data are filled with new and fresh information. In terms of case-based process, the model updating should reload all the data from the database to re-construct the decision tree model. However, the update process may be time-consuming, depending on the data volume and attribute dimensions. For this reason, it is impossible to update the model too frequently in short intervals for large health-care centers or hospitals that have large data repositories. The usefulness of a model is fading with time elapses.
2.3 Analysis Methods in Bio-Medical Application On-line analytical processing (OLAP) is a useful tool for database analyzing. A number of commercial products have been built to support this functionality, for examples Cognos’ Enterprise’s OLAP, PowerPlay, Business Objects Inc.’s Business Objects, Informix’s MetaCube, Platinum’s InfoBeacon, MicroStrategy’s DSS Agent, and Oracle’s Express. For this reason, many healthcare centers chose OLAP as the method for analyzing data so as to find business/medical insights [19]. Human medical data are at once the most rewarding and difficult of all biological data to mine and analyze. The natural history of disease affects statistical hypotheses in an unknown way. Therefore, medical data have a special status based upon their applicability to all people; their urgency (including life-or-death); and a moral obligation to be used for beneficial purposes [20]. A study [21] demonstrates how the decision tree could be used in developing “continuous quality improvement” (CQI) strategies [22].The CHAID algorithm [23] provided cumulative statistics to find the inpatient mortality cases by taking the best segments of the sample in healthcare. A data mining method combining CHAID decision trees algorithm, logical regression and artificial neural networks is applied to large, complex public-use Medicare insurance claims files to reveal insights such as geographic variation in healthcare delivery practice patterns for lung cancer [24]. It works with propensity scoring methods can be used to improve the predictive models. Another study [25] aims to examine the healthcare coverage of individuals by applying data mining techniques on a wide-variety of predictive factors, used the artificial neural networks and the CART decision tree algorithm [26]. The sensitivity analysis and variable importance measures are calculated to analyze the importance of the predictive factors. It uses a multi-layer perception type artificial neural network model find out that the most predictive factors are income, employment status, education, and marital status. But both their experimental data are historical (1999’s and 2004’s). On the other hand, a data-mining framework is proposed to utilize the concept of clinical pathways to facilitate automatic and systematic construction of an adaptable and extensible detection model [27]. This framework is used for health care service providers' fraudulent and abusive behavior detection. However, as far as authors have known, there is little research about the real-time application that can provide the knowledge discovery process for bio-medical uses.
3.2 Stream-based Bio-medical Application Stream-based bio-medical applications described in this section are derived from the previous research in data mining and real-time business intelligence. Different from the cased-based mining, the proposed architecture concentrates on constructing a stream mining system which is able to output prediction results to end-user in real-time. From the view of real-time application requirement, the model must keep up-to-date whenever new data comes. Stream-based data mining establishes a rule-based model, in a form of decision tree (similar to C4.5). The ultimate difference is that the model updates itself while fresh data are streaming in. At each time, the data are read in one pass in a manner of read-train-andforget. Therefore no database is needed for storing up historical data. The stream mining model can be extended to couple with some detection algorithm, such as some bio-medical signal processing, for detecting out anomalous patterns from the incoming streams.
209
Figure 2: Case-based application flowchart
Figure 3: Stream-based application flowchart Figure 3 shows an overview of a hospital process as a stream-based example. Similar to case-based applications, the patterns that are found by data mining are used as references for diagnosis. A unique feature is that the operation can be in real-time, that includes feeding in the data streams, processing them, using them to train/refresh the decision trees, thus both detection and classification (or prediction) can be done simultaneously. Therefore bio-medical expert can leverage such tool for doing detection and classification in real-time when data are continuously arriving (possibly from sensors and monitors).
classification. Those datasets can be downloaded from UCI Data Repository (http://archive.ics.uci.edu/ml) which is a popular data archive for data mining benchmarking. Table 1: UCI Experimental Raw Dataset Name Abalone Breast-Cancer Thyroid CTG PAS Ecoli Mammographic
Attribute# 8 10 21 23 169 8 6
Instances# 4177 699 7200 2126 4418 336 961
Type Numeric Nominal nominal Mix Numeric Numeric Mix
AddIns# 104400 83880 100881 106300 92778 84000 96099
The simulation system adopts WEKA J48 C4.5 decision tree classifier to simulate case-based classification, and MOA Hoeffding Tree algorithm to simulate stream-based classification. The data stream in experiment is stored in ARFF file format. The development environment is under JAVA JDK 1.5 and WEKA 3.6, the system runs on a workstation of Windows 7, 64-bits with CPU Intel Quad 2.83 GHz and 8Gb RAM.
4. Case- and Stream-based Simulations To further illustrate the differences between case-based and stream-based biomedical system, we programmed a JAVA based simulation system. It uses life-science datasets obtained from real-world to simulate the two types of knowledge discovery processes based on
210
accuracy in Breast-cancer and CTG datasets; both of which have moderate number of attributes. It is also noticeable that the computation time required is directly related to the number of instances. The more instances number is, the longer time it is spent on data mining. Amongst these datasets, PAC data has the greatest number of attributes. As a result, running over the PAC dataset by C4.5 TDT algorithm gives a relatively poor accuracy. The experiments results are shown in Table 2 and Table 3 below that show the case-based classification results and stream-based classification results, respectively.
4.1 Case-based Classification The first experiment to simulate the case-based TDT uses WEKA J48 C4.5 classifier to build the decision tree model. In addition to simulating the condition of large volume data, the raw dataset is also enlarged to nearly 100,000 instances by a random variable simulator, so as to simulate a large volume of data stream. In general, the accuracy is depending on the quality of the available dataset. If the data contents are complete and consistent, the classification generally yields a satisfactory level of prediction accuracy. In analyzing those datasets, we observe that C4.5 TDT algorithm has the best
Table 2: Case-based classification results Breast Cancer Thyroid CTG PAS
Abalone
Dataset Instances# Accuracy (%) Time (sec) Size (node#
)
Ecoli
4177 52.44
104400 98.33
699 94.42
83880 99.67
7200 66.71
100881 66.42
2126 98.78
106300 99.96
4418 66.16
92778 67.42
336 84.23
84000 98.81
0.15 711
17.32 2579
0.02 31
0.31 240
0.14 49
6.82 203
0.09 27
33.01 63
2.45 897
6.82 203
0.08 43
7.46 105
Dataset Instances# Accuracy (%) Time (sec) Size (node#
Table 3: Stream-based classification results Breast Cancer Thyroid CTG
Abalone
)
Mammographic 961 96099 82.31 93.71 0.02 18
4.32 515
PAS
Ecoli
Mammographic
104400 53.20 0.41
83880 99.67 0.31
100881 46.55 1872
106300 98.24 1.15
92778 69.36 4.80
84000 72.15 0.66
96099 84.13 0.52
27
240
12
25
73
15
198
Stream-based Application: HTA Accuracy PAS
Abalone
Thyoid
CGT
BreastCancer
Ecoli
Mammographic
120
100
Accuracy (Correct %)
80
60
40
20
0
0 0 0 1
0 0 0 3
0 0 0 5
0 0 0 7
0 0 0 9
0 0 0 1 1
0 0 0 3 1
0 0 0 5 1
0 0 0 7 1
0 0 0 9 1
0 0 0 1 2
0 0 0 3 2
0 0 0 5 2
0 0 0 7 2
0 0 0 9 2
00 01 3
0 0 0 3 3
0 0 0 5 3
0 0 0 7 3
0 0 0 9 3
0 0 0 1 4
0 0 0 3 4
0 0 0 5 4
0 0 0 7 4
0 0 0 9 4
0 0 0 1 5
0 0 0 3 5
0 0 0 5 5
0 0 0 7 5
0 0 0 9 5
Number of Instances
Figure 4: Stream-based classification accuracy
211
0 0 0 1 6
0 0 0 3 6
0 0 0 5 6
0 0 0 7 6
0 0 0 9 6
0 0 0 1 7
0 0 0 3 7
0 0 0 5 7
0 0 0 7 7
0 0 0 9 7
0 0 0 1 8
0 0 0 3 8
0 0 0 5 8
0 0 0 7 8
0 0 0 9 8
different from the previous experiments in which the instances are entirely inputted to the data mining programs at one time (at each testing point along the xaxis); the divisions of the data for model training, crossvalidation and testing were automatically done by the programs according to the default settings.
4.2 Stream-based Classification In contrast to case-based, stream-based application utilizes decision tree classification of HTA. See from Figure 4, the result indicates that the accuracy of HTA is accumulative, which is increasing with more and more instances being calculated. However, the same experiment run by C4.5 consumes more time. Comparing Table 2 with Table 3, we can see that one of the disadvantages of TDT is a long computation time required. In other words, when more instances arrive, the time spent on rebuilding TDT model becomes longer and longer. Considering a real-time scenario that the model is required to update in each single second, case-based classification fails to construct decision tree model within a short time. In our previous research [30], we carried out a general comparison between C4.5 TDT algorithm and HTA. We reported the experimental comparison of C4.5 and HTA. The experiments are highlighted as follow: (1) Used medium synthetic dataset to simulate traditional C4.5 decision tree algorithm and Hoeffding tree stream mining algorithm. The result shows C4.5 can achieve a higher accuracy than HTA in medium dataset. But HTA operates in faster computation time and smaller tree size than C4.5. (2) Used real world small dataset to compare C4.5 and HTA stream mining. The result is similar to the medium synthetic dataset. And (3) C4.5 reveals its limits in handling a huge dataset. Both nominal and numeric synthetic datasets of huge sizes are used with HTA. Simulation result finds HTA accuracy is sensitive to noise data. Tree size is increasing linearly when more instances arrive. Numeric dataset results in a more complex tree model but it is more accurate than nominal data in HTA. From the table comparison, we find that: case-based classification using C4.5 has a higher accuracy but slower in running time because of the multi-scans over a database. Stream-based classification using HTA has a very fast speed but achieves a relatively lower accuracy in the case of small data size. The time for booting step of HTA is long since the algorithm starts from a very low accuracy.
Figure 5: Case-based model usefulness In the breast-cancer dataset, C4.5 is used to construct the decision tree model by updating at recurring intervals when every 400 new instances have arrived. As a result, the experiment presents the first three periods during which the model update took place. In each period, the first 400 data are collected for rule-building, while the other 1,200 data are used for prediction by the just updated decision-making model. The simulated result for the case of C4.5 is shown in Figure 5. Clearly, the established rules by the aged data fall short of accuracy for making predictions with the new coming data. This is reflected by alike declining trends over the three periods of time. Comparatively, we applied the same dataset for HTA in another experiment.
4.3 Classification Model Usefulness Comparison In this set of experiment, we attempt to illustrate the usefulness of case-based and stream-based classification. In a real-life environment, for example in a healthcare center, where predictive is used as a major ingredient in its decision support system, the sequence of operation usually goes by first building up a decision-making model (aka Decision Tree), and then put it in use along with the incoming data (which is similar to testing for accuracy in our experiment). Decisions were made in real-time by the case-based system, and the models are supposed to be good until a while later when the model needs to be updated (rebuild decision tree model) with the inclusion of the new data. This process was already explained in Figure 2. This experiment is set out to verify the usefulness of the data mining algorithms under such working sequence. That is
Figure 6: Stream-based model usefulness The result in Figure 6 shows the performance curve is rather steady (in contrast of the down lines broken up as in C4.5) and the general accuracy is ever improving as the decision tree model gets updated by the unique mining mechanism of HTA, each time when new data feed in. However, HTA is an accumulative algorithm that the accuracy is increasing when more and more instances come. In other words, the booting step of HTA may be
212
longer than that of C4.5 to some extent. It’s also a disadvantage of HTA comparing to TDT algorithms.
monitoring applications nowadays. TDT algorithms seem to be no longer suitable. Therefore, stream-based classification may apply for bio-medical applications. In this paper, we illustrated the different workflows of case-based and stream-based data mining in terms of bio-medical examples. By using the life-science dataset from real-world applications, the experiment shows that the advantages and disadvantages of case- and streambased data mining. We built a JAVA simulator for casebased classification using C4.5 decision tree, and streambased classification using HTA. To make stream-based application more practical (with increased accuracy during the start-up), we proposed a new method to solve the problem of long boosting step of HTA. In future, we will continue studying other intelligent methods to improve the accuracy of stream-based classification, and probe deeper into their integration into biomedical applications, possibly with some real-life case studies.
4.4 Method to Improve Stream-based Classification From the previous experiment result in Section 4.2 & 4.3, we find that: although stream-based can handle very large data volume, yet it has a respectively long boosting time for the model to reach a satisfactory level of accuracy. VFDT [12], the first system introducing HTA, provides the comparison amongst C4.5, VFDT, and VFDT-boot. VFDT-boot is VFDT bootstrapped with an over-pruned version of the tree produced by C4.5. The comparison result shows that VFDT-boot classification has a higher accuracy that that of VFDT. In terms of that study, the accuracy of stream-based application applying HTA may be improved using HTAboot method. The experimental dataset is still breast cancer. Previous experiment result in Figure 4 shows that the accuracy of HTA is increasing from 48% to 89%. The accuracy stays below 70% since the instances number from 0 to 19,000. To improve the accuracy, we use the first part of 19,000 to build a pruned C4.5 decision tree model. The HTA is then run on the top of that pre-trained model. As a result shown in Figure 7, the stream-based mining using HTA-C45Boot obtains a better accuracy than that of HTA alone.
References [1] Mu-Jung Huang, Mu-Yen Chen, & Show-Chin Lee, Integrating data mining with case-based reasoning for chronic diseases prognosis and diagnosis, Expert Systems with Applications, Vol 32, Issue 3, 2007, 856-867 [2] Suryani Lukman, Yulan He, & Siu-Cheung Hui, Computational methods for Traditional Chinese Medicine: A survey, Computer Methods and Programs in Biomedicine, Vol 88, Issue 3, 2007, 283-294 [3] Chun-Lang Chang, A study of applying data mining to early intervention for developmentally-delayed children, Expert Systems with Applications, Vol 33, Issue 2, 2007, 407-412 [4] Asil Oztekin, Dursun Delen, & Zhenyu (James) Kong, Predicting the graft survival for heart-lung transplantation patients: An integrated data mining methodology, International Journal of Medical Informatics, In Press, Corrected Proof, Available online 3 June 2009
Figure 7: HTA-C45Boot accuracy
5. Conclusion Decision tree classification is one of the most important methods that have been widely used in many knowledge discovery applications including those in the biomedical field. It builds a tree-like graph that presents the relationship amongst attributes and class from data. The tree structure is easily understood by human and computer. For this reason, decision tree algorithms have been favourably applied in various bio-medical KDDs. Case-based classification in bio-medical is implemented by the traditional decision tree classification, which requires to multi-scan data to construct a decision tree. However, with the advances of biomedical technologies, the respective data can be recorded or generated at high speed; the data are streaming both in large volume and potentially unlimited, in most
[5] Chad-Ton Su, Chien-Hsin Yang, Kuang-Hung Hsu, & Wen-Ko Chiu, Data mining for the diagnosis of type II diabetes from three-dimensional body surface anthropometrical scanning data, Computers & Mathematics with Applications, Vol 51, Issues 6-7, 2006, 1075-1092. [6] Gaber, M. M., Zaslavsky, A., & Krishnaswamy, S. Mining data streams: a review. SIGMOD Rec. 34, 2, Jun. 2005, 18-26. [7] Gaber, M.M., Data Stream Mining Using Granularity-Based Approach, Studies in Computational Intelligence,Vol 206, 2009, 47-66
213
[8] Quinlan, J.R. Induction on decision tress. Machine Learning, 1, 1986, 81-106.
Data Warehouse, Journal of healthcare information management, vol. 15, no. 2, 2001
[9] Quinlan, J.R. C4.5: Programs for machine learning. Morgan Kaufmann series in machine learning. Kluwer Academic Publishers, 1993.
[20] Krzysztof J. Cios, G. William Moore, Uniqueness of medical data mining, Artificial Intelligence in Medicine, Vol 26, Issues 1-2, Medical Data Mining and Knowledge Discovery, 2002, 1-24
[10]Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J., Classification and regression trees, California, USA, Wadsworth, 1984.
[21] Young M. Chae, Hye S. Kim, Kwan C. Tark, Hyun J. Park, & Seung H. Ho, Analysis of healthcare quality indicator using data mining and decision support system, Expert Systems with Applications, Vol 24, Issue 2, 2003, 167-172.
[11] Maron, O., and Moore, A.W. Hoeffding races: Accelerating Model Selection Search for Classification and Function Approximation. NIPS, 1993, 59-66.
[22] Juran, J.M. Juran, Editor, Juran's quality control handbook (4th ed.),, McGraw-Hill, New York, 1988.
[12] Domingos, P. and Hulten, G. Mining high-speed data streams. In Proceedings of the Sixth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining, 2000, 71-80. [13] Hulten, G., Spencer, L., and Domingos, P. Mining time-changing data streams. In Proceedings of the Seventh ACM SIGKDD international Conference on Knowledge Discovery and Data Mining. KDD '01. ACM, 2001, 97-106.
[23] Chae et al., Y. Chae, S. Ho and K. Cho et al., Data mining approach to policy analysis in a health insurance domain, International Journal of Medical Informatics 62, 2006 103–111. [24] Gloria Phillips-Wren, Phoebe Sharkey, Sydney Morss Dy, Mining lung cancer patient data to assess healthcare resource utilization, Expert Systems with Applications, Vol 35, Issue 4, 2008, 1611-1619.
[14] Gama, J., Medas, P., and Rodrigues, P. Learning decision trees from dynamic data streams. In Proceedings of the 2005 ACM Symposium on Applied Computing, ACM, New York, 2005, 573-577.
[25] Dursun Delen, Christie Fuller, Charles McCann, Deepa Ray, Analysis of healthcare coverage: A data mining approach, Expert Systems with Applications, Vol36, Issue 2, Part 1, 2009, 995-1003.
[15] Tao Wang, Zhoujun Li, Xiaohua Hu, Yuejin Yan, and Huowang Chen. A New Decision Tree Classification Method for Mining High-Speed Data Streams Based on Threaded Binary Search Trees. Emerging Technologies in Knowledge Discovery and Data Mining. Springer. 2009, 256-267.
[26] Breiman et al., L. Breiman, J.H. Friedman, R.A. Olshen and C.J. Stone, Classification and regression trees, Wadsworth & Brooks/Cole Advanced Books & Software, Monterey, CA,1984. [27] Wan-Shiou Yang, San-Yih Hwang, A processmining framework for the detection of healthcare fraud and abuse, Expert Systems with Applications, Vol 31, Issue 1, 2006, 56-68.
[16] Bernhard Pfahringer, Geoffrey Holmes, and Richard Kirkby. New Options for Hoeffding Trees, Advances in Artificial Intelligence, Springer, 2007, 90-99.
[28] Hang Y. and Fong S., Investigating the Impact of Bursty Traffic on Hoeffding TreeAlgorithm in Stream Mining over Internet, In proceeding of 2ndInternational Conference on Evelving Internet (INTERNET), Spain, 2010, 147-152.
[17] Nishimura, S., Terabe, M., Hashimoto, K., and Mihara, K. Learning Higher Accuracy Decision Trees from Concept Drifting Data Streams. In Proceedings of the 21st international Conference on industrial, Engineering and Other Applications of Applied intelligent Systems: vol. 5027. Springer-Verlag, 2008, 179-188.
[29] Hang Y. and Fong S., The Impacts of Data Stream Mining on Real-Time Business Intelligence, 2nd International conference on IT & Budsiness Intelligence, Novermber 12-14, India, 2010, Accepted to be published
[18] Gama, J., Rocha, R., and Medas, P. Accurate decision trees for mining high-speed data streams. In Proceedings of the Ninth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining. KDD '03. ACM, New York, NY, 2003, 523-528.
[30] Hang Y. and Simon Fong, An Experimental Comparison of Decision Trees in Traditional Data Mining and Data Stream Mining. 2nd International Conference on Data Mining and Intelligent Information Technology Applications, Korea, 2010, to appear.
[19] Michael Silver, Taiki Sakata, Hua-Ching Su, Charles Herman, Steven B. Dolin, Michael J. O’Shea, Case Study: How to Apply Data Mining Techniques in a Healthcare
214