Data Mining Approaches for Predicting Demand for ...

56 downloads 8334 Views 255KB Size Report
Data Mining Approaches for Predicting Demand for. Healthcare Services in Abu Dhabi. Noura Al Nuaimi. Collage of Information and Technology. United Arab ...
Data Mining Approaches for Predicting Demand for Healthcare Services in Abu Dhabi Noura Al Nuaimi Collage of Information and Technology United Arab Emirates University, Al Ain 17551, UAE [email protected] Abstract—The application of data mining techniques provides a powerful approach to manipulate and extract useful information from existing data. It allows the learning of information from hidden data that could be used for future predictions. Accurate demand forecasting of any service has been a challenging research problem. Data mining has different techniques that could support demand prediction. Estimating the demand for healthcare services as a result of rapid population growth becomes essential for the strategic planning of Abu Dhabi Emirate. Future plans should be focused on districts that have needs, either for short- or long-term plans. The objective of this paper is to develop a data mining model that allows healthcare services demand planning. We use different data mining techniques in order to build four models to assist decision makers in predicting the demand for healthcare services in Abu Dhabi Emirate. Keywords—Data mining; healthcare services demand; service’s supply; K-Nearest-Neighbor; sequential minimal optimization (SMO).

I. INTRODUCTION Abu Dhabi is the capital and second-most crowded city in the United Arab Emirates (UAE). In mid-2012, the resident population of Abu Dhabi Emirate was 2.33 million, which includes the population of the Abu Dhabi region estimated at 1.42 million, the Al Ain region at 0.63 million, and the Western “Al Gharbia” region estimated at 0.29 million [1]. Population growth in the UAE is one of the highest in the world, mostly due to immigration by the year 2013 [2]. The significant increase in the UAE’s population is expected to remain the fundamental driver of increased demand for healthcare services. The World Health Organization (WHO) ranked the UAE first in life expectancy in 2009 (78 years), which manifested the lowest infant mortality rate (seven per 1,000 births) and was ranked third in terms of having one of the lowest adult mortality rates (79 per 1,000 births) among other Gulf Cooperation Council (GCC) countries [3]. The Abu Dhabi government has recently announced its Vision 2030, a comprehensive plan for the development of the city that will guide planning decisions for the next quarter of a century. Thus, having a justified study of the Emirate’s demand for healthcare services will support this vision. Effective access to healthcare resources is an important issue that affects the strategic planning of the government and the citizens’ welfare. The rapid population growth in the Emirate of Abu Dhabi

requires a more accurate estimation of healthcare services demand to ensure the availability of suitable healthcare services. Future development plans should be concentrated on areas with new residential and commercial development and planned population growth. The current situation reflects that there is either a potential oversupply for healthcare services or an undersupply. For example, Khalifa City A currently has four hospital projects under construction, despite a projected 2030 population of only 80,000. Similarly, Al Ain city has 15 provisional hospital projects with a potential demand for only three hospitals. On the other hand, there are severe capacity shortfalls in rural areas. Data mining and knowledge discovery in databases are interdisciplinary subjects that involve methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. Massive data is collected every day, and analyzing such data is an important challenge. Data mining allows for learning interesting patterns in the data, which results in models. These models are then used to predict and classify data. This research work uses data mining techniques to predict the demand for healthcare services in Abu Dhabi Emirate. Thus, when a new healthcare resource is planned to establish itself, the Health Authority of Abu-Dhabi (HAAD) and the Abu Dhabi government should have an accurate prediction of the specific needs of the region in order to cover that need. In this paper, our objective is to obtain an accurate quota of healthcare service balance (“demand and supply”) in Abu Dhabi Emirate. It is also our intention to improve the process of assigning healthcare resources to urban and rural communities based on the potential demand. Finally, we hope to better understand the most influential factors that lead to a high demand. The paper is organized as follows: Section II presents the related work of the healthcare services’ demand prediction; Section III includes the proposed methods; Section IV illustrates the experimental work and discusses the results; and finally, the conclusion of this research work is presented in Section V.

II. RELATED WORK This section reviews related works to predicting healthcare services demand, where we highlight some similarities and the differences between some of the related works and the proposed work. Lavrač et al. [4] presented an innovative use of data mining and visualization techniques for decision support in planning and regional-level management of Slovenian public healthcare. The authors used data mining and statistical techniques in order to analyze and predict public health resources in the Celje region of Slovenia. Their objective was to identify typical areas in terms of availability and accessibility of public health services for the population. The results were applicable to healthcare planning and support in decision making for local and regional healthcare authorities in Celje. The authors concluded that visualization is the easiest way to facilitate knowledge management and decision-making processes. The case of Celje is quite similar to our case study. The Abu Dhabi region is categorized into three regions: Abu Dhabi, the capital; the Al Ain region; and the Western region. Each region consists of several districts. The Health and Social Care Modelling Group in the United Kingdom [5] developed web-based forecasting system that generates graphs and reports of future spending and demand across six categories of care and under different provisional types and future scenarios. Different forecasting methodologies were used, such as exponential smoothing, Holt-Winters exponential smoothing, linear regression analysis, single layer artificial neural networks, and grey systems prediction. “Continuing healthcare” is a name given to a package of care arranged and funded solely by the NHS for individuals who are not in the hospital but who have complex ongoing healthcare needs [6]. Consequently, they forecasted only the National Health Service (NHS) for this specific group, where this group is seeking home services. Li Fang el al. [7] analyzed the major complications revealed in a group of professional health technicians in the Heilongjiang province in China and then designed a grey dynamic prediction model according to grey system theory and subsequently forecasted the demand of the Heilongjiang professional health technicians from the years 2011 to 2020. However, this work was restricted to predict only healthcare professional needs. Considering other healthcare resources, such as hospitals and clinics, and verifying which areas could be undersupplied or potentially oversupplied should also be predicted. Skordis-Worrall el al. [8] estimated the demand for health services in four poor districts in Cape Town, South Africa. The paper presented two models of healthcare demand: one estimating the probability of using any service and the other modeling the number of visits among users. The paper presented a fine method of data collecting and preprocessing by using the multi-stage cluster design. The researchers’ work is concerned only with estimating the current needs of four poor towns. Eren Demir el al. [9] illustrated a case of shifting outpatient services from hospitals, which caused a number of concerns for

Hounslow Primary Care Trust (PCT). For example, which of the outpatient services should be shifted, and what are the current and expected future demands for these services? The project has two phases: the first phase explores the set of specialties that were frequently visited in a sequence (using sequential association rules). The second phase involves the computation of the current and expected future demands for the selected specialties. However, this work predicts only specialty needs and relies on the usage of an Excel-based spreadsheet tool. III. METHOD Data mining is about the “extraction of interesting (nontrivial, implicit, previously unknown, and potentially useful) patterns or knowledge from huge amount of data” [10]. It also has an alternative name—knowledge discovery (mining) in databases (KDD), which involves knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc. The knowledge discovery process of the proposed method is shown in Figure 1 as an iterative sequence of the following steps [10]:

Fig. 1. The knowledge discovery process.

1) Data cleaning: this is used to remove noise and inconsistent data. 2) Data integration: where data might be collected from multiple data sources. A popular trend in the information industry is to perform data cleaning and data integration as a pre-processing step. 3) Data selection: where data relevant to the analysis task is retrieved from the database. In our case, we will select all of the attributes because they are few. 4) Data transformation: where data is transformed into an appropriate format like the Attribute-Relation File Format (ARFF) or Comma Separated Values (CSV) that is suitable to be used with WEKA [11]. 5) Data mining: mining from input data to patterns by using intelligent methods. 6) Pattern evaluation: this is used in order to identify the truly interesting patterns representing knowledge based on interestingness measures. 7) Knowledge presentation: where visualization and knowledge representation techniques are used to present mined knowledge to users.

A. Data Source and Dataset Health Authority – Abu Dhabi (HAAD), which is the regulatory body of the healthcare sector in the Emirate of Abu Dhabi, releases the health statistics for Abu Dhabi Emirate. The health statistics communication, published annually since 2008, includes population trends, public health highlights, and investor developments. Three key areas of interest include the performance of the Emirate in managing access to, cost of, and quality of healthcare. In 2012, HAAD published its Capacity Master Plan that defined the way that health services should be organized in the future: “In 2012 the Abu Dhabi’s Health insurance system covered 2.58 million people. There were 12.8 million clinical episodes (~12% more than in 2011) in which 67 million clinical activities were performed. Health services were provided by 5,528 physicians, 969 dentists, 12,375 nurses, 4,319 allied health professionals and 1,993 pharmacists in 1,508 licensed healthcare facilities. The number of healthcare facilities has been steadily growing since 2007, the highest growth being among Clinics (10.9%) and Centers (4.9%)” [12]. The Health Statistics 2012 states that an aggressive growth in demand is expected for services relating to lifestyle. In terms of supply, there has been a 13% growth in the number of physicians and dentists and an 11% of growth in facilities [12]. The data set is retrieved from HAAD Statistics 2012, and it consists of 57 instances and 18 attributes as described in Table 1. TABLE 1: DATASET ATTRIBUTES Attribute Region Cap Now Type Current Total Population Current Total Facilities Current Total Hospital Current Total Clinics & Centers Current Nearby Hospital Doctors /1000 Population Growth Hospital Need Hospital Underway Hospital Under Construction Clinic Need Clinics Underway Ambulance Station 2030 Population Future Supply a.

Description {'Abu Dhabi', 'Al Ain',Western} {Severe, None, Moderate} {Rural, Urban} Numeric Numeric Numeric Numeric Numeric Numeric Numeric {0,1} Numeric Numeric a {S, M, L, none} {0,1} Numeric Numeric {Undersupply, ‘Potential oversupply'}

algorithm, K Nearest Neighbor (KNN) algorithm, Support Vector Machine (SVM) algorithm, and C4.5 algorithm. WEKA provides a uniform interface for many different data mining algorithms, along with methods for pre- and postprocessing and for evaluating the results of the mining dataset. C. Preprocessing Step Following [8], the main tasks in data preprocessing could be achieved through: •

Data cleaning, where missing values should be handled in order to smooth noisy data, identify or remove outliers, and resolve issues with inconsistencies. In our case, the dataset has several missing entries; therefore, missing values should be handled in order to smooth noisy data, identify or remove outliers, and resolve issues with inconsistencies. • Data integration is needed if there is more than one source of data. In our case, the dataset is constructed from one source, which is the HAAD Statistics 2012. • Data reduction is used with huge data in terms of attributes and instances. Thus, dimensionality reduction, numerosity reduction, or data compression could be a perfect solution. In our case, the attributes are few, and there is no need for data reduction. • Data transformation and data discretization are reached through normalization and concept hierarchy generation. The success of the data mining process depends on the quality of the prepared dataset. There are many factors that measure the data quality, including accuracy, completeness, consistency, timeliness, believability, and interpretability [10]: • •

• • •

S: small; M: medium; L: large; none: no need. The number refers to a quantity.

B. Data Mining Tool There are many current tools for data mining and knowledge discovery, such as Waikato Environment for Knowledge Analysis (WEKA) [11], Orange [13], RapidMiner [14], IBM SPSS Modeler [15], etc. These tools provide a set of methods and algorithms that utilize data by providing a better analysis. In [16], a study concluded that no tool is better than the other; however, they ranked WEKA as the best tool in terms of the ability to run the selected classifiers: Naïve Bayes (NB)



Accuracy (correct or wrong, accurate or not): because we use official data sourced from HAAD, we assume that the data is accurate. Completeness (not recorded, unavailable, etc.): the data has missing values, where it is not known if the data is intended to be a slip for formatting purposes or if there are missing values. Consistency: the data is consistent. Timeliness: the data reflects the healthcare resource status in Abu Dhabi in 2012. Besides that, there are some expected values for 2030’s expected population. Believability (how trustable the data is): because the data is extracted from a government agency, we assume it is trustable. Interpretability (how easily the data can be understood): the data is represented in a flat file (see Table 1), and only the missing values need to be well understood in order to be replaced correctly.

D. Classification Algorithms Classification is a pattern recognition task that has applications in a broad range of fields. It requires the construction of a model that approximates the relationship between input features and output categories [17].

Three different types of classification techniques are used: Support vector machines (SVM), K-Nearest-Neighbor (KNN), and Naïve Bayes (NB). These techniques were used due to their popularity in the recently published literature as well as their ranking as the most accurate [18] data mining algorithms. SVM is one of the most popular algorithms for largemargin classification [18] [19] [20] [21]. The idea of the SVM algorithm is to map the given training set into a possibly highdimensional feature space and attempting to locate in that space a hyperplane that maximizes the distance separating the positive from the negative examples. Having found such a hyperplane, the SVM can then predict the classification of an unlabeled example [22]. KNN [18] [23] is one of the simplest and most trivial classifiers that memorizes the entire training data and accomplishes classification only if the attributes of the test object exactly match one of the training examples. The KNN algorithm can estimate complex target concepts locally and differently for each new instance to be classified; it provides good generalization accuracy on many domains, learns very quickly, and is easy to understand. On the other hand, the KNN algorithm has large storage requirements because it has to store all of the data; it is slow with large datasets because all of the training instances have to be visited. The accuracy of the NN algorithm degrades with an increase of noise and irrelevant attributes in the training data. Naïve Bayes classifiers [18] are a family of simple probabilistic classifiers based on applying Bayes’ theorem with strong independence assumptions between the features. As one of its main features, Naïve Bayes is easy to implement because it requires a small amount of training data in order to estimate the parameters, where good results can be found in most cases. However, it is class conditional independent, where it causes losses in accuracy and dependencies. E. Post-processing Step Processing the pattern after the mining step includes pattern evaluation, pattern selection, and pattern interpretation. For pattern evaluation, we describe the details of the classification accuracy in terms of precision, recall, and accuracy. The Fmeasure is also calculated to show the tradeoff between the success rate and false alarm rate over a noisy channel [24] [25]. For attribute selection, we will use weka.attributeSelection.cfssubseteval [26] to select the worth of a subset of attributes that are highly correlated with the class. IV. EXPERIMENTAL WORK AND RESULTS At the data cleaning step, we fill in missing values in order to smooth noisy data, as with facilities data, where there are many missing values. F. Data Pre-processing 1) Replacing Missing Values • Each missing value will be replaced by zero, which means that no facility is available for a specific district.



Future needs for each district: the need for facilities (e.g., hospitals and clinics). For example, with the Hospital Needs attribute, the values are 0, 1, 2, or 3. The rest of the instances have no data entries. Thus, the type of attribute here will be changed to Boolean, either 0 for none or 1 for an existing need. 2) Handling Attributes’ Types • This issue can be found where the attribute has mixed values of numbers and nominal. The data is one of the following: none, 1, 2, 3, 6, 7, and so on. Thus, we replace all of the “none” data with zero to resolve this issue. • Add new attribute for hospital needs {undersupply, potential oversupply}. • While gathering data, there was some graphic representation for the healthcare demand gap. We replace these three graphics by three classes, which are the following: {sever, moderate, none}. • The Clinic Need data is represented by the number of clinic need size {S:small, M:medium, L:large}. Thus, the data will replace only the clinic size need. 3) Remove Invaluable Data Because the dataset attributes are limited, there is no need to reduce the dimensionality. However, we noticed that the district attribute name is a unique identifier; thus, we removed it. G. Experiment 1 The goal of the first experiment is to build a model to predict which district is undersupplied or potentially oversupplied. At the current status, there are 32 hospitals to be built in the capital region, 16 hospitals planned for the Al Ain region, and five hospitals planned for the Western region. All of these 53 hospitals are planned to be built at different districts in Abu Dhabi Emirates by 2030. Nevertheless, there are 23 districts that have classified their hospital needs as either undersupplied or potentially oversupplied. However, there are 34 districts whose needs are unknown. Thus, the first objective here is to build a model that could predict which district has a need for hospitals by 2030. Then, use this model to predict the needs for other districts with no mentioned needs. The main dataset is split into two sets. TrainingSet_Classification.arff contains 23 instances (40%), and TestSet.arff contains 34 instances (60%). The first set has predefined classes, which are either undersupplied or potentially oversupplied. For the second set, all instances have no predefined classes. Initially, the experiment uses “using training set” to evaluate the classifier on how well it predicts the class of the instances it was trained on. The models are built using three algorithms: Naïve Bayes, KNN, and SVM. For a performance measurement, the confusion matrix is obtained to estimate four measures: accuracy, sensitivity, specificity, and F-measure. As a result, 1NN had the highest accuracy of 100%, SVM with 91.30%, Naïve Bayes with 86.95%, and 2NN with 73.91% as shown in Table 2. The following are selected attributes that are highly correlated with

the class: current total clinics and centers, population growth, and hospital under way.

following are the selected attributes that are highly correlated with the class: region, population growth, hospital under construction, ambulance station, and 2030 population.

TABLE 2: HOSPITAL NEED MODELS Algorithm Naïve Bayes SVM 1-NN 2-NN

Accuracy 82.6087% 91.3043% 100% 78.2609%

Precision 0.870 0.913 1 0.847

Recall 0.826 0.913 1 0.783

The generated models were used to predict the unknown hospital need for Abu Dhabi districts. As mentioned previously, there are 34 districts whose needs are unknown, and our objective here is to predict their classes {under supply, potentially oversupply}. The experiment uses the previous model 1-NN, which has the highest results among the four measurements (detailed results are available in supplementary materials, Table A). H. Experiment 2 The goal of the second experiment is to investigate whether data about current hospital needs are correctly expected. The HAAD statistic shows that there are 12 districts in the capital of the Abu Dhabi region with a need for hospitals, four districts in the Al Ain region have a need for hospitals, and finally, one district in the Western region needs a hospital. As with the previously mentioned preprocessing step, the data of this attribute is formatted as {0, 1}, where one means correctly planned, and zero requires more investigation. Initially, the experiment use “percentage split 70%” as test options with the three classifiers: Naïve Bayes, KNN, and SVM. These three classifiers are configured the same way as the experiment one. As a result, Naïve Bays and SVM provides results with equal accuracy of 82.4%, followed by 1-NN and 2NN with 76.4% as shown in Table 3. The following are the selected attributes that are highly correlated with the class: population growth and clinics under way. TABLE 3: INVESTIGATING HOSPITAL NEED Algorithm Naïve Bayes 1-NN 2-NN SMO

Accuracy 82.3529% 76.4706% 76.4706% 82.3529%

Precision 0.812 0.765 0.585 0.812

Recall 0.824 0.765 0.765 0.824

TABLE 4: INVESTIGATING CLINICS NEED

F-Measure 0.819 0.913 1 0.769

F-Measure 0.814 0.765 0.663 0.814

I. Experiment 3 The goal of the fourth experimental is to investigate whether the current clinic needs are correctly estimated. The current status shows that there are different needs for the sizes of different clinics, including small, medium, or even large clinics. For instance, the Alreem district needs two large clinics, and the Al Ain city district requires two medium clinics. As previously mentioned, this attribute will use only the clinic’s size without specifying the number of sizes because of preprocessing requirements. The experiment is carried out using the same configuration as experiment two, and the results are reported in Table 4. The

Algorithm Naïve Bayes SMO 1-NN 2-NN

Accuracy 64.7059% 41.1765% 64.7059% 64.7059%

Precision 0.597 0.422 0.765 0.782

Recall 0.647 0.412 0.647 0.647

F-Measure 0.620 0.400 0.644 0.664

J. Experiment 4 The goal of experiment four is to investigate whether clinics under development are correctly estimated. The current status shows that there are four clinics planned for the Abu Dhabi region, five clinics underway for the Al Ain region, and two clinics for the Western region. The experiment is carried out using the same configuration as experiment two, and the results are reported in Table 5. The following are selected attributes that are highly correlated with the class: capacity now, hospital need, and clinics need. TABLE 5: INVESTIGATING CLINICS UNDERWAY Algorithm Naïve Bayes SMO 1-NN 2-NN

Accuracy 76.4706% 82.3529% 82.3529% 94.1176%

Precision 0.874 0.878 0.878 0.886

Recall 0.765 0.824 0.824 0.941

F-Measure 0.816 0.850 0.850 0.913

V. CONCLUSION AND FUTURE WORK In this paper, we presented different data mining techniques in order to build four models for predicting the demand for healthcare services in Abu Dhabi. Within the proposed work, we introduced four models to assist decision makers in the Health Authority of Abu Dhabi (HAAD) and the Abu Dhabi government to plan which district may need healthcare services, either in the form of a hospital or a clinic. • • • •

Model 1: predicts the district’s current needs for hospitals Model 2: predicts the district’s future needs for hospitals {potentially oversupplied, undersupplied} Model 3: predicts the district’s current needs for clinics Model 4: predicts the district’s future needs for clinics

The experimental results presented within the scope of this paper show that there is real demand for healthcare services in some districts, and it requires more investigation. In future work, having access to more descriptive attributes will improve the accuracy of the work and the recommendations.

REFERENCES [1]

H. M. Kumar, “Gulfnews.com,” Al Nisr Publishing LLC, 13 Oct 2013. [Online]. Available: http://gulfnews.com/news/gulf/uae/general/abudhabi-s-population-at-2-33m-with-475-000-emiratis-1.1240863. [Accessed 14 Jun 2014].

[2]

[3]

[4]

UN - OECD, “World Migration in Figures,” OECD-UNDESA, 3 Oct 2013. [Online]. Available: http://www.oecd.org/els/mig/WorldMigration-in-Figures.pdf. [Accessed 8 Jun 2014]. Deloitte, “2011 Survey of the UAE healthcare sector Opportunities and challenges for private providers,” Dec 2011. [Online]. Available: https://www.deloitte.com/assets/DcomLebanon/Local%20Assets/Documents/Consulting/Consulting%20Healt hcare%20publication%20FV2.pdf. [Accessed 30 May 2014]. Nada Lavrač et al., “Data mining and visualization for decision support and modeling of public health-care resources,” Journal of Biomedical Informatics , vol. 40, no. 4, pp. 438-447, 2007.

Jun 2014]. [16]

AH Wahbeh et al., “A comparison study between data mining tools over some classification methods,” International Journal of Adv anced Computer Science and Applications, no. Special Issue, pp. 18-26, 2011.

[17]

S. Bouktif et al., “Ant Colony Optimization Algorithm for Interpretable Bayesian Classifiers Combination: Application to Medical Predictions,” PLoS ONE, vol. 9, no. 2, 2014.

[18]

X. Wu et al., “Top 10 algorithms in data mining,” Knowledge and Information Systems, vol. 14, no. 1, pp. 1-37, January 2008.

[19]

Joachims Thorsten, Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms, Kluwer: Springer, 2002.

[20]

Chen Pai‐Hsuen et al., “A tutorial on ν-support vector machines,” Applied Stochastic Models in Business and Industry, vol. 21, no. 2, pp. 111-136, 2005.

[5]

Health & Social Care Modelling Group HSCMG, “Forecasting Patient Demand for NHS Continuing Healthcare,” 2013. [Online]. Available: http://www.healthcareanalytics.co.uk/wp-content/uploads/LPPDemand-Planning-Tool-Branded-v2.pdf. [Accessed 30 May 2014].

[6]

NHS. [Online]. Available: http://www.nhs.uk/.

[21]

[7]

Li Fang et al., “Demand prediction of Heilongjiang professional health technicians: Based on gray dynamic model,” in 2012 International Conference on Management Science & Engineering (19th), Dallas, TX, 20-22 Sept. 2012.

“Class SMO,” [Online]. Available: http://weka.sourceforge.net/doc.dev/weka/classifiers/functions/SMO.ht ml.

[22]

NM Zaki, S Deris, and RM Illias, “Feature Extraction for Protein Homology Detection using Hidden Markov Model combining Scores,” International J. of Computational Intelligence and Applications, vol. 4, no. 1, pp. 1-12, 2004.

[8]

Jolene Skordis-Worrall, Kara Hansonb and Anne Millsb, “Estimating the demand for health services in four poor districts of Cape Town, South Africa,” International health, vol. 3, no. 1, pp. 44-49, 2011.

[23]

[9]

Demir E. et al., “A decision support tool for health service re-design,” Journal of Medical Systems, vol. 36, no. 2, pp. 621-630, 2012.

Yu Kai et al., “Kernel nearest-neighbor algorithm. Neural Processing Letters,” Neural Processing Letters, vol. 15, no. 2, pp. 147-156, 2002.

[24]

[10]

Han Jiawei and Micheline Kamber, Data Mining: Concepts and Techniques, 3rd ed., Boston: Elsevier, 2011.

NM Zaki, Fadi Sibai and Piers Campbell, “Conotoxin Protein Classification Using Pairwise Comparison and Amino Acid Composition,” in ACM Genetic and Evolutionary Computation Conference (GECCO2011), Dublin, Ireland, July 12-16.

[11]

“Weka,” [Online]. Available: http://www.cs.waikato.ac.nz/ml/weka/. [Accessed 14 Jun 2014].

[25]

[12]

Health Authority – Abu Dhabi (HAAD), “HAAD releases 2012 Health Statistics and Capacity Master Plan,” HAAD, 10 December 2013. [Online]. Available: https://www.haad.ae/haad/tabid/58/ctl/Details/Mid/417/ItemID/379/De fault.aspx. [Accessed 31 May 2014].

NM Zaki et al., “Conotoxin Protein Classification Using Free Scores of Words and Support Vector Machines,” BMC Bioinformatics, vol. 12, no. 1, p. 217, 2011.

[26]

“CfsSubsetEval,” [Online]. Available: http://wiki.pentaho.com/display/DATAMINING/CfsSubsetEval.

[13]

“Orange,” [Online]. Available: http://orange.biolab.si/. [Accessed 12 Jun 2014].

[14]

RapidMiner. [Online]. Available: http://rapidi.com/content/view/181/190/lang,en/. [Accessed 12 Jun 2014].

[15]

“IBM,” [Online]. Available: http://www01.ibm.com/software/analytics/spss/products/modeler/. [Accessed 12

Suggest Documents