The AAAI-17 Joint Workshop on Health Intelligence WS-17-09
Clustering-Aided Approach for Predicting Patient Outcomes with Application to Elderly Healthcare in Ireland Mahmoud Elbattah, Owen Molloy National University of Ireland Galway
[email protected],
[email protected]
Abstract
particular, the study aimed to build machine learning models that can make predictions on patient outcomes in terms of inpatient length of stay (LOS) and discharge destination. Patient outcomes were described as the end result of care, or a measurable change in the health status or behaviour of patients (Harris 1991). According to (Nolan and Mock 2000), there are four categories of patient outcomes including: i) Clinical outcomes, ii) Functional outcomes, iii) Financial outcomes, and iv) Perceptual outcomes. The clinical outcomes mainly represent the patient’s response to medical interventions. The functional outcomes measure the improvement of a patient’s physical functioning. Financial outcomes are used to assess the efficient usage of resources. The perceptual outcomes might be the most intangible set of outcomes, which can represent a patient's satisfaction with care received and its providers. The scope of patient outcomes addressed by the paper can be considered to fall within the clinical and financial categories. The study initially started with the aim of grouping patients from a data-driven perspective. In this regard, unsupervised clustering using the K-Means algorithm was applied. The discovered clusters were then utilised to train the prediction models as delineated at the following sections in detail.
Predictive analytics have proved promising capabilities and opportunities to many aspects of healthcare practice. Data-driven insights can provide an important part of the solution for curbing rising costs and improving care quality. The paper implements machine learning techniques in an attempt to support decision making in relation to elderly healthcare in Ireland, with a particular focus on hip fracture care. We adopt a combination of unsupervised and supervised learning for predicting patient outcomes. Initially, elderly patients are grouped based on the similarity of age, length of stay (LOS) and elapsed time to surgery. Using the K-Means algorithm, our clustering experiments suggest the presence of three coherent clusters of patients. Subsequently, the discovered clusters are utilised to train prediction models that address a particular cluster of patients individually. In particular, two machine learning models are trained for every cluster of patients in order to predict the inpatient LOS, and discharge destination. The developed models are claimed to make predictions with relatively high accuracy. Furthermore, the potential usefulness of the clustering-guided approach of prediction is discussed in general.
Machine Learning; Unsupervised Learning; Supervised Learning; Clustering; K-Means; Elderly Healthcare.
1. Introduction Elderly-driven care continues to become a central issue for healthcare systems owing to the growing trend of population ageing worldwide, and especially in developed countries. Likewise in Ireland, population ageing has been increasing in a rapid pace. As reported by Ireland’s Health Service Executive (HSE), the increase in the number of people aged over 65 is approaching 20,000 per year (HSE 2014). As a result, there is an imperative need for an insightful planning in order to become abreast of the prospective demographic shift. In this respect, machine learning presents as a practical method for building prediction models that can support the planning of healthcare services. The focus of the study is centralised around the care scheme of hip fracture for elderly patients in Ireland. In
2. Background: Elderly Care and Hip Fractures in Ireland This section aims to deliver a concise review of the application domain endorsed by the study. As alluded to earlier, the study focused on the care scheme of hip fracture in Ireland. In particular, we were concerned with elderly patients (aged 60 and over) who undergo hip fracture care. Hip fractures were considered for a set of plausible reasons. First, hip fracture care can represent an adequate exemplar of elderly healthcare schemes. In this regard, plentiful studies (e.g. (Cooper, Campion, and Melton 1992), (Melton, 1996), and (Gullberg, Johnell, and Kanis
533
• Is there a possible correlation between patient outcomes and other care-related factors, such as “Time to Surgery” for example? In a broader context, we believe that the study can be considered within other comparable machine learning problems in healthcare. Specifically, the combined unsupervised/supervised approach adopted by the study can arguably draw further attention to the significance of patient clustering while dealing with prediction-related problems. In this respect, the study evidently demonstrated that the weight of features used for training our models remarkably varied from a cluster of patients to another.
1997)) identified hip fractures as exponentially increasing with age, despite the variation of rates from country to another. In Ireland, around 3,000 people sustain hip fractures annually (Ellanti et al. 2014), which will unavoidably increase owing to population ageing. Second, the care delivery of hip fractures is of a considerable importance in Ireland, whereas the HSE identified hip fractures as one of the most serious injuries resulting in lengthy hospital admissions and high costs (HSE 2008). Typically, the cost of treating a hip fracture was estimated around €12,600 (HSE 2008). Equally important, more than two-thirds of patients can spend prolonged periods in long-stay care facilities after hospital discharge (Ellanti et al. 2014). For all these reasons, hip fractures can represent a major concern to healthcare in Ireland. Predictive analytics can therefore serve as a key enabling factor for healthcare executives through providing evidence-based directions for future strategies.
4. Approach Overview The study adopted an approach that comprised a combination of unsupervised and supervised learning conducted over two stages. Initially, the study aimed to explore the potential presence of coherent clusters of patients. The K-Means algorithm was used to discover the patient groups from a mere data-driven perspective. This stage realised the discovery of three well-separated patient clusters as explained through Section 3.2 - 3.5. The second stage included the development of machine learning models that can make predictions on patient outcomes. The prediction models were developed with respect to every cluster of patients separately. Specifically, a regression model was trained for predicting the LOS, and another binary classifier for predicting discharge destination. We availed of the Microsoft Azure Machine Learning (Barga, Fontama, and Tok 2015) to train and test the prediction models. Figure 1 illustrates the adopted approach.
3. Significance of the Study Generally, the inpatient LOS and discharge destination both represent a pivotal importance within healthcare schemes. According to numerous studies (e.g. (O'Keefe, Jurkovich, and Maier 1999), (Englert, Davis, and Koch 2001), and (Guru et al. 2005)), the LOS was suggested as a significant measure of patient outcomes. While from a strategic perspective, the LOS was recognised as a valid proxy to measure the consumption of hospital resources (e.g. (Millard, Peter, and Sally 1994), (Faddy and McClean 1999), (Marshall and McClean 2003), and (Fackrell 2009)). Further, the LOS was reported as the main component of the overall cost of hip fracture care (Johansen et al. 2013.). Similarly, predicting discharge destination has a strategic importance in order to estimate the needed capacity of long-stay care facilities such as nursing homes for example. (Brasel et al. 2007) also demonstrated that there is an evident relationship between the inpatient LOS and discharge destination. In light of that, the study is claimed to carry potential benefits as follows. From a practical standpoint, the developed machine learning models are capable of predicting LOS and discharge destination with relatively high accuracy. Moreover, the discovery of coherent clusters of patients can extend opportunities to answer questions, or raise further unposed questions such as: • How do patient clusters vary with respect to specific characteristics such as age, gender or fragility history for example? • How do patient clusters vary with respect to patient outcomes in terms of LOS and discharge destination?
Figure 1. Approach overview. The approach is described based on the existence of three patient clusters, which is explained in detail at Section 3.6.
534
Based on our intuition, many irrelevant variables could be obviously excluded (e.g. Admission/Discharge Time). Table 1 lists the variables that were initially considered as possible features for the clustering and prediction models as well.
5. Methodology 5.1 Data Description The main source of the data used by the study is the Irish Hip Fracture Database (IHFD) (NOCA 2012). The IHFD repository is the national clinical audit developed to capture care standards and outcomes for hip-fracture patients in Ireland. Decisions to grant access to the IHFD data are made by the National Office of Clinical Audit (NOCA). We acquired a dataset extracted from the IHFD repository. The dataset included records about elderly patients aged 60 and over. The data comprised 4,773 records over two years, from January 2013 to December 2014. Figure 2 plots a histogram of the age distribution within the dataset, while Figure 3 shows the percentages of male and female patients. It is worth mentioning that a particular patient may be related to more than one record in case of recurrent fractures. However, we were unfortunately not able to ascertain the proportion of recurrent cases because patients had no unique identifiers, and records were completely anonymised for privacy purposes.
Table 1. Variables explored as possible features. " ! $
!! !!" " ! !
%'& "$ #
%!"!$ !"$ !$ !$ ! ! "!%!! !
5.2 Data Preprocessing This section describes the data preprocessing phase conducted prior to training our machine learning models including the clustering and prediction models. The preprocessing procedures included: i) Removing outliers, ii) Feature scaling, and iii) Extraction of features that are considered as indicators of care quality. The preprocessing was implemented using the R language. The existence of data anomalies is largely unavoidable. A data anomaly was defined as an observation that appears to be inconsistent with the remainder of the dataset (Hodge, and Austin 2004), or more generally as any data that is unsuitable for the intended use (Sarsfield 2009). In our case, data anomalies were mainly outliers and data imbalances as explained in the following sections.
Figure 2. The distribution of patients’ age within the dataset.
A.Outliers Removal According to (HSE-NOCA, 2014), the mean and median LOS for hip- fracture patients were reported as 19 and 12.5 days respectively. Therefore, we considered only the samples whose LOS were no longer than 60 days in order to prevent the odd influence of outliers. The excluded outliers represented approximately 5% of the overall dataset. Figure 4 plots a histogram of the LOS used to identify outliers.
Figure 3. The proportions of male and female patients within the dataset.
The dataset records contained ample information about a patient’s journey from admission to discharge. Specifically, a typical patient record included 38 data fields such as gender, age, type of fracture, date of admission and LOS. A thorough explanation of the data fields was available via the official data dictionary (HPO 2012). Initially, we explored the variables that can serve as features for training our machine learning models.
Figure 4. Histogram and probability density of the LOS variable. The outliers can be obviously observed when the inpatient stay becomes longer than 60 days.
535
B. Feature Scaling Feature scaling is a central preprocessing step in machine learning in case that the range of features values varies widely. Several studies (e.g. (Visalakshi, and Thangavel 2009) and (Patel, and Mehta 2011)) argued that large variations within the range of feature values can affect the quality of computed clusters. In this regard, the features of the raw data were rescaled to constrain dataset values to a standard range. The min-max normalisation method was used, and every feature was linearly rescaled to the [0, 1] interval. The values were transformed using the formula below: (1)
implementation, simplicity, efficiency, and empirical success (Jain 2010) and (Rokach and Maimon 2005). The K-Means clustering uses a simple iterative technique to group points in a dataset into clusters that contain similar characteristics. Initially, a number of clusters (K) is decided. The algorithm iteratively places data points into clusters by minimizing the within-cluster sum of squares as in Equation (2) below. The algorithm converges on a solution when meeting one or more of these conditions: • The cluster assignments no longer change. • The specified number of iterations is completed. (2) (Jain 2010)
Where is the mean of cluster , and the squared error between and the points in cluster .
C. Feature Extraction In a report (BOA 2007) published by the British Orthopaedic Association, six standards for hip fracture care were emphasised. Those standards generally reflect good practice at key stages of hip fracture care including: • All patients with hip fracture should be admitted to an acute orthopaedic ward within 4 hours of presentation. • All patients with hip fracture who are medically fit should have surgery within 48 hours of admission, and during normal working hours. The raw data did not include fields that explicitly captured such standards. However, they could be derived based on the date/time values of patient arrival, admission and surgery. Thus, two new features were extracted named as “Time to Admission (TTA)” and “Time to Surgery (TTS)”. Eventually, only the TTS was included, whereas the TTA contained a significant amount of missing values. 5.3 Clustering Approach According to (Jain 2010), clustering approaches can be broadly divided into two categories as follows: i) Hierarchical clustering, and ii) Partitional clustering. On one hand, the family of hierarchical clustering algorithms attempt to build a hierarchy of clusters representing a nested grouping of objects and similarity level that change the grouping scope. In this way, clusters can be computed either in an agglomerative (bottom-up) fashion, or divisive (top-down) fashion. On the other hand, partitional clustering algorithms decompose data into a set of disjoint clusters (Sammut, and Webb 2011). Data is divided into K clusters satisfying that: i) Each cluster contains at least one point, and ii) Each point belongs to exactly one cluster. The KMeans algorithm is an example. In this paper, we embraced the partitional clustering approach using the K-Means algorithm. The K-Means algorithm has been constantly considered as one of the simplest and most widely used clustering algorithms. The reasons behind the popularity of K-Means are robustly recognised in literature. Examples are the ease of
5.4 Selected Features The patients were clustered based on: i) Age, ii) LOS, and iii) Time to surgery (TTS). We considered the numeric features only since the K-Means algorithm is originally applicable to numeric features only, such that a distance metric (e.g. Euclidean distance) can be used for measuring the similarity between data points. However, it is worth mentioning that there are some K-Means extensions that attempted to incorporate categorical features, such as the K-Modes algorithm (Huang 1998). 5.5. Discovering Patient Clusters The question of “How many clusters are present in the data?” has been a central issue while approaching a clustering task. The clustering experiments were therefore examined using a number of clusters ranging from 2 to 7. In this manner, the appropriate number was decided based on the experimental results. Table 2 summarises the parameters used during the clustering experiments. With the aid of Principal Component Analysis (PCA), the computed clusters were projected into two dimensions as shown in Figure 5. The sub-figures in Figure 5 represent the output of a clustering experiment using a different number of clusters (K). Initially with K=2, a promising tendency of clustering was indicated, where the data space is obviously separated into two big clusters. Similarly for K=3, the clusters are still well-separated. However, the quality of the clusters started to decline when k=4 upwards. Furthermore, Figure 6 attempts to assess the appropriate number of clusters by plotting the sum of squared distances as demonstrated in Equation 2. In light of that, it turned out that there were three potential clusters of patients that best separated the dataset. Table 2. Parameters of the K-Means algorithm. Parameter Value Number of Clusters (K) 2–7 Centroid Initialisation Random Similarity Metric Euclidian Distance Number of Iterations 100
536
patient outcomes. The training dataset has now been divided into three partitions representing the discovered clusters. The clusters inherently suffered from imbalanced distributions, where a particular set of values predominated. The imbalance issue was pronounced for both of the LOS and discharge destination labels. Figure 7 provides an example of the imbalance within the LOS values. (a) K=2
(b) K=3
(a) Cluster 1
(b) Cluster 2
(c) Cluster 3
Figure 7. The LOS histogram. The sub-figures represent LOS distributions within the three discovered clusters of patients. The imbalance of data samples can be observed in the three clusters. (c) K=4
(Galar et al. 2012) identified two different strategies in order to deal with the imbalance problem including: i) Algorithm-level approach and, ii) Data-level approach such as under-sampling or over-sampling. The oversampling technique (Japkowicz 2000) was adopted in this paper. The underrepresented samples were resampled at random until they contained relatively sufficient examples.
(d) K=5
5.7 Learning Algorithm: Random Forests The study adopted a unified approach using Random Forests for regression and classification as well. Introduced by Leo Breiman (Breiman 2011), Random Forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. During training phase, all trees are trained independently. While testing, predictions are made through weighted voting on the most confident predicted class. The trees that have a higher prediction confidence will have a greater weight in the final decision of the ensemble. The aggregation of voting can be done by a simple averaging operation as in Equation (3). Random forests were proved to provide high accuracy, robustness to noise and stability (Breiman 2011) and (Liaw and Wiener 2002). In addition, the efficiency of Random Forests was evidently demonstrated in a diversity of applications (e.g. bioinformatics (Jiang et al. 2004), ecology (Prasad, Iverson, and Liaw 2006)).
(e) K=6 (f) K=7 Figure 5. Clustering experiments with number of clusters (K) ranging from 2 to 7. The clusters are projected based on the two principal components. An inconsistency of colours indicates less separation of clusters. The visualisations were produced using the R package ggplot (Wickham 2009).
Figure 6. Plotting the sum of squared distances within clusters.
(3)
Where pt(c|v) denotes the posterior distribution obtained by the t-th tree.
5.6. Preprocessing: Tackling Data Imbalances This section represents the beginning of the second stage of our methodology, where a supervised learning is utilised to train models in order to predict the intended
Using Random Forests, two models were developed for every cluster of patients. Specifically, the prediction 537
models included a regression model for predicting the LOS, and a binary classifier for predicting discharge destination. The predicted discharge destination represented either home or a long-stay care facility such as nursing homes. Table 3 presents the parameters used for training the random forests.
6. Experimental Results The predictive models were tested using a subset from the dataset described in Section 3.1. The randomly sampled test data represented approximately 40% of the overall dataset. The prediction error of each model was estimated by applying 10-fold cross-validation. Table 6 provides evaluation metrics for the LOS regression models, while Figure 8 shows the ROC curves of the discharge destination classifiers. Furthermore, Table 7 presents the measures of precision, recall and accuracy of the classifiers.
Table 3. Parameters of random forests models. Number of Decision Trees Maximum Depth of Decision Trees Number of Random Splits per Node
8 32 128
5.8 Feature Selection As mentioned in Section 5.1, the dataset initially contained 38 features, however not all of them were relevant. Intuitively irrelevant feature were simply excluded. In addition, the most important features were decided based on the technique of permutation feature importance (Altmann, Toloşi, Sander and Lengauer 2010). Table 4 presents the set of features used by both models.
Table 6. The average 10-fold cross-validation accuracy of the LOS predictor within the three clusters of patients. LOS Predictors Cluster 1 Cluster 2 Cluster 3
Table 4. Selected features for the prediction models. Prediction Selected Features Model Age, , Patient Gender , Fracture Type, LOS Regression Hospital Admitted To, ICD-10 Diagnosis, Fragility History, Time to Surgery Model Age, Patient Gender, Fracture Type, Discharge Hospital Admitted To, ICD-10 Diagnosis, Destination Fragility History, Time to Surgery, LOS Classifier
Age Patient Gender Fracture Type Hospital Admitted To ICD-10 Diagnosis Fragility History Time To Surgery
Cluster2
Cluster3
0.84 0.14 0.38 0.78 0.48 0.44 0.15
0.49 0.23 0.44 0.93 0.52 0.10 0.27
0.61 0.18 0.21 0.56 0.29 0.09 0.64
Coefficient of Determination (≈) 0.86 0.81 0.82
Table 7. The average 10-fold cross-validation accuracy of the discharge destination classifiers for the three clusters of patients. Discharge Destination Classifier Cluster 1 Cluster 2 Cluster 3
Precision (≈) 0.821 0.733 0.809
Recall (≈) 0.876 0.767 0.857
Accuracy (≈) 0.810 0.739 0.811
7. Comparative Analysis
Table 5. LOS Prediction Model: Importance of selected features with respect to the three clusters. Feature Importance Score (≈) Cluster1
Relative Squared Error (≈) 0.14 0.19 0.18
(b) AUC=0.739 (a) AUC=0.851 (c) AUC=0.863 Figure 8. The average 10-fold cross-validation accuracy of the discharge destination classifiers. Each sub-figure represents the accuracy of a binary classifier trained for a specific cluster of patients.
It was observed that the importance of features remarkably varied from a cluster to another. For instance, the patient age scored the highest importance within the first cluster while training the LOS regression model. On the other hand, the “Time to Surgery” feature had the highest importance in the third cluster. This can translate into a relative disparity between patient clusters, which obviously reflected on the importance of features used for model training. Table 5 lists the importance of features scored while training the LOS prediction model for every cluster.
Feature
Relative Absolute Error (≈) 0.24 0.30 0.28
It was important to evaluate the clustering-aided approach compared to other simpler approaches. In this respect, this section implements a comparative analysis to assess the magnitude of improvement in prediction accuracy using our approach against non-clustering-based predictions. To achieve this, we developed two additional models including a regression model for predicting the LOS, and a binary classifier for predicting discharge destination. Both models were trained using the same learning algorithm (i.e. Random Forests), and parameters as mentioned earlier in Section 5.7. However, the new
538
based methods to predict the LOS, or similar patient outcomes (e.g. (Liu et al. 2006),(Hachesu, Ahmadi, Alizadeh, and Sadoughi 2013), (Azari, Janeja, and Mohseni 2012), and (Morton et al. 2014)). For instance, decision trees and Naive Bayesian classifiers were trained to a geriatric hospital dataset in order to predict the LOS (Liu et al. 2006). Likewise, (Morton et al. 2014) conducted a comparison between supervised machine learning techniques for predicting short-term LOS for diabetic patients. In pertinent respects, a number of recent studies addressed hip fracture care. For instance, some studies (e.g. (Marufu et al. 2016), (Nijmeijer et al. 2016), and (Karres, Heesakkers, Ultee, and Vrouenraets 2015))focused on predicting patient outcomes in terms of mortality. (Marufu et al. 2016) developed a logistic regression model aiming to predict the 30-day mortality after hip fracture surgery. The study utilised a dataset obtained from the UK’s National Hip Fracture Database (Currie et al. 2012).
models were trained and tested with respect to the whole dataset without considering patients clusters. The comparative analysis clearly showed that the clusteringaided prediction is more accurate in both cases of predicting LOS and discharge destination. Figure 9 and Figure 10 demonstrate the comparative analysis results.
0.5
0.4 0.2
0
0 Non-Clus.
Clus.
Non-Clus.
(a) Relative Absolute Error
Clus.
(b) Relative Squared Error
0.9 0.8 0.7 0.6 Non-Clus.
Clus.
(c) Coefficient of Determination
9. Study Limitations
Figure 9. LOS Prediction Accuracy: Clustering–aided predictions against non-clustering. The clustering-aided accuracy is computed as the median of the three models trained for patient clusters.
0.85 0.8 0.75 0.7
We acknowledge the limitations of the study as follows: • The patients were clustered based on a mere datadriven standpoint. It should be taken into account that adding a clinical perspective (e.g. diagnosis, procedures) may group patients differently. • The IHFD dataset included patients admitted to public hospitals only.
1 0.8 0.6 Non-Clus.
Clus.
Non-Clus.
Clus.
(b) Recall
(a) Precision
10. Conclusions
The combined unsupervised/supervised approach adopted by the study can draw more attention to the significance of patient clustering while dealing with prediction-related problems. Specifically, the clustering-aided approach yielded further concerns that can contribute to improving the accuracy of predicting patient outcomes.
0.9 0.8 0.7 Non-Clus.
Clus.
First, the study evidently demonstrated a pronounced variance of the importance of selected features from a cluster of patients to another. Accordingly, the clusteringaided approach can arguably make finer-grained predictions for patients who naturally share a common set of characteristics, or experienced similar outcomes. In this regard, our approach also proved to provide more accurate predictions compared to non-clustering-based predictions. In a broader context, we believe that the study can be considered within other comparable machine learning problems in healthcare.
(c) Accuracy Figure 10. Discharge Destination Prediction Accuracy: Clustering–aided predictions against non-clustering. The clustering-aided accuracy is computed as the median of the three models trained for patient clusters.
8. Related Work The application of machine learning techniques within the context of healthcare was largely applied. The literature contains plentiful studies that utilised machine learning-
539
ve_Summary__Strategy_to_Prevent_Falls_and_Fractures_in_Ireland%E2%80 %99s_Ageing_Population.pdf (Accessed: 16 January 2016).
References Harris, M.D., 1991. Clinical and financial outcomes in patient care in a home health care agency. Journal of Nursing Care Quality, 5(2), pp.41-49. Nolan, M.T. and Mock, V., 2000. Measuring patient outcomes. Sage publications.
O'Keefe, G.E., Jurkovich, G.J. and Maier, R.V., 1999. Defining excess resource utilization and identifying associated factors for trauma victims. Journal of Trauma and Acute Care Surgery, 46(3), pp.473-478.
Barga, R., Fontama, V., Tok, W.H. and Cabrera-Cordon, L., 2015. Predictive analytics with Microsoft Azure machine learning. Apress.
Englert, J., Davis, K.M. and Koch, K.E., 2001. Using clinical practice analysis to improve care. Joint Commission Journal on Quality and Patient Safety, 27(6), pp.291-301.
NOCA (2012) IHFD. Available at: https://www.noca.ie/irishhip-fracture-database (Accessed: 16 December 2016).
Guru, V., Anderson, G.M., Fremes, S.E., O’Connor, G.T., Grover, F.L., Tu, J.V. and Consensus, C.C.S.Q.I., 2005. The identification and development of Canadian coronary artery bypass graft surgery quality indicators. The Journal of thoracic and cardiovascular surgery, 130(5), pp.1257-e1.
Hodge, V.J. and Austin, J., 2004. A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2), pp.85-126. Sarsfield, S., 2009. The data governance imperative. IT Governance Publishing.
Brasel, K.J., Lim, H.J., Nirula, R. and Weigelt, J.A., 2007. Length of stay: an appropriate quality measure?. Archives of Surgery, 142(5), pp.461-466.
HSE-NOCA (2014) Available at: https://www.noca.ie/wpcontent/uploads/2015/11/IHFD-National-Report-2014-OnlineVersion.pdf (Accessed: 16 December 2016).
Millard, Peter H., and Sally I. McClean, eds. Modelling hospital resource use: a different approach to the planning and control of health care systems. Royal Society of Medicine Press, 1994.
Visalakshi, N.K. and Thangavel, K., 2009. Impact of normalization in distributed k-means clustering. International Journal of Soft Computing, 4(4), pp.168-172.
Faddy, M.J. and McClean, S.I., 1999. Analysing data on lengths of stay of hospital patients using phasetype distributions. Applied Stochastic Models in Business and Industry, 15(4), pp.311-317.
Patel, V.R. and Mehta, R.G., 2011. Impact of outlier removal and normalization approach in modified k-means clustering algorithm. IJCSI International Journal of Computer Science Issues, 8(5).
Marshall, A.H. and McClean, S.I., 2003. Conditional phase-type distributions for modelling patient length of stay in hospital. International Transactions in Operational Research, 10(6), pp.565-576.
British Orthopaedic Association (BOA), 2007. The care of patients with fragility fracture. London: British Orthopaedic Association, pp.8-11.
Fackrell, M., 2009. Modelling healthcare systems with phasetype distributions. Health care management science, 12(1), p.11.
Jain, A.K., 2010. Data clustering: 50 years beyond K-means. Pattern recognition letters, 31(8), pp.651-666. Sammut, C. and Webb, G.I. eds., 2011. Encyclopedia of machine learning. Springer Science & Business Media.
Johansen, A., Wakeman, R., Boulton, C., Plant, F., Roberts, J. and Williams, A., 2013. National Hip Fracture Database: National Report 2013. Clinical Effectiveness and Evaluation Unit at the Royal College of Physicians.
Rokach, L. and Maimon, O., 2005. Clustering methods. In Data mining and knowledge discovery handbook (pp. 321-352). Springer US.
Wickham, H., 2009. ggplot2: elegant graphics for data analysis. Springer Science & Business Media.
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H. and Herrera, F., 2012. A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(4), pp.463484.
Cooper, C., Campion, G. and Melton Iii, L.J., 1992. Hip fractures in the elderly: a world-wide projection. Osteoporosis international, 2(6), pp.285-289.
Japkowicz, N., 2000, July. Learning from imbalanced data sets: a comparison of various strategies. In AAAI workshop on learning from imbalanced data sets (Vol. 68, pp. 10-15).
Melton, L.J., 1996. Epidemiology of hip fractures: implications of the exponential increase with age. Bone, 18(3), pp.S121S125.
Breiman, Leo. "Random forests." Machine learning 45, no. 1 (2001): 5-32.
Huang, Z., 1998. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data mining and knowledge discovery, 2(3), pp.283-304.
Liaw, A. and Wiener, M., 2002. Classification and regression by randomForest. R news, 2(3), pp.18-22.
Gullberg, B., Johnell, O. and Kanis, J.A., 1997. World-wide projections for hip fracture. Osteoporosis international, 7(5), pp.407-413.
Jiang, H., Deng, Y., Chen, H.S., Tao, L., Sha, Q., Chen, J., Tsai, C.J. and Zhang, S., 2004. Joint analysis of two microarray geneexpression data sets to select lung adenocarcinoma marker genes. BMC bioinformatics, 5(1), p.81.
Ellanti, P., Cushen, B., Galbraith, A., Brent, L., Hurson, C. and Ahern, E., 2014. Improving Hip Fracture Care in Ireland: A Preliminary Report of the Irish Hip Fracture Database. Journal of osteoporosis, 2014.
Prasad, A.M., Iverson, L.R. and Liaw, A., 2006. Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems, 9(2), pp.181-199.
HSE (2008) Available at: http://www.hse.ie/eng/services/publications/olderpeople/Executi
540
Altmann, A., Toloşi, L., Sander, O. and Lengauer, T., 2010. Permutation importance: a corrected feature importance measure. Bioinformatics, 26(10), pp.1340-1347. Liu, P., Lei, L., Yin, J., Zhang, W., Naijun, W. and El-Darzi, E., 2006, September. Healthcare data mining: Prediction inpatient length of stay. In Intelligent Systems, 2006 3rd International IEEE Conference on (pp. 832-837). IEEE. Hachesu, P.R., Ahmadi, M., Alizadeh, S. and Sadoughi, F., 2013. Use of data mining techniques to determine and predict length of stay of cardiac patients. Healthcare informatics research, 19(2), pp.121-129. Azari, A., Janeja, V.P. and Mohseni, A., 2012, December. Predicting hospital length of stay (phlos): A multi-tiered data mining approach. In Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on (pp. 17-24). IEEE. Morton, A., Marzban, E., Giannoulis, G., Patel, A., Aparasu, R. and Kakadiaris, I.A., 2014, December. A Comparison of Supervised Machine Learning Techniques for Predicting ShortTerm In-Hospital Length of Stay Among Diabetic Patients. In Machine Learning and Applications (ICMLA), 2014 13th International Conference on (pp. 428-431). IEEE. Health Service Executive (HSE). Annual Report and Financial Statements 2014, 2014. Marufu, T.C., White, S.M., Griffiths, R., Moonesinghe, S.R. and Moppett, I.K., 2016. Prediction of 30-day mortality after hip fracture surgery by the Nottingham Hip Fracture Score and the Surgical Outcome Risk Tool. Anaesthesia. Nijmeijer, W.S., Folbert, E.C., Vermeer, M., Slaets, J.P. and Hegeman, J.H., 2016. Prediction of early mortality following hip fracture surgery in frail elderly: The Almelo Hip Fracture Score (AHFS). Injury, 47(10), pp.2138-2143. Karres, J., Heesakkers, N.A., Ultee, J.M. and Vrouenraets, B.C., 2015. Predicting 30-day mortality following hip fracture surgery: evaluation of six risk prediction models. Injury, 46(2), pp.371-377. Currie, C., Partridge, M., Plant, F., Roberts, J., Wakeman, R. and Williams, A., 2012. The national hip fracture database national report 2012. London, British Geriatrics Society.
541