9th 9th IFAC IFAC Symposium Symposium on on Biological Biological and and Medical Medical Systems Systems 9th onBerlin, Biological and Medical Systems Aug.IFAC 31 -- Symposium Sept. 2, 2, 2015. 2015. Germany Aug. 31 Sept. Berlin, Germany 9th IFAC Symposium on Biological and Medical Systems Aug. 31 Sept. 2, 2015. Berlin, Germany 9th IFAC Symposium on Biological and Medicalonline Systems Available at www.sciencedirect.com Aug. Aug. 31 31 -- Sept. Sept. 2, 2, 2015. 2015. Berlin, Berlin, Germany Germany
ScienceDirect IFAC-PapersOnLine 48-20 (2015) 469–474
Machine Learning for Predictive Modelling based on Machine Learning for Predictive Modelling based on Machine Learning for Predictive Modelling based on Machine Learning for Predictive Modelling based on Small Data in Biomedical Engineering Machine Learning for Predictive Modelling based on Small Data in Biomedical Engineering Small Data in Biomedical Engineering Small Data in Biomedical Engineering Small Data in Biomedical Engineering Torgyn Shaikhina11, Dave Lowe22, Sunil Daga3,3, 44, David Briggs33, Robert Higgins44 and Natasha Khovanova11
Lowe2,, Sunil Torgyn Shaikhina Shaikhina1,, Dave Torgyn Sunil Daga Daga3, 44,, David David Briggs Briggs3,, Robert Robert Higgins Higgins4 and and Natasha Natasha Khovanova Khovanova1 1 Dave Lowe2 2, Sunil Daga3, 3, 4, David Briggs3 3, Robert Higgins4 4 and Natasha Khovanova 1 Dave Torgyn Dave Lowe Lowe , Sunil Daga , David Briggs , Robert Higgins and Natasha Khovanova1 Torgyn Shaikhina Shaikhina1,,11School of Engineering, University of Warwick, Coventry, CV47AL UK 1School of Engineering, University of Warwick, Coventry, CV47AL UK School of Engineering, University
[email protected]) Warwick, Coventry, CV47AL UK 1 (tel.: +44(0)2476528242; e-mail: 1School Engineering, of Warwick, Coventry, (tel.: of +44(0)2476528242; e-mail: School Engineering, University University
[email protected]) Warwick, Coventry, CV47AL CV47AL UK UK 2 (tel.: of 3 +44(0)2476528242; e-mail:
[email protected]) 2 NHS Blood and Transplant Birmingham 3Warwick Medical School +44(0)2476528242; e-mail:
[email protected]) NHS Blood and Transplant Birmingham Warwick Medical School 2 (tel.: 3 (tel.: +44(0)2476528242; e-mail:
[email protected]) 4 Blood and Transplant Birmingham Warwick Medical School NHS 2 3 NHS 4University Hospitals Coventry and Warwickshire 2 3 NHS Blood Transplant Birmingham Warwick School Hospitals Coventry and Warwickshire NHS Trust Trust 4University NHS Blood and and Transplant Birmingham Warwick Medical Medical School 4University Hospitals Coventry and Warwickshire NHS Trust 4 University Hospitals Coventry and Warwickshire University Hospitals Coventry and Warwickshire NHS NHS Trust Trust Abstract: Experimental datasets in bioengineering are commonly limited in size, thus rendering Machine Abstract: Experimental Experimental datasets datasets in in bioengineering bioengineering are are commonly commonly limited limited in in size, size, thus thus rendering rendering Machine Machine Abstract: Learning Experimental (ML) impractical impractical for inpredictive predictive modelling. Novel techniques techniques of multiple multiple runs for forMachine model Abstract: datasets bioengineering are commonly limited in size, thus rendering Learning (ML) for modelling. Novel of runs model Abstract: Experimental datasets in bioengineering are commonly limited in size, thus rendering Machine Learning (ML) impractical for analysis predictive modelling. Novelare techniques of multiple runs for model development and surrogate data for model validation suggested for prediction of biomedical Learning (ML) impractical for predictive modelling. Novel techniques of multiple runs for model development and surrogate data analysis for model validation are suggested for prediction of biomedical Learning (ML) impractical for analysis predictive Novelare techniques of multiple runs for model development andon surrogate data for modelling. model validation suggested forproposed prediction of biomedical outcomes based small datasets for classification and regression tasks. The framework was development and surrogate data for are for prediction of outcomes based on small datasets datasets for classification classification and regression regression tasks. The The proposed framework was development andon surrogate data analysis analysis for model model validation validation are suggested suggested forproposed predictionframework of biomedical biomedical outcomes based small for and tasks. was applied to designing a Neural Network model for osteoarthritic bone fracture risk stratification, and a outcomes based on small datasets for classification and regression tasks. The proposed framework was applied to designing a Neural Network model for osteoarthritic bone fracture risk stratification, and outcomesto based on small datasets for classification and regression tasks. The proposed frameworkand wasaa applied designing aforNeural Network model for osteoarthritic bone fracture risk stratification, Decision Tree model prediction of antibody-mediated kidney transplant rejection. Despite the small applied designing Network model fracture risk Decisionto Tree model aafor forNeural prediction of antibody-mediated antibody-mediated kidney bone transplant rejection. Despite the the and smallaa applied toTree designing Neural Network model for for osteoarthritic osteoarthritic bone fracture risk stratification, stratification, and Decision model prediction of kidney transplant rejection. Despite small datasets (35 bone specimens and 80 kidney transplants), the two models achieved high accuracy of 98.3% Decision Tree model for prediction of antibody-mediated kidney transplant rejection. Despite the small datasets (35 bone specimens and 80 kidney transplants), the two models achieved high accuracy of 98.3% Decision(35 Tree model for prediction of antibody-mediated transplant rejection. Despite the small datasets bone specimens and 80 kidney transplants), thekidney two models achieved high accuracy of 98.3% and 85%, respectively. datasets bone and 85%,(35 respectively. datasets (35 bone specimens specimens and and 80 80 kidney kidney transplants), transplants), the the two two models models achieved achieved high high accuracy accuracy of of 98.3% 98.3% and 85%, respectively. and 85%, respectively. Keywords: Machine Learning, Small Data, Data, Biomedical Systems, Decision Tree,Ltd. Neural Network and 85%,IFAC respectively. © 2015, (International Federation of Automatic Control) Hosting by Elsevier All Network rights reserved. Keywords: Machine Learning, Small Biomedical Systems, Decision Tree, Neural Keywords: Machine Learning, Small Data, Biomedical Systems, Decision Tree, Neural Network Keywords: Machine Learning, Small Data, Biomedical Systems, Decision Tree, Neural Network Keywords: Machine Learning, Small Data, Biomedical Systems, Decision Tree, Neural Network In their early work, the authors have been successful in In early the authors been in 1. INTRODUCTION In their their Neural early work, work, the (NNs) authorsto have have been successful successful ina 1. applying Networks a correlation analysis of 1. INTRODUCTION INTRODUCTION In their early work, the authors have been successful in applying Neural Networks (NNs) to a correlation analysis of In their early work, the authors have been successful inaa applying Neural Networks (NNs) to a correlation analysis of 1. INTRODUCTION small dataset of 35 samples in tissue engineering Machine Learning (ML) enables data-driven models to “learn” 1. INTRODUCTION applying Neural Networks (NNs) to aahard correlation analysis of aa small dataset of 35 samples in hard tissue engineering Machine Learning (ML) enables data-driven models to “learn” applying Neural Networks (NNs) to correlation analysis of small datasetet al. of 2014; 35 samples in ethard tissueCurrent engineering Machine Learning (ML) enablesdirectly data-driven models to “learn” (Khovanova Shaikhina al. 2014). paper information about aa system from observed data small dataset of 35 samples in hard tissue engineering (Khovanova et al. 2014; Shaikhina et al. 2014). Current paper Machine Learning (ML) enables data-driven models to “learn” information about system directly from observed data small dataset of 35 samples in hard tissue engineering Machine Learning (ML) enables data-driven models to “learn” (Khovanova et al. 2014; Shaikhina et al. 2014). Current paper information about a system directly from observed data develops ML framework for predictive predictive modelling based on without predetermining mechanistic relationships that govern (Khovanova et 2014; et Current paper develops aaa ML framework for modelling based on information about directly from data without predetermining mechanistic relationships that govern (Khovanova et al. al. 2014; Shaikhina Shaikhina et al. al. 2014). 2014). Current paper information about aa system system directly from observed observed data small develops ML framework for predictive modelling based on without predetermining mechanistic relationships that govern biomedical datasets for classification and regression the system. Due to to the the ability ability of ML ML relationships algorithms to tothat adaptively develops a ML framework for predictive modelling based small biomedical datasets for classification and regression without predetermining mechanistic govern the system. Due of algorithms adaptively develops a ML framework for predictive modelling based on on without predetermining mechanistic relationships that govern tasks. small biomedical datasets for classification and regression the system. Due to the ability of ML algorithms to adaptively Specifically, the paper considers two biomedical improve their performance with each new data sample, ML has small datasets for and regression tasks. Specifically, the paper considers two the system. Due to of ML algorithms to improve their performance with data sample, ML has small biomedical biomedical datasets for classification classification and biomedical regression the system. Due to the the ability ability ofeach ML new algorithms to adaptively adaptively improve their performance with each new data sample, ML has tasks. Specifically, the paper considers two biomedical applications: become the core technology for numerous real-world applications: improve performance with new data ML become the core technology for numerous real-world tasks. Specifically, Specifically, the the paper paper considers considers two two biomedical biomedical improve their their performance with each each data sample, sample, ML has has tasks. applications: become the from coreweather technology fornew numerous real-world applications: forecasting and DNA sequencing, applications: become the core technology for numerous real-world applications: from weather forecasting and DNA sequencing, a. NNs for for prediction prediction of of Compressive Compressive Strength Strength (CS) (CS) of of human human applications: become the from coreweather technology for and numerous real-world a. applications: forecasting DNA sequencing, NNs to Internet search engines and stock market predictions. a. NNs for bone prediction of Compressive Strength (CS)model). of human applications: from forecasting and DNA to Internet search engines and stock market predictions. trabecular in severe osteoarthritis (regression applications: from weather weather forecasting and DNA sequencing, sequencing, to Internet search engines and stock market predictions. a. NNs for prediction of Compressive Strength (CS) of trabecular bone in severe osteoarthritis (regression model). Nevertheless, ML systems areand rarely viewed in the context of a. NNs for bone prediction of Compressive (CS)model). of human human trabecular in severe osteoarthritisStrength (regression to Internet engines stock market predictions. Nevertheless, ML systems rarely viewed in the context of to Internet search search enginesare and stock market predictions. Nevertheless, ML systems are rarely viewed in the context of trabecular bone in severe osteoarthritis (regression model). b. Decision Trees (DTs) for prediction of acute antibodysmall data, where insufficient number of training samples can trabecular bone in severe osteoarthritis (regression model). Nevertheless, ML systems rarely in context of Decision Trees (DTs) of small data, of samples Nevertheless, ML insufficient systems are arenumber rarely viewed viewed in the the contextcan of b. b. Decision Trees (ABMR) (DTs) for forofprediction prediction of acute acute antibodyantibodysmall data, where where insufficient number of training training samples can mediated rejection kidney transplants based on compromise the learning success (Forman & Cohen 2004; b. Decision Trees (DTs) for prediction of acute small data, where insufficient number of training samples can mediated rejection (ABMR) of kidney transplants based compromise the learning success (Forman & Cohen 2004; b. Decision Trees (DTs) for prediction of acute antibodyantibodysmall data, where insufficient number of training samples can mediated rejection (ABMR) of kidney transplants based on on compromise the learning success (Forman & Cohen 2004; pre-operative clinical indicators (classification model). Lanouette et al. al. 1999). mediated rejection (ABMR) of kidney transplants based compromise the learning success (Forman & Cohen 2004; pre-operative clinical indicators (classification model). Lanouette et 1999). rejection (ABMR) of (classification kidney transplants based on on compromise the1999). learning success (Forman & Cohen 2004; mediated pre-operative clinical indicators model). Lanouette et al. pre-operative clinical indicators (classification model). Lanouette et al. 1999). Small dataset conditions (less than 10 occurrences per pre-operative clinical indicators (classification model). Lanouette et al. 1999). Small conditions (less than per 2. METHODOLOGY Small dataset dataset conditions (less than 10 10 occurrences occurrences per 2. predictor variable) are characteristic of biomedical 2. METHODOLOGY METHODOLOGY Small dataset dataset conditions (less than 10 10 occurrences occurrences per predictor variable) are characteristic of biomedical Small conditions (less than per predictor variable) are complexity characteristic of high biomedical 2. METHODOLOGY engineering domain, where and the cost of 2. METHODOLOGY predictor variable) are characteristic of biomedical engineering domain, where and the cost of 2.1 Comparing ML model designs using multiple runs predictor variable) are complexity characteristic of high biomedical 2.1 Comparing ML model designs engineering domain, where complexity and the high cost of experiments restrain the the number of available available samples (Hudson 2.1 Comparing ML model designs using using multiple multiple runs runs engineering domain, where complexity and the high cost of experiments restrain number of samples (Hudson engineering domain, where complexity and samples the high(Hudson cost of 2.1 experiments restrain the number of available 2.1 Comparing Comparing ML ML model model designs designs using using multiple multiple runs runs & Cohen 2000). It has been argued that ML can offer an ML algorithms commonly contain a deliberate degree of experiments restrain the number of available samples (Hudson & Cohen 2000). It has been argued that ML can offer an experiments restrain the number of available samples algorithms commonly contain aa deliberate degree of & Cohen 2000). It has been argued that involving ML can (Hudson offer an ML ML algorithms commonly contain deliberate degree of indispensable tool for biomedical problems complex randomness in their training and initialization routines. & Cohen 2000). It has been argued that ML can offer an indispensable tool for biomedical problems involving complex & Cohen 2000). It has been argued that involving ML can complex offer an ML algorithms commonly contain a deliberate degree of randomness in their training and initialization routines. indispensable tool for biomedical problems ML algorithms commonly contain ainitialization deliberate degree of randomness in their training and routines. heterogeneous data when conventional statistical tools fail Random starting points are often necessary to improve the indispensable for biomedical problems involving complex heterogeneous data conventional fail indispensable tool tool forwhen biomedical problemsstatistical involvingtools complex randomness in their training and initialization routines. Random starting points are often necessary to improve the heterogeneous data when conventional statistical tools fail randomness in their training and initialization routines. Random starting points are often necessary to improve the (Inza et al. 2010; Campbell 2014; Grossi 2011). In applications convergence to the global minimum (Forman & heterogeneous data when conventional statistical tools (Inza al. 2010; Campbell Grossi 2011). In heterogeneous data when 2014; conventional statistical tools fail fail algorithm’s Random starting points often necessary to the algorithm’s convergence to the global minimum (Forman & (Inza et et al. 2010; Campbell 2014; Grossi 2011). In applications applications Random starting points are are often necessary to improve improve the algorithm’s convergence toCohen the global minimum (Forman &a such as gene selection (Hoff et al. 2008), heart Cohen 2004; Hudson & 2000). This comes with (Inza et et al. 2010; Campbell 2014; Grossi 2011).screening In applications applications such as gene selection (Hoff et al. 2008), screening heart (Inza al. 2010; Campbell 2014; Grossi 2011). In algorithm’s convergence to the global minimum (Forman & Cohen 2004; Hudson & Cohen 2000). This comes with such as gene selection (Hoff et al. 2008), screening heart algorithm’s convergence to the global minimum (Forman &a Cohen 2004; Hudson & Cohen 2000). This comes with murmurs in children (DeGroff et al. 2001), and predicting negative sideHudson effect &on onCohen the 2000). algorithm stabilitywith andaa such as gene gene selection(DeGroff (Hoff et etetal. al.al.2008), 2008), screening heart negative murmurs in children 2001), and predicting such as selection (Hoff screening heart This comes Cohen 2004; side effect the algorithm stability and murmurs in children (DeGroff et al. 2001), and predicting Cohen 2004; Hudson & Cohen 2000). This comes with a negative side effect on the algorithm stability and breast cancer relapse (Faradmal 2014), ML-based models generalisation, more pronounced when only murmurs in children children (DeGroffet etal. al. 2001), and predicting predicting breast cancer relapse (Faradmal et al. 2014), ML-based models murmurs in (DeGroff et al. 2001), and negative side side which effectbecomes on the the algorithm stability andaaa generalisation, which becomes more pronounced when only breast cancer relapse (Faradmal et al. 2014), ML-based models negative effect on algorithm stability and generalisation, which becomes more pronounced when only were able to map map highly non-linear input and output patterns patterns number of training samples is available. In other breast cancer relapse (Faradmal et 2014), ML-based models were to highly non-linear output breastable cancer relapse (Faradmal et al. al.input 2014),and ML-based models small generalisation, which becomes more pronounced when words, only aa small number of training samples is In other words, were able tomechanistic map highly non-linear input and outputvariables patterns generalisation, becomes more pronounced when only small number ofwhich training samples is available. available. Inmay other words, even when relationships between model a ML algorithm trained on a small dataset produce were able to map highly non-linear input and output patterns even when mechanistic relationships between model variables were able tomechanistic map highlyrelationships non-linear input andmodel outputvariables patterns asmall small number of training samples is available. In other words, ML algorithm trained on a small dataset may produce even when between of training samples is available. Inmay otherproduce words, a MLnumber algorithm trained on a small dataset could not bemechanistic determined due to pathologies or complexity. output patterns depending the random initial even relationships between model variables could not determined due or complexity. even when when between variables dissimilar ML algorithm algorithm trained on small on dataset may produce produce dissimilar output patterns depending on the random initial could not be bemechanistic determinedrelationships due to to pathologies pathologies or model complexity. aa ML trained on aa small dataset may dissimilar output patterns depending on the random initial conditions. Subsequently, various instances of the same smallcould not be determined due to pathologies or complexity. Nonetheless, the vast potential of ML for predictive modelling could not be determined due to pathologies or complexity. dissimilar output patterns depending on the random initial conditions. Subsequently, various instances of the same smalldissimilar output patterns depending on the random initial Nonetheless, the vast potential of ML for predictive modelling conditions. Subsequently, various instances of the same smallNonetheless, the vast potential of ML for predictive modelling data ML model would often exhibit erratic fluctuations in in bioengineering bioengineering remains largely unexplored. To extend extend the conditions. conditions. Subsequently, various instances of the the same smalldata ML model would often exhibit erratic fluctuations in Subsequently, various instances of same smallNonetheless, the vast potential of ML for predictive modelling in remains largely unexplored. To the in data ML model would often exhibitcomparison erratic fluctuations Nonetheless, the vast potential of ML for predictive modelling in bioengineering remains largely unexplored. To extend the performance. This prevents effective between ML benefits of ML to a wider range of bioengineering models, it data ML model would often exhibit erratic fluctuations in performance. This prevents effective comparison between ML data ML model would often exhibit erratic fluctuations in in bioengineering remains largely unexplored. To extend the benefits of ML to a wider range of bioengineering models, it performance. This prevents effective comparison between ML in bioengineering remains largelyofunexplored. To extend the benefits of ML to a wider range bioengineering models, it models and hinders hinders the possibility possibility of comparison their optimisation. optimisation. is essential to develop methods that would cope with the performance. This prevents effective between ML models and the of their This prevents effective between ML benefits of to range bioengineering models, it is essential to would with models and hinders the possibility of comparison their optimisation. benefits of ML ML to aa wider widermethods range of ofthat bioengineering it performance. is essential to develop develop methods that would cope copemodels, with the the limited data size. models and hinders hinders the possibility possibility of runs their in optimisation. We introduce a method of multiple order to provide models and the of their optimisation. is essential to develop methods that would cope with the limited data size. is essential to develop methods that would cope with the We introduce aa method of in to limited data size. We introduce method comparisons of multiple multiple runs runs in order order to provide provide _____________________________ means for consistent between various ML limited data size. _____________________________ limited data size. We introduce introduce method comparisons of multiple multiple runs runs in order order to provide provide means for consistent between various ML _____________________________ We aa method of in to means for consistent comparisons between various ML This work was supported by the EPSRC UK Grant EP/K02504X/1. models, which enables their subsequent optimisation. First, for _____________________________ This work was supported by the EPSRC UK Grant EP/K02504X/1. _____________________________ means for consistent comparisons between various ML models, which enables their subsequent optimisation. First, for means for consistent comparisons between various This work was supported by the EPSRC UK Grant EP/K02504X/1. models, which enables their subsequent optimisation. First,ML for This work was supported by the EPSRC UK Grant EP/K02504X/1. models, which enables their subsequent optimisation. First, for This work was supported by the EPSRC UK Grant EP/K02504X/1. models, which enables their subsequent optimisation. First, for
Copyright ©©2015 2015 IFAC 469Hosting by Elsevier Ltd. All rights reserved. 2405-8963© 2015,IFAC IFAC (International Federation of Automatic Control) Copyright 469 Copyright © 2015 469Control. Peer review underIFAC responsibility of International Federation of Automatic Copyright © 2015 IFAC 469 Copyright © 2015 IFAC 469 10.1016/j.ifacol.2015.10.185
9th IFAC BMS 470 Aug. 31 - Sept. 2, 2015. Berlin, GermanyTorgyn Shaikhina et al. / IFAC-PapersOnLine 48-20 (2015) 469–474
a given ML model a large number of instances with various initial conditions are generated and trained in parallel. Consequently the performance of the ML model is assessed not on a single instance, but repeatedly on a set of a few thousands of instances of the same model (hereafter run). The optimal design is then determined by comparing the average performance between runs of various ML models, even when individual instances cannot be compared. Once optimal model design is selected, the single best performing instance of that design is used as the final model.
3. DATA MODELS 3.1. Regression NN for osteoarthritic bone CS prediction The NN model was designed to predict the CS of an osteoarthritic trabecular bone from micro–CT indications of its morphology, level of interconnectivity, porosity, as well as patient’s gender and age. The detailed description of the dataset, comprising 35 human femora, can be found in the original study by Perilli et al. (2007). The samples were divided into training (22 samples) and validation (6 samples) sets using a random permutation, while remaining samples were reserved for test (7 samples) and fixed for every NN.
When applied to NNs and DTs, this strategy principally differs from the ensemble-NNs or Random Forests in that only the output of the best performing NN/DT instance is ultimately selected as the final predictive model.
Considering the size and the nature of the available data, a twolayer feedforward backpropagation NN was chosen as the base for the CS model with 5 input features and 1 output (Fig.1). The heterogeneous 1x5 input vector, 𝑥𝑥̅ , was stacked in the following order: x1 = Structure Model Index (SMI), x2 = trabecular thickness (Tb.Th), x3 = bone volume density (BV/TV), x4 = age and x5 = gender. The 5x4 input weights ̅̅̅̅̅′ , and the matrix, 𝐼𝐼𝐼𝐼, 4x1 layer weights column vector, 𝑙𝑙𝑤𝑤 ̅̅̅̅̅ corresponding biases 𝑏𝑏(1) and 𝑏𝑏(2) for each layer were initialized according to the Nguyen-Widrow method (Nguyen & Widrow 1990) in order to distribute the active region of each neuron in the layer approximately evenly across the layer's input space. Neurons in the hidden layer implemented a hyperbolic tangent sigmoid transfer function (Yonaba et al. 2010), while the output neuron computed the CS output from the input using a simple linear transfer function.
Choice for the size of the run, i.e. how many model NN/DT instances each run contains, was influenced by the need to balance between desired precision of performance measures and computational efficiency, as larger runs require more memory and time to simulate. For the two applications in this study we found that the minimum sizes of the run that maintained performance measures consistent to 3 decimal places were 2000 for NNs and 600 for DTs. 2.2 Validating ML models for regression tasks using surrogate data Small dataset conditions and the associated random effects make validation of ML models for regression tasks impractical. Conventional methods, such as cross-validation, may become unreliable when the number of independent test samples is limited. This necessitates an alternative approach for validating regression ML models in the presence of random effects due to small data.
NNs were trained using Leverberg-Marquardt backpropagation algorithm (More 1978). The cost function was evaluated by the mean squared error between the output and actual CS values. The NN performance was measured by regression factor, 𝑅𝑅, between the actual CS values and the values predicted by the NN. The techniques of early-stopping and cross-validation were implemented in order to avoid NN overtraining and hence ensured better generalisaiton (Fushiki 2009). The resulting NN model mapped the output, 𝑦𝑦 (in MPa) to the input vector, 𝑥𝑥:
Inspired by the success of the surrogate data approach (Theiler et al. 1992; Hirata et al. 2008) for biomedical and nonlinear physics applications, and neural coding, we propose to use surrogate data for validation of regression ML models built on small data. The surrogates were generated from random numbers to mimic the distribution of the original dataset independently for each component of the input vector. While resembling the original data statistically in terms of their mean, standard deviation and range, the surrogates do not retain the intricate interrelationships between the variables of the real dataset. Hence successful real-data models are expected to perform significantly better than the surrogate data models.
̅̅̅̅̅ ̅̅̅̅̅′ 𝑦𝑦 = tanh[𝑥𝑥̅ ∙ 𝐼𝐼𝐼𝐼 + 𝑏𝑏 (1) ] ∙ 𝑙𝑙𝑤𝑤 +𝑏𝑏(2)
In our proposed framework, validation with surrogate data was considered in the context of multiple runs and was used for comparison of the real-data NN model of the optimal design with the surrogate-data NN of the same design on a run of 2000 NN instances. To improve robustness, the experiment was replicated in 10 runs involving 20000 NNs in total.
(1)
Fig.1. NN model topology and layer configuration is represented by a 5D input vector, a hidden layer with 4 neurons and a single output neuron.
We demonstrate on the example of NNs that a ML regression model trained and tested on surrogates can be used as a benchmark for validating real data models by setting a performance threshold for the random effects due to small data. Defined as the highest performance achieved by surrogate models, this threshold indicates the lower performance boundary expected of the real data models.
The method of multiple runs was used to determine the optimal NN topology in terms of the hidden layer size and the earlystopping cut-off. Hidden layer sizes from 1 to 13 neurons and the early-stopping cut-off factors from 1 to 10 have been considered. Each of these 130 topology configurations was evaluated in a run of 2000 NNs. The optimal NN configuration 470
9th IFAC BMS Aug. 31 - Sept. 2, 2015. Berlin, GermanyTorgyn Shaikhina et al. / IFAC-PapersOnLine 48-20 (2015) 469–474
(Fig.1) comprised 4 neurons in the hidden layer and utilised a cut-off of 9 for early-stopping. Such iterative process using multiple runs involved 260000 individual NNs in total, but only the best performing NN from the 2000 with the optimal topology configuration, was selected as the final model for predicting CS in this application.
size: minimum 10 observations for the node to become a branch node and at least 1 observation per a leaf node. Notably, for a DT classifier, finding an optimal binary split for a continuous predictor is far less computationally intensive than for a categorical predictor with multiple levels. In former case, DT can split between any two adjacent values of a continuous vector, but for a categorical predictor with i levels, all of the 2i–1–1 splits need to be considered to find the optimal one. As an example: to identify the optimal split for the total number of HLA mismatches (i=7) the DT had to consider 127 possibilities.
3.2. Classification DT for prediction of early kidney rejection A classification model was developed for the prediction of acute/early antibody-mediated rejection within first 30 days after operation. The clinical focus of the model was to investigate how pre-treatment donor specific antibody (DSA) Immunoglobulin G (IgG) subclass levels, measured using Median Fluorescence Intensity (MFI) cytometry techniques, affect early outcome of transplantation when accounted for multiple baseline characteristics (Lowe et al. 2013). The following 15 parameters, measured before operation, were considered as input variables for the predictive model:
471
To avoid overfitting, 10-fold cross-validation was implemented in the DT design (Fushiki 2009). Using the method of multiple runs presented in section 2.1, 600 individual DTs were generated and the best performing model was selected. 4. RESULTS
7 continuous: highest IgG DSA MFI level, patient’s age, years on dialysis, and 4 total IgG subclass MFI levels, 4 categorical: cytometery cross-match (bead, flow or CDC), total number of HLA mismatches between donor and recipient (0-6), the number of class II HLA-DR mismatches (0-2), and the number of previous transplants (0-2), 4 binary: gender (male/female), delayed graft function (yes/no), live/deceased donor, and the presence of ClassI and ClassII (yes/no) HLA DSA.
4.1. NN model The optimal NN predicted CS with a root-mean-square error (rmse) of 0.85 MPa. The linear regression factors, R, between the actual and predicted CS, were 99.9% across the entire dataset and 98.3% on tests (Table I). The surrogate data approach described in section 2.2 revealed significant differences in performance between real and surrogate NNs (Fig.2). Evaluated across 10 runs of 2000 NN instances, the real data NNs consistently outperformed surrogate NNs with the corresponding increase in µ(𝑅𝑅𝑎𝑎𝑎𝑎𝑎𝑎 ) from 0.33 to 0.68 (Fig.2). The Wilcoxon rank sum test confirmed the statistical difference between the medians of the regression factors achieved by the two models with p