Clinical applications of machine learning in

CLINICAL REVIEW

European Heart Journal (2018) 0, 1–14 doi:10.1093/eurheartj/ehy404

Clinical applications of machine learning in cardiovascular disease and its relevance to cardiac imaging Subhi J. Al’Aref1†, Khalil Anchouche1†, Gurpreet Singh1, Piotr J. Slomka2, Kranthi K. Kolli1, Amit Kumar1, Mohit Pandey1, Gabriel Maliakal1, Alexander R. van Rosendael1, Ashley N. Beecy1, Daniel S. Berman2, Jonathan Leipsic3, Koen Nieman4, Daniele Andreini5, Gianluca Pontone5, U. Joseph Schoepf6, Leslee J. Shaw1, Hyuk-Jae Chang7, Jagat Narula8, Jeroen J. Bax9, Yuanfang Guan10, and James K. Min1* 1 Department of Radiology, NewYork-Presbyterian Hospital and Weill Cornell Medicine, New York, NY, USA; 2Departments of Imaging and Medicine and Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA, USA; 3Departments of Medicine and Radiology, University of British Columbia, Vancouver, BC, Canada; 4Departments of Cardiology and Radiology, Stanford University School of Medicine and Cardiovascular Institute, Stanford, CA, USA; 5Centro Cardiologico Monzino, IRCCS Milan, Italy; 6Division of Cardiovascular Imaging, Department of Radiology and Radiological Science and Division of Cardiology, Department of Medicine, Medical University of South Carolina, Charleston, SC, USA; 7Division of Cardiology, Severance Cardiovascular Hospital and Severance Biomedical Science Institute, Yonsei University College of Medicine, Yonsei University Health System, Seoul, South Korea; 8Zena and Michael A. Wiener Cardiovascular Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA; 9 Department of Cardiology, Heart Lung Center, Leiden University Medical Center, Leiden, The Netherlands; and 10Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA

Received 21 March 2018; revised 29 May 2018; editorial decision 22 June 2018; accepted 6 July 2018

Artificial intelligence (AI) has transformed key aspects of human life. Machine learning (ML), which is a subset of AI wherein machines autonomously acquire information by extracting patterns from large databases, has been increasingly used within the medical community, and specifically within the domain of cardiovascular diseases. In this review, we present a brief overview of ML methodologies that are used for the construction of inferential and predictive data-driven models. We highlight several domains of ML application such as echocardiography, electrocardiography, and recently developed non-invasive imaging modalities such as coronary artery calcium scoring and coronary computed tomography angiography. We conclude by reviewing the limitations associated with contemporary application of ML algorithms within the cardiovascular disease field.

................................................................................................................................................................................................... Cardiovascular disease

•

Machine learning (ML), an extension of the century-long quest for artificial intelligence (AI), has altered our collective conception of information and its seemingly boundless potential for guiding change. Machine learning is broadly defined as the ability of a system to autonomously acquire knowledge by extracting patterns from large data sets.1 This field has sparked tremendous innovation in all sectors of the technology industry, from speech recognition and sentiment analysis to spam filters, chat-bots and autonomous

.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .

Keywords

Machine learning

•

Introduction

Coronary computed tomography angiography

•

Echocardiography

driving. While the adoption of ML in the information technology sector is nearly ubiquitous, its introduction into the medical field has been much more subdued. The landscape, however, is rapidly changing. Equipped with novel ML frameworks, increasing computational power and the availability of big data, the ML community is now concentrating its efforts squarely at complex tasks in the healthcare sector. These efforts have borne their fruits, for example, in radiology, where an ML platform has been demonstrated to be as effective as a human radiologist in validating presumptive diagnoses.2 In pathology, ML has uncovered entirely new

* Corresponding author. Tel: þ1-212-746-6192, Fax: þ1-212-746-0129, Email: [email protected] †

The first two authors contributed equally to the content of the manuscript.

C The Author(s) 2018. For permissions, please email: [email protected]. Published on behalf of the European Society of Cardiology. All rights reserved. V

Downloaded from https://academic.oup.com/eurheartj/advance-article-abstract/doi/10.1093/eurheartj/ehy404/5060564 by Weill Cornell Medical Library user on 02 August 2018

2 prognostic histological features in breast cancer.3 More recently, in clinical cardiology, it has been shown that ML is more adept in the prediction of either cardiovascular or all-cause mortality than clinical or imaging modalities used separately.4,5 Altogether, the potential for ML to fundamentally change the way we practice medicine is now well-appreciated.6 In this review, the goals are three-fold: (i) to outline in detail the general methodology employed in ML endeavours for a clinical audience; (ii) to highlight some of the avenues in which ML has found application in cardiology; and (iii) to note a few limitations for its expanded use in healthcare.

Overview of machine learning Machine learning, a broad discipline with foundations in mathematics and computer science, proposes a set of novel algorithms and methodologies for the construction of inferential and predictive datadriven models. It is important, however, to first clearly delineate what ML can and cannot do. Machine learning does not constitute general intelligence. Rather, it is used to tackle classes of well-defined problems, which were historically too difficult, if not altogether impossible, to solve with rule-based paradigms (‘if . . . then . . .’). A representative and often-cited early application of ML is that of handwritten digit recognition.7 Early research in the field focused on the development of models capable of correctly assigning labels (0–9) to a set of images containing hand-written digits. Rather than delineating explicit criteria for the classification of each of the ten digits such as characteristic symmetries and geometric configurations, ML proposes using vast troves of data (in this case, other images of handwritten digits along with their correct labels) to ‘learn’ mathematical representations of these characters. Such representations can then be extended to make prediction on previously unseen images. This is an example of supervised learning, a framework in which a computer learns directly from large quantities of correctly labelled examples.8 It is within this context that it is vital to keep in mind that, as of now, state-of-the-art ML algorithms are still focused on discriminative learning and not generative modelling, thus are incapable of deciphering underlying true probability distribution and infer like humans. Unsupervised learning, on the other hand, does not necessitate any pre-defined human input; in this framework, an algorithm attempts to autonomously derive conclusions from unlabeled data. A common example of unsupervised learning is cluster analysis, where a dataset, without a priori knowledge of its true labels, is partitioned into clusters of ‘similar’ objects. The third type is known as reinforcement learning, which is reward-based learning that is typically used in robotic and gaming applications. We focus in this review primarily on supervised ML, which has found the widest applicability to date.

Data In order to employ the tools and techniques of supervised ML, data must first be framed appropriately. Datasets generally consist of a set of objects characterized by a number of features and their associated labels. For instance, a dataset might consist of patients represented by clinical and medical imaging parameters, with their associated

S.J. Al’Aref et al.

.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .

clinical outcomes, e.g. experienced a stroke/did not experience a stroke. Such information can be most compactly represented using matrix notation, whereby each row of a design matrix X corresponds to an object and each column to a feature, with its matching label contained in a vector y. Once raw data has been gathered and stored in a matrix, a number of pre-processing steps are then performed before it can be further manipulated. Typically, the categorical features are represented numerically using a technique known as one-hot encoding. One-hot encoding is the process of binarization of categorical values in order to transform the values into a format that has been observed to work well for classification and regression algorithms. Additionally, given the numerous imperfections which can plague imported datasets such as sparsity, the presence of outliers, and inter-variable differences in scale, further transforms such as imputation of missing data and normalization are then employed to assist in standardization. As it happens, many of these transforms are actually necessary prerequisites in the application of ML algorithms that rely on gradient descent, a frequently used numerical optimization algorithm. There are numerous strategies that can then be employed to determine the optimal feature representation of the objects in a dataset once it has been preprocessed, comprising the full gamut from feature selection (only the most ‘predictive’ features are used in the construction of a model) to feature extraction (multidimensional feature vectors are projected to a lower dimensionality subspace allowing for a more succinct representation of each object). Beyond this, feature engineering can also be performed, a labour-intensive process whereby handcrafted features are added to a dataset in the hopes of achieving enhanced classification. More novel ML frameworks, such as deep learning, automate this difficult task. These methodologies have been well characterized in the ML literature and are typically adapted empirically to the problem at hand.9

Algorithms Supervised ML algorithms can be employed to tackle both classification and regression problems. The goal in the former task is to correctly assign a binary or multi-class label, while in the latter, it is to correctly predict a real-valued output. Conveniently, many ML algorithms are sufficiently flexible to accomplish both types of analysis with only minor adaptations, although constraints such as interpretability, computational cost, and type of available data need to be considered in tailoring the choice of algorithm. We focus in this review primarily on classification tasks. Machine learning algorithms frequently employed in practice include linear and logistic regression, artificial neural networks (ANN), support vector machines (SVM), and tree-based methods. These individual models can then be combined with one another using ensemble learning, a methodology which leverages the power of multiple weak classifiers to achieve optimal overall performance. Datasets used in ML projects are typically partitioned into training, validation, and test subsets: the training set, which encompasses the bulk of all available data, is used for the primary development of the model; the validation set is used to estimate overall model performance or to fine-tune its hyperparameters. Hyperparameters, as opposed to standard model parameters, are weighted factors set before the learning process begins. Thereafter, repeated multi-fold training/validation (cross-validation) is


3

Clinical applications of machine learning in cardiovascular disease

Figure 1 Schematic diagram of splitting dataset for training machine learning model. Dataset is typically split into training and test subsets. The training set is used to develop the model in question, while the test set is used to assess its generalizability. During cross-validation, the training set is divided into two subsets; one is used to train the model and the second subset is used to validate the model at each iteration. Shown in the figure is an example of four-fold cross-validation wherein 75% of the training dataset was used for training and the remaining 25% was used for validation the results at each iteration.

Figure 2 Simplified representation of the decision boundary for model (A) underfitting, (B) optimal fitting, and (C) overfitting. An underfitted model is incapable of representing the actual data distribution. Usually in such cases, one should chose a more complex model such as increasing the degree of polynomia for regression. However, increasing the complexity beyond a certain limit can lead to an overfitted model. These models perform well on the training datasets but may fail on unseen datasets. Therefore, a trade-off between model complexity and data representation is desirable to obtain an optimally fitted model.

performed, which removes the variance in the created model due to just one random split selection into just one training and validation set. Finally, an independent (external) test set can be used to assess its generalizability once completely optimized (Figure 1). Once an algorithm and training framework is settled upon, the algorithm then

.. .. .. .. .. .. ..

‘learns’ by iteratively adapting its internal parameters to fit the training data via optimization of an objective function (Table 1). Ultimately, the challenge of ML is to efficiently perform these steps, while avoiding over-fitting (Figure 2). That is, the constructed model should not be so specific to the noise in the data on which it was initially trained so


4


Figure 3 Machine learning model capacity and error on a bias-variance spectrum. Bias refers to the error that is introduced by constructing an excessively simple model with poor prediction accuracy. Variance, on the other hand, refers to the error incurred by building an excessively complex model, attuned to noise in the training data. Here, we plot the training error vs. model capacity to show the bias-variance spectrum. Two distinct zones exist, namely, overfitting and underfitting zones. During model training, there may exist a hinge point that represents the achievable optimal capacity of the model. Prior to this point, the model has high bias and is underfitted. Subsequent to the hinge point, if trained, the model will eventually become overfitted and have high variance.

as to perform well in both training and validation, only to fail in reproducing those results on an independent test set. In ML parlance, such models are often said to be characterized by ‘high variance’ (Figure 3). Convolutional neural networks (CNNs) are a class of ML based approaches aptly suited for image analysis. Convolutional neural networks require minimal pre-processing and are generally composed of multiple convolution layers, where multiple kernels/filters of shared weights are used to find local patterns in organized data such as images. Convolutional neural networks consist of an input and an output layer, as well as intermediary hidden layers. Each layer is connected in an end-to-end manner, and weights are trained using backpropagation. Convolutional neural networks have become the ‘goto’ approach for feature extraction in cardiovascular imaging sets, and examples of such networks include U-Net, VGG, Faster RCNN.10 A significant limitation associated with CNN is highlighted by adversarial examples, wherein CNNs erroneously misclassified a completely unrecognizable image into a recognizable object with high confidence (for instance, calling pixelated noise as an animal).11 Towards this, a class of unsupervised learning based CNNs, called of generative adversarial networks (GANs) has been proposed and is still an active area of research for data augmentation. Another recent breakthrough towards mitigating such pitfalls is the inception of Capsule Networks (CapsNet) that instead of stacking deeper layers has capsules of CNN layers.12,13

Performance metrics and model refinement Two crucial decisions must be made early in the construction of an ML pipeline: namely, the selection of a performance metric and the

.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..

designation of appropriate benchmarks. These measures are of paramount importance, as they are used to quantify the degree to which the constructed model is successfully achieving its designated task and will guide all subsequent adjustments (such as gathering more data, fine-tuning hyperparameters, employing regularization, or using an entirely different learning algorithm) (Figure 4). The choice of a performance metric is largely problem-dependent. For example, while accuracy is often an intuitive and convenient metric for use in a binary classification problem, it may actually mask poor model performance if it is attempting to predict an exceedingly infrequent event, such as the occurrence of a rare disease. In such a scenario, a classifier which indiscriminately labelled all patients as ‘normal’ might achieve near perfect accuracy, while actually misclassifying every occurrence of the disease. In the medical literature, the performance of a classifier is typically reported using a receiver-operating characteristic curve along with its corresponding area under the curve (AUC or C-statistic), which allows for the quantification of both sensitivity and specificity at every classifier threshold. Other performance metrics also exist, such as squared error. Importantly, it should be noted that ML, for datasets with real-life complexity, may give very high accuracies but without guaranteeing correct prediction. Given any task, it is then merely a matter of experimentation to determine the optimal model with the greatest potential for generalizability. Theoretically, the lowest attainable error is known as the Bayes error rate. This ceiling on performance exists because most phenomena studied in the natural world are permeated by noise. Consider, for example, two patients characterized by identical clinical parameters; while it may be reasonable to assume that such individuals will, on average, experience similar clinical outcomes, it is impossible to make such a claim with absolute certainty because the system being approximated is probabilistic rather than deterministic.


5


Table 1

An overview of algorithms commonly used in machine learning

Algorithm

Overview

Illustration

Artificial neural networks

An ANN consists of a set of nodes (often referred to as ‘neurons’) configured in layers (input, hidden, and output), connected to one another via weighted edges. Input feature

....................................................................................................................................................................................................................

(ANN)

vectors are processed sequentially by every layer in the net via non-linear transformations, before an output (e.g. a class label) is generated upon reaching the final layer. During the training process, if the output of the ANN is incorrect, the edge weights are incrementally adjusted to account for the error via an algorithm known as back-propagation. The ANN is foundational to deep learning.

Support vector machine (SVM)

The SVM classifier is constructed by projecting training data into a higher dimensional space via mappings known as kernels, and devising in this new space a boundary (formally known as a hyperplane), which maximizes separation between the classes. New examples are then projected into this higher dimensional space, where this previously learned boundary is used to assign labels.

Decision Tree

The decision tree is the simplest tree-based machine learning model. The aim is to recursively construct a tree structure, which can accurately assign labels given an input feature vector by creating the appropriate ‘splits’, a process known as recursive partitioning. Importantly, trees can be combined using ensemble learning to yield potent classifiers such as random forests and boosted trees.

k-Nearest neighbours (KNN)

In KNN, every object being classified is compared to its k nearest training examples via a distance function, where k is an integer; its label is then assigned by majority vote.

Domains of application Machine learning has yet to find a prominent role in clinical cardiology. While a plethora of studies have been published during the past decade examining its potential utility in various clinical contexts, currently, there is no consensus regarding the manner in which to construct and apply such models or objectively evaluate their results. In light of this, we highlight in this review the numerous domains within cardiovascular medicine where ML algorithms have been employed, ranging from coronary artery disease evaluation to heart failure phenotyping (Table 2).

Electrocardiography Automated electrocardiogram (ECG) interpretation, an enterprise initially undertaken in the 1960s with the advent of digital ECG machines, is now almost universal.28 It was the first instance in which rudimentary AI (likely a rule-based expert system) effectively

.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .

streamlined hospital-care and cut costs.29,30 Modern ML models are now able to identify different wave morphologies (QRS complexes, P and T waves) with high precision; using this information, clinically significant parameters such as heart rate, axis deviation, and interval lengths can then be calculated. Models have also been proposed for the high fidelity detection of both ST-changes as well as common rhythm disturbances, such as atrial fibrillation and interventricular conduction delay (more complex arrhythmias, however, still often necessitate human validation). At the root of this automation are storied ML algorithms, including ANN, SVM, and hidden Markov models.31 In recent years, numerous ECG pre-processing, feature extraction and classifier algorithms have been detailed in an everexpanding literature using tools borrowed primarily from the fields of signals processing and wavelet analysis.32–37 The accuracy of these models has been historically evaluated using annotated public domain ECG repositories, such as the Massachusetts Institute of Technology—Beth Israel Hospital (MIT-BIH) arrhythmia and European ST-T databases.38,39 For example, Zhao et al. proposed a simple but highly predictive framework in which ECG tracings were


6


Figure 4 A general outline of the step-by-step approach to machine learning.

characterized using both a wavelet transform and autoregressive modelling. These feature vectors were then used to classify the tracing as one of five common arrhythmias using an SVM ML algorithm with a Gaussian kernel. This methodology achieved test set classification accuracies of 100%, 98.66%, 100%, 99.66%, and 100% for sinus rhythm, left bundle branch block, right bundle branch block, premature ventricular contraction, and premature atrial contraction, respectively, on the MIT-BIH dataset.33 Similarly, a number of preprocessing algorithms and classifiers have been proposed for the detection of ischaemic changes. For example, a neural network for the detection of ST-changes proposed by Afsar et al.,40 using wavelettransformed ECG signals as input, achieved a sensitivity of 90.75% and positive predictive value of 89.2% on the European Long-Term ST-T Database. More recently, the Stanford Machine Learning Group used a 34-layer convolutional neural network, a form of deep learning) to detect a broad range of arrhythmias and found that the model exceeded the average cardiologist performance in both recall and precision compared to board-certified cardiologists.41

Echocardiography Two-dimensional echocardiography (2DE) is ubiquitously used to guide the diagnosis and down-stream management of numerous cardiac pathologies. Detailed echocardiography, however, is resource-intensive, often resulting in visual estimation rather than precise calculation.42 Machine learning proposes to automate many of these processes. There are already a number of widely-adopted

.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .

commercial softwares developed for the functional analysis of 2DE data (e.g. EchoPAC by GE healthcare, QLAB by Philips etc.), which can perform an assortment of tasks ranging from segmentation and detection of anatomic landmarks to blood tracking. All results from such analyses, however, are wholly dependent upon careful annotation of input data, which is implicitly deferred to the clinician. That is, in order to use these tools, the physician evaluating the imaging study must manually select the correct study, choose an appropriate window, and specify which parameters to compute. These steps, while trivial in the evaluation of an individual echocardiogram, become limitations when analysing significantly larger datasets. Khamis et al.14 demonstrated that apical two-chamber (A2C), apical four-chamber (A4C), and apical long-axis (ALX) images could be correctly classified using a novel spatio-temporal feature extraction and supervised dictionary learning. This methodology achieved diagnostic accuracies of 97%, 91% and 97% for A2C, A4C, and ALX images, respectively. In 2015, Knackstedt et al.15 demonstrated that the ejection fraction and longitudinal strain could be reliably and reproducibly computed from echocardiographic data using commercially available proprietary ML software. The automated values generated were compared with those obtained via visual estimation and manual tracing, demonstrating significantly improved speed (automated analysis was completed within an average of 8 s) without any loss in accuracy. More recently, a number of deep learning architectures have been proposed for left ventricular volume estimation and segmentation, as well as viewpoint classification.43,44 Finally, ML has also been utilized to detect and characterize valvular and anatomic pathology, as well as to enhance the quality of existing echocardiograms16–18,45–50 (Figure 5 and Table 2).


7


Figure 5 Spectrum of applications for advanced machine learning algorithms in clinical echocardiography.

Coronary artery calcium scoring and coronary computed tomography angiography Non-invasive imaging has become instrumental in establishing the presence of coronary artery disease (CAD) and subsequent determination of downstream prognosis. Coronary artery calcium (CAC) scoring and/or coronary computed tomography angiography (CCTA) provide the ability to assess atherosclerosis both qualitatively and quantitatively, while CCTA can further provide information on stenosis severity of a particular atherosclerotic lesion. Machine learning algorithms have been extensively used for optimization of information extraction from such imaging modalities. For example, Takx et al.51 used an ML-based approach that used supervised classification systems, including direct classification with the nearest neighbour classifier, and two-stage classification with nearest neighbour and SVM classifiers for the automation of CAC scoring in low-dose, noncontrast enhanced, non-ECG-gated chest CT. They found that such an approach resulted in acceptable reliability and agreement when compared to a manually determined reference standard for CAC scoring. Similar results were obtained by Isgum et al.52,53 when comparing the accuracy of an automated CAC scoring system with that of manually annotated scans. Furthermore, Kang et al.54 used a twostep ML algorithm, utilizing an SVM algorithm as one of the base decisions, in order to automate the process of coronary stenosis evaluation on CCTA. They found that such an approach resulted in an accuracy of 94% and an AUC of 0.94 for the automated detection of non-obstructive and obstructive CAD.

.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..

For the assessment of the functional significance of atherosclerotic lesions, an automated ML-based algorithm segmented the coronary tree with and without accounting for the partial volume effect (PVE). Partial volume effect could lead to overestimation of lumen area in coronary vessels with a small diameter, which is an important parameter in the determination of the haemodynamic significance of a coronary lesion. As such, incorporation of the PVE improved the AUC for the detection of haemodynamically significant CAD from 0.76 to 0.80 when compared with invasively assessed fractional flow reserve (FFR).55 Machine learning has also been used to compute a fractional flow reserve (FFRCT) directly from CT angiographic features without the traditional computational fluid dynamics approach and showed incremental prognostic value for the determination of the risk of future adverse events.56–58 In related work, ML (LogitBoost algorithm) has been applied by Dey et al.20 in a multicentre study of 254 patients with CCTA to predict the probability of low value by invasive FFR by considering a number of CCTA image features derived by vessel extraction and plaque characterization software. Machine learning exhibited higher AUC (0.84) than any individual CCTA image measures, including stenosis severity (0.76), low-density non-calcified plaque (0.77), or total plaque volume (0.74). Such techniques may allow clinical implementation of a risk score for the identification functionally significant lesions (low invasive FFR) by rapid non-invasive CCTA image analysis. CT-based imaging has been also used for the determination and quantification of myocardial perfusion, in an effort to increase CCTA’s accuracy for the detection of physiologically significant coronary stenosis. A trained AdaBoost classifier that incorporates three


8


Table 2 Summary of contemporary data on published literature using machine learning algorithms in cardiovascular medicine Study

Objectives and key findings

Algorithm/tool

Sample size

Automated view classification of three standard echocardiographic views Accuracy 97% for apical two-chamber, 91% for apical four-chamber, 97% for

Spatio-temporal feature extraction and diction-

309 clips

.................................................................................................................................................................................................................... Echocardiography Khamis et al.14

apical long axis. Overall accuracy 95% Knackstedt et al.15

Echocardiographic left ventricular ejection fraction determination compared with reference manual tracking ICC 0.83. Mean bias -0.3, 95% CI -1.5–0.9

ary learning based classification AutoLV (TomTec-Arena

255 patients

1.2, TomTec Imaging Systems, Unterschleissheim, Germany)

Moghaddasi et al.16

Automated echocardiographic assessment of mitral regurgitation Accuracy of 99.5%, 99.4%, 99.3%, and 99.6% to detect none, mild, moderate,

Support vector machine

102 patients

94 patients

and severe mitral regurgitation, respectively

Sengupta et al.17

Differentiation between constrictive pericarditis vs. restrictive cardiomyopathy AUC of 0.962/accuracy 93.7% using (speckle tracking)

Associative memory classifier-based machine

Narula et al.18

Discrimination between hypertrophic cardiomyopathy from physiological hypertrophy seen in athletes

Ensemble model building (artificial neural net-

learning algorithm

AUC 0.795; sensitivity 87% and specificity 82% for optimal cut-off points of the model

143 patients

work, support vector machines and random forest)

Computed tomography Motwani et al.4

Prediction of 5-year all-cause mortality among patients undergoing CCTA for suspected CAD

LogitBoost

10 030 patients

Gradient boosting classi-

252 patients

AUC 0.79 using clinical and CCTA variables; AUC 0.64 for segment stenosis Han et al.19

score Prediction of abnormal FFR among patients undergoing CCTA (and resting CT myocardial perfusion analysis derived using ML) AUC 0.75 for model including ML derived CT perfusion analysis vs. AUC 0.68 without CT perfusion

Dey et al.20

Prediction of FFR using semi-automated quantitative CCTA to derive plaque information AUC 0.84 for the integrated ML model, compared to 0.77 for low density

fier for CT perfusion analysis Boosted ensemble

254 patients

algorithm

non calcified plaque volume Rosendael van et al.21

Prediction of major cardiovascular events among patients undergoing CCTA for suspected CAD. Only CCTA variables were used by the ML model

Extreme gradient boosting (XGBoost)

8844 patients

LogitBoost

1181 patients

LogitBoost

713 patients

Boosted ensemble algorithm

2619 patients

AUC 0.771 for ML model; AUC 0.701 for segment stenosis score Myocardial perfusion imaging Prediction of obstructive CAD (>_70% stenosis) on ICA from automated single Arsanjani et al.22

photon MPI analysis AUC 0.94 for ML model (clinical and quantitative MPI variables); significantly higher than expert MPI reading

Arsanjani

Prediction of early coronary revascularization by quantitative MPI analysis

et al.23

AUC 0.81 for ML model including MPI and clinical variables. AUC 0.77 for standalone perfusion measures

Betancur et al.24

Prediction of MACE using clinical information combined with MPI data, integrated by ML AUC 0.81 for ML model; significantly higher than a ML model including only MPI data (AUC 0.78) or five-point scale visual diagnosis by a physician (AUC 0.65) or standard image quantification (AUC 0.71)

Continued


9


Table 2

Continued

Study

Objectives and key findings

Algorithm/tool

Sample size

Prediction of impaired myocardial flow reserve on PET using quantitative pla-

Boosted ensemble

51 patients

.................................................................................................................................................................................................................... Dey et al.25

que features from CCTA AUC 0.83 for composite score including all quantitative CCTA features; AUC

algorithm

0.66 for quantitative stenosis severity Heart failure Frizzell et al.26

Prediction of 30-day readmission of patients with heart failure AUC for 5 ML models using different algorithms ranging from 0.607 to 0.624; AUC 0.589 for prior validated electronic health records model

A tree-augmented naive

56 477 patients

Bayesian network, a random forest algorithm, and a gradientboosted model

Mortazavi et al.27

Prediction of 30-day hospital readmission rates among patients with heart failure

Random forests, boosting, random forests com-

AUC 0.628 for ML model to predict all cause readmission and 0.678 for heart

bined hierarchically

failure readmission vs. 0.533 and 0.543 for logistic regression, respectively

with support vector machines

977 patients

AUC, area under the receiver operating curve; CAD, coronary artery disease; CCTA, coronary computed tomography angiography; CI, confidence interval; FFR, fractional flow reserve; ICA: invasive coronary angiography; ICC, intraclass correlation coefficient; MACE: major adverse cardiovascular events; ML, machine learning; MPI, myocardial perfusion imaging; PET: positron emission tomography; SPECT, single photon emission computed tomography; ML, machine learning.

myocardial features obtained from rest CCTA images, such as normalized perfusion intensity, transmural perfusion ratio, and myocardial wall thickness showed a sensitivity of 0.79 and a specificity of 0.64, with an overall accuracy of 0.70, for establishing the presence of obstructive CAD.59 Additionally, a supervised approach using a gradient boosting classifier to analyse resting CT perfusion images improved the ability to detect ischaemia (defined as invasive FFR