applied sciences Article
Visualization and Interpretation of Convolutional Neural Network Predictions in Detecting Pneumonia in Pediatric Chest Radiographs Sivaramakrishnan Rajaraman * , Sema Candemir , Incheol Kim, George Thoma and Sameer Antani Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, MD 20894, USA;
[email protected] (S.C.);
[email protected] (I.K.);
[email protected] (G.T.);
[email protected] (S.A.) * Correspondence:
[email protected]; Tel.: +1-301-827-2383 Received: 25 August 2018; Accepted: 18 September 2018; Published: 20 September 2018
Abstract: Pneumonia affects 7% of the global population, resulting in 2 million pediatric deaths every year. Chest X-ray (CXR) analysis is routinely performed to diagnose the disease. Computer-aided diagnostic (CADx) tools aim to supplement decision-making. These tools process the handcrafted and/or convolutional neural network (CNN) extracted image features for visual recognition. However, CNNs are perceived as black boxes since their performance lack explanations. This is a serious bottleneck in applications involving medical screening/diagnosis since poorly interpreted model behavior could adversely affect the clinical decision. In this study, we evaluate, visualize, and explain the performance of customized CNNs to detect pneumonia and further differentiate between bacterial and viral types in pediatric CXRs. We present a novel visualization strategy to localize the region of interest (ROI) that is considered relevant for model predictions across all the inputs that belong to an expected class. We statistically validate the models’ performance toward the underlying tasks. We observe that the customized VGG16 model achieves 96.2% and 93.6% accuracy in detecting the disease and distinguishing between bacterial and viral pneumonia respectively. The model outperforms the state-of-the-art in all performance metrics and demonstrates reduced bias and improved generalization. Keywords: computer vision; computer-aided diagnosis; convolutional neural networks; pediatric; pneumonia; visualization; explanation; chest X-rays; clinical decision
1. Introduction Pneumonia is a significant cause of mortality in children across the world. According to the World Health Organization (WHO), around 2 million pneumonia-related deaths are reported every year in children under 5 years of age, making it the most significant cause of pediatric death [1]. Pneumonia sourced from bacterial and viral pathogens are the two leading causes and require different forms of management [2]. Bacterial pneumonia is immediately treated with antibiotics while viral pneumonia requires supportive care, making timely and accurate diagnosis important. Chest X-ray (CXR) analysis is the most commonly performed radiographic examination for diagnosing and differentiating the types of pneumonia [3]. However, rapid radiographic diagnoses and treatment are adversely impacted by the lack of expert radiologists in resource-constrained regions where pediatric pneumonia is highly endemic with alarming mortality rates. Figure 1 shows sample instances of normal and infected pediatric CXRs.
Appl. Sci. 2018, 8, 1715; doi:10.3390/app8101715
www.mdpi.com/journal/applsci
Appl. Sci. 2018, 8, 1715 Appl. Sci. 2018, 8, x FOR PEER REVIEW
2 of 17 2 of 18
Figure 1. Pediatric CXRs:(a) (a) Normal Normal CXR clear lungs with with no abnormal opacification; (b) Figure 1. Pediatric CXRs: CXRshowing showing clear lungs no abnormal opacification; Bacterial pneumonia exhibiting focal lobar consolidation in the right upper lobe; (c) Viral pneumonia (b) Bacterial pneumonia exhibiting focal lobar consolidation in the right upper lobe; (c) Viral pneumonia manifesting with diffuse interstitial patterns in both lungs. manifesting with diffuse interstitial patterns in both lungs.
Computer-aided diagnostic (CADx) tools aim to supplement clinical decision-making. They Computer-aided diagnostic (CADx) tools aim to supplement clinical decision-making. combine elements of computer vision and artificial intelligence with radiological image processing They combine elements of computer vision and artificial intelligence with radiological image processing for recognizing patterns [4]. Much of the published literature describes machine learning (ML) for recognizing patterns [4]. Much of the published literature describes machine learning (ML) algorithms that use handcrafted feature descriptors [5] that are optimized for individual datasets and algorithms that use handcrafted descriptors that are optimized individual trained for specific variability infeature size, orientation, and[5] position of the region of for interest (ROI) [6].datasets In and trained for specific variability in size, orientation, and position of the of interest (ROI) [6]. recent years, data-driven deep learning (DL) methods are shown to avoid the region issues with handcrafted In recent years, data-driven deepfeature learning (DL) methods are shown to avoid the issues with handcrafted features through end-to-end extraction and classification. Convolutional neuralfeature networks (CNNs) belong to a class of DL models that are prominently features through end-to-end extraction and classification. used in computer vision [7]. These modelsbelong have multiple layersthat to learn hierarchical used Convolutional neural networks (CNNs) to a classprocessing of DL models are prominently feature representations from the input pixel data. The features in the early network layers are in computer vision [7]. These models have multiple processing layers to learn hierarchical feature abstracted through the mechanisms of local receptive fields, weight sharing, and pooling to form rich representations from the input pixel data. The features in the early network layers are abstracted feature representations toward learning and classifying the inputs to their respective classes. Due to through the mechanisms of local receptive fields, weight sharing, and pooling to form rich feature lack of sufficiently extensive medical image data, CNNs trained on large-scale data collections such representations learning and the classifying the their respective to lack as ImageNettoward [8] are used to transfer knowledge of inputs learnedto representations in theclasses. form of Due generic of sufficiently extensive image data,are CNNs trainedtoondeliver large-scale dataresults collections such as image features to themedical current task. CNNs also shown promising in object ImageNet [8] and are localization used to transfer the knowledge of learned representations in the form of generic detection tasks [9]. image features to the current task. are also shownwith to deliver in object detection The astounding success of CNNs deep CNNs coupled lack of promising explainableresults decision-making has resulted in atasks perception and localization [9]. of doubt. This poorly understood model behavior has limited their use in routine clinical practice ThereCNNs aren’t coupled enough with studies pertaining to the visualization and has The astounding success[10]. of deep lack of explainable decision-making interpretation of CNNs in medical image analysis/understanding applications. In this article, we (i) use resulted in a perception of doubt. This poorly understood model behavior has limited their detect and distinguish pneumonia types in pediatric CXRs, and (ii) explain the internal operations in routine clinical practice [10]. There aren’t enough studies pertaining to the visualization and and predictions of CNNs applied to this challenge. interpretation of CNNs in medical image analysis/understanding applications. In this article, In this study, we evaluate, visualize, and explain the predictions of CNN models in classifying we (i)pediatric detect and pneumonia in pediatric CXRs, between and (ii)bacterial explainand theviral internal CXRsdistinguish to detect pneumonia and types furthermore to differentiate operations and predictions of CNNs applied to this challenge. pneumonia to facilitate swift referrals that require urgent medical intervention. We propose a novel In this study, we evaluate, visualize, the predictions ofcorrect CNN models in classifying method to visualize the class-specific ROIand thatexplain is considered significant for predictions across pediatric detect pneumonia andclass. furthermore toand differentiate bacterial and viral all theCXRs inputstothat belong to an expected We evaluate statisticallybetween validate the performance of different customized CNNs that isthat trained end-to-end the dataset under study provide aannovel pneumonia to facilitate swift referrals require urgenton medical intervention. Wetopropose accurate and timely of the ROI pathology. work is organized as follows: Section 2 discusses method to visualize thediagnosis class-specific that isThe considered significant for correct predictions across the related work, Section 3 elaborates on the materials and methods, Section 4 discusses theperformance results, all the inputs that belong to an expected class. We evaluate and statistically validate the and Section 5 concludes the study. of different customized CNNs that is trained end-to-end on the dataset under study to provide an accurate and timely 2. Related Work diagnosis of the pathology. The work is organized as follows: Section 2 discusses the related work, Section 3 elaborates on the materials and methods, Section 4 discusses the results, A study of the literature reveals several works pertaining to the use of handcrafted features for and Section 5 concludes the study. detecting pneumonia in chest radiographs [11–14]. However, few studies reported the performance of DL Work methods applied to pneumonia detection in pediatric CXRs. Relatively few researchers 2. Related
A study of the literature reveals several works pertaining to the use of handcrafted features for detecting pneumonia in chest radiographs [11–14]. However, few studies reported the performance of DL methods applied to pneumonia detection in pediatric CXRs. Relatively few researchers attempted to
Appl. Sci. 2018, 8, 1715
3 of 17
offer a qualitative explanation of their model’s learned behavior, internal computations, and predictions. The authors of [15] used a pretrained InceptionV3 model as a fixed feature extractor to classify normal and pneumonia-infected pediatric CXRs and further distinguish between bacterial and viral pneumonia with an area under the curve (AUC) of 0.968 and 0.940 respectively. In another study [4], the authors used a gradient-based ROI localization algorithm to detect and spatially locate pneumonia in CXRs. They released the largest collection of the National Institutes of Health (NIH) CXR dataset that contains 112,120 frontal CXRs, the associated labels are text-mined from radiological reports using natural language processing tools. The authors reported an AUC of 0.633 toward detecting the disease. The authors of [16] used a gradient-based visualization method to localize the ROI with heat maps toward pneumonia detection. They used a 121-layer densely connected neural network toward estimating the disease probability and obtained an AUC of 0.768 toward detecting pneumonia. The authors of [17] used an attention-guided mask inference algorithm to locate salient image regions that stand indicative of pneumonia. The features of local and global network branches in the proposed model are concatenated to estimate the probability of the disease. An AUC of 0.776 is reported for pneumonia detection. 3. Materials and Methods 3.1. Data Collection and Preprocessing We used a set of pediatric CXRs that have been made publicly available by the authors of [15]. The authors have obtained approvals from the Institutional Review Board (IRB) and Ethics Committee toward data collection and experimentation. The dataset includes anteroposterior CXRs of children from 1 to 5 years of age collected from Guangzhou Women and Children’s Medical Center in Guangzhou, China. The characteristics of the data and its distribution are shown in Table 1. The dataset is screened for quality control to remove unreadable and low-quality radiographs and curated by experts to avoid grading errors. Table 1. Dataset and its characteristics. Category
Training Samples
Test Samples
File Type
Normal Bacterial Viral
1349 2538 1345
234 242 148
JPG JPG JPG
The CXRs contain regions other than the lungs that do not contribute to diagnosing pneumonia. Under these circumstances, the model may learn irrelevant feature representations from the underlying data. Using an algorithm based on anatomical atlases [18] to automatically detect the lung ROI can avoid this. A reference set of patient CXRs with expert-delineated lung masks are used as models [19] to register with the objective pediatric CXR. When presented with an objective chest radiograph, the algorithm uses the Bhattacharyya distance measure to select the most similar model CXRs. The correspondence between the model CXRs and objective CXR is computed by modeling the objective CXR with local image feature representations and identifying similar locations by applying SIFT-flow algorithm [20]. This map is the transformation applied to the model lung masks to transform them into the approximate lung model for the objective chest radiograph. The lung boundaries are cropped to the size of a bounding box to include all the lung pixels that constitute the ROI for the current task. The baseline data (whole CXRs) and the cropped bounding box are resampled to 1024 × 1024 pixel dimensions and mean normalized to assist the models in faster convergence. The detected lung boundaries for the sample pediatric CXRs are shown in Figure 2.
Appl. Sci. 2018, 8, 1715
4 of 17
Appl. Sci. 2018, 8, x FOR PEER REVIEW
4 of 18
Figure2.2. Detected Detected boundaries in sample Figure sample pediatric pediatricCXRs. CXRs.
3.2. 3.2.Configuring ConfiguringCNNs CNNsfor forPneumonia PneumoniaDetection Detection We and aaVGG16 VGG16model modelinindetecting detecting Weevaluated evaluatedthe theperformance performanceof of different different customized customized CNNs and pneumonia viral types typestotofacilitate facilitatetimely timelyand and pneumoniaand andfurthermore furthermoredistinguishing distinguishingbetween between bacterial and viral accurate disease diagnosis. We evaluated the performance of three different CNN architectures: accurate disease diagnosis. We evaluated the performance of three customized different customized CNN (i) Sequential (ii) CNNconnections with residual connections (Residual CNN); CNN (i)architectures: Sequential CNN; (ii) CNNCNN; with residual (Residual CNN); and, (iii) CNNand, with(iii) Inception with Inception modules modules (Inception CNN).(Inception CNN). 3.2.1. 3.2.1.Sequential SequentialCNN CNN AAsequential feed-forwardartificial artificialneural neuralnetworks networks sequentialCNN CNNmodel model belongs belongs to to the the class class of deep, feed-forward that is aa linear linearstack stackofofconvolutional, convolutional,nonlinear, nonlinear, thatare arecommonly commonlyapplied applied to to visual visual recognition recognition [7]. ItIt is pooling,and anddense denselayers. layers.We Weoptimized optimized the the sequential sequential CNN pooling, CNN architecture architectureand andits itshyperparameters hyperparameters forthe thedatasets datasetsunder understudy study through through Bayesian Bayesian learning learning [21,22]. The for Theprocedure procedureuses usesaaGaussian Gaussian processmodel modelofofan anobjective objectivefunction function and and its its evaluation evaluation to optimize process optimize the the network networkdepth, depth,learning learning rate,momentum, momentum, and and L2-regularization. L2-regularization. These areare passed as arguments in the formform of rate, Theseparameters parameters passed as arguments in the optimization variables to evaluate the objective function. We initialized the search ranges to [110], of optimization variables to evaluate the objective function. We initialized the search ranges[1to 10 1network × 10−7[11 × × 10 × 10−10and 1 × [1 10−2 rate, momentum, [110], 10−1−],7 [0.7 1 ×0.99], 10−1 ],and [0.7[10.99], ×] for 10−the × 10−2 ]depth, for thelearning network depth, learningand rate, L2-regularization respectively. Therespectively. objective function takes these variables as these input,variables trains, validates momentum, and L2-regularization The objective function takes as input, and saves the optimal network that gives the minimum error on the testerror data.on Figure 3 trains, validates and saves the optimal network that givesclassification the minimum classification the test illustrates the steps involved in optimization. data. Figure 3 illustrates the steps involved in optimization. 3.2.2.Residual ResidualCNN CNN 3.2.2. sequentialCNN, CNN,the the succeedingnetwork network layer learns the feature representations from only InIna asequential succeeding layer learns the feature representations from only the the preceding layer. These networks are constrained by the level of information they can process. preceding layer. These networks are constrained by the level of information they can process. Residual Residual are networks are by proposed bywon [23] the thatImageNet won the ImageNet Large Scale Visual Recognition networks proposed [23] that Large Scale Visual Recognition (ILSVRC) (ILSVRC) Challenge in 2015. These networks tackle the issue of representational Challenge in 2015. These networks tackle the issue of representational bottlenecks bottlenecks by injectingby the injecting thefrom information from the earlier network layers downstream prevent loss of information. information the earlier network layers downstream to prevent to loss of information. They also They also prevent the gradients from vanishing by introducing a linear information carry track to prevent the gradients from vanishing by introducing a linear information carry track to propagate propagate gradients through deep network layers. In this study, we propose a customized CNN that gradients through deep network layers. In this study, we propose a customized CNN that is made up is made up of six residual blocks, as shown in Figure 4. of six residual blocks, as shown in Figure 4. 3.2.3.Inception InceptionCNN CNN 3.2.3. The Inception architecture, proposed by [24] consists of independent modules having parallel The Inception architecture, proposed by [24] consists of independent modules having parallel branches that are concatenated to form the resultant feature map that is fed into the succeeding branches that are concatenated to form the resultant feature map that is fed into the succeeding modules. modules. Unlike sequential CNN, this method of stacking modules help in separately learning the Unlike sequential CNN, this method of stacking modules help in separately learning the spatial and spatial and channel-wise feature representations. The 1 × 1 convolution filters used in these modules channel-wise feature representations. The 1 × 1 convolution filters used in these modules factor out factor out the channel and spatial feature learning by computing features from the channels without the channel and spatial feature learning by computing features from the channels without mixing mixing spatial information by looking at one input tile at a given point in time. We construct a spatial information by looking at one input tile at a given point in time. We construct a customized customized Inception CNN by stacking six InceptionV3 modules [23], as shown in Figure 5. Inception CNN by stacking six InceptionV3 modules [23], as shown in Figure 5.
Appl. Sci. 2018, 8, 1715
5 of 17
Appl. Sci. 2018, 8, x FOR PEER REVIEW
5 of 18
Appl. Sci. 2018, 8, x FOR PEER REVIEW
5 of 18
Figure 3. 3.Flowchart describing theoptimization optimization procedure. Figure Flowchartdescribing describing the procedure. Figure 3. Flowchart the optimization procedure.
Figure 4. The architecture customizedresidual residual CNN: block; (b) (b) Customized residual Figure 4. The architecture of of customized CNN:(a) (a)Residual Residual block; Customized residual CNN stacked with six residual blocks. CNN stacked with six residual blocks.
Figure 4. The architecture of customized residual CNN: (a) Residual block; (b) Customized residual CNN stacked with six residual blocks.
Appl. Sci. 2018, 8, 1715
6 of 17
Appl. Sci. 2018, 8, x FOR PEER REVIEW
6 of 18
Figure 5. The architecture of customized InceptionV3 CNN: (a) InceptionV3 module; (b) Customized Inception CNN stacked with six InceptionV3 modules.
3.2.4. Customized Customized VGG16 VGG16 3.2.4. VGG16 is is proposed proposed and and trained trained by by the the Oxford’s Oxford’s Visual Visual Geometry Group (VGG) (VGG) [25] [25] for VGG16 Geometry Group for object object recognition. The model scored first in ILSVRC image localization and second in image classification recognition. The model scored first in ILSVRC image localization and second in image classification tasks. We We customized customized the the architecture architecture of of VGG16 tasks. VGG16 model model and and evaluated evaluated its its performance performance toward toward the the tasks of of interest. The model model is is truncated truncated at at the the deepest layer and and added added with with aa global tasks interest. The deepest convolutional convolutional layer global average pooling pooling (GAP) (GAP) and and dense dense layer layer as as shown shown in in Figure Figure 6. 6. We We refer average refer to to this this model model as as customized customized VGG16 in this study. VGG16 in this study. The hyperparameters hyperparameters of of the the customized residual, Inception Inception and and VGG16 The customized residual, VGG16 models models are are optimized optimized through a randomized grid search [26] that searches and optimizes the value of hyperparameters through a randomized grid search [26] that searches and optimizes the value of hyperparameters including learning learning rate, rate,momentum, momentum,and andL2-regularization. L2-regularization. search ranges initialized including TheThe search ranges are are initialized to [1to × − 6 − 1 − 10 − 1 [1 × 10 1 × 10 ], [0.7 0.99], and [1 × 10 1 × 10 ] for the learning rate, momentum, and −6 −1 −10 −1 10 1 × 10 ], [0.7 0.99], and [1 × 10 1 × 10 ] for the learning rate, momentum, and L2-regularization L2-regularization respectively. Callbacks to view theduring internal states during training and respectively. Callbacks are used to view are theused internal states training and retain the best retain the best performing modelWe for performed analysis. We performed hold-out testing testevery data step. after performing model for analysis. hold-out testing with the testwith datathe after every step. The performance of customized CNNs are evaluated in terms of the following performance The performance of customized CNNs are evaluated in terms of the following performance metrics: metrics: (i) accuracy; (ii) (iii) AUC; (iii) precision; (iv) recall; (v) specificity; F-Score;and, and,(vii) (vii) Matthews Matthews (i) accuracy; (ii) AUC; precision; (iv) recall; (v) specificity; (vi)(vi) F-Score; Correlation Coefficient Coefficient (MCC). (MCC).We We used usedthe theNIH NIHBiowulf BiowulfLinux Linuxcluster cluster(https://hpc.nih.gov/) (https://hpc.nih.gov/) Correlation andand the the high performance computing facility at the National Library of Medicine (NLM) for computational high performance computing facility at the National Library of Medicine (NLM) for computational analyses. Software Softwareframeworks frameworksincluded includedwith with Matlab R2017b used to configure evaluate analyses. Matlab R2017b areare used to configure andand evaluate the the sequential CNN along with Keras and Tensorflow backend for other customized models used in sequential CNN along with Keras and Tensorflow backend for other customized models used in this this study. study.
Appl. Sci. 2018, 8, 1715 Appl. Sci. 2018, 8, x FOR PEER REVIEW
7 of 17 7 of 18
Figure 6. VGG16 model truncatedatatthe the deepest deepest convolutional layer andand added withwith a GAP and and Figure 6. VGG16 model truncated convolutional layer added a GAP dense layer. dense layer.
3.3. Visualization Studies 3.3. Visualization Studies interpretation and understanding of of CNNs CNNs isisaahotly topic in ML, particularly in The The interpretation and understanding hotlydebated debated topic in ML, particularly in the context of clinical decision-making [4]. CNNs are perceived as black boxes and it is imperative to the context of clinical decision-making [4]. CNNs are perceived as black boxes and it is imperative to explain their working to build trust in their predictions [9]. This helps to understand their working explain their working to build trust in their predictions [9]. This helps to understand their working principles, assist in hyperparameter tuning and optimization, identify and get an intuition of the principles, assist in hyperparameter tuning and optimization, identify and get an intuition of the reason reason behind the model failures, and explain the predictions to the end-user prior to possible behind the modelThe failures, and explain theCNNs predictions to the end-userinto prior to possiblemethods deployment. deployment. methods of visualizing are broadly categorized (i) preliminary The methods arestructure broadlyofcategorized methods thatthat help to that help of to visualizing visualize theCNNs overall the model; into and, (i) (ii)preliminary gradient-based methods visualize the overall structure of the model; and, (ii) gradient-based methods that manipulate manipulate the gradients from the forward and backward pass during training [27]. We the demonstrated overall structure of the CNNs, as showntraining in Figures 4–6.We demonstrated the overall gradients from thetheforward and backward pass during [27]. structure of the CNNs, as shown in Figures 4–6. 3.3.1. Visual Explanation through Discriminative Localization
3.3.1. Visual through Localization TheExplanation trained model focussesDiscriminative on discriminative parts of the image to arrive at the predictions. Classtrained Activation Mapsfocusses (CAM) help visualizing andparts debugging particularly in The model on in discriminative of themodel imagepredictions, to arrive at the predictions. case of a prediction error when the model predicts based on the surrounding context [27]. The output Class Activation Maps (CAM) help in visualizing and debugging model predictions, particularly in of the GAP layer is fed to the dense layer to identify the discriminative ROI localized to classify the case of a prediction error when the model predicts based on the surrounding context [27]. The output inputs to their respective classes. Let Gm denote the GAP that spatially averages the m-th feature map of thefrom GAP layer is fed to the dense layer ROIthe localized to classify the deepest convolutional layer, andto𝑤identify denotethe thediscriminative weights connecting m-th feature map the m denote the GAP that spatially averages the m-th feature map inputs to their respective classes. Let G to the output neuron corresponding to the expected class p. A prediction score Sp at the output neuron p fromisthe deepestasconvolutional layer, andaswshown the weights expressed a weighted sum of GAP in Equation (1). connecting the m-th feature map to m denote the output neuron corresponding to the expected class p. A prediction score Sp at the output neuron is 𝑆 = ∑ 𝑤 ∑ , 𝑔 (𝑥, 𝑦) = ∑ , ∑ 𝑤 𝑔 (𝑥, 𝑦) (1) expressed as a weighted sum of GAP as shown in Equation (1). The value gm (x, y) denotes the m-th feature map activation in the spatial location (x, y). The CAM p p for the class p denoted from all the (1) S p by = CAM wm of gmthe y) ) =weighted ( x, activations ∑m wp ism expressed ∑x,y gm (x,asythe ∑x,y ∑msum feature maps with respect to the expected class p at the spatial location (x, y) as shown in Equation The (2). value gm (x, y) denotes the m-th feature map activation in the spatial location (x, y). The CAM
for the class p denoted by CAMp is expressed as the weighted sum of the activations from all the feature 𝐶𝐴𝑀 (𝑥, 𝑦) = ∑ 𝑤 𝑔 (𝑥, 𝑦) (2) maps with respect to the expected class p at the spatial location (x, y) as shown in Equation (2). CAM gives information pertaining to the importance of the activations at each spatial grid (x, y) p CAM p (class x, y)p.=It∑ wm gm (to x,the y) size of the input image to locate (2) to classify an input image to its expected is m rescaled the discriminative ROI used to classify the image to its expected class. This helps to answer queries CAM gives information pertaining to the importance of the spatialWe grid (x, pertaining to the ability of the model in predicting and localizing theactivations ROI specificat to each its category. y) to propose classify aan input image to its expected class p. It is rescaled to the size of the input image to novel visualization method called average-CAM to represent the class-level ROI that is locatemost the commonly discriminative ROI used to classify the image to itsacross expected considered significant for correct prediction all theclass. inputsThis that helps belongtotoanswer a given class. Thetoaverage-CAM classin p is computedand by averaging outputs as queries pertaining the ability of for thethe model predicting localizingthe theCAM ROI specific to shown its category. in Equation (3). visualization method called average-CAM to represent the class-level ROI that is most We propose a novel
commonly considered significant𝑎𝑣𝑒𝑟𝑎𝑔𝑒 for correct prediction all the inputs that belong to a given (𝑥, 𝑦) =across ∑ 𝐶𝐴𝑀 − 𝐶𝐴𝑀 (𝑥, 𝑦) (3) class. The average-CAM for the class p is computed by averaging the CAM outputs as shown in Equation (3). 𝐶𝐴𝑀 (x, y) denotes the CAM for the a-th image in the expected class p. This helps to identify the ROI specific to the expectedaverage class, improve interpretability ofa the internal representations, and − CAMthe p ( x, y ) = ∑ a CAM p ( x, y ) explainability of the model predictions.
(3)
CAM ap (x, y) denotes the CAM for the a-th image in the expected class p. This helps to identify the ROI specific to the expected class, improve the interpretability of the internal representations, and explainability of the model predictions.
Appl. Sci. 2018, 8, 1715
8 of 17
CAM visualization only Appl. Sci. 2018, 8, x FOR PEER can REVIEW
be applied to networks with a GAP layer. Gradient-weighted 8CAM of 18 strict generalization of CAM that can be applied to all existing CNNs [28]. It uses 8 of 18 CAM visualization onlyexpected be applied to networks withinto a GAP Gradient-weighted CAM the gradient informationcan of the class, flowing back thelayer. deepest convolutional layer to (grad-CAM) is a strict generalization of CAM can be sum applied tothe all feature existing CNNs It uses CAM visualization can only be applied to networks with athat GAP layer. Gradient-weighted CAM generate explanations. Grad-CAM produces the weighted of all maps in [28]. the deepest the gradient information ofexpected the that expected flowing back into(4). the deepest convolutional to (grad-CAM) is convolutional a strict generalization can applied toinall existing CNNs [28]. nonlinearity It uses layer forof theCAM classbe pclass, as shown Equation A ReLU is layer applied generate explanations. Grad-CAM produces the weighted sum of all the feature maps in the deepest the gradient information of the expected class, flowing back into the deepest convolutional layer to avoid the negative weights from influencing the class p. This is based8 on ci. 2018, 8, x FOR PEERto REVIEW of the 18 consideration that the convolutional layer produces for the expected class p sum as shown in (4). Ain ReLU nonlinearity is applied generate explanations. Grad-CAM thelikely weighted of theEquation feature maps the deepest pixels with negative weights are to belong toall other classes. to avoid the negative weights influencing the(4). class p. Thisnonlinearity is basedCAM on consideration that the convolutional layer for the class p asfrom shown in Equation AReLU is applied CAM visualization can only beexpected applied to networks with a GAP layer. Gradient-weighted the p grad − CAM x, y = ReLU β g x, y (4) ( ) ( ) p m pixels weights with negative weights arethe likely top.belong otheron to avoid negative from influencing class This based themconsideration -CAM) is a the strict generalization of CAM that can be applied to allisto existing CNNs [28]. It uses that the ∑classes. m pixels with negative weights are likely belongback to other adient information of the expected class, to flowing into classes. the deepest convolutional layer to 𝑔𝑟𝑎𝑑 − 𝐶𝐴𝑀 (𝑥, 𝑦) = 𝑅𝑒𝐿𝑈(∑ 𝛽 𝑔 (𝑥, 𝑦)) (4) p The value β m is obtained by computing of the in prediction score Sp with respect to the ate explanations. Grad-CAM produces the weighted sum of allthe thegradient feature maps the deepest 𝑔𝑟𝑎𝑑 − 𝐶𝐴𝑀 (𝑥, 𝑦) = 𝑅𝑒𝐿𝑈(∑ 𝛽 𝑔 (𝑥, 𝑦)) (4) The value 𝛽 as obtained by computing gradient of the prediction m-thexpected feature map shown in in Equation (5). lutional layer for the class p is as shown Equation (4). Athe ReLU nonlinearity is appliedscore Sp with respect to the feature map as the shown inp.Equation (5). The value 𝛽 m-th is from obtained by computing the gradient of the score Spthat withthe respect to oid the negative weights influencing class This is based onprediction the∂S consideration p p the negative m-th feature map are as shown inbelong Equation (5). classes. β m = ∑ x,y (5) with weights likely to to other 𝛽 = ∑ , ∂gm ( x, y) (5) ( , ) 𝑔𝑟𝑎𝑑 − 𝐶𝐴𝑀 (𝑥, 𝑦) = 𝛽𝑅𝑒𝐿𝑈(∑ (4) p (5) = ∑ , 𝛽 𝑔 (𝑥, 𝑦)) ( , ) p According to toEquations Equations(1)(1) β mprecisely is precisely the as same with the same 𝑤 as forwnetworks with a CAMandand (4), (4), 𝛽 is m for networks The value 𝛽 is obtained by computing the𝛽 gradient of difference thethe prediction score Sthe p with respect to acompatible The the ReLU to exclude architecture. The lies in lies applying ReLU non-linearity the isdifference precisely same asin 𝑤applying for networks withnon-linearity a CAM- to exclude According toCAM-compatible Equations (1) andarchitecture. (4), -thcompatible feature maparchitecture. asthe shown in Equation (5). influence ofdifference negative weights that likely to belong to other classes. The average-grad-CAM influence of negative weights are are likely to belong to other classes. average-grad-CAM for The lies that in applying the ReLU non-linearity to The exclude the for class isthat computed by averaging the grad-CAM outputs as shown in Equation (6). value the the class p isp computed by averaging the grad-CAM outputs as shown in Equation (6). The influence of negative weights are likely to belong to other classes. The average-grad-CAM for 𝛽 = ∑ , (5) ( , ) grad-CAM (x, y)denotes denotes the grad-CAM forthe thea-th a-th imagein inEquation theexpected expected classvalue p. grad-𝐶𝐴𝑀byap(x, y) the grad-CAM for image in the p. the class p is computed averaging the grad-CAM outputs as shown (6). class The grad-𝐶𝐴𝑀 (x, y) denotes the (4), grad-CAM for the a-th in 𝑤 the expected classwith p. a CAMthe image same as for networks According to Equations (1) and 𝛽 is precisely ∑ grad =∑ 𝑔𝑟𝑎𝑑−−CAM 𝐶𝐴𝑀ap ((𝑥, 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑔𝑟𝑎𝑑−−CAM 𝐶𝐴𝑀 ((𝑥, (6) average −−grad x, y𝑦) x, y𝑦) (6) )= ) atible architecture. The difference lies in applying the ReLUp non-linearity to exclude the a 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 − 𝑔𝑟𝑎𝑑 − 𝐶𝐴𝑀 (𝑥, 𝑦) = ∑ 𝑔𝑟𝑎𝑑 − 𝐶𝐴𝑀 (𝑥, 𝑦) (6) nce of negative weights that are likely to belong to other classes. The average-grad-CAM for 3.3.2. Model-Agnostic Visual Explanations Explanations Model-Agnostic Visual ass p is computed3.3.2. by averaging the grad-CAM outputs as shown in Equation (6). The value 3.3.2. Visual Explanations 𝐶𝐴𝑀 (x,Model-Agnostic y) denotes theLocal grad-CAM for the a-th image in theexplanations expected class p. is a visualization tool proposed by [29]. interpretable model-agnostic model-agnostic Local interpretable explanations (LIME) (LIME) is a visualization tool proposed by [29]. It helps helps to to provide qualitative interpretation ofathe the relationship between perturbed input instances instances Local interpretable model-agnostic explanations (LIME) isof visualization tool proposed by [29].input It provide aa qualitative relationship between perturbed (𝑥, ∑ 𝑔𝑟𝑎𝑑 𝑦) = interpretation − 𝐶𝐴𝑀 (𝑥, 𝑦) 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 − 𝑔𝑟𝑎𝑑 − 𝐶𝐴𝑀 (6) andathe model The image into contiguous superpixels a dataset of It helps to provide qualitative interpretation ofinput the relationship between perturbed input instancesand and the model predictions. predictions. The input imageisisdivided divided into contiguous superpixels and a dataset perturbed input instances is constructed by turning on/off these interpretable components. The and the model predictions. The input image is divided into contiguous superpixels and a dataset of of perturbed input instances is constructed by turning on/off these interpretable components. Model-Agnostic Visual Explanations perturbed instances are are weighted by by their similarity totothe instance. algorithm perturbed input instances is constructed by turning on/off these interpretable components. The The The perturbed instances weighted their similarity theexplained explained instance. The algorithm approximates the CNN by a sparse, linear model that is weighted only in the neighborhood of the the perturbed instances are weighted by their similarity to the explained instance. The algorithm Local interpretable model-agnostic explanations (LIME) is a visualization tool proposed by [29]. approximates the CNN by a sparse, linear model that is weighted only in the neighborhood of explained predictions. An explanation is generated in the form of superpixels with the highest approximates the CNN by a sparse, linear model that is weighted only in the neighborhood of the ps to provide a qualitative interpretation of the relationship between perturbed input instances explained predictions. An explanation is generated in the form of superpixels with the highest positive positive weights that demonstrate thecontiguous discriminative ROI localized the model to classify image predictions. is the generated in the form of superpixels with the he explained model predictions. The An input image is divided into superpixels and a by dataset oftohighest weights thatexplanation demonstrate discriminative ROI localized by the model classify the the image to d be the d, the d 0 d to its expected class. Let k ∈ ℝ explained instance, and k’ {0, 1} binary vector that positive that demonstrate the discriminative ROI localized by the model to classify the image ∈ rbed input weights instances is constructed by turning on/off these interpretable components. The its expected class. Let k ∈ R be the explained instance, and k ∈ {0, 1} , the binary vector that d be the denotes the presence/absence of superpixel. Let G{0, denote thebinary explanation where G is is aa class class of of to its expected class. Letthe k presence/absence ∈ explained instance, and 1}d,The the vectorwhere that G rbed instances are weighted byℝtheir similarity the explained instance. algorithm denotes ofaato superpixel. Letgk’g ∈ ∈∈G denote the explanation interpretable linear models. Let ℽ(g) denote the complexity measure associated with the explanation explanation denotes presence/absence of a superpixel. Let g ∈ G denote the explanation where G the is a class ofthe ximates thethe CNN by a sparse, linear model that is weighted onlycomplexity in the neighborhood of interpretable models. Let (g) denote the measure associated with d → G.The Thevalue value ℽ(g) denotes the number non-zero coefficients the linear model. interpretable models. Let ℽ(g) denote the complexity measure associated the explanation ned predictions.linear explanation is(g) generated in the form of superpixels withwith thefor highest ggAn ∈∈G. denotes the number ofof non-zero coefficients for the linear model. LetLet m: m: Rd ℝ→ R d ℝ denote the explained model and m(k),the the probability that belongs agiven given class. LetΠ Πkk(x) (x) denote ℽ(g) the denotes the number of non-zero coefficients for the linear model. Let m: ℝ → ve gweights thatvalue demonstrate the discriminative ROI localized by the model tok kclassify the ∈ G. The denote explained model and m(k), probability that belongs totoaimage class. Let denote d be the the measure ofand proximity between the instance todto P(m, g, ΠLet k) denote the loss of g toward ℝ denote the explained m(k), thebetween probability that belongs aand given class. Π k(x) denote expected class. Let k measure ∈ ℝmodel explained instance, and k’ k∈ {0,xx 1} ,kkthe binary vector that the of proximity the instance to and P(m, g, Π k ) denote the loss of g toward approximating m in the neighborhood defined by Π k . The value P(m, g, Π k ) is minimized and and the the measure ofapproximating proximity between the instance x to k and P(m, g, Π k ) denote the loss of g toward es the presence/absence of a superpixel. Let g G denote the explanation where G is a class of ∈ m in the neighborhood defined by Πk . The value P(m, g, Πk ) is minimized value of ℽ(g) remainsthe low enoughby for interpretability. Equation (7) gives the the explanations explanations produced approximating m inLet the neighborhood defined Πinterpretability. k. The value P(m, g, Πkthe ) is(7) minimized and the retable linear models. denote complexity measure associated with explanation value of (g) remains low enough for Equation gives produced d → by LIME. value of ℽ(g) low enough for interpretability. Equation (7) gives the explanations produced G. The value ℽ(g)remains denotes the number of non-zero coefficients for the linear model. Let m: ℝ by LIME. oteby theLIME. explained model and m(k), the probability that to aP𝑃(𝑚, given Let β𝛽(𝑘) ==argmin Π + (7) (kk)belongs (m, g,𝑔,class. (Πgk)(x) denote argmin ℽ(𝑔) 𝛱 k)) + (7) g∈ G ∈ easure of proximity between the instance x to k and P(m, g, Π k) denote the loss of g toward 𝛽(𝑘) = argmin 𝑃(𝑚, 𝑔, 𝛱 ) + ℽ(𝑔) (7) ∈ k. The value P(m, ximating m in the neighborhood g, Πk) issamples minimized and the The value P defined (m, g, Π by drawing weighted by Πkkby ) isΠ approximated samples weighted by Π Πkk. Equation (8) shows of ℽ(g) The remains enough interpretability. (7) gives the function explanations produced an(m, exponential kernel defined onEquation the function (J) with e.width For given perturbed exponential kernel defined on L2-distance the samples L2-distance (J)kwidth with €. Forinput a given input valuelow P g, Πk)for is approximated by drawing weighted by Π . Equation (8)a shows 0 0 d ME.an exponentialsample d’ ∈sample {0, 1} on fraction ofanon-zero elements, thelabel explanation model m(b) perturbed b’containing ∈ 1} acontaining fraction of non-zero elements, for the explanation kernel bdefined the{0, L2-distance function (J) with widththe €.label For for a the given input is a Appl. Sci. 2018, 8, x (grad-CAM) FOR PEER REVIEW
d as shown is obtained by recovering the sample thesample original b ∈the Rdexplanation as shown (9).in model isd’obtained the inrepresentation the original b ∈inℝEquation perturbed sample b’ ∈m(b) {0,𝛽(𝑘) 1} containing arecovering fraction elements, the labelrepresentation for = argminby ℽ(𝑔) 𝑃(𝑚, 𝑔, 𝛱 )of+innon-zero (7) ∈ d Equationby (9). model m(b) is obtained recovering the sample in the original representation ! b ∈ ℝ as shown in y,k.bEquation )2 (9).g, Πk) is approximated by drawing samples TheEquation value P (m, (8) shows bweighted by (Π ΠkΠ= = exp − J ( , 2) ) (8) (8) exp(−𝐽 e ponential kernel defined on the L2-distance function (J) € For a given input ( , )with width €. (8) Π of=non-zero exp(−𝐽 ) rbed sample b’ ∈ {0, 1}d’ containing a fraction the label for the explanation €) = ∑ 𝑃(𝑚, 𝑔, Πelements, (9) , ∈ Π (𝑚(𝑏) − 𝑔(𝑏 )) l m(b) is obtained by recovering the sample representation b ∈ ℝd as shown in ∑ , original )) 𝑃(𝑚, 𝑔, Π )in = the Π (𝑚(𝑏) − 𝑔(𝑏 (9) LIME provides explanations ∈that help to make an informed decision about the trustworthiness ion (9). of theexplanations predictions and insights into thedecision model behavior. LIME provides thatgain helpcrucial to make an informed about the trustworthiness ( , ) (8) Π insights = exp(−𝐽 of the predictions and gain crucial into the)model behavior.
€
Appl. Sci. 2018, 8, 1715
9 of 17
P(m, g, Πk ) =
∑b,b∈B Πbk (m(b) − g
2 b0 )
(9)
LIME provides explanations that help to make an informed decision about the trustworthiness of the predictions and gain crucial insights into the model behavior. 2018, 8, x FOR PEER REVIEW 4. Results Appl. andSci.Discussion
9 of 18
4. Results and Discussion 4.1. Performance Evaluation of Customized CNNs Performance of Customized CNNs Figure4.1. 7 shows the Evaluation optimized architecture and parameters of the sequential CNN, obtained through Figure 7We shows the optimized and parameters of the sequential obtainedthe model Bayesian learning. performed 100 architecture objective function evaluations toward CNN, optimizing − 3 − 6 through Bayesian learning. We performed 100 objective function evaluations toward optimizing the parameters. The optimized values are found to be 6, 1 × 10 , 0.9, and 1 × 10 for the network depth, model parameters. The optimized values are found to be 6, 1 × 10−3, 0.9, and 1 × 10−6 for the network learning rate, momentum, and L2-regularization parameters respectively. The number of convolutional depth, learning rate, momentum, and L2-regularization parameters respectively. The number of layer filtersconvolutional is increasedlayer by afilters factor of 2 each a max-pooling is used, in order is increased bytime a factor of 2 each time alayer max-pooling layer is used,to in ensure order roughly to ensureofroughly the same in number of computations in the network layers. the same number computations the network layers. Rectified Linear UnitRectified (ReLU)Linear layersUnit are added to (ReLU) layers and are added to vanishing introduce non-linearity and prevent vanishing gradients during introduce non-linearity prevent gradients during backpropagation [7]. backpropagation [7].
The optimized architecture of customized sequential CNN. Figure Figure 7. The7.optimized architecture of customized sequential CNN.
Our analysis shows an increase in the performance of the residual and inception CNNs when
Our analysis shows aninincrease in the performance of the residual and inception CNNs the number of filters the convolutional layers of the succeeding blocks are increased by a factor of when the number of2.filters in the layers values of thefor succeeding areand increased by a factor of 2. We found the convolutional optimal hyperparameter the residual,blocks inception, VGG16 models through a randomized grid search. The values tabulated in Table 2. We found the optimal hyperparameter values forare the residual, inception, and VGG16 models through a randomized grid search. The values are tabulated in Table 2. The customized CNNs are evaluated with the baseline and cropped ROI data. The results are tabulated in Table 3. We observed that the performance of the models with the cropped ROI is relatively promising in comparison to the baseline in classifying normal and pneumonia infected CXRs. This is obvious because the models trained with the cropped ROI learn relevant feature representations toward classifying the task of interest.
Appl. Sci. 2018, 8, 1715
10 of 17
Table 2. Optimal values for the hyperparameters of the customized residual and inception CNNs obtained through a randomized grid search. Model
Learning Rate
Residual CNN Inception CNN Customized VGG16
10−3
1× 1 × 10−2 1 × 10−4
Momentum
L2 Regularization
0.9 0.95 0.99
1 × 10−6 1 × 10−4 1 × 10−6
Table 3. Performance of customized CNNs with baseline and cropped ROI data. Task
Data
Models
Baseline
Customized VGG16 Sequential Residual Inception
Cropped ROI
Customized VGG16 Sequential Residual Inception
Baseline
Customized VGG16 Sequential Residual Inception
Cropped ROI
Customized VGG16 Sequential Residual Inception
Baseline
Customized VGG16 Sequential Residual Inception
Cropped ROI
Customized VGG16 Sequential Residual Inception
Normal vs. Pneumonia
Bacterial vs. Viral Pneumonia
Normal vs. Bacterial vs. Viral Pneumonia
Accuracy
AUC
0.957
0.990
0.943 0.910 0.886
0.983 0.967 0.922
0.962
0.993
0.941 0.917 0.897
0.984 0.971 0.932
0.936
0.962
0.928 0.897 0.854
0.954 0.921 0.901
0.936
0.962
0.928 0.908 0.872
0.956 0.933 0.919
0.917
0.938
0.896 0.861 0.809
0.922 0.887 0.846
0.918
0.939
0.897 0.879 0.821
0.923 0.909 0.865
Precision
Recall
Specificity
F-Score
MCC
0.951
0.983
0.915
0.967
0.908
0.920 0.908 0.887
0.980 0.954 0.939
0.855 0.838 0.800
0.957 0.931 0.913
0.878 0.806 0.755
0.977
0.962
0.962
0.970
0.918
0.930 0.913 0.896
0.995 0.959 0.947
0.877 0.847 0.817
0.955 0.936 0.921
0.873 0.820 0.778
0.920
0.984
0.860
0.951
0.862
0.909 0.880 0.841
0.984 0.967 0.934
0.838 0.784 0.714
0.946 0.922 0.886
0.848 0.780 0.675
0.920
0.984
0.860
0.951
0.862
0.909 0.888 0.853
0.984 0.976 0.959
0.838 0.798 0.730
0.946 0.930 0.903
0.848 0.802 0.725
0.917
0.905
0.958
0.911
0.873
0.888 0.868 0.753
0.885 0.882 0.848
0.948 0.933 0.861
0.887 0.875 0.798
0.841 0.809 0.688
0.920
0.900
0.960
0.910
0.876
0.898 0.883 0.778
0.898 0.890 0.855
0.949 0.941 0.878
0.898 0.887 0.815
0.844 0.825 0.714
* Bold numbers indicate superior performance.
The customized VGG16 model demonstrates promising performance than the other CNNs under study. The model learned generic image features from ImageNet that served as a good initialization compared to random weights and trained end-to-end on the current tasks to learn task-specific features. This results in faster convergence with reduced bias, overfitting, and improved generalization. In classifying bacterial and viral pneumonia, no significant difference in performance is observed for the customized VGG16 model with the baseline and cropped ROI. In the multi-class classification task, the cropped ROI gave better results than the baseline data. However, we observed that the differences in performance are not significant. This may be due to the reason that the dataset under study already appeared as cropped, and the boundary detection algorithm resulted in a few under-segmented regions near the costophrenic angle. The customized sequential, residual, and inception CNNs with random weight initializations didn’t have the opportunity to learn discriminative features, owing to the sparse availability and imbalanced distribution of training data across the expected classes. We observed that the sequential CNN outperformed the residual and inception counterparts across the classification tasks. The usage of residual connections is beneficial in resolving the issue of representational bottlenecks and vanishing gradients in deep models. The CNNs used in this study have a shallow architecture. The residual connections did not introduce significant gains into the performance for the tasks of interest. Unlike ImageNet, the variability in the pediatric CXR data is several orders of magnitude smaller. The architecture of residual and inception CNNs are progressively more complex and did not seem to be a fitting tool to use for the tasks of interest. The confusion matrices
discriminative discriminativefeatures, features,owing owingto tothe thesparse sparseavailability availabilityand andimbalanced imbalanceddistribution distributionof oftraining trainingdata data across across the the expected expected classes. classes. We We observed observed that that the the sequential sequential CNN CNN outperformed outperformed the the residual residual and and inception inceptioncounterparts counterpartsacross acrossthe theclassification classificationtasks. tasks.The Theusage usageof ofresidual residualconnections connectionsisisbeneficial beneficial in inresolving resolvingthe theissue issueof of representational representationalbottlenecks bottlenecksand andvanishing vanishinggradients gradientsin indeep deepmodels. models.The The CNNs CNNs used used in in this this study study have have aa shallow shallow architecture. architecture. The The residual residual connections connections did did not not introduce introduce Appl. Sci. 2018, 8, 1715 11 of 17 significant significantgains gainsinto intothe theperformance performancefor forthe thetasks tasksof ofinterest. interest.Unlike UnlikeImageNet, ImageNet,the thevariability variabilityin inthe the pediatric pediatricCXR CXRdata dataisisseveral severalorders ordersof ofmagnitude magnitudesmaller. smaller.The Thearchitecture architectureof ofresidual residualand andinception inception CNNs are progressively more complex and did not seem to be a fitting tool to use for the CNNs are progressively more complex and did not seem to be a fitting tool to use for thetasks tasks of of and AUC achieved with the customized VGG16 model are shown in Figures 8–10. We observed that interest. The confusion matrices and AUC achieved with the customized VGG16 model are shown interest. The confusion matrices and AUC achieved with the customized VGG16 model are shownin in the training metrics are poor compared to test accuracy. This is due to the fact that noisy images are Figures Figures8–10. 8–10.We Weobserved observedthat thatthe thetraining trainingmetrics metricsare arepoor poorcompared comparedto totest testaccuracy. accuracy.This Thisisisdue due included infact thethat training data toare reduce bias,in and improve model generalization. to noisy included the and tothe the fact that noisyimages images are included inoverfitting, thetraining trainingdata datato toreduce reducebias, bias,overfitting, overfitting, andimprove improve We compared the performance of the customized VGG16 model trained with the cropped ROI, model generalization. model generalization. We the of cropped ROI, to the state-of-the-art. results are tabulated in TableVGG16 4. We model observed thatwith ourthe model outperforms Wecompared comparedThe theperformance performance ofthe thecustomized customized VGG16 modeltrained trained with the cropped ROI, to the state-of-the-art. The results are tabulated in Table 4. We observed that our model outperforms to the state-of-the-art. Theperformance results are tabulated in Table We observed that our modelThe outperforms the current literature in all metrics across4. the classification tasks. customized the in metrics across the The customized the current current literature in all all performance performance metrics across the classification tasks. The customized sequential CNNliterature demonstrates higher values for recall in: (i)classification classifying tasks. normal and pneumonia; sequential CNN demonstrates higher values for recall in: (i) classifying normal and pneumonia; and, sequential CNN demonstrates higher values for recall in: (i) classifying normal and pneumonia; and,viral and, (ii) identical recall measures to the customized VGG16 model in classifying bacterial and (ii) identical recall measures to the customized VGG16 model in classifying bacterial and viral (ii) identical recall measures to the customized VGG16 model in classifying bacterial and viral pneumonia. However, considering the balance between precision and recall as demonstrated by the pneumonia. pneumonia.However, However,considering consideringthe thebalance balancebetween betweenprecision precisionand andrecall recallas asdemonstrated demonstratedby bythe the F-Score and MCC, the customized VGG16 model outperforms the other CNNs andthe thestate-of-thestate-of-the-art F-Score and MCC, the customized VGG16 model outperforms the other CNNs and F-Score and MCC, the customized VGG16 model outperforms the other CNNs and the state-of-theacrossart theacross classification tasks. tasks. the artacross theclassification classificationtasks.
Figure 8.8. Confusion matrices the performance VGG16 model: (a) v.v. v. Figure Confusion matrices for the performance of of the the customized customized VGG16 model: (a) Normal Normal Figure 8. Confusion matrices forfor the performance of the customized VGG16 model: (a) Normal Pneumonia; (b) Bacterial v. Viral Pneumonia. Pneumonia; (b) Bacterial v. Viral Pneumonia. Pneumonia; (b) Bacterial v. Viral Pneumonia.
Figure 9. ROC curves demonstrating the performance of the customized VGG16 model: (a) Normal
9. ROC curves demonstrating performanceofofthe thecustomized customized VGG16 VGG16 model: FigureFigure 9. ROC curves demonstrating thethe performance model:(a)(a)Normal Normal v. v.v.Pneumonia; Pneumonia;(b) (b)Bacterial Bacterialv.v.Viral ViralPneumonia. Pneumonia. Pneumonia; (b) Bacterial v. Viral Pneumonia.
Table 4. Comparing the performance of the customized VGG16 model with the state-of-the-art. Task
Model
Accuracy
Normal v. Pneumonia
Customized VGG16 Kermany et al.
AUC
Precision
Recall
Bacterial v. Viral Pneumonia
Customized VGG16 Kermany et al.
Normal v. Bacterial v. Viral Pneumonia
Customized VGG16 Kermany et al.
Specificity
F-Score
MCC
0.962
0.993
0.977
0.928
0.968
-
0.962
0.962
0.970
0.918
0.932
0.901
-
-
0.936
0.962
0.920
0.907
0.940
-
0.984
0.860
0.951
0.862
0.886
0.909
-
-
0.918
0.939
-
-
0.920
0.900
0.960
0.910
0.876
-
-
-
-
-
* Bold numbers indicate superior performance.
Appl. Sci. 2018, 8, 1715
12 of 17
Appl. Sci. 2018, 8, x FOR PEER REVIEW
12 of 18
Figure 10. Performance customizedVGG16 VGG16 model classification: (a) Confusion matrix; Figure 10. Performance of of customized modelininmulticlass multiclass classification: (a) Confusion matrix; (b) ROC curves. (b) ROC curves. Table 4. Comparing the performance of the customized VGG16 model with the state-of-the-art. 4.2. Visualization through Discriminative Localization F-
Task Model has a Accuracy AUC Precision Recall Specificity MCC The customized VGG16 model CAM-compatible architecture owing toScore the presence of the GAP layer. This helps inCustomized visualizing the model predictions using both CAM and grad-CAM 0.962 0.993 0.977 0.962 0.962 0.970 0.918 Normal v. Pneumonia visualization tools. Figures 11VGG16 and 12 demonstrate the results of applying these visualizations to Kermany et al. 0.928 0.968 0.932 0.901 localize the discriminative ROI in pneumonia-infected CXRs. Customized Appl. Bacterial Sci. 2018, v. 8, Viral x FOR PEER REVIEW VGG16 Pneumonia Kermany et al. Customized Normal v. Bacterial v. VGG16 Viral Pneumonia Kermany et al.
0.936
0.962
0.920
0.984
0.860
0.951
130.862 of 18
0.907
0.940
-
0.886
0.909
-
-
0.918
0.939
0.920
0.900
0.960
0.910
0.876
-
-
-
-
-
-
-
* Bold numbers indicate superior performance.
4.2. Visualization through Discriminative Localization The customized VGG16 model has a CAM-compatible architecture owing to the presence of the GAP layer. This helps in visualizing the model predictions using both CAM and grad-CAM visualization tools. Figures 11 and 12 demonstrate the results of applying these visualizations to localize the discriminative ROI in pneumonia-infected CXRs.
11. Visual explanationsthrough through gradient-based localization usingusing CAM:CAM: (a) Input FigureFigure 11. Visual explanations gradient-based localization (a)CXRs; Input(b)CXRs; Bounding boxeslocalizing localizing regions regions of showing heat maps superimposed on theon the (b) Bounding boxes of activations; activations;(c)(c)CAM CAM showing heat maps superimposed original CXRs; (d) Automatically segmented lung masks; (e) CAM showing heat maps superimposed original CXRs; (d) Automatically segmented lung masks; (e) CAM showing heat maps superimposed on the cropped lungs. on the cropped lungs.
Appl. Sci. 2018, 8, 1715 Appl. Sci. 2018, 8, x FOR PEER REVIEW
13 of 17 14 of 18
Figure 12. Visual explanations throughgradient-based gradient-based localization grad-CAM: (a) Input CXRs; Figure 12. Visual explanations through localizationusing using grad-CAM: (a) Input CXRs; (b) Bounding boxes localizing regions of activations; (c) Grad-CAM showing heat maps (b) Bounding boxes localizing regions of activations; (c) Grad-CAM showing heat maps superimposed on (d) the Automatically original CXRs; segmented (d) Automatically segmented lung masks;showing (e) Grad-CAM on thesuperimposed original CXRs; lung masks; (e) Grad-CAM heat maps showing heat maps superimposed on the cropped lungs. superimposed on the cropped lungs.
CXRs are fed to the trained model and the predictions are decoded. The heat maps are generated
CXRs are fed to the trained model and the predictions are decoded. The heat maps are generated as as a two-dimensional score grid, computed for each input pixel location. Pixels carrying high a two-dimensional score grid, computed for each input pixel location. Pixels carrying high importance importance with respect to the expected class appeared bright red with distinct color transitions for with respect to the The expected class appeared red with color transitions varying varying ranges. generated heat maps arebright superimposed ondistinct the original input to localizefor imageranges. The generated heat maps are superimposed on the original input to localize image-specific specific ROI. The lung masks that are generated with the boundary detection algorithm are applied ROI. The lung masks that are generated with boundary detection algorithm areand applied to extract to extract the localized ROI relevant to thethe lung regions. We observed that CAM grad-CAM visualizations generatedtoheat the pneumonia class to highlight the visual differences in the the localized ROI relevant the maps lung for regions. We observed that CAM and grad-CAM visualizations “pneumonia-like” regions of the image. generated heat maps for the pneumonia class to highlight the visual differences in the “pneumonia-like” We applied our novel method of average-CAM and average-grad-CAM to visualize the classregions of the image. specific ROI, as shown in Figures 13 and 14. Lung masks are applied to the generated heat maps to We applied our novel method of average-CAM and average-grad-CAM to visualize the localize only the ROI specific to the lung regions. We observed that the class-specific ROI localized class-specific ROI, as shown in Figures 13 and 14. Lung masks are applied to the generated heat by the average-CAM and average-grad-CAM for the viral pneumonia class follows a diffuse pattern. mapsThis to localize only the ROI specific to the lung regions. We observed that the class-specific ROI is obvious for the reason that viral pneumonia manifests with diffuse interstitial patterns in both localized by the average-CAM and average-grad-CAM for that the the viral pneumonia class follows a diffuse lungs [30]. For the bacterial pneumonia class, we observed model layers are activated on both pattern. This is obvious for the reasonon that pneumonia diffuse patterns sides of the lungs, predominantly theviral upper and middlemanifests right lungwith lobes. This isinterstitial for the reason in both lungs [30].pneumonia For the bacterial pneumonia class, we observed the model layers arestudy activated that bacterial manifests as lobar considerations [30]. Thethat pneumonia dataset under hassides more of pediatric patients with right lobar on both the lungs, predominantly onconsolidations. the upper and middle right lung lobes. This is for the reason that bacterial pneumonia manifests as lobar considerations [30]. The pneumonia dataset under study has more pediatric patients with right lobar consolidations.
Appl. Sci. 2018, 8, 1715 Appl. Sci. Sci. 2018, 2018, 8, 8, xx FOR FOR PEER PEER REVIEW REVIEW Appl.
14 of 17 15 of of 18 18 15
Figure13. 13. Visual Visual explanations through average-CAM: (a) Bacterial Bacterial and viral viral CXR (topCXR and bottom); bottom); Figure Visualexplanations explanations through average-CAM: (a) Bacterial and viral (top and Figure 13. through average-CAM: (a) and CXR (top and (b) Average-CAM localizing class-specific ROI with bounding boxes highlighting the regions of bottom); (b) Average-CAM localizing class-specific ROI bounding with bounding highlighting the regions (b) Average-CAM localizing class-specific ROI with boxesboxes highlighting the regions of maximum activation; (c) (c) Automatically segmented lung lung masks; (d) Average-CAM Average-CAM localizing classofmaximum maximum activation; Automatically segmented masks; (d) Average-CAM localizing activation; (c) Automatically segmented lung masks; (d) localizing classspecific ROI ROIROI withwith the extracted extracted lung lung regions. class-specific the extracted regions. specific with the lung regions.
Figure14. 14.Visual Visualexplanations explanations through through average-grad-CAM: average-grad-CAM: (a) (a) Bacterial and viral CXR (top and Figure (a) Bacterial Bacterialand andviral viralCXR CXR(top (topand and Figure 14. Visual explanations through average-grad-CAM: bottom);(b) (b)Average-grad-CAM Average-grad-CAMlocalizing localizing class-specific class-specific ROI ROI with with bounding boxes highlighting the bottom); with bounding boundingboxes boxeshighlighting highlightingthe the bottom); (b) Average-grad-CAM localizing class-specific regionsofof ofmaximum maximumactivation; activation; (c) (c)Automatically Automatically segmented segmented lung lung masks; (d) Average-grad-CAM regions lung masks; masks;(d) (d)Average-grad-CAM Average-grad-CAM regions maximum activation; (c) Automatically segmented localizingclass-specific class-specificROI ROIwith withthe theextracted extractedlung lung regions. regions. localizing localizing class-specific ROI with the extracted lung regions.
4.3. 4.3.Visual VisualExplanations Explanationswith withLIME LIME 4.3. Visual Explanations with LIME Figure LIME for sample sampleinstances instancesof ofpediatric pediatricchest chest Figure1515 15shows showsthe theexplanations explanations generated generated with LIME for for sample instances of pediatric chest Figure shows the explanations generated with LIME radiographs. Lung masks are applied to the explanations localize only the ROI specific to the lung radiographs. Lung masks are applied to the explanations to localize only the ROI specific to the lung radiographs. Lung masks are applied to the explanations to localize only the ROI specific to the lung regions. The explanations are shown as follows: (i) with the highest positive weights and regions. The explanations are shown as follows: Superpixels with the highest positive weights and regions. The explanations are shown as follows: (i) Superpixels with the highest positive weights and therest rest are greyed out; and, (ii) superpixels superpixels superimposed on the extracted extracted lung regions. regions. We the areare greyed out;out; and,and, (ii) superpixels superimposed on theon extracted lung regions. We observed the rest greyed (ii) superimposed the lung We observed that the explainer focused on the regions with high opacity. The model differentiates that the explainer focused on the regions with high opacity. The model differentiates bacterial and observed that the explainer focused on the regions with high opacity. The model differentiates bacterial and viral pneumonia by (i) showing superpixels with the highest positive activations in the viral pneumonia by pneumonia (i) showing superpixels with the highest positive activations in the regions of bacterial and viral by (i) showing superpixels with the highest positive activations inlobar the regions of of lobar lobar consolidations consolidations for for bacterial bacterial pneumonia; pneumonia; and, and, (ii) (ii) diffuse diffuse interstitial interstitial patterns patterns across across regions
Appl. Sci. 2018, 8, 1715
15 of 17
Appl. Sci. 2018, 8, x FOR PEER REVIEW
16 of 18
consolidations for bacterial pneumonia; and, (ii) diffuse interstitial patterns across the lungs for viral pneumonia. alsopneumonia. observed that of falsethat positive superpixels arepositive reported. The reasonare is the lungs forWe viral Wea number also observed a number of false superpixels that the current LIME implementation uses aLIME sparseimplementation linear model to approximate thelinear modelmodel behavior reported. The reason is that the current uses a sparse to in the neighborhood of the explained predictions. However, explanations resultHowever, from a random approximate the model behavior in the neighborhood of thethese explained predictions. these sampling process area not faithful if the underlying highly ifnon-linear in themodel locality explanations resultand from random sampling process andmodel are notisfaithful the underlying is of predictions. highly non-linear in the locality of predictions.
Figure 15. 15. Visual explanations through LIME: (a) Input Input CXRs; CXRs; (b) (b) Automatically Automatically segmented segmented lung lung Superpixels with the highest highest positive positive weights with the others others masks; (c) Copped lung regions; (d) Superpixels greyed out; out; (e) Superpixels Superpixels with the highest positive weights are superimposed superimposed on the cropped lungs. greyed
5. 5. Conclusions Conclusions We We proposed proposed aa CNN-based CNN-based decision decision support support system system to to detect detect pneumonia pneumonia in in pediatric pediatric CXRs CXRs to to expedite expedite accurate accurate diagnosis diagnosis of of the the pathology. pathology. We We applied applied novel novel and and state-of-the-art state-of-the-art visualization visualization strategies predictions thatthat is considered highly significant to clinical decision-making. strategies totoexplain explainmodel model predictions is considered highly significant to clinical decisionThe study presents a universal approach to apply to an extensive range of visual recognition tasks. making. The study presents a universal approach to apply to an extensive range of visual recognition Classifying pneumonia in chestinradiographs is a demanding task due to the of a high tasks. Classifying pneumonia chest radiographs is a demanding task duepresence to the presence ofdegree a high of variability in the input data. The promising performance of the customized VGG16 model trained degree of variability in the input data. The promising performance of the customized VGG16 model on the current tasks suggest that it effectively learns learns from afrom sparse collection of complex data with trained on the current tasks suggest that it effectively a sparse collection of complex data reduced bias and improved generalization. We hope that our results are useful for developing clinically with reduced bias and improved generalization. We hope that our results are useful for developing useful solutions detect and distinguish pneumonia types in chest radiographs. clinically useful to solutions to detect and distinguish pneumonia types in chest radiographs. Author Contributions: Contributions: Conceptualization, S.R. and S.A.; Methodology, S.R.; Software, S.R., S.C., and I.K.; Author Validation, Validation, S.R., S.R., S.C., S.C., and and I.K.; I.K.; Formal Formal Analysis, Analysis, S.R.; S.R.; Investigation, Investigation, S.R., S.R., and and S.A.; S.A.; Resources, Resources, S.R. S.R. and and S.C.; Data Curation, S.R.; Writing-Original Draft Preparation, S.R.; Writing-Review & Editing, S.A. and G.T.; Visualization, S.R.; Supervision, S.A. and G.T.; Project Administration, S.A. and G.T.; Funding Acquisition, S.A. and G.T.
Appl. Sci. 2018, 8, 1715
16 of 17
Curation, S.R.; Writing-Original Draft Preparation, S.R.; Writing-Review & Editing, S.A. and G.T.; Visualization, S.R.; Supervision, S.A. and G.T.; Project Administration, S.A. and G.T.; Funding Acquisition, S.A. and G.T. Funding: This work was supported by the Intramural Research Program of the Lister Hill National Center for Biomedical Communications (LHNCBC), the National Library of Medicine (NLM), and the U.S. National Institutes of Health (NIH). Conflicts of Interest: The authors declare no conflict of interest.
References 1.
2. 3.
4.
5.
6.
7.
8.
9.
10.
11.
12. 13. 14.
15.
Le Roux, D.M.; Myer, L.; Nicol, M.P. Incidence and severity of childhood pneumonia in the first year of life in a South African birth cohort: The Drakenstein Child Health Study. Lancet Glob. Health 2015, 3, e95–e103. [CrossRef] Mcluckie, A. Respiratory Disease and Its Management, 1st ed.; Springer: London, UK, 2009; pp. 51–59. ISBN 978-1-84882-094-4. Cherian, T.; Mulholland, E.K.; Carlin, J.B.; Ostensen, H.; Amin, R.; De Campo, M.; Greenberg, D.; Lagos, R.; Lucero, M.; Madhi, S.A.; et al. Standardized interpretation of paediatric chest radiographs for the diagnosis of pneumonia in epidemiological studies. Bull. World Health Organ. 2005, 83, 353–359. [PubMed] Wang, X.; Peng, Y.; Lu, L.; Lu, Z.; Bagheri, M.; Summers, R.M. ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3462–3471. Karargyris, A.; Siegelman, J.; Tzortzis, D.; Jaeger, S.; Candemir, S.; Xue, Z.; KC, S.; Vajda, S.; Antani, S.K.; Folio, L.; et al. Combination of texture and shape features to detect pulmonary abnormalities in digital chest X-rays. Int. J. Comput. Assist. Radiol. Surg. 2016, 11, 99–106. [CrossRef] [PubMed] Neuman, M.I.; Lee, E.Y.; Bixby, S.; Diperna, S.; Hellinger, J.; Markowitz, R.; Servaes, S.; Monuteaux, M.C.; Shah, S.S. Variability in the Interpretation of Chest Radiographs for the Diagnosis of Pneumonia in Children. J. Hosp. Med. 2012, 7, 294–298. [CrossRef] [PubMed] Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.F. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 248–255. Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 818–833. Bar, Y.; Diamant, I.; Wolf, L.; Lieberman, S.; Konen, E.; Greenspan, H. Chest Pathology Detection Using Deep Learning with Non-Medical Training. In Proceedings of the IEEE International Symposium on Biomedical Imaging (ISBI), Brooklyn, NY, USA, 16–19 April 2015; pp. 294–297. Oliveira, L.L.G.; Silva, S.A.E.; Ribeiro, L.H.V.; De Oliveira, R.M.; Coelho, C.J.; Ana Lúcia, A.L.S. Computer-Aided Diagnosis in Chest Radiography for Detection of Childhood Pneumonia. Int. J. Med. Inform. 2008, 77, 555–564. [CrossRef] [PubMed] Abe, H.; Macmahon, H.; Shiraishi, J.; Li, Q.; Engelmann, R.; Doi, K. Computer-aided diagnosis in chest radiology. Semin. Ultrasound CT MR 2004, 25, 432–437. [CrossRef] [PubMed] Giger, M.; MacMahon, H. Image processing and computer-aided diagnosis. Radiol. Clin. N. Am. 1996, 34, 565–596. [PubMed] Monnier-Cholley, L.; MacMahon, H.; Katsuragawa, S.; Morishita, J.; Ishida, T.; Doi, K. Computer-aided diagnosis for detection of interstitial opacities on chest radiographs. AJR Am. J. Roentgenol. 1998, 171, 1651–1656. [CrossRef] [PubMed] Kermany, D.S.; Goldbaum, M.; Cai, W.; Valentim, C.C.S.; Liang, H.; Baxter, S.L.; McKeown, A.; Yang, G.; Wu, X.; Yan, F.; et al. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell 2018, 172, 1122–1131. [CrossRef] [PubMed]
Appl. Sci. 2018, 8, 1715
16.
17.
18.
19.
20. 21. 22. 23.
24.
25.
26. 27.
28.
29.
30.
17 of 17
Rajpurkar, P.; Irvin, J.; Zhu, K.; Yang, B.; Mehta, H.; Duan, T.; Ding, D.; Bagul, A.; Langlotz, C.; Shpanskaya, K.; et al. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. arXiv. 2018. Available online: https://arxiv.org/abs/1711.05225 (accessed on 23 January 2018). Guan, Q.; Huang, Y.; Zhong, Z.; Zheng, Z.; Zheng, L.; Yang, Y. Diagnose like a Radiologist: Attention Guided Convolutional Neural Network for Thorax Disease Classification. arXiv. 2018. Available online: https://arxiv.org/abs/1801.09927v1 (accessed on 17 June 2018). Candemir, S.; Jaeger, S.; Palaniappan, K.; Musco, J.P.; Singh, R.K.; Xue, Z.; Karargyris, A.; Antani, S.; Thoma, G.; McDonald, C.J. Lung Segmentation in Chest Radiographs Using Anatomical Atlases with Nonrigid Registration. IEEE Trans. Med. Imaging 2014, 33, 577–590. [CrossRef] [PubMed] Candemir, S.; Antani, S.; Jaeger, S.; Browning, R.; Thoma, G. Lung Boundary Detection in Pediatric Chest X-Rays. In Proceedings of the SPIE Medical Imaging, Orlando, FL, USA, 21–26 February 2015; Volume 9418, p. 94180Q. Liu, C.; Yuen, J.; Torralba, A. SIFT Flow: Dense Correspondence across Scenes and Its Applications. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 978–994. [CrossRef] [PubMed] Snoek, J.; Rippel, O.; Adams, R.P. Scalable Bayesian Optimization Using Deep Neural Networks. In Proceedings of the International Conference on Machine Learning (ICML), Lille, France, 6–11 July 2015; pp. 2171–2180. Deep Learning Using Bayesian Optimization. Available online: https://www.mathworks.com/help/nnet/ examples/deep-learning-using-bayesian-optimization.html (accessed on 14 January 2018). He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015; pp. 1–32. Bergstra, J.; Bengio, Y. Random Search for Hyper-Parameter Optimization. JMLR 2012, 13, 281–305. Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning Deep Features for Discriminative Localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the IEEE International Conference of Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. Sharma, S.; Maycher, B.; Eschun, G. Radiological imaging in pneumonia: Recent innovations. Curr. Opin. Pulm. Med. 2007, 13, 159–169. [CrossRef] [PubMed] © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).