Automatic Segmentation of Colorectal Cancer in 3D

2018 IEEE 4th Middle East Conference on Biomedical Engineering (MECBME)

Automatic Segmentation of Colorectal Cancer in 3D MRI by Combining Deep Learning and 3D Level-Set Algorithm–A Preliminary Study Mumtaz Hussain Soomro*, Gianluca De Cola, Silvia Conforto, Maurizio Schmid, and Gaetano Giunta* Department of Engineering, University of Roma Tre, Via Vito Volterra 62, 00146 Rome, Italy Elisa Guidi and Emanuele Neri Department of Radiological Sciences, AOUP, Via Savi 10, 56126 Pisa, Italy Damiano Caruso, Maria Ciolina, and Andrea Laghi Dept. of Radiological Sciences, Oncology and Pathology, “Sapienza”- University of Rome, I.C.O.T. Hospital, Via Franco Faggiana 1668, 04100 Latina, Italy. Correspondence* {mumtazhussain.soomro, gaetano.giunta}@uniroma3.it

Abstract— In this paper, a novel method to automatically segment colorectal cancer from 3D MR images based on combination of 3D fully convolutional neural networks (3DFCNNs) and 3D level-set is proposed. The 3D-level set is incorporated in the 3D-FCNNs aiming at: i) a fine-tuning of the training phase; ii) a refinement of the outputs during the testing phase by integrating smoothing function and prior information in a post-processing step. The proposed method is assessed and compared with 3D-FCNNs without 3D-level set (3D-FCNNs alone) in terms of Dice Similarity Coefficient (DSC) as a performance metric. The proposed method showed higher DSC than 3D-FCNNs alone on both training and testing data set as, (0.91813 vs 0.8568) and (0.9378 vs 0.86238), respectively. Our results on 3D colorectal MRI data demonstrated that the proposed method gives better and accurate segmentation results than 3D-FCNNs alone. Keywords— Colorectal Cancer, 3D MRI Segmentation, 3DFCNNs, 3D level-set.

I. INTRODUCTION Colorectal or colon plays vital role in food digestion by breaking large molecules into ions and nutrients to be absorbed in body along with water [1-2]. Colorectal cancer or rectum/bowl cancer is diagnosed as 3rd most common in US trailed by lung and breast cancer. Colorectal cancer is the second leading cause of cancer death worldwide, about half of million people die due to this type of cancer where women and men are equally affected [3-4]. Early diagnosis of this type of cancer is of supreme interest to take decision about patient’s pretreatment or surgical operation. Recently, MRI has become one of the prominent imaging modality to diagnosis of colorectal cancer [2, 5-6]. In medical image processing, segmentation of the desired shape of any organ or interested part inside the organ (i.e. tumor etc.) is one crucial step. Generally, the radiologist or medical experts perform the segmentation manually by slicing the

978-1-5386-1462-4/18/$31.00 ©2018 IEEE

desired part. This requires lot of time and labor work and can be affected by several human eye errors. Secondly, medical imaging is more chaotic than natural images where the structure of the same organ in the same patient is diverse in each slice. Thus, an automatic and intelligent segmentation method is required to segment chaotic medical images with high accuracy. Lately, level-set based segmentation algorithms have been widely used and become preferable algorithms for medical image segmentations [4], [7]. Level-set approaches perform segmentation based on the energy minimization problem by integrating different type of regularization (smoothing terms) and priors [8]. However, the level-set based segmentation methods are more preferable as they provide a segmentation function with tendency to change topological properties even if they require an appropriate contour initialization to obtain effective segmentation results. In addition, level-set based approaches are progressively deficient due to their simple appearance model [8]. Recently, convolution neural networks (CNNs) based on deep learning methods have been successfully employed in medical imaging, especially for segmentation and detection purposes [9-10]. In deep learning based methods, features for complex structures and patterns are erudite from well-defined large training data sets. Thereafter, these trained features are utilized for prediction. Unlike levelset based methods, deep learning can learn appearance models automatically from the large training data. Nevertheless, deep learning based approaches are not able to provide explicit way to integrating prior shape and regularization [11]. Additionally, medical image segmentation is a task more chaotic than natural image segmentation. Firstly, the patient data is extremely diversified. In other words, the pattern of the same pathology varies among patients. Secondly, small and incomplete medical data sets make CNNs training more prone to overfitting [9]. In spite of this, recent proposed CNN architectures demonstrated better performance than other

198


machine learning based algorithms for medical image segmentation [12]. Thirdly, medical images such as MRI (Magnetic Resonance Imaging) or CT (Computed Tomography) scans are often in 3D volumetric form while the existing CNNs are in 2D nature. These 2D CNNs are applied slice-by-slice sequentially [13], thereby disregarding spatial information in the third dimension. An alternative solution was proposed in [14], where spatial information is enhanced by aggregating axial, sagittal and coronal planes in one-to-one association, respectively. In [14], input slices are treated independently considering each orthogonal plane separately where convolutional kernel is used only for two orthogonal planes when it is not shared with the third one. Nonetheless, this solution is not able to utilize volumetric spatial information completely. Furthermore, 3D CNNs have been discouraged due to computational complexity and memory requirement [10]. Considering above problem, recently, 3D fully CNNs (3D FCNNs) are proposed to detect cerebral microbleeds from MRI, where whole volumetric data are used as input to obtain 3D volumetric output as a 3D prediction score directly within a single forward propagation [15], thus reducing computational complexity. Unlike other 2D CNNs based methods, they use 3D kernel which shares spatial information across all the three dimensions. In [9], 3D-FCNNs provide an efficient performance in segmenting brain lesions. In [10] 3D-FCNNs based on baseline CNN architecture are proposed for subcortical segmentation where maximum pooling layer (which is used by some FCNN based methods [15]) is avoided to further reduce computational complexity and redundancy. Considering the above problems in level-set based algorithms and deep learning based methods, we proposed an automatic method by combining deep learning based 3DFCNNs and 3D Level-Set. In this work, 3D-FCNNs architecture, as shown in Figure 1(b) is related to [10]. Specifically, we have designed deeper 3D-FCNNs architecture with 8 convolutional layers, 2 Fully Connected layers (FCs) and one softmax layer. From the begin, the probability map to detect the surface of the tumor is learned by a trained 3DFCNNs model. Thereafter, learned probability map is used as input to initialize the 3D level-set to further refine the initial segmentation output of the 3D-FCNNs. Here, the 3D level-set is incorporated in the training phase to enhance the performance of the 3D FCNNs whereas it is used in testing phase as a post-preprocessor to refine the output of 3D FCNNs. In our previous work [4], we have shown that geodesic active contour based method [16] outperformed in segmenting colorectal part from 2D MR images. Therefore, in this study, we have employed 3D level set method based on geodesic active contour. The rest of the paper is organized as follows: Section II provides the information about patients and dataset used in training and testing phases, Section II presents the details about methods, results are discussed in Section IV, and finally the concluding remarks are presented in Section V. II. DATA a) Patient The approval of this study was given by institutional ethical committee Department of Radiological Sciences, AOUP, Pisa, Italy. Informed written consent was attained

from all patients after providing a detailed information about the study purpose and all the possible risks of the experiment. In this study, 12 patients with histologically determined colorectal adenocarcinoma and tumor stages T2 and T3 were enrolled. Patients were excluded when matching the following criteria: cardiovascular disease (i.e. implanted with pacemakers), synchronous tumors, previous history of psychiatric or neurological disorders, incomplete MRI examination or previous pelvic RT and legal incapacity. T2weighted MRI data have been collected from 8 patients with age 45-60 years. b) Dataset In this study, seventy 3D abdominal MRI volumes are used. The MRI volume dimension are 512×512×(69 ∼ 122) voxels with voxel spacing varying from 0.46×0.46×0.5 to 0.6×0.6×1.2 mm/voxel. Among seventy MRI volumes, fifty together with their manually segmented ground truth labels were used for training, while 20 were used as test data. The ground truth labels were manually segmented by two expert radiologists. III. METHODS A. 3D-FCNNs In this study we have used 3D-FCNNs based on CNNs base line model. The architecture of proposed 3D-FCNNs is shown in Figure 1(b). The proposed architecture consisted of eight convolutional layers. At the first convolutional layer, MRI volume as an input is taken and then the input is convolved with 3D convolutional filters (i.e. kernels) to yield the feature volume. In the successive layers, the inputs are taken as future volumes of the previous layers. Suppose Fkl −1 is kth feature volume of the (l-1)th layer then the mth output feature volume of l-th layer is given as,     Fml (x, y, z) = χ  Fkl −1 (x − m, y − n, z − t) ⊗Wkil (m.n.t)  + blk , (1)     k  m,n,t 



where W kil , which denotes the 3D filters with kernel size of (m×n×t), is convolved element wise over the feature volume of each preceding layer, Fkl −1 ; ⊗ represents the 3D convolutional operation; blk denotes bias and χ (⋅) is the nonlinear activation function. Here, a newly introduced non-linear activation function, namely Parametric Rectified Linear Unit (PReLU) [17], is used as a replacement of the famous Rectified Linear Unit [10]. PReLU function is given as,

χ ( Fi ) = max(0, Fi ) + α i ⋅ min(0, Fi ),

(2)

where Fi represents the input, χ ( Fi ) is the output, and αi is the trainable parameter that is required to learn to control the negative part of Fi, while αi is almost zero in ReLU. Consequently, PReLU can adjust rectifiers to their input, thus improving network’s accuracy with nearly zero additional

199


(a)

(b)

Fig. 1. (a) An overview of proposed methodology, (b) proposed 3D-FCNNs architecture. computational cost and also reducing overfitting risk. This PReLU is applied on each layer except the last one (i.e. softmax layer). The proposed network contains two fully connected layers to further retain spatial information and learn complex patterns extracted in the preceding layers. Finally, at the end all the neurons are convened in n class feature maps where the normalized probability values are computed using n class feature maps by softmax function, such that:

probabilit y _ map n =

exp( Fl n ) n

 exp( F

l

i

(3)

)

In this work, we present deeper architecture with small kernel size, 3×3×3, where each convolutional layer is once repeated with same kernel size. The size of the feature volume depends on the kernel size (i.e. size of feature volume = kernel size - 1). Therefore, feature volume produced by each layer is smaller by 2 voxels than their input volumes as shown in Figure 1(b). The proposed 3D-FCNNs has got 8 convolutional layers with number of filters followed by each convolutional layer feature volumes as: 16, 16, 32, 32, 64, 64, 128, and 128, with small kernel size of 3×3×3, respectively. The 2 fullyconnected layers with kernel size of 1×1×1, contained 200 and 150 hidden neurons, individually. These FCs layers are

i =1

200


followed by final softmax layer or classification layer that produces the probability maps. 3D-FCNNs Training The proposed network was trained using training data set and segmented ground truth (i.e. tumor labels). Let Φ be the network parameters involving convolutional filters, biases and PReLU. Pl v is the predicted probability map of jth label ( l mv ) m

of voxel v in the mth training images with corresponding ground truth ith labels, then the optimization of network parameters can be minimized by following cost function,

L (Φ ) = −

1 m

m

v

 log P i =1 j =1

lmv

( Fv ).

(4)

This function calculates the cross-entropy loss function. Furthermore, the filters weight in each layer was initialized randomly with a zero mean Gaussian distribution N (0,1×10−6) and a stochastic gradient descent algorithm with metaparameters (i.e. initial learning rate, £ = 0.001 and momentum, ß =0.6) is used to update the weights. PReLU and biases in each convolutional layer were initialized with 0.27 and 0, respectively. The total number of epochs was selected as 30 during training. The training of proposed 3D-FCCNs was performed on windows desktop with processor i7-4790 (3.6 GHz) equipped with NVIDIA GeForce GTX 1060 with 6 GB RAM using MATLAB 2017a. The duration of training process end for approximately 14 hours. B. 3D Level-Set In our proposed method, 3D level-set based on 3D geodesic active contour algorithm [16] is incorporated to refine the initial segmentation obtained by 3D-FCNNs. 3D active contour regulates the tumor boundaries more precisely. The mathematical derivation of the geodesic active contour is well explained in [16] and [4]. This algorithm presents an association between active contours and the calculation of geodesic or nominal distance curves. This association gives stable boundary detection even in presence of abundant gaps and variations of gradients. Suppose φ(Pl, t = 0) be the levelset function with given initial surface at t = 0. Here, Pl is the probability map obtained from 3D-FCNNs and it is used as the initial surface to initialize the 3D level-set. The level-set function is evolved to refine the tumor boundaries by partial differential equations [16], such that:

∂ϕ = αX(Pl ) ⋅ ∇ϕ − βY (Pl ) | ∇ϕ | +γZ(Pl )κ | ∇ϕ |, (5) ∂t where X(.) is a convection function, Y(.) represents expansion or propagation function, and Z(.) is a spatial modifier or smoothing function. α, β, and γ are constant scalar quantities to make tradeoff among convection, propagation and spatial modifier functions. At the begin a zero-level surface is required by the level-set algorithm then the initial surface is propagated into a particular direction (inward, outward) with

Input: Image I, mean value (μ) and variance (std), trained 3D-FCNNs model, values of α, β, γ used in 3D level-set. Output: Refined final segmented tumor. 1: Normalize voxel intensities in I using terms μ and std, obtain Inorm. 2: Enter Inorm to developed 3D-FCNNs and obtain output probability map P from softmax layers. 3: Incorporating the P as an input to initialize 3D level-set algorithm. 4: for iteration = 1:100 do Refine the output using 3D Level-Set using parameters α, β, γ. ALGORITHM 1: Proposed method for tumor segmentation

speed. This is controlled by the propagation function and the smoothness of the regions with respect to the mean curvature κ is controlled by the spatial modifier function. The termination of this process depends on convergence criterion or on a maximum number of iteration. Here, we have used the latter setting the maximum number of iterations at 100. C. Proposed Methodology Figure 1(a), demonstrates an overview of the proposed methodology that consisted of three stages: 1) patch extraction and preprocessing, 2) 3D-FCNNs and 3) 3D Level-Set. In patch extraction and preprocessing step, the MR image is divided randomly into M×M size patches for(?) the training phase. In this study, 3D-FCNNs are trained with randomly extracted patches instead of full-sized image. Since the tumor size is generally small training with full-sized image could produce false positive results. Hence, the patches related to tumor regions are picked up to sample tumor region with more frequency. In this regard, 60 patches from each slice along with given ground truth (i.e. label as shown in Figure 1 (a)) are extracted and approximately 140,000 patches are extracted from 12 patients, used as training set. After patch extraction, all the extracted patches have gone through preprocessing step where voxel intensities and variance are normalized by removing the mean from each voxel. 3D-FCNNs takes volumetric data as input and produces 3D volumetric output; each value of the 3D volumetric output indicates the probability of the colorectral cancerous tumor corresponding to each voxel of the input volume. By using this probability as input it is possible to initialize the 3D level-set for further refinement of the output yielded by 3D-FCNNs. The proposed method is briefly demonstrated in ALGORITHM 1 also. D. Performance Metrics for Evaluation In this study, three performance metrics are used to evaluate segmentation results, namely Dice similarity coefficient (DSC) [18], positive predictive value (PPV), and

201


recall or sensitivity. The DSC computes the overlap similarity between the ground truth label and the segmented result. It is expressed as,

2TP DSC = , FP + 2TP + FN

TABLE 1. Comparison of 3D-FCNNs and 3D-FCNNs + 3D LevelSet performances on training data Performance Metrics

Methods

(6)

3D-FCNNs (Alone) 3D-FCNNs + 3D Level-Set

where TP, FP and FN denote true positive, false positive and false negative regions detection, respectively. PPV gives the accuracy of true positive and false positive of tumor segmentation. Its higher value indicates that segmentation covers less non-tumor region. PPV is defined as,

DSC

PPV

Sensitivity

0.8568

0.8012

91.249

0.91813

0.8734

84.7

1×106

TABLE 2. Comparison of 3D-FCNNs and 3D-FCNNs + 3D Level-Set performances on testing data Methods

PPV =

TP . TP + FP

(7)

The last metric is the sensitivity that evaluates the number of TP and FN in the detection of a tumor. It is expressed as,

TP Sensitivit y = . TP + FN

(8)

In this study, two different models were learned and the performance on colorectal tumor segmentation were compared on the same training and testing data sets. These models were learned by using the following methods: 3D-FCNNs (Alone) 3D-FCNNs + 3D Level-Set (Proposed method)

During the training, the same number of network

(a)

Performance Metrics DSC

PPV

Sensitivity

3D-FCNNs (Alone)

0.86238

0.8387

93.045

3D-FCNNs + 3D Level-Set

0.9378

0.9012

88.37

parameters was chosen for both methods, as 1×106. TABLE 1 demonstrates the performance of both methods on training data set and TABLE 2 shows the comparison of segmentation results produced by each method on testing data set. figure 2 shows segmentation results on 3D testing data set which is individually represented in each plane i.e. axial (figure 2(a)), sagittal (figure 2(b)), and coronal (figure 2(c)). In the figure 2, tumor is manually delineated by expert radiologist/reader (red), yellow outline is the result yielded by 3D-FCNNs alone, and the results produced by our method are represented by the blue outline. From TABLE 1 and TABLE 2, it can be quantitatively assess that the proposed method has achieved better DSC and PPV score than 3D-FCNNs alone on both training and testing data set. Also, it is shown that the proposed method is less sensitive to detect false negative

IV. RESULTS AND DISCUSSION

1. 2.

Total parameters

(b)

(c)

Fig. 2. Segmentation result of 3D MR image represented individually in each 2D plane, axial (a), sagittal (b) and coronal (c) with ground- truth (red), segmentation by 3D-FCNNs (yellow), and segmentation results by 3D-FCNNs + 3D Level-Set (blue).

202


tumor than 3D-FCNNs alone. It is because that 3D-FCNNs alone may learn the shape which is nearly homogeneous to tumor and may not regularized the shape in explicit way as shown in figure 2 (result outlined with yellow). On the contrary, our proposed method incorporates the 3D level-set algorithm to regularize the output of 3D-FCNNs alone in explicit way in order to obtain refined output as shown in figure 2 (result outlined with blue). V. CONCLUSION In this work, deep learning based 3D-FCNNs with 11 layers including 8 convolutional layers, 2 fully connected layers and one softmax or classification layer, is studied to segment the colorectal tumor from 3D MRI. 3D-FCNNs alone have been firstly explored to segment the colorectal tumor. We found that 3D-FCNNs were unable to regularize the tumor shape by considering contours and edges and showed high sensitivity values. Next, we incorporated the 3D level-set algorithm into the 3D-FCNNs so improving the performance of the 3D-FCNNs in training and giving rise to better segmentation results. In this study, the proposed method has been assessed by using simple metrics but for further and deeper assessment we want to test the proposed method with different convolutional layers along with different kernel sizes, patch sizes and with more data sets. REFERENCES [1] Ashiya, “Notes on the Structure and Functions of Large Intestine of Human Body,” http://www.preservearticles.com/201105216897/notes-onthe-structure-and-functions-of-large-intestine-of-human-body.html, Feb. 2013. [2] M. H. Soomro, G. Giunta, A. Laghi, D. Caruso, M. Ciolina, C. De Marchis, S. Conforto, M. Schmid, “Haralick’s Texture Analysis applied to colorectal T2-weighted MRI: A preliminary study of significance for cancer evolution,” In Proc. of 13th IASTED (BioMed 2017), pp. 16-19, 2017. [3] J. Scholefield et al., eds., Challenges in Colorectal Cancer, second ed. Wiley-Blackwell, 2006.

[4] M. H. Soomro, G. Giunta, A. Laghi, D. Caruso, M. Ciolina, C. De Marchis, S. Conforto, M. Schmid, “Segmenting MR Images by Level-Set Algorithms for Perspective Colorectal Cancer Diagnosis.” VipIMAGE 2017. ECCOMAS 2017. Lecture Notes in Computational Vision and Biomechanics, vol 27. Springer, 2018. [5] H. Kaur, H. Choi, Y. Nancy You, G. M. Rauch, C. T. Jensen, P. Hou, G. J. Chang, J.M. Skibber, and R. D. Ernst, “MR Imaging for Preoperative Evaluation of Primary Rectal Cancer: Practical Considerations,” RadioGraphics, 32(2), 2012, pp. 389-409. [6] Ü. Tapan, M. Özbayrak, S. Tatlı, “MRI in local staging of rectal cancer: an update,” Diagnostic and Interventional Radiology, vol. 20(5), 2014, pp. 390-398. [7] Y, Chen, “A novel approach to segmentation and measurement of medical image using level set methods,” Magnetic Resonance Imaging, Vol. 39, pp. 175 – 193, 2017. [8] D. Cremers, M. Rousson, and R. Deriche, “A review of statistical approaches to level set segmentation: integrating color, texture, motion and shape,” International Journal of Computer Vision, pp. 195-215, vol. 72, April 2007. [9] K. Kamnitsas, C. Ledig, V. F. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, and B. G. Daniel Rueckert and, “E cient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation,” Medical Image Analysis, vol. 36, p. 6178, 2017. [10] J. Dolz, C. Desrosiers, I. B. Ayed, “3D fully convolutional networks for subcortical segmentation in MRI: A large-scale study,” NeuroImage, 2017. https://doi.org/10.1016/j.neuroimage.2017.04.039 [11] M. Tang, S. Valipour, Z. V. Zhang, D. Cobzas, M. Jagersand, “A deep level set method for image segmentation,” arXiv:1705.06260, 2017. [12] B. H. Menze et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Transactions on Medical Imaging, 34(10):1993–2024, 2015. [13] H. R. Roth et al., “Deeporgan: Multi-level deep convolutional networks for automated pancreas segmentation,” ArXiv Preprint ArXiv:1506.06448, 2015. [14] A. Prasoon, K. Petersen, C. Igel, F. Lauze, E. Dam, and M. Nielsen, “Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network,” in Medical Image Computing and Comput.-Assisted Intervention-MICCAI 2013. New York: Springer, 2013, pp. 246–253. [15] Q. Dou, H. Chen, L. Yu, L. Zhao, J. Qin, D. Wang, V. C. Mok, L. Shi, and P.-A. Heng, “Automatic detection of cerebral microbleeds from MR Images via 3D convolutional neural networks,” IEEE transactions on medical imaging, 35(5):1182–1195, 2016. [16] V. Caselles , R. Kimmel, G. Sapiro, “Geodesic active contours,” International Journal of Computer Vision, vol. 22, pp. 61–79, 1997. [17] K. He, X. Zhang, S. Ren, J. Sun, “Delving deep into rectifiers: surpassing human-level performance on imagenet classification,” In Proc. of the IEEE int.conf. on computer vision, pp 1026–1034. [18] L. R. Dice, “Measures of the amount of ecologic association between species,” Ecology,vol.26,no.3,pp.297–302,1945.

203