Fully Automated Organ Segmentation in Male Pelvic CT Images - arXiv

Fully Automated Organ Segmentation in Male Pelvic CT Images Anjali Balagopal§, Samaneh Kazemifar§, Dan Nguyen, Mu-Han Lin, Raquibul Hannan, Amir Owrangi, Steve Jiang Medical Artificial Intelligence and Automation Laboratory, Department of Radiation Oncology, University of Texas Southwestern, Dallas, Texas, USA Email: [email protected] ABSTRACT Accurate segmentation of prostate and surrounding organs at risk is important for prostate cancer radiotherapy treatment planning. We present a fully automated workflow for male pelvic CT image segmentation using deep learning. The architecture consists of a 2D localization network followed by a 3D segmentation network for volumetric segmentation of prostate, bladder, rectum, and femoral heads. We used a multi-channel 2D U-Net followed by a 3D U-Net with encoding arm modified with aggregated residual networks, known as ResNeXt. The models were trained and tested on a pelvic CT image dataset comprising 136 patients. Test results show that 3D U-Net based segmentation achieves mean (±SD) Dice coefficient values of 90 (±2.0)% ,96 (±3.0)%, 95 (±1.3)%, 95 (±1.5)%, and 84 (±3.7)% for prostate, left femoral head, right femoral head, bladder, and rectum, respectively, using the proposed fully automated segmentation method.

INTRODUCTION Accurate segmentation of prostate and surrounding organs at risk (OARs) is important for radiotherapy treatment planning. Manual segmentation by physicians currently used in clinical practice is time consuming, highly depends on the physicians’ skill and experience, leading to large inter and intra observer variation16.

Inter-observer variability (standard deviation) for the manually segmented prostate volumes was observed

to range from 10% to 18%7, indicating the high demand for automated segmentation methods in clinical practice. Automated segmentation of prostate and organs from CT images, is challenging for two reasons: 1) The boundary between prostate and background is usually unclear because of low contrast in the CT images. 2) Variation in shape, size, and intensity of prostate, bladder, and rectum among different patients of different ages and at different times are large. Also, pelvic CT images are largely diverse. Sources of this diversity are described as follows: 1) Use of contrast agents in some patients may partially or fully brighten the bladder in CT images. 2) Since the images are often acquired with different fields of view, substantial organ position and volume dimension variation occurs. 3) Sometimes fiducial markers or catheters may be

§

Co-first authors.

implanted into patients, changing the texture of major pelvic organs. The diversity of pelvic CT images further complicates the segmentation of male pelvic organs from CT images. Segmentation algorithms have been developed to reduce these variabilities and improve accuracy and efficiency8-30. Most developed methods fall into the category of deformable model-based segmentation, atlas-based segmentation, and learning-based segmentation. In recent years, methods using deformable models based on the statistical information extracted from both organ shape and image appearance have been studied, showing good performance for pelvic organs. Pizer et al.31 proposed a medial shape model to segment bladder, prostate, and rectum from CT images, and Broadhurst et al.32 used a deformable model based on the histogram-statistics appearance model for prostate segmentation. The performance of these classification-based deformable models highly depends on the position, size, and shape of the initialized model. Good initialization is required for good segmentation accuracy. However, in the case of pelvic organ segmentation, characterized by large inter-subject anatomical variations, good initialization is often difficult to obtain33. Regression-based deformable models overcome this drawback. Shi Y et al.19 proposed a novel spatiallyconstrained transductive lasso (SCOTO) to jointly select discriminative features and generate a prostate likelihood map, followed by multi atlas-based label fusion of planning and previous treatment images. Gao Y et al.20 proposed using multi-task random forest to learn the displacement regressor jointly with the organ classifier, followed by an auto-context model to iteratively enforce structural information during voxel-wise prediction. In recent years, the shape model based on prior knowledge and machine learning has been widely used in the automatic segmentation of CT images. These methods use a training dataset as prior knowledge. The training dataset is divided into two population-based and patient-specific. The method based on the patientspecific model often has higher segmentation accuracy because the patient-specific dataset consists of several CT images from the same patients, with large shape variation between different patients. Li W et al.22 suggested a patient-based classification model that uses two sets of location- adaptive classifiers placed along the two coordinate directions of the planning image space of the patient. The model is further trained with the planning image and also the previous segmented images of the same patient to jointly perform prostate segmentation for a new image. Ma et al.23 proposed a combination of population-based and patient-based learning methods for segmenting the prostate on CT images. A population model was trained according to a group of prostate patients, a patient-specific model was trained according to individual patient data, and user interaction was used for marking the information to be incorporated into segmentation. The similarity between the two models was computed to estimate the likelihood of a pixel belonging to

prostate tissue. Shao Y et al.24 presented a boundary detector based on a regression forest and used it as shape before guiding accurate prostate segmentation. Lay N et al.25 proposed a discriminative classifier by employing the landmarks that can be detected through joint global and local texture information. Even though these methods perform remarkably well, they require training samples from the previous treatment images of the same patient to learn patient-specific information for enhancing the prostate segmentation result. These methods also rely on hand crafted features for accurate segmentation. Despite all the progress, an obvious gap between the automated segmentation results and manual annotations needs to be addressed. Deep learning and specifically, convolutional neural networks (CNNs) have revolutionized the field of natural image processing. Instead of extracting features in a hand-designed manner, deep learning discovers the informative representations in a self-taught manner from a set of a data with minor preprocessing only.34,35 Deep learning can effectively describe the large variations in the training data, simultaneously learning local and global information at different resolutions, and achieving satisfactory results without additional steps to refine their segmentation. Although deep learning networks have long training times, their inference time is minimal. Many researchers have used deep learning methods for medical image segmentation achieving remarkable success. The U-Net36 is a well-known CNN architecture for medical images segmentation. Their architecture includes so-called skip connections between feature maps in the same depth level of the contracting and expanding path. The concatenation of the feature maps gives U-Net a significant advantage compared to the patch-wise approaches, preserving local features while propagating global features to higher resolution layers. For this reason, the skip connections characteristic is found in many other segmentation works, in which the features maps are combined in different ways. The U-Net architecture was extended to a 3D form by Cicek et al.38 The 3D network is trained with 2D annotated slices, generating dense volumetric segmentations. This is an interesting development of the original U-Net because a high percentage of medical data is in 3D, with a similar size in each dimension, and a slice-by-slice application of 2D convolutional operations can be inefficient and inaccurate. Inspired by U-Net, another volumetric, fullyconvolutional neural network for 3D image segmentation was proposed by Milletari et al.39 This architecture, called V-Net, performs 3D convolutions with volumetric kernels and uses residual blocks to manage the vanishing gradient problem. Another major change introduced into U-Net by Milletari et al.39 is the objective function. To cope with the unbalanced class problem an objective function based on the Dice coefficient was proposed instead of cross-entropy. Maximizing the Dice similarity coefficient does not require any hyper parameters (for example, the weight map), adding a major advantage to this approach.

Recently, deep learning has been used for prostate segmentation in MRI40-43, achieving good results. MRI provides excellent soft-tissue contrast helping to better delineate organ boundaries. However, because current radiotherapy treatment planning workflow uses CT images for contouring and dose calculation, automated segmentation of prostate and OARs in CT image is highly desirable, despite being more challenging than segmentation in MR images. Ma et al.28 used a combination of deep learning and multi atlas fusion for the automatic segmentation of prostate on CT images. A 2D fully convolutional network (FCN) was used for initial segmentation followed by multi-atlas label fusion to refine the segments. The model was trained on a dataset of 92 patients and achieved a Dice similarity coefficient (DSC) value of 86.80% as compared to manual segmentation. In an earlier work, we have used a 2D U-Net for semi-automated segmentation of prostate and OARs in Pelvic CT and achieved a DSC value of 88% compared to manual segmentation44. We present a fully automated workflow for pelvic CT multi organ segmentation that uses a multi-channel 2D U-Net for organ localization, followed by a 3D U-Net with ResNeXt blocks for accurate segmentation of prostate and organs-at-risk in pelvic CT. The significance of this work is three-fold: 1) This is the first fully automated workflow for pelvic CT segmentation using deep learning for multi organ segmentation, 2) The proposed 3D network provides better segmentation accuracy compared to existing methods, and 3) The network shows that existing deep learning architectures with some modifications could be highly effective to localize and segment organs without pre- and post-processing of raw images.

METHODS In general, an organ segmentation problem consists of two related tasks: organ localization and organ delineation. Organ localization determines the target object’s whereabouts on the image or its location, whereas organ delineation draws the object’s spatial extent and composition. A fully automated network should be capable of completing both tasks with good accuracy. Inter-patient variation in shape, size, and organs location hinders organ detection in CT images. Errors in location accuracy would completely mislead segmentation. The most common measure taken by automated segmentation algorithms to overcome this issue is the manual detection of organ locations24, which is time consuming and unsuitable when developing fully automated segmentation algorithms. To overcome this issue, we have included a 2D U-Net in our workflow to detect organ location in pelvic CT scans before segmentation. Automation workflow

The proposed framework for the automated detection and segmentation of prostate, bladder, rectum, and femoral heads in pelvic CT scans is shown in Figure 1. The upper and lower 15% of the image does not contain any useful information or organs. However, because it increases memory consumption it has been cropped out. The cropping margin was determined for the patients scanned at UT Southwestern Medical Center. The value should be determined according to the dataset. This step was used only for increasing the computing speed and could be removed because it does not affect accuracy. The images are then downsized before being fed into the localization network to speed up the network and to overcome memory issues. The entire CT volume is fed into the localization network to detect organ locations. Organ volumes are then extracted and fed into the final segmentation network.

Figure 1: Automation workflow that requires minimal data preprocessing for organ localization and segmentation

Localization network The localization network is a 5 channel 2D U-Net architecture (Figure 2) where each channel corresponds to one of the organs. The U-Net architecture36 consists of an encoding path followed by and connected to

a decoding path that enables pixel-wise predictions even for small datasets. The architecture uses upconvolutions to remap the lower resolution feature maps within the network to the denser space of the input images. In each iteration, a small batch of images is trained by the network. Convolution layers with (3×3) kernel size and a stride of one are used encoding arm. Feature map dimension is reduced at each layer of the encoding network using the pooling operator. Up sampling is performed in the decoding branch, by keeping the stride value of the convolution kernel at the start of each layer at 2, to better incorporate the neighboring features in order to construct the missing ones. The CT images and corresponding masks for each organ are fed into each of the channels for training. The final predicted output will have 5 channels with each channel corresponding to the prediction for one organ. The network was trained by minimizing a custom weighted Dice loss function, DSCw (describe in the section on loss function), using the Adam optimizer45.

Figure 2: Localization network: a 5 channel 2D U-Net.

Segmentation network For most organs, because the first and last few slices do not contain sufficient shape information, they would be difficult to segment using 2D networks. Applying 3D kernels would generate discriminative feature maps that contain 3D shape and surface of the related organ rather than generating 2D curves only. A threedimensional network could comprehend the continuity of sequential slices and also could better segment small parts of the organs compared to 2D networks. We used a U-Net architecture with a modified encoding arm as our segmentation network (Figure 3). The employed U-Net architecture is the 3D extension described by Cicek et al.38, developed from the U-Net proposed by Ronneberger O et al.36 The encoding

arm of this architecture consists of ResNeXt blocks adapted from S. Xie et al.46 at each layer. Xie et.al.46 introduced the concept of “cardinality” – an additional dimension to depth and width- and showed that aggregating residual blocks with the same topology and hyper-parameters is more effective in gaining accuracy than going deeper or wider. The ResNeXt block used in our network consists of 𝑛 repetitions of two 3D convolutions (filter size, 3×3×3) and is represented in Figure 3. The outputs of these 𝑛 paths are concatenated. The resulting feature map is then concatenated with the input to the ResNeXt block. Maxpooling with stride (2,2,2) is performed on the ResNeXt output before being passed onto the next layer in the encoding arm. The decoding arm consists of (3×3×3) convolutions with up sampling (stride of [2,2,2]). All the layers use the ReLU activation function except for the output layer, which uses sigmoid.

Figure 3: Segmentation architecture: a 3D U-Net with ResNeXt blocks in the encoding arm (Left). Architecture of the ResNeXt block used (Right).

The output of the localization network is used for predicting organ volumes that is input into the segmentation network. Since the organ volumes vary from patient to patient and the 3D network requires a common volume size, a fixed number of total slices (32 for prostate and 64 each for the OARs) is selected around the predicted volume. The 3D network was trained by minimizing the boundary weighted dice loss function, DSCbw (described in the next section), using the Adam optimizer45. The number of layers, filter sizes, and also the value of n and hyper parameters varied for different organs. Loss functions

The Dice similarity coefficient (DSC), also called the overlap index, is the most used metric in validating medical volume segmentations. DSC measures the amount of agreement between two image regions. It is widely used as a metric to evaluate the segmentation performance with the given ground truth in medical images. DSC is defined as

𝐷𝑆𝐶 =

2|𝑇𝑟𝑢𝑒 ∩ 𝑃𝑟𝑒𝑑| |𝑇𝑟𝑢𝑒| + |𝑃𝑟𝑒𝑑|

(1)

We use | | to indicate the number of foreground voxels in the ground truth (True) and predicted (Pred) segmentations. DSC is a useful loss function for segmentation networks because of its efficiency in addressing class imbalance issue. Organ weighted dice function for localization network Since organs vary in size, we have to make sure that all the organs are contributing equally to the dice loss function. To achieve this, a custom weighted Dice function was used for the localization network as defined in (2). All pixels corresponding to annotated areas were assigned a weight inversely proportional to the number of samples of its class in the specific set.

𝐷𝑆𝐶𝑤 = ∑{𝑤𝑥 }

2|𝑇𝑟𝑢𝑒 ∩ 𝑃𝑟𝑒𝑑| |𝑇𝑟𝑢𝑒| + |𝑃𝑟𝑒𝑑|

(2)

Here {wx} = weights vector for organ x determined during training from the number of samples belonging to the organ. True and Pred are the gold standard and predicted segmentation results, respectively. Boundary weighted dice function for segmentation network To mitigate the issue of fuzzy boundary we have used a boundary weighted Dice function defined in (3) for the segmentation network. More importance is given to the pixels near the organ boundary. The closer a pixel is to the boundary, the higher the weight would be and hence larger the loss (1- DSCbw).

𝐷𝑆𝐶𝑏𝑤 = ∑

2|(𝑤𝑏 ∗{𝑇𝑟𝑢𝑒 ∩ 𝑃𝑟𝑒𝑑}| |{𝑤𝑏 ∗𝑇𝑟𝑢𝑒}| + |{𝑤𝑏 ∗𝑃𝑟𝑒𝑑}|

(3)

Here wb = border weight matrix for the organ determined during training.

EXPERIMENTS & RESULTS Patients Data The dataset consisted of raw CT scan images of 136 prostate cancer patients collected at The University of Texas Southwestern Medical Center (UTSW). All CT images were acquired using a 16-slice CT scanner (Royal Philips Electronics, Eindhoven, The Netherlands). The target organ (prostate) and OARs (bladder, rectum and femoral heads) were contoured by radiation oncologists. All images were acquired with a 512x512 matrix and 2 mm slice thickness (voxel size 1.17mm×1.17mm×2mm).

Training Data Eighty percent of the patient data were used for training and the remaining 20% were used for testing. The training data was split into 4 groups and the leave-one-out cross validation strategy was used for training. One set was reserved as the validation data set, and the training was performed to minimize the loss on that validation set. The average of the predictions of the four models was used for predicting the test set.

TOTAL PATIENT DATA 20%

80% TRAINING DATA

TRAINING SET

TEST DATA VALIDATION SET

Pat_set 1

Pat_set 2

Pat_set 3

Pat_set 4

Pat_set5

Pat_set 1

Pat_set 2

Pat_set 3

Pat_set 4

Pat_set5

Pat_set 1

Pat_set 2

Pat_set 3

Pat_set 4

Pat_set5

Pat_set 1 Pat_set 2 Pat_set 3 Pat_set 4 Figure 4: Leave-one-out cross validation training strategy

Pat_set5

Training strategies Initially, we tried a kernel size of 5×5×5 for the convolutional layers, but this led the output segmentations to become inaccurate around edges and small contours for the validation data. Intuitively this made sense as we were grouping large sections together and when up-sampling was performed, they did not map to precise points. After trying different kernel sizes of 5×5×5, 3×3×3, and 1×1×1, we settled on 3×3×3 exclusively. The model’s accuracy and training speed changed drastically when we used different learning rates. A high learning rate of 1e-2 with a decay of 0.1, when loss plateaus, led to the best validation set Dice coefficient value. Details of the 2D as well as 3D architectures can be found in the Appendix. Even though the architecture we employed is efficient enough for accurate prediction, over fitting issues still occur. To overcome over fitting, we used early stopping and dropout48. Batch normalization49 was added at each layer to increase network stability. For efficient learning, the learning rate was decayed when validation loss plateaus. Deep learning models were implemented using the open source Keras package47, and componential processing was performed with a NVIDIA Tesla K80 dual-GPU graphic card. Results Examples of the segmentations predicted by the model vs the ground truth segmentation are shown is Figures 5 and 6. The segmentations predicted by the automation network (Blue) closely resemble the manually segmentations (Red). Examples of ground truth vs predicted segmentation for base, middle, and apex slices for prostate are illustrated in Figure 7.

Figure 5: Predicted segmentation (Blue) vs ground truth (Red) result for left and right femoral head.

Figure 6: Predicted segmentation (Blue) vs ground truth (Red) result for prostate, bladder and rectum, from left to right.

Figure 7: Predicted segmentation vs. ground truth results for prostate at base, middle, and apex slices, from left to right.

The calculated DSC values for each of the four cross validation models and the averaged model for prostate segmentation are plotted in Figure 8. With the averaged model, the maximum DSC was 93% and the minimum DSC was 85.9%.

Dice similarity coefficient(%)

100% 95% 90% 85% 80% 75% Set1

70% 0

5

Set2 10

Set3

Set4

15

Patient number

20

Average 25

Figure 8: Plot showing the prostate segmentation DSC values for the cross validation model and the final averaged model.

The result of cross validation training for bladder, prostate, and rectum is summarized in Table 1. The final test accuracy was obtained by averaging the prediction of the four cross validation models on the test dataset (Table 2). The box whisker plot showing the distribution of Dice coefficient of the organs segmented for all patients is shown in Figure 9. Table 1: Cross validation result for bladder, prostate, and rectum.

Organ

Bladder Prostate Rectum

Accuracy of cross validation Set 1 95.6% 91.2% 86.5%




1

DSC

0.95 0.9 0.85 0.8 0.75 Bladder

Prostate

Rectum

R FEM

L FEM

Figure 9: Box whisker plot showing the distribution of dice coefficient for all the organs segmented. Table 2: Mean DSC accuracy for the averaged model.

Organ Bladder Prostate Rectum Right femoral head Left Femoral Head

Average Accuracy(SD) 95.0(±1.5)% 90.2(±2.0)% 84.3(±3.7)% 95.0(±1.3)% 96.0(±3.0)%

DISCUSSION Accurate segmentation of prostate and OARs is important for radiotherapy treatment planning. The low soft tissue contrast in CT images and organ size, shape, and intensity variations among patients hinder automated segmentation. The proposed fully automated method provides accurate and fast segmentation of prostate and OARs. Once training is completed and the weights for the network are saved, final segmentation for the test set (27 patients) only takes a few seconds. The entire workflow which includes 2D localization, volume cropping, and 3D segmentation only takes a few minutes for each patient. The mean DSC values obtained for prostate, bladder, rectum, left femoral head, and right femoral head were 0.90, 0.95, 0.84, 0.95, and 0.96 respectively. These results indicate high similarity between automated and manual segmentations of the organs. The low mean DSC value as well as the large variation for the rectum is likely due to the large variation in the size and shape of this organ among patients. The dataset included patients with rectal balloon inserts. Even though these cases were predicted well (Figure 10), large variation is introduced into the data. This and the fact that most rectum manual segmentations are conducted with the help of MRI may explain the suboptimal results. In the future, we hope that larger datasets will help improve accuracy.

Figure 10: Predicted segmentation (Blue) vs ground truth (Red) result for a rectum with rectal balloon insert

Several deformation-based, multi-atlas-based and learning-based methods were developed for the same purpose. A direct comparison is not possible, however, Table 3 shows the comparison between the DSC values of our proposed method with the DSC values of the methods that have values reported in their publications. Table 3: Comparison of proposed segmentation result with other published results

Shao Y et al.24

Gao Y et al.21

Gao Y et al.20

Feng Q et al.26

Martinez et al.27

Ma L et al.23

Our method

Prostate

88(±2)%

86(±5)%

87(±4)%

89(±5)%

87(±7)%

86.8(±6.4)%

90(±2.0)%

Bladder

86(±8)%

91(±10)%

92(±5)%

-

89(±8)%

-

95(±1.5)%

Rectum

85(±5)%

79(±20)%

88(±5)%

-

82(±6)%

-

84(±3.7)%

Also, we compared the performance of our network with several variations of U-Net model. A comparison of prostate segmentation accuracy and computation time for different architectures is shown in Table 4. These network configurations were trained and tested using the same patient dataset. The use of ResNet51 improved the results, though they were not significant and involved higher computation times. Nevertheless, the performance of DenseNet50 was even poorer. Our proposed 3D network performed the best, as indicated by a cross validation balanced accuracy of 90.2% for prostate and lowest computation time. The prediction time for all the networks was 6-8s for the test set containing 27 patients. All of these models performed poorly without a boundary weighted Dice loss function (1-DSCbw). We present an example of the predicted segmentation result when the network was trained with the boundary weighted dice loss function (1-DSCbw) vs when it was trained with normal dice loss function(1-DSC) (Figure 11).

Table 4: Comparison of the result of different U-Net architectures on the same dataset

Computation time (hours for training)

Prostate segmentation Accuracy

2D U-Net

2.0

87.0%

3D U-Net

2.2

89.0%

3D U-Net with 1x1 convolutions

2.2

88.4%

3D U-Net with DenseNet

2.4

88.5%

3D U-Net with ResNet

2.8

89.1%

3D U-Net with ResNeXt blocks

1.9

90.2%

Model

Figure 11: Predicted segmentation with custom weighted dice loss (blue), with normal dice loss (green), vs ground truth segmentation (red).

Although it is considered the “gold standard,” manual segmentation is not perfect ground truth because of inter-observer variability. In our future work we will ask multiple observers to contour pelvic CT and take an average of the contours as ground truth. Also, we will create an ensemble network to effectively combine the segmentations of all the organs in images without any overlap. CONCLUSION The proposed 2D-3D hybrid network only requires CT images as input, demonstrates a unique ability in dealing with irregular prostates and rectums, and to exhibits superior segmentation performance (i.e., higher DSC value) compared with the state-of-the-art methods (including our own 2D model). Our deep-learning based method is more effective than other automated segmentation methods, and does not require any manual intervention. The proposed workflow could also be easily adapted for segmenting other organs. ACKNOWLEDGEMENTS AB and SJ would like to thank the Cancer Prevention and Research Institute of Texas (CPRIT) for their financial support through grant IIRA RP150485. The authors thank Dr. Damiana Chiavolini for proof-reading and editing the manuscript.

REFERENCES 1. Weiss E and Hess C F 2003 The impact of gross tumor volume (GTV) and clinical target volume (CTV) definition on the total accuracy in radiotherapy Strahlentherapie und Onkologie 179 21-30 2. Weiss E, Richter S, Krauss T, Metzelthin S I, Hille A, Pradier O, Siekmeyer B, Vorwerk H and Hess C F 2003 Conformal radiotherapy planning of cervix carcinoma: differences in the delineation of the clinical target volume: A comparison between gynaecologic and radiation oncologists Radiotherapy and Oncology 67 87-95 3. Van Herk M 2004 Errors and margins in radiotherapy Seminars in radiation oncology 14 52-64 4. van Mourik A M, Elkhuizen P H, Minkema D, Duppen J C, Dutch Young Boost Study G and van Vliet-Vroegindeweij C 2010 Multiinstitutional study on target volume delineation variation in breast radiotherapy in the presence of guidelines Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology 94 286-91 5. Vorwerk H, Beckmann G, Bremer M, Degen M, Dietl B, Fietkau R, Gsänger T, Hermann R M, Alfred Herrmann M K, Höller U, van Kampen M, Körber W, Maier B, Martin T, Metz M, Richter R, Siekmeyer B, Steder M, Wagner D, Hess C F, Weiss E and Christiansen H 2009 The delineation of target volumes for radiotherapy of lung cancer patients Radiotherapy and Oncology 91 455-60 6. Rasch C, Steenbakkers R and van Herk M 2005 Target Definition in Prostate, Head, and Neck Seminars in Radiation Oncology 15 136-45 7. C. Fiorino, M. Reni, A. Bolognesi, G. M. Cattaneo, and R. Calandrino, "Intra- and inter-observer variability in contouring prostate and seminal vesicles: implications for conformal treatment planning," Radiotherapy and oncology: journal of the European Society for Therapeutic Radiology and Oncology, vol. 47, pp. 285-292, 1998. 8. Acosta O, Dowling J, Cazoulat G, et al. Atlas based segmentation and mapping of organs at risk from planning CT for the development of voxel-wise predictive models of toxicity in prostate radiotherapy. In: Madabhushi A, Dowling J, Yan P, Fenster A, Abolmaesumi P, Hata N, eds. International Workshop on Prostate Cancer Imaging. Berlin: Springer; 2010:42–51. 9. Acosta O, Simon A, Monge F, et al. Evaluation of multi-atlas-based segmentation of CT scans in prostate cancer radiotherapy. In: Wright S, ed. 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro. Chicago, IL: IEEE; 2011:1966–1969. 10. Bueno G, Deniz O, Salido J, Carrascosa C, Delgado JM. A geodesic deformable model for automatic segmentation of image sequences applied to radiation therapy. Int J Comput Assist Radiol Surg. 2011;6:341–350. 11. Chai X, Xing L. SU-E-J-92: multi-atlas based prostate segmentation guided by partial active shape model for CT image with gold markers. Med Phys. 2013;40:171–171. 12. Chen S, Radke RJ. Level set segmentation with both shape and intensity priors. In: Matsuyama T, ed. 2009 IEEE 12th International Conference on Computer Vision. Kyoto: IEEE; 2009:763–770. 13. Chen T, Kim S, Zhou J, Metaxas D, Rajagopal G, Yue N. 3D meshless prostate segmentation and registration in image guided radiotherapy. Med Image Comput Comput Assist Interv. 2009;2009:43–50.

14. Cosio FA, Flores JAM, Castaneda MAP. Use of simplex search in active shape models for improved boundary segmentation. Pattern Recognit Lett. 2010;31:806–817. 15. Costa MJ, Delingette H, Novellas S, Ayache N. Automatic segmentation of bladder and prostate using coupled 3D deformable models. Med Image Comput Comput Assist Interv. 2007;2007:252–260. 16. Ghosh P, Mitchell M. Prostate segmentation on pelvic CT images using a genetic algorithm. In: Reinhardt JM, Pluim JPW, eds. Medical Imaging, International Society for Optics and Photonics. San Diego, CA: SPIE; 2008:691442–691442. 17. Rousson M, Khamene A, Diallo M, Celi JC, Sauer F. Constrained surface evolutions for prostate and bladder segmentation in CT images. In: Liu Y, Jiang T, Zhang C, eds. International Workshop on Computer Vision for Biomedical Image Applications. Berlin: Springer; 2005:251–260. 18. Tang X, Jeong Y, Radke RJ, Chen GT. Geometric-model-based segmentation of the prostate and surrounding structures for image-guided radiotherapy. In: Panchanathan S, Vasudev B, eds. Electronic Imaging, International Society for Optics and Photonics. San Jose, CA: SPIE; 2004:168–176. 19. Shi Y, Liao S, Gao Y, Zhang D, Gao Y, Shen D. Prostate Segmentation in CT Images via Spatial-Constrained Transductive Lasso. Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition

IEEE

Computer

Society

Conference

on

Computer

Vision

and

Pattern

Recognition.

2013:10.1109/CVPR.2013.289. doi:10.1109/CVPR.2013.289. 20. Gao Y, Shao Y, Lian J, Wang AZ, Chen RC, Shen D. Accurate Segmentation of CT Male Pelvic Organs via Regression-based Deformable Models and Multi-task Random Forests. IEEE transactions on medical imaging. 2016;35(6):1532-1543. doi:10.1109/TMI.2016.2519264. 21. Gao, Y., Liao, S., & Shen, D. 2012. Prostate Segmentation by Sparse Representation Based Classification. Medical Image Computing and Computer-Assisted Intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention, 15(Pt 3), 451–458. 22. Li, W., Liao, S., Feng, Q., Chen, W., Shen, D.: Learning image context for segmentation of prostate in ct-guided radiotherapy. In: MICCAI. Volume 6893. (2011) 570–578 23. Ma L, Guo R, Tian Z, et al. Combining Population and Patient-Specific Characteristics for Prostate Segmentation on 3D CT Images. Proceedings of SPIE--the International Society for Optical Engineering. 2016;9784:978427. doi:10.1117/12.2216255. 24. Shao, Y., Gao, Y., Yang, X., & Shen, D. (2014). CT prostate deformable segmentation by boundary regression. In Medical Computer Vision: Algorithms for Big Data - International Workshop, MCV 2014 held in Conjunction with MICCAI 2014, Revised Selected Papers (Vol. 8848, pp. 127-136). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8848). Springer Verlag. DOI: 10.1007/978-3-319-13972-2_12 25. Lay N, Birkbeck N, Zhang J, Zhou SK. Rapid Multi-organ Segmentation Using Context Integration and Discriminative Models. 2013; Berlin, Heidelberg. 26. Feng Q., Foskey M., Chen W., and Shen D. 2010 “Segmenting CT prostate images using population and patientspecific statistics for radiotherapy,” Med. Phys. , 4121–4132 .10.1118/1.3464799

27. Martinez F, Romero E, Drean G, Simon A, Haigron P, de Crevoisier R and Acosta O 2014 Segmentation of pelvic structures for planning CT using a geometrical shape model tuned by a multi-scale edge detector Phys Med Biol 59 1471-84 28. Ma, Ling, et al. 2017 “Automatic Segmentation of the Prostate on CT Images Using Deep Learning and Multi-Atlas Fusion.”

Optical

Engineering,

International

Society

for

Optics

and

Photonics,

24

Feb.

2017,

www.spiedigitallibrary.org/conference-proceedings-of-spie/10133/1/Automatic-segmentation-of-the-prostate-onCT-images-using-deep/10.1117/12.2255755.full. 29. Li, D., Zang, P., Chai, X., Cui, Y., Li, R., & Xing, L. (2016). Automatic multiorgan segmentation in CT images of the male pelvis using region-specific hierarchical appearance cluster models. Medical Physics, 43(10), 5426–5436. http://doi.org/10.1118/1.4962468 30. Yang, X., Rossi, P., Ogunleye, T., Jani, A. B., Curran, W. J., & Liu, T. (2014). A New CT Prostate Segmentation for CT-Based HDR Brachytherapy. Proceedings of SPIE--the International Society for Optical Engineering, 9036, 90362K–. http://doi.org/10.1117/12.2043695 31. Pizer S. M., Fletcher P. T., Joshi S., Gash A. G., Stough J., Thall A., Tracton G., and Chaney E. L., 2005 “A method and software for segmentation of anatomic object ensembles by deformable m-reps,” Med. Phys. , 1335– 1345 .10.1118/1.1869872 32. Broadhurst R. E., Stough J., Pizer S. M., and Chaney E. L., 2005 “Histogram statistics of local model-relative image regions,” in Deep Structure, Singularities, and Computer Vision (Springer, Berlin, Heidelberg), Vol. 3753, pp. 72– 83. 33. Gao Y 2016, “Accurate Segmentation Of Ct Pelvic Organs Via Incremental Cascade Learning And RegressionBased Deformable Models “, Thesis, Department of Computer Science, University of North Carolina at Chapel Hill 34. Bengio Y. Learning deep architectures for ai. Foundations and Trends in Machine Learning. 2009;2:1–127. 35. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444 36. Ronneberger

O,

segmentation.

Fischer

Proceedings

P, of

Brox

T.

Medical

U-net: Image

Convolutional Computing

networks

and

for

biomedical

Computer-Assisted

image

Intervention

(MICCAI) 2015:234–241 37. Simonyan

K,

Zisserman

A.

Very

deep

convolutional

networks

for

large-scale

image

recognition. CoRR. 2014 abs/1409.1556. 38. Çiçek, Özgün et al. “3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation.” MICCAI (2016). 39. Milletari, Fausto et al. “V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation.” 2016 Fourth International Conference on 3D Vision (3DV) (2016): 565-571. 40. Zhu, Qikui & Du, Bo & Turkbey, Baris & L. Choyke, Peter & Yan, Pingkun. 2017. Deeply-supervised CNN for prostate segmentation. 178-184. 10.1109/IJCNN.2017.7965852. 41. Tian, Zhiqiang & Liu, Lizhi & Zhang, Zhenfeng & Fei, Baowei. 2018 . PSNet: Prostate segmentation on MRI based on a convolutional neural network. Journal of Medical Imaging. 5. 1. 10.1117/1.JMI.5.2.021208. 42. Ishioka, Junichiro & Matsuoka, Yoh & Uehara, Sho & Yasuda, Yosuke & Kijima, Toshiki & Yoshida, Soichiro & Yokoyama, Minato & Saito, Kazutaka & Kihara, Kazunori & Numao, Noboru & Kimura, Tomo & Kudo, Kosei & Kumazawa, Itsuo & Fujii, Yasuhisa. 2018. Computer-aided diagnosis of prostate cancer on magnetic resonance imaging using a convolutional neural network algorithm. BJU International. 10.1111/bju.14397

43. Yu, Lequan, and Xin Yang. 2017 “Volumetric ConvNets with Mixed Residual Connections for Automated Prostate Segmentation from 3D MR Images.” AAAI Publications, Thirty-First AAAI Conference on Artificial Intelligence, aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14719. 44. Kazemifar, Samaneh, et al. 2018 “Segmentation of the Prostate and Organs at Risk in Male Pelvic CT Images Using Deep Learning.” ArXiv.org , 26 Feb. 2018, arxiv.org/abs/1802.09587. 45. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014. 46. S. Xie, R. B. Girshick, P. Dollár, Z. Tu, and K. He. Aggre-gated residual transformations for deep neural networks. InCVPR, 2017. 47. Chollet, F. 2015 keras, GitHub. https://github.com/fchollet/keras 48. Srivastava, Nitish, Hinton, Geoffrey E, Krizhevsky, Alex, Sutskever, Ilya and Salakhutdinov, Ruslan. "Dropout: a simple way to prevent neural networks from overfitting.." Journal of machine learning research 15 , no. 1 (2014): 1929--1958. 49. Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (ICML'15), Francis Bach and David Blei (Eds.), Vol. 37. JMLR.org 448-456. 50. Huang, et al. “Densely Connected Convolutional Networks.” [1608.06993] Densely Connected Convolutional Networks, 28 Jan. 2018, arxiv.org/abs/1608.06993. 51. Zhang, et al. “Deep Residual Learning for Image Recognition.” [1512.03385] Deep Residual Learning for Image Recognition, 10 Dec. 2015, arxiv.org/abs/1512.03385.

Appendix A. Details on the 5 channel 2D U-Net architectures used for organ localization 5 channel 2D U-Net architecture Layer number

Layer type

Number of features

1

Input

5

2

Conv

8

3

Conv

8

4

Conv

8

5

Max Pooling

8

6

Conv

16

7

Conv

16

8

Conv

16

9

Max Pooling

16

10

Conv

32

11

Conv

32

12

Conv

32

13

Max Pooling

32

14

Conv

64

15

Conv

64

16

Conv

64

17

Conv

128

18

Conv U-Net Upsample Conv

128


64


32

16

29


30

Conv

8

31

Conv

5

19 20 21 22 23 24 25 26 27 28

128 64

64 32

32 16

16 8

Appendix B. Details on the modified 3D U-Net architectures used for organ segmentation

Modified 3D U-Net architecture PROSTATE

BLADDER

RECTUM

FEMORAL HEADS

~250

~120

~200

~130

EPOCHS Layer number

Layer type

1

Input

Number of features 5

2

Conv

3

Conv

4

6

Conv Max Pooling Conv

7

Conv

5

8 9

Conv Max Pooling

n

5

Input


8

Conv

8

Conv

8

16


16

Conv

8

4

Layer type

16 16

Conv Max Pooling

n

4

Input


2

Conv

2

Conv

2

8


8

Conv

2

4

Layer type

8 8

Conv Max Pooling

n

5

Input


2

Conv

8

2

Conv

2

8


8

Conv

2

5

Layer type

8 8

Conv Max Pooling

n

8 3

8 8 16 16

3

16 16

10

Conv

11

Conv

12

14


15

Conv

13

16

Conv

32

Conv

32

64


64

Conv

32

2

128


Conv 3D Upsample

128

Conv

128

Conv

20

Conv

64

21

Conv 3D Upsample

23

Conv

32

24

Conv 3D Upsample

17 17 18 19

22

25


3

32

64 64

-

64

Max Pooling Conv

64

Conv

32

3D Upsample Conv

32

Conv

26

Conv

16

27

Conv 3D Upsample

16

3D Upsample Conv

16

Conv

29

Conv

8

30

Conv

8

31

Conv

5

28

3D Upsample Conv

4

16

Conv

16

Conv

16

32


32

Conv

16

4

32

64


64

Conv

64

Conv

32

4

Conv

16

Conv

16

32


32

Conv

16

5

32 32 64 64

5

64

64 64 3

64 64 128

Conv 3D Upsample

128 128

128

128

64

128

Conv

128

Conv 3D Upsample

128

Conv

32

64

Conv 3D Upsample

64

3D Upsample Conv

64

Conv

128

32

3D Upsample Conv

32

Conv

64

32 16

64

32

32

Conv 3D Upsample

16

32

Conv

8

16

Conv

8

Conv 3D Upsample Conv

16

Conv

5

8

3D Upsample Conv

32 -

2

37

Conv

2

Conv

2

38

Conv

5

Conv

5

8

32

16

36

8

-

Conv


16

64

64

8

35

32

64


34

32

Conv

16

33

32 3

64


32


32

Max Pooling Conv

64

-

5

16

16 8

8 2

16