Neural Network Convolution (NNC) for Converting Ultra-Low-Dose to “Virtual” High-Dose CT Images Kenji Suzuki1(&), Junchi Liu1, Amin Zarshenas1, Toru Higaki2, Wataru Fukumoto2, and Kazuo Awai2 1
Department of Electrical and Computer Engineering & Medical Imaging Research Center, Illinois Institute of Technology, Chicago, IL, USA
[email protected] 2 Department of Diagnostic Radiology, Institute of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
Abstract. To reduce radiation dose in CT, we developed a novel deep-learning technique, neural network convolution (NNC), for converting ultra-low-dose (ULD) to “virtual” high-dose (HD) CT images with less noise or artifact. NNC is a supervised image-based machine-learning (ML) technique consisting of a neural network regression model. Unlike other typical deep learning, NNC can learn thus output desired images, as opposed to class labels. We trained our NNC with ULDCT (0.1 mSv) and corresponding “teaching” HDCT (5.7 mSv) of an anthropomorphic chest phantom. Once trained, our NNC no longer require HDCT, and it provides “virtual” HDCT where noise and artifact are substantially reduced. To test our NNC, we collected ULDCT (0.1 mSv) of 12 patients with 3 different vendor CT scanners. To determine a dose reduction rate of our NNC, we acquired 6 CT scans of the anthropomorphic chest phantom at 6 different radiation doses (0.1–3.0 mSv). Our NNC reduced noise and streak artifacts in ULDCT substantially, while maintaining anatomic structures and pathologies such as vessels and nodules. With our NNC, the image quality of ULDCT (0.1 mSv) images was improved at the level equivalent to 1.1 mSv CT images, which corresponds to 91% dose reduction. Keywords: Deep learning Radiation dose reduction Image quality improvement Image-based machine learning Virtual imaging
1 Introduction CT has been proven to be useful to detect and diagnose various diseases. For example, CT was proved to be effective in screening for lung cancer [1]. Recent studies [2] showed that CT scans might be responsible for up to 2% of cancers in the U.S., and CT scans performed in each year would cause 29,000 new cancer cases in the future due to ionizing radiation exposures to patients. To address this serious issue, researchers and CT venders developed various iterative reconstruction (IR) techniques [3–5] to enable dose reduction by reconstruction of scan raw data. A recent survey study with more © Springer International Publishing AG 2017 Q. Wang et al. (Eds.): MLMI 2017, LNCS 10541, pp. 334–343, 2017. DOI: 10.1007/978-3-319-67389-9_39
Neural Network Convolution (NNC)
335
than 1,000 hospitals in Australia [6], however, revealed that IR reduced radiation dose by 17–44%, which is not sufficient for screening population. Furthermore, full IR is computationally very expensive; for example, GE Veo took 30–45 min. Per one scan on a specialized massively parallel computer of 112 CPU cores [4]. Therefore, it is crucial to develop a technique that can reduce radiation dose in CT substantially in a reasonable time. In this study, we invented and developed a novel machine-learning (ML) approach to radiation dose reduction in CT in 2010 and 2012 [7], respectively. We presented our initial results in abstracts at clinical conferences [8–10]. No ML approach was developed for radiation dose reduction in CT. Some investigators developed radiation dose reduction in CT by means of ML in 2017 [11, 12]. Our technology employed our original deep-learning technique called neural network convolution (NNC), which is extensions of our original ML techniques with image input [13–17] back in 1994. Unlike most other deep-learning models such as AlexNet, LeNet, and most other convolutional neural networks (CNNs) that learn output class labels (e.g., cancer or non-cancer), our NNC learns directly “teaching” desired images thus outputs images, which we believe an extremely important concept. Our NNC’s property of high computational efficiency would solve the problem of the high computational cost with IR techniques. By means of our NNC, we aimed to accomplish substantial dose reduction for screening population in efficient computation for a routine clinical use. We evaluated our dose-reduction technology with anthropomorphic phantom and patient images acquired with different vender CT scanners for its robustness and performance and to determine its dose reduction rate quantitatively.
2 Radiation Dose Reduction Technology Based on NNC 2.1
Basic Principles of Our Radiation Dose Reduction Technology
The schematic diagrams of our proposed radiation dose reduction technology based on NNC are illustrated in Fig. 1. Our technology has a training step and an application step. In the training step, we acquire pairs of lower-dose (LD) CT images (e.g., 0.1 mSv) reconstructed by the filtered back-projection (FBP) algorithm and corresponding “teaching” (or desired) higher-dose (HD) CT images (e.g., 5.7 mSv) reconstructed by FBP of an anthropomorphic chest phantom. Through the training, the NNC learns the relationship between the input LDCT and teaching HDCT images to convert LDCT images with noise and artifacts due to low radiation exposure into HDCT images. Once trained, the NNC no longer requires HDCT images. In the application step, the trained NNC model is applied to unseen new LDCT images of a patient to produce “high-dose-like” CT images where noise and artifacts are substantially reduced while maintaining lesions and anatomic structures such as lung nodules and lung vessels; thus we term “virtual” HD (VHD) CT.
336
K. Suzuki et al.
Fig. 1. Schematic diagrams of our proposed radiation dose reduction technology based on our original deep-learning model, NNC.
2.2
Architecture and Training of NNC
In the field of image processing, Suzuki et al. developed supervised nonlinear filters and edge enhancers based on a neural network (NN), called neural filters [15, 18] and neural edge enhancers [17, 19], respectively. In the field of computer-aided diagnosis (CAD), Suzuki et al. invented a massive-training artificial NN (MTANN) by extending neural filters and edge enhancers to accommodate pattern-recognition and classification tasks [16, 20–23]. In this study, we extended MTANNs and developed a general framework for supervised image processing, which we call it a machine-learning convolution. An NNC consists of a linear-output-layer neural network (NN) regression (LNNR) model [17] that is capable of deep layers. The NNC can be considered as a supervised nonlinear filter that can be trained with input images and the corresponding “teaching” images. The LNNR model, which is capable of operating on image data directly, employs a linear function instead of a sigmoid function in the output layer because the characteristics of an NN were improved significantly when applied to the continuous mapping of values in image processing [17]. The NNC model consists of an input layer, a convolutional layer, multiple fully-connected hidden layers, and an output layer. The input layer of the LNNR receives the pixel values in a 2D subregion (or kernel, image patch), R, extracted from input ULDCT images. The output O(x,y) of
Neural Network Convolution (NNC)
337
the NNC is a continuous value, which corresponds to the center pixel in the subregion (or image patch), represented by b g ðx; yÞ ¼ NN ff ðx i; y jÞjði; jÞ 2 Rg ;
ð1Þ
where NN() is the output of the LNNR model, and f(x,y) is a pixel value of an input CT image. Note that only one unit is employed in the output layer. Other ML model such as support vector regression and nonlinear Gaussian process regression [22] can be used instead of the LNNR model, which forms a ML convolution. The entire CT images are obtained by scanning the LNNR model in a convolutional manner, thus term NNC. For training to convert input LDCT images with noise and artifacts into desired HDCT images, we define the error function to be minimized, represented by E¼
1 X fgðx; yÞ ^ gðx; yÞg2 ; P ðx;yÞ2R
ð2Þ
T
where RT is a training region. The NNC is trained by a linear-output-layer back-propagation (BP) algorithm [17], which was derived for the LNNR model in the same way as for the BP algorithm [24]. After training with this training algorithm, the NNC is expected to output the values close to desired pixel values in the teaching images. Thus, the trained NNC would output high-dose-like CT images with less noise or artifacts when new LDCT images are entered. 2.3
Anatomy-Specific NNC with Soft Gating Layers
Noise and artifact properties are different in different anatomies in reconstructed CT images. Although a single NNC can reduce noise and artifacts in the entire CT images, it may not reduce some noise or artifacts in specific anatomic segments sufficiently, because the capability of a single NNC in suppressing a wide variety of patterns is limited. To improve the overall performance, we extended the capability of a single NNC and developed an anatomy-specific multiple-NNC scheme that consisted of multiple NNC models together with gating layers, as shown in Fig. 2. In a training step, gating layers control which NNC among multiple NNC is used for training of which anatomic segments. Each anatomy-specific NNC, denoted as NNCs, is trained independently with training samples extracted from the corresponding anatomic segment by a pair of gating layers. After training, each trained anatomy-specific NNC becomes an expert for the corresponding anatomic segment. In an application step, three anatomic segments were segmented automatically by using thresholding followed by dilation and erosion operations in mathematical morphology. Gating layers control applications of the trained anatomy-specific NNC to corresponding anatomic segments. The gating layers compose the outputs from multiple anatomy-specific NNC into an entire output CT image by using the segmented anatomic segments. To avoid unnatural sudden changes near the boundaries between anatomic segments, “soft” gating layers, as opposed to “hard” gating layers, blend the outputs from anatomy-specific NNCs near the boundaries by using a weighting
338
K. Suzuki et al.
Fig. 2. The architecture of an anatomy-specific multiple NNC scheme that consists of multiple NNC arranged in parallel together with gating layers.
Gaussian function fws ðx; yÞ. The entire output image is composed of three output anatomic segments of three trained anatomy-specific NNC, represented by ^gðx; yÞ ¼
3 X
^gs ðx; yÞ fws ðx; yÞ ;
ð3Þ
s¼1
where ^gs ðx; yÞ is the output pixel of the s-th trained anatomy-specific NNC. 2.4
Experiments to Validate and Test Our “Virtual” HD Technology
2.4.1 Training with an Anthropomorphic Chest Phantom To train our NNC, we acquired CT scans of an anthropomorphic chest phantom (Lungman, Kyoto Kagaku, Kyoto, Japan) without motion at LD (0.1 mSv) and HD (5.7 mSv) levels with a CT scanner (LightSpeed VCT with 64 slices, GE, Milwaukee, WI) at Hiroshima University Hospital, Hiroshima, Japan. We changed radiation doses by changing x-ray tube current, while fixing an x-ray tube voltage at 120 kVp. Tube current, tube current time product, and CTDIvol for the LDCT and HDCT were 4 and 570 mA, 4 and 230 mAs, and 0.24 and 13.57 mGy, respectively. CT slices were reconstructed by using FBP reconstruction with the “lung” kernel. Reconstructed CT size was 512 512 pixels, and reconstructed slice thickness was 5 mm. The 5 mm slice thickness was chosen by following the Japanese lung cancer screening guideline. We trained NNC with pairs of the ULDCT (0.1 mSv) slices and corresponding “teaching” (desired) HDCT (5.7 mSv) slices. 2.4.2 Phantom Validation To validate our radiation dose reduction technology based on our NNC, we acquired CT scans of the anthropomorphic chest phantom at five more different dose levels, namely, 0.25, 0.5, 1.0, 1.5, and 3.0 mSv. Tube current; tube current time product; and
Neural Network Convolution (NNC)
339
CTDIvol for the CT scans were 25, 50, 100, 150, and 300 mA; 10, 20, 40, 60, and 120 mAs; and 0.60, 1.19, 2.38, 3.57, and 7.14 mGy, respectively. We applied the trained anatomy-specific NNC scheme to reconstructed CT slices. To evaluate the performance of our virtual HD technology, we used the structural similarity (SSIM) index [25], which overcame the limitation of conventional contrast-to-noise-ratio (CNR) with lack of spatial information (e.g., structure) in evaluation, for measuring the image quality of CT. We used the highest dose CT scan (5.7 mSv) as the “gold standard” in the calculation of the SSIM. We used a two-fold cross validation. 2.4.3 Clinical Case Evaluation To evaluate our technology, we acquired ULD scans of 12 patients with three different vender CT scanners for a robustness test. Tube current, tube current time product, CTDIvol, and effective dose for the ULDCT were 10–20 mA, 6.0 ± 3.5 mAs, 0.37 ± 0.22 mGy, and 0.14 ± 0.08 mSv, respectively. X-ray tube voltages were 120– 135 kVp. Reconstructed slice thickness was 5 mm. We applied the trained anatomy-specific NNC scheme to the ULDCT studies. We used CNR to evaluate the image quality quantitatively. We were not able to use the SSIM for the quantitative evaluation, because the SSIM required ideal images in the calculation.
3 Results 3.1
Phantom Experiments
It took 46.7 h for the training of each NNC on a PC (Intel i7-4790K CPU, 4.5 GHz) or 1.73 h on a GPU (GeForce GTX TITAN Z, Nvidia, CA). In our VHD image, heavy noise and streak and other artifacts in the input ULDCT image (0.1 mSv) is reduced substantially, while maintaining anatomic structures such as pulmonary vessels. We examined the relationship between the image quality in terms of SSIM and effective dose. An average SSIM of our VHD image of 0.94 was equivalent to the image quality of 1.1 mSv CT images, which corresponds to 91% dose reduction, as shown in Fig. 3.
Fig. 3. Estimation for equivalent effective dose from SSIM for our virtual HDCT of CT scans of an anthropomorphic chest phantom.
340
3.2
K. Suzuki et al.
Clinical Case Evaluation
To evaluate the performance of our VHD technology, we applied our GE-scanner-, phantom-trained NNC to 12 patient cases. Figure 4 shows comparisons of our VHDCT image for one of 12 patients with a lung nodule (enlarged to show details) acquired with the Toshiba CT scanner with the corresponding “gold-standard” real HD CT image. Noise and artifacts in the input ULDCT image is substantially reduced in our VHDCT image, while the conspicuity of the lung nodule and anatomic structures such as lung vessels is improved. We compared our VHD image with the state-of-the-art IR product (Toshiba AIDR-3D: strong, Toshiba, Tokyo, Japan) and ones of the recent best known denoising algorithms, K-SVD [26] and BM3D [27]. The image quality of our VHCT image is superior to the IR image and the BM3D-denoised image (the result by K-SVD is not shown due to much inferior performance) in terms of the conspicuity of the nodule and artifacts, as shown in Fig. 4. In the IR image, the nodule appears of lower-contrast with fuzzy boundary in the background with emphysema-like artifacts. The fuzzy chest wall is another issue with the IR image. In the BM3D-denoised image, diagnostic information (e.g., nodule and vessels) disappears. The processing time for each CT scan was 70.9 s. on a PC (Intel i7-4790 K at 4.5 GHz). With our preliminary VHD technology, an average CNR of 0.1 mSv ULDCT images of the patients was improved from 6.1 ± 2.1 to 13.5 ± 1.9 (two-tailed t-test; P