Enhanced Detection of Referable Diabetic Retinopathy via DCNNs and Transfer Learning Michelle Yuen Ting Yip1,2 , Zhan Wei Lim4 , Gilbert Lim4 , Nguyen Duc Quang2 , Haslina Hamzah3 , Jinyi Ho3 , Valentina Bellemo2 , Yuchen Xie2 , Xin Qi Lee2 , Mong Li Lee4 , Wynne Hsu4 , Tien Yin Wong1,2,3 , and Daniel Shu Wei Ting1,2,3 1
Duke-NUS Medical School, National University of Singapore, Singapore 2 Singapore Eye Research Institute, Singapore 3 Singapore National Eye Centre, Singapore 4 National University of Singapore, School of Computing, Singapore
[email protected] [email protected]
Abstract. A clinically acceptable deep learning system (DLS) has been developed for the detection of diabetic retinopathy by the Singapore Eye Research Institute. For its utility in a national screening programme, further enhancement was needed. With newer deep convolutional neural networks (DCNNs) being introduced and technological methodology such as transfer learning gaining recognition for better performance, this paper compared the performance of the DCNN used in the original DLS, VGGNet, with newer DCNNs, ResNet and Ensemble, with transfer learning. The DLS performance improved with higher AUC, sensitivity and specificity with the adoption of the newer DCNNs and transfer learning. Keywords: Deep learning · Convolutional Neural Networks · Diabetic Retinopathy.
1 1.1
Introduction Diabetic Retinopathy: a Global Problem
It is projected that by 2040, 600 million people will suffer from diabetes in the world [16]. Among these, one third are estimated to have one form of diabetic retinopathy (DR) although 80% of these individuals may be unaware of their condition [22]. Diagnosis is often late, with one-third of patients with DR presenting with vision-threatening DR. Early detection, referral and treatment will reduce the risk of blindness. Screening for DR will benefit from an automated system in terms of efficiency and reproducibility [8]. 1.2
Prior Work of AI in Medicine
The medical arena has in recent years gradually welcomed the advantages of artificial intelligence (AI) systems [25]. One such application is in medical imaging,
2
M.Y.T. Yip et al.
with specialities ranging from dermatology to radiology investing in this area of research [5, 15, 26]. In particular, ophthalmology has been at the forefront of innovations, with far-reaching implications in other systemic diseases beyond the eye [24]. Automated systems in detection of a range of ophthalmological pathologies have been developed and reported to have good performance with the adoption of DLSs, comparable to human assessment [23]. Detection of age-related macular degeneration (AMD) with a DLS was evaluated by Burlina and colleagues with a high AUC, ranging between 0.94 and 0.96, and accuracy between 88.4% and 91.6% [3]. Grassmann et al. further explored the use of multiple DCNNs in detection of AMD with similarly good results [7]. A DLS identifying glaucoma has also been attempted. Though various methods have been explored by multiple groups, Li et al. was successful at generating acceptable results with AUC, sensitivity and specificity for glaucoma detection exceeding 0.9. Retinopathy of prematurity is another area that has been targeted by groups such as Brown et al. [2] and Worrall et al. [27] in the effort to circumvent childhood blindness. However, DR remains to be the main interest of many research groups due to its public health significance and high prevalence. Abr`amoff et al. developed and evaluated a DLS for diagnosis of DR and was successful in obtaining approval from the US Food and Drug Administration (FDA) [1] . A group at Google AI Healthcare built upon this work with the same publicly available datasets to evaluate the Inception-v3 network which demonstrated consistency with a panel of board-certified ophthalmologists [8]. Although majority of the literature focuses on fundus photographs, other ophthalmology imaging modalities such as Optical Coherence Tomography (OCT) [19], OCT Angiography [17], slit lamp images [6] have also been capturing interest for the development of DLSs.
1.3
Preliminary Study
In 2017, Ting et al. published a DLS developed and tested using real-world DR screening cohorts with close to 500,000 retinal images, showing clinically acceptable performance (AUC, sensitivity and specificity of more than 90%) for simultaneous detection of common ophthalmological diseases, namely AMD, glaucoma and diabetic retinopathy [21]. This DLS system by Ting et al. had been developed using the DCNN VGGNet as the neural architecture and serves as a preliminary study. Since the release of VGGNet, newer DCNNs with more layers have been published which show better feature extractions and yield better results. ResNet in particular has been garnering attention for its accuracy secondary to its extreme depth, whilst still being easy to train [12]. An ensemble of the outputs of both ResNet and VGGNet has been found to perform better with higher accuracy by reducing false negative results [9, 11]. In addition, transfer learning has been shown to improve performances of DLS and has been especially useful in medical image classification [13, 18].
Enhancing the Detection of Diabetic Retinopathy
1.4
3
Study Aim
The primary aim of this study is to enhance this DLS with the use of newer DCNNs, with transfer learning, to improve its diagnostic performance in detecting diabetic retinopathy.
2 2.1
Methods Training Dataset and Testing Dataset
Both the training and testing datasets were obtained from Singapore National Integrated Diabetic Retinopathy Screening Programme (SiDRP). The SiDRP was established from 2010, screening half of the diabetic population in Singapore by 2015. SiDRP uses digital retinal photography, a tele-ophthalmology platform for assessment of DR by a team of trained professional graders (> 5 years of experience). For this study, 2 graders analysed each image and for discordant findings, a retinal specialist (PhD-trained with > 5 years of experience in conducting DR assessment) would generate the final grading. The training dataset was taken from the SiDRP between 2010 and 2013. Once trained, the DCNNs were subsequently tested on images obtained from SiDRP between 2014 and 2015. The International Clinical Diabetic Retinopathy Severity Scale (ICDRS) was utilised for DR classification. For the purpose of this paper, referable DR is defined as moderate non-proliferative DR or worse, including diabetic macular edema and ungradable images. Vision-threatening DR is defined as severe nonproliferative DR or proliferative DR. 2.2
Different Convolutional Neural Networks of the DLS
This study explored three different DCNNs, namely VGGNet, ResNet and Ensemble. (a) VGGNet VGGNet is a 16-layered network, designed by the Visual Geometry Group in Oxford in 2014 and won second place for image classification in the infamous ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [20]. This competition evaluates many DCNNs on their speed and accuracy in object detection and image classification. An adapted VGGNet was the initial choice for the previous study by Ting et al. as it had been demonstrated to have excellent performance on the classification of retinal images [21]. VGGNet was trained on Caffe framework where no layers were frozen. (b) ResNet ResNet-50 was introduced in 2015 by the Microsoft group, comprising of 50 layers and won first place in ILSVRC 2015 [12]. It has been popular for its ability to increase the depth of the network architecture and thus increasing accuracy, whilst still maintaining ease for training and optimisation. This is because it employs ‘skip’ residual connections that perform identity mappings. For this study,
4
M.Y.T. Yip et al.
PyTorch was the framework for ResNet. Due to close association with the 1,000 classes in ImageNet, the last linear layer of the pre-trained model was discarded and replaced for purposes of DR classification. (c) Ensemble Ensemble is a DCNN made up of a combination of the 2 networks, VGGNet and ResNet, probability output scores, where performance is expected to match or exceed single DCNNs [10]. 2.3
Transfer Learning
Using transfer learning, ResNet was pre-trained with the ImageNet database for weight initialisation and general-purpose features. ImageNet is one of the largest annotated databases. This comprehensive annotated dataset consists of more than 1.2 million natural images over 1,000 categories, the largest image dataset currently available. It has been accredited to advance image recognition in the deep learning field and has formed the basis of the benchmark to evaluate new DCNNs for image recognition in the annual ILSVRC [13]. 2.4
Statistical Analysis
The primary outcome measures include area under the curve (AUC) of the receiver operating characteristic curve, specificity and sensitivity of the DLSs, developed using ResNet and Ensemble. The operating thresholds for the DLSs were selected to enable comparisons with the human graders’ past performances (sensitivity of 90%) and criteria set forth by Singapore’s Ministry of Health. The reference standard was set as the grading finalised by professional graders and retinal specialists.
3
Results
Details of the training and validation datasets used for the development and testing of the DLS in detection of referable DR and vision-threatening DR are outlined in Table 1. 148,266 images were used to train and validate the three different DLSs. 76,370 images (38,185 eyes) from SiDRP data from 2010 to 2013 were used in the training dataset and a separate, similar sized dataset of 71,896 images (35,948 eyes) from SiDRP 2014 to 2015 were allocated to evaluate the DLS. In the former, 8.4% (n=3,192) were referable eyes and 1.4% eyes (n=548) displayed vision-threatening DR in accordance with the finalized reference standard grading. Similar proportions are reflected in the validation dataset, encompassing 3.8% eyes (n=1,373) with referable DR and 0.5% eyes (n=194) with vision-threatening DR. The remaining balance were eyes that were non-referable. The performance of the DLSs in detection of referable DR and vision-threatening DR are shown in Table 2.
Enhancing the Detection of Diabetic Retinopathy
5
Table 1. Training and Validation Datasets of Diabetic Retinopathy used to evaluate the DLSs.
Dataset
No. Patients
Training Dataset (SiDRP 2010-13) 13,099 Validation Dataset (SiDRP 2014-15) 14,880
Images
Eyes
No. of eyes NonReferable Vision-Threatening referable DR DR
76,370
38,185
34,993
3,192
548
71,896
35,948
34,575
1,373
194
Table 2. Results of DLS performance of three different DCNNs (VGGNet, ResNet and Ensemble). VGGNet ResNet Ensemble Referable DR - AUC 0.936 - Sensitivity 90.5% - Specivity 91.6% Vision-Threatening DR - AUC 0.958 - Sensitivity 100% - Specificity 91.1%
0.969 91.7% 93.1%
0.970 92.2% 92.5%
0.994 96.2% 98.5%
0.987 96.2% 98.9%
6
4
M.Y.T. Yip et al.
Conclusion
Better performance of the DLS was reported with the use of newer DCNNs (ResNet and Ensemble) and transfer learning, with higher AUC, sensitivity and specificity for detection of both referable DR as well as vision-threatening DR. With the rapidly evolving field of deep learning and AI, newer neural architectures and techniques advancing the performance of DCNNs are continuously being developed. For image recognition and classification, the aforementioned annual ILSVRC, and the subsequent succeeding Kaggle, serves as a platform to showcase the abilities of these novel developments such as DenseNet, Inceptionv4, Dual Path Networks (DPN). DenseNet has been regarded by some as an extension of ResNet by allowing deeper layers to be feasible through the use of connections between layers in a feed-forward fashion [14]. The new DPN was designed to encompass the advantages of both ResNet and DenseNet and has shown success at the ILSVRC competition in 2017 [4]. There will undoubtedly be continued growth of the deep learning field and future work for medical image analysis will be warranted to keep pace. However, the rate at which these new techniques are translating to real-world applications, especially in the medical sector, often fails to catch up. Thus, further research is essential to evaluate the use of these models in prospective clinical trials.
References 1. Abr` amoff, M.D., Lou, Y., Erginay, A., Clarida, W., Amelon, R., Folk, J.C., Niemeijer, M.: Improved automated detection of diabetic retinopathy on a publicly available dataset through integration of deep learning. Investigative ophthalmology & visual science 57(13), 5200–5206 (2016) 2. Brown, J.M., Campbell, J.P., Beers, A., Chang, K., Ostmo, S., Chan, R.P., Dy, J., Erdogmus, D., Ioannidis, S., Kalpathy-Cramer, J., et al.: Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks. JAMA ophthalmology (2018) 3. Burlina, P.M., Joshi, N., Pekala, M., Pacheco, K.D., Freund, D.E., Bressler, N.M.: Automated grading of age-related macular degeneration from color fundus images using deep convolutional neural networks. JAMA ophthalmology 135(11), 1170– 1176 (2017) 4. Chen, Y., Li, J., Xiao, H., Jin, X., Yan, S., Feng, J.: Dual path networks. In: Advances in Neural Information Processing Systems. pp. 4467–4475 (2017) 5. Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., Blau, H.M., Thrun, S.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639), 115 (2017) 6. Gao, X., Lin, S., Wong, T.Y.: Automatic feature learning to grade nuclear cataracts based on deep learning. IEEE Transactions on Biomedical Engineering 62(11), 2693–2701 (2015) 7. Grassmann, F., Mengelkamp, J., Brandl, C., Harsch, S., Zimmermann, M.E., Linkohr, B., Peters, A., Heid, I.M., Palm, C., Weber, B.H.: A deep learning algorithm for prediction of age-related eye disease study severity scale for age-related macular degeneration from color fundus photography. Ophthalmology (2018)
Enhancing the Detection of Diabetic Retinopathy
7
8. Gulshan, V., Peng, L., Coram, M., Stumpe, M.C., Wu, D., Narayanaswamy, A., Venugopalan, S., Widner, K., Madams, T., Cuadros, J., et al.: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama 316(22), 2402–2410 (2016) 9. Han, S.S., Park, G.H., Lim, W., Kim, M.S., Im Na, J., Park, I., Chang, S.E.: Deep neural networks show an equivalent and often superior performance to dermatologists in onychomycosis diagnosis: Automatic construction of onychomycosis datasets by region-based convolutional deep neural network. PloS one 13(1), e0191493 (2018) 10. Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE transactions on pattern analysis and machine intelligence 12(10), 993–1001 (1990) 11. Harangi, B.: Skin lesion detection based on an ensemble of deep convolutional neural network. arXiv preprint arXiv:1705.03360 (2017) 12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016) 13. Hoo-Chang, S., Roth, H.R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D., Summers, R.M.: Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning. IEEE transactions on medical imaging 35(5), 1285 (2016) 14. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR. vol. 1, p. 3 (2017) 15. Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., van der Laak, J.A., Van Ginneken, B., S´ anchez, C.I.: A survey on deep learning in medical image analysis. Medical image analysis 42, 60–88 (2017) 16. Moss, S.E., Klein, R., Klein, B.E.: The 14-year incidence of visual loss in a diabetic population1. Ophthalmology 105(6), 998–1003 (1998) 17. Prentaˇsi´c, P., Heisler, M., Mammo, Z., Lee, S., Merkur, A., Navajas, E., Beg, M.F., ˇ Sarunic, M., Lonˇcari´c, S.: Segmentation of the foveal microvasculature using deep learning networks. Journal of biomedical optics 21(7), 075008 (2016) 18. Rampasek, L., Goldenberg, A.: Learning from everyday images enables expert-like diagnosis of retinal diseases. Cell 172(5), 893–895 (2018) 19. Schlegl, T., Waldstein, S.M., Vogl, W.D., Schmidt-Erfurth, U., Langs, G.: Predicting semantic descriptions from medical images with convolutional neural networks. In: International Conference on Information Processing in Medical Imaging. pp. 437–448. Springer (2015) 20. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 21. Ting, D.S.W., Cheung, C.Y.L., Lim, G., Tan, G.S.W., Quang, N.D., Gan, A., Hamzah, H., Garcia-Franco, R., San Yeo, I.Y., Lee, S.Y., et al.: Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. Jama 318(22), 2211–2223 (2017) 22. Ting, D.S.W., Cheung, G.C.M., Wong, T.Y.: Diabetic retinopathy: global prevalence, major risk factors, screening practices and public health challenges: a review. Clinical & experimental ophthalmology 44(4), 260–277 (2016) 23. Ting, D.S.W., Pasquale, L.R., Peng, L., Campbell, J.P., Lee, A.Y., Raman, R., Tan, G.S.W., Schmetterer, L., Keane, P.A., Wong, T.Y.: Artificial intelligence and deep learning in ophthalmology. British Journal of Ophthalmology pp. bjophthalmol– 2018 (2018)
8
M.Y.T. Yip et al.
24. Ting, D.S.W., Wong, T.Y.: Eyeing cardiovascular risk factors. Nature Biomedical Engineering 2(3), 140 (2018) 25. Ting, D.S., Liu, Y., Burlina, P., Xu, X., Bressler, N.M., Wong, T.Y.: Ai for medical imaging goes deep. Nature medicine 24(5), 539 (2018) 26. Ting, D.S., Yi, P.H., Hui, F.: Clinical applicability of deep learning system in detecting tuberculosis with chest radiography. Radiology 286(2), 729 (2018) 27. Worrall, D.E., Wilson, C.M., Brostow, G.J.: Automated retinopathy of prematurity case detection with convolutional neural networks. In: Deep Learning and Data Labeling for Medical Applications, pp. 68–76. Springer (2016)