Retinal Vessel Detection Using Deep Learning: A

0 downloads 0 Views 1MB Size Report
key words: Machine learning, Deep learning, Retinal vessel detection .... segmentation was implemented using the Keras ... Similarly, during testing, U-Net.
Korean Journal of Vision Science 20(2): 151~159, June 2018 https://doi.org/10.17337/JMBI.2018.20.2.151

ISSN 1229-6457(Print) ISSN 2466-040X(Online)

Retinal Vessel Detection Using Deep Learning: A novel DirectNet Architecture Hyeongsuk Ryu⋅Hyeongjun Moon⋅Björn Browatzki⋅Christian Wallraven* Dept of. Brain and Cognitive Engineering, Korea University ԰՚խիխձվխլԨՕթցԨԹԸԴԨԺԸԹՀՂԨ՚խվձջխլԨՒսնխԨԹՁԴԨԺԸԹՀՂԨՉիիխոռխլԨՒսնխԨԹՁԴԨԺԸԹՀԱ

Abstract Purpose : The aim of this study is to develop a novel deep learning system for vessel segmentation of retinal images. We present a recurrent Convolutional Neural Network (CNN) architecture and compare performance with existing CNN approaches, showing greatly reduced processing time with excellent performance. Methods : The proposed DirectNet architecture is composed of blocks, with each block containing a collection of convolutional layers. Blocks are stacked up in a pyramid, such that the number of blocks is increased by one at each level. Data are repeatedly processed by each block and combined with outputs of other blocks. This recurrent structure combined with the use of large kernel avoids the need for up- or downsampling layers, thus creating a direct pixel-to-pixel mapping from pixel inputs to the outputs of segmentation. Results : DirectNet provides higher accuracy, sensitivity, specificity, and precision values compared to a state-of-the-art, patch-based CNN approach (0.9538 vs 0.9327, 0.7851 vs 0.7346, 0.9782 vs 0.9730, 0.8458 vs 0.7987). Training time on a standard dataset for DirectNet is reduced from 8 hours to 1 hour, and testing time per image is greatly reduced from 1 hour for the patch-based method to 6 seconds for our method. Conclusion : The proposed deep-learning architecture is eight times faster for training and 600 times faster for testing at slightly higher accuracy values than a state-of-the-art method. Segmentation successfully highlights retinal blood vessels of large down to small sizes. key words: Machine learning, Deep learning, Retinal vessel detection

Address reprint request to Christian Wallraven Korea University, Anam-Dong, Sungbuk-Ku, Seoul 136-701, Korea TEL: +82-2-3290-5925, E-mail: [email protected] Copyright ⓒ 2018 by Korean Society of Vision Science All rights reserved.

딥러닝을 이용한 망막혈관 추출: 새로운 DirectNet 구조

Ⅰ. Introduction

are a class of algorithms that simulates the way a human brain processes patterns, being able to

Vessel analysis on retinal pictures can help to

robustly learn categories from large amounts of

1)

identify health-related problems at early stages.

data. Convolutional Neural Networks (CNNs), in

More specifically, visual impairments such as diabetic

particular, are a special class of ANNs that have

retinopathy or age-related macular degeneration

received a lot of attention for their performance

(AMD) can be deduced from retinopathy analysis

in image interpretation tasks, Fu et al. in 2016

of the macular vessels as it may show occlusion

developed such a model for the task of vessel

2, 3)

or hemorrhage of the vessels.

Even though this

method is generally reliable, finding occlusions of

segmentation, showing impressive segmentation accuracy.12)

blood vessels in patients with diabetes or hypertension

Current CNNs architectures usually operate in

from fundus photographs is difficult – similarly,

a patch-based fashion: small patches are extracted

there are limitations in confirming the morphology

from the image and each pixel in each patch is

of micro-vessels such as neovascularization as

classified whether it belongs to a blood vessel or

compared with general blood vessels. Accurate

not. Since the patch size needs to be small in order

analysis of retinal vessel structure with the aim

for robust learning, this means that such an

of early diagnosis of retinal disease is therefore

architecture takes a long time both during training,

an important area of research with a wide range

and more importantly, also during testing. In the

4)

present paper, we try to overcome this problem

of applications in practice.

One of the first automatic processing algorithms was developed in 2002 by Walter et al. who

and develop a novel, deep-learning-based architecture for vessel segmentation called DirectNet.

implemented an algorithm that collected common features from patients with diabetic retinopathy to extract exudates in retinal images. The algorithm

Ⅱ. Methods

extracted morphological characteristics of the exudate image that are common in diabetic retinopathy.5)

DirectNet is a fully convolutional neural network

Nearest-neighbor-classifiers were then developed

that approaches retinal image segmentation as an

to distinguish the blood vessels from the retinal

image-to-image translation task. In a traditional

image, which further improved the accuracy by

feedforward CNN, data flows through the network

subdividing and analyzing the features of the blood

continuously in one direction from the top to the

6-8)

vessel image.

bottom layer. In contrast, we propose the use of

For the purpose of vessel segmentation, Wang

recurrent structures to build a compact, yet

et al. implemented computer vision techniques in

sufficiently complex model. An architecture that

9)

2000. These showed reasonably high accuracy on

allows for very fast analysis, while maintaining

test retinal images; however reliable extraction of

accuracy is developed in this study.

branch vessels such as microvascular or neovascular

Our network consists of a pyramid-shaped stack

vessels required more advanced algorithm architectures

of recurring blocks of convolutional layers as

that were also able to distinguish between normal

depicted in Figure 1b. Data flows through the

10,11)

Recently, Artificial

network in 4 stages, being processed repeatedly

Neural Networks (ANNs) have experienced a

by 4 distinct blocks. Each block consists of a set

renaissance in image processing and analysis – ANNs

of convolutional layers. At stage one, the input

vessels and abnormal vessels.

152

Korean J. Vis. Sci. Vol. 20, No. 2

유형석⋅문형준⋅Björn Browatzki⋅Christian Wallraven

(a) Convolutional Neural Network for vessel segmentation

(b) Proposed DirectNet for vessel segmentation. Blocks depicted in the same color denote identical network components. Each block consists of a collection of layers.

Fig 1. Comparison of the standard, patch-based CNNs approach (a) with our proposed DirectNet architecture (b).

image is processed by block 1 (depicted as red block

it does not contain any pooling layers. Instead,

in Figure 1b). Outputs are then passed on to block

the required receptive field size is achieved by

2 (green) but also fed back into block 1. At the

employing larger kernels sizes (5×5, 7×7, 15×15)

next stage, outputs of block 1 are again given directly

combined with the aforementioned recurrent

to block 1 and outputs of block 2 are given to block

structure of the network. This structure allows the

3 (blue). Outputs of block 1 and 2 are then merged

storage and propagation of information across the

and then passed to block 2. This process continues

image. The increase in computational complexity,

in the same fashion through stages 3 and 4. Finally,

introduced by larger kernels can be mitigated by

the results of all individual blocks on stage 4 are

the use of depthwise separable convolutions as

combined into a joint prediction. The final output

proposed by Chollet15). This method can reduce

is a vessel probability map of the same size as the

computation time by more than 30% compared to

input image.

general convolutions by separating spatial convolution

Most current CNN models use the combination

from the convolution across image channels.

of small 3×3 convolutional kernels and pooling layers to reduce model parameters14). Since the DirectNet architecture presented here relies on a rather shallow network design with fewer layers,

대한시과학회지 : 제 20 권 제 2 호 2018

153

딥러닝을 이용한 망막혈관 추출: 새로운 DirectNet 구조

Ⅲ. Results

experiments were run on the Intel Core i7 processor with 16Gb RAM and a GeForce GTX 1080Ti graphics

The proposed DirectNet model for retinal vessel

card.

segmentation was implemented using the Keras

Performance of the vessel segmentation can be

library with a Tensorflow backend and evaluated

described with several metrics including the

on the DRIVE dataset, which is the most commonly-

F1-score, accuracy, sensitivity, specificity, precision,

6)

used dataset for vessel segmentation . The testing

and the area under the ROC curve. A definition

procedure followed the common methodology of

of these metrics is given in (1).

selecting the annotations of the first human observer as ground truth (annotations of the second observer are usually only used to study human performance). The DRIVE dataset contains 40 fundus images, split into 20 images for training and 20 images for testing. All images were cropped to an input size of 584 × 565 pixels. To compare the DirectNet architecture to a

      ,             ,    ,       

(1)

Pr   ,    Pr ×  Pr × 

    × 

standard, state-of-the-art method, another patchbased CNN was trained on DRIVE to serve as a baseline. For this a publicly available implementation* based on the U-Net architecture13) was used. In the DirectNet architecture, the total number of parameters was 273,668 (see Table 1), whereas the U-Net implementation had 517,666 parameters. All

, where, TP, TN, FP, and FN stand for true positive, true negative, false positive, and false negative classification, respectively. The area under the ROC curve was calculated using the standard implementation provided in the Python library. Segmentation results in form of probability maps

Fig 2. ROC curves of DirectNet (blue) and patch-based CNN (red). * https://github.com/orobix/retina-unet

154

Korean J. Vis. Sci. Vol. 20, No. 2

유형석⋅문형준⋅Björn Browatzki⋅Christian Wallraven

Table 1. DirectNet architecture Block

Block 1

Block 2 Bl. 1 + 2 Block 3 Bl. 1+2+3 Block 4 Block 1+2+3+4 Final Block

Layer Input Layer Conv 2D Conv 2D Conv 2D (res) Separable Conv 2D Separable Conv 2D(+res) Conv 2D (res) Separable Conv 2D Separable Conv 2D Add Conv 2D (res) Separable Conv 2D Separable Conv 2D (+res) Add Conv 2D (res) Separable Conv 2D Separable Conv 2D (+res) Add Separable Conv 2D Separable Conv 2D Total

Batch norm.&Activation BatchNorm & ReLU BatchNorm & ReLU BatchNorm BatchNorm & ReLU BatchNorm & ReLU BatchNorm BatchNorm & ReLU BatchNorm & ReLU ReLU BatchNorm BatchNorm & ReLU BatchNorm & ReLU ReLU BatchNorm BatchNorm & ReLU BatchNorm & ReLU ReLU BatchNorm & ReLU BatchNorm & Sigmoid

Kernel 3 × 7 × 1 × 15 × 15 × 1 × 15 × 15 × 1 × 15 × 15 × 1 × 15 × 15 × 7 × 5 ×

Size 3 7 1 15 15 1 15 15 1 15 15 1 15 15 7 5

Num. of Parameter 864 + 128(BN) 100352 + 256(BN) 4096 + 256(BN) 18496 + 256(BN) 18496 + 256(BN) 4096 + 256(BN) 18496 + 256(BN) 18496 + 256(BN) 4096 + 256(BN) 18496 + 256(BN) 18496 + 256(BN) 4096 + 256(BN) 18496 + 256(BN) 18496 + 256(BN) 4160 + 64(BN) 416 + 4(BN) 273,668

Table 2. Performance comparison of patch-based CNN, Liskowskiet al17), and our method Fundus Images Fundus Training Patches F1 score Accuracy Sensitivity Specificity Precision AUC ROC curve Jaccard similarity score Training time Test time per image

Patch-based CNN 20 400,000 (20×20,000) 0.7653 0.9427 0.7346 0.9730 0.7987 0.9640 0.9426 8h 1h

Liskowski et al. 17) 20 400,000 (20×20,000) 0.9535 0.7811 0.9807 0.9790 8h 92sec

DirectNet 20 20 0.8124 0.9538 0.7851 0.9782 0.8458 0.9733 0.9490 1h 6sec

were converted to binary images by a fixed threshold

the use of multiple dataset splits, such that

of 0.5. Since threshold values may change the

statistical tests or Bland-Altman plots cannot be

results, the automatic Otsu threshold selection

run). This was mostly driven by an increase in

16)

method was also tried , but did not obtain better

sensitivity, meaning the algorithm’s ability to

results.

accurately detect a pixel of a vessel. Additionally,

As shown in Table 2, all parameters obtained

DirectNet showed a significant speed-up compared

from DirectNet were relatively higher than the

to the U-Net architecture: training of the

patch-based CNN method. In particular, the F1

patch-based CNN took 8 hours, whereas DirectNet

score for predictive evaluation of 0.8124 compared

took only 1 hour. Similarly, during testing, U-Net

to 0.7653 showed a strong increase in performance

took 1 hour to process all patches of one retinal

(note, that the format of the training and testing

image, whereas DirectNet finished the same task

paradigm, which is used in the field so far prohibits

in 6 seconds – a strong speed-up.

대한시과학회지 : 제 20 권 제 2 호 2018

155

딥러닝을 이용한 망막혈관 추출: 새로운 DirectNet 구조

Figure 3 shows the qualitative evaluation of

recent deep-learning method suggested by Liskowski

segmentation results on the first 4 images in

et al.17). This method required 8 hours of training

DRIVE. The original retinal photos are shown in

time on 400,000 sample patches extracted from

the first column. The second and third columns

the 20 training images in DRIVE. Both methods

show human annotations and the developed

achieved comparable accuracy (0.9535 vs 0.9538)

vessel segmentation maps, respectively.

and virtually identical ROC performance (AUC was

Finally, DirectNet was also compared to another

0.9790 for DirectNet, compared to 0.9733 for

Fig 3. Segmentation results for the first 4 fundus images in DRIVE. Left: original fundus images; middle: ground-truths; right: segmentation results produced by DirectNet.

156

Korean J. Vis. Sci. Vol. 20, No. 2

유형석⋅문형준⋅Björn Browatzki⋅Christian Wallraven

Liskowski et al.). Importantly, at test time

Acknowledgement

DirectNet was still more than 15 times faster than the other approach, which took 92 seconds per

This work was supported by an Institute for

image (see Table 2 for all results).

Information & Communications Technology Promotion

`

(IITP) grant funded by the Korean government (No. 2017-0-00451).

Ⅳ. Conclusion This study presents a novel method for retinal

References

blood vessel segmentation that is time- and memory-efficient while providing high segmentation

1. Lee CH, Woo JM et al.: Clinical Characteristics

accuracy. The proposed recurrent DirectNet architecture

of Retinal Arterial Macroaneurysms. J Korean

provides a compact network architecture (low

Ophthalmol Soc. 43(9), 1612-1620, 2002.

parameter count) that does not require patch-based

2. Higgins RD, Yan Y et al.: Regression of

scanning techniques or any post-processing steps.

retinopathy by squalamine in a mouse model.

It is able to predict a segmentation image by

Pediatr Res. 56(1), 144-149, 2004.

operating directly on the image without prior up-

3. Schmidt-Erfurth UM, Pruente C: Management

or downsampling steps as necessary in other

of neovascular age-related macular degeneration.

approaches. DirectNet was benchmarked against

Prog Retin Eye Res. 26(4), 437-451. 2007.

two other state-of-the-art methods on the DRIVE

4. Ferris FL 3rd, Davis MD et al.: Treatment of

dataset, yielding or surpassing state-of-the-art

diabetic retinopathy. N Engl J Med. 341(1),

performance in terms of accuracy, sensitivity, and

667-678. 1999.

specificity. Importantly for practical implementations,

5. Walter T, Klein JC et al.: A contribution of

however, the proposed DirectNet architecture is

image processing to the diagnosis of diabetic

at least one order of magnitude faster than

retinopathy-detection of exudates in color

traditional patch-based CNNs.

fundus images of the human retina. IEEE

Vessel segmentation is only the first step in an

Trans Med Imaging 21(10), 1236-1243, 2002.

automatic analysis pipeline that can be implemented

6. Staal J, Abràmoff MD et al.: Ridge-based vessel

in clinical practice. In the future, our goal is to

segmentation in color images of the retina. IEEE

derive features based on the segmented vessels that

Trans Med Imaging 23(4), 501-509, 2004.

can be helpful in diagnosing certain types of

7. Jiang X, Mojon D: Adaptive local thresholding

retinopathies such as edema, or early signs of age-

by verification-based multithreshold probing

related macular degeneration (AMD). However, for

with application to vessel detection in retinal

specialized diagnostic tasks based on retinal images

images. IEEE TPAMI. 25(1), 131-137, 2003.

a large amount of training data will be required,

8. Hoover A, Kouznetsova V et al.: Locating blood

going far beyond currently available datasets such

vessels in retinal images by piecewise threshold

as DRIVE. Especially in these cases, efficient

probing of a matched filter response. IEEE Trans

architectures, like the proposed DirectNet, will be

Med imaging 19(3), 203-210, 2000.

necessary for training on large datasets and in clinical application use cases.

9. Wang H, Hsu W et al.: An effective approach to detect lesions in color retinal images. In:

대한시과학회지 : 제 20 권 제 2 호 2018

157

딥러닝을 이용한 망막혈관 추출: 새로운 DirectNet 구조

Proc IEEE Conference Computer Vis Pattern Recognition 2, 181-186, 2000. 10. Zuluaga MA, Magnin IE et al.: Automatic

13. Simonyan K, Zisserman A: Very deep convolutional networks for large-scale image recognition. ICLR 2015, 1-14, 2015.

detection of abnormal vascular cross-sections

14. Chollet F: Xception: Deep learning with depthwise

based on density level detection and support

separable convolutions. arXiv preprint, arXiv:

vector machines. Int J Comput Assist Radiol

1610.02357v2, 1251-1258, 2016.

Surg. 6(2), 163-174, 2011.

15. Ronneberger O, Fisher P et al.: U-net: Convolutional

11. Sopharak A, Dailey MN et al.: Machine learning

networks for biomedical image segmentation.

approach to automatic exudate detection in

Int Conference Med Image Computing Computer-

retinal images from diabetic patients. J Mod

assisted Intervention 9351, 234-241, 2015.

Opt. 57(2), 124-135, 2010. 12. Fu H, Xu Y al.: Retinal vessel segmentation via deep learning network and fully-connected

16. Otsu N: A threshold selection method from gray-level histograms. IEEE Trans Systems Man Cybernet 9(1), 62–66, 1979.

conditional random fields. Proceedings IEEE

17. Liskowski P, Krawiec K: Segmenting retinal

Int Symposium Biomed Imag. 2016, 698-701,

blood vessels with deep neural networks. IEEE

2016.

Trans Medical Imaging 35(11), 2369-2380, 2016.

158

Korean J. Vis. Sci. Vol. 20, No. 2

Korean Journal of Vision Science 20(2): 151~159, June 2018

ISSN 1229-6457(Print) ISSN 2466-040X(Online)

https://doi.org/10.17337/JMBI.2018.20.2.151

딥러닝을 이용한 망막혈관 추출: 새로운 DirectNet 구조 유형석⋅문형준⋅Björn Browatzki⋅Christian Wallraven* 고려대학교 일반대학원 뇌공학과 ⼪৞⑺԰ԺԸԹՀโԨԽ⏒ԨԹԸ⑺ԱԴԨ἖┓⑺԰ԺԸԹՀโԨԾ⏒ԨԹՁ⑺ԱԴԨঊ⒪㍓┓⑺԰ԺԸԹՀโԨԾ⏒ԨԹՁ⑺Ա

요 약 목적: 본 연구는 망막 영상의 혈관 분할을 위한 새로운 심층 인공 학습 시스템을 구축하는데 목적이 있 다. 기존 CNN (Convolutional Neural Network) 아키텍처를 제시하고, CNN 방식을 토대로 성능을 개선한 DirectNet을 고안하는데 있다. 특히, 기존 CNN에 비해 처리 시간을 크게 단축시키는데 있다. 방법: 제안된 DirectNet 아키텍처는 피라미드 형식의 블록(Block)들로 구성되며 각 블록은 컨볼루션레이 어 (Convolution layer) 묶음을 포함한다. 하나의 블록은 학습 결과를 보존(저장)하는 단위이다. 블록들은 순차적으로 하나씩 증가하며 피라미드 형식으로 쌓이게 되고 이를 통해 초기 학습 결과가 소멸되지 않고 최 종 분석에 활용될 수 있도록 하였다. DirectNet은 패치(Patch) 추출과 Pooling 과정 없이 이미지를 학습하 여 학습 층을 거듭하여도 원본 이미지와 동일하게 유지되는 것이 특징이다. 또한 다양한 커널사이즈를 활용 하되 Depthwise Separable Convolution(DSC)을 활용하여 색을 구성하는 RGB(Red, Green, Blue)픽셀로 부터 혈관 모양을 인식 및 탐지한다. 결과: DirectNet은 최첨단 패치 기반 CNN 접근 방식 (0.9538 vs 0.9327, 0.7851 vs 0.7346, 0.9782 vs 0.9730, 0.8458 vs 0.7987)과 비교하여 더 높은 정확도, 감도, 특이도 및 정밀도 값을 제공하였다. Direct Net의 학습 시간은 8시간에서 1시간, 테스트에 소요된 시간은 1시간에서 이미지 당 6초로 크게 단축하였다. 결론: 제안 된 심층 학습 아키텍처는 기존의 CNN 방식으로 진행한 학습 시간에 비해 8배, 테스트에서 600배 빠른 속도로 결과를 제공한다. DirectNet은 CNN에 비해 2.11%의 약간 높은 정확도를 보였으며, 다 른 측정 항목에서도 동등하거나 그 이상의 결과를 보여 분석 시간 효율을 크게 높였다. 찾아보기 낱말 : 기계학습, 딥러닝, 망막혈관탐지

Address reprint request to Christian Wallraven Korea University, Anam-Dong, Sungbuk-Ku, Seoul 136-701, Korea TEL: +82-2-3290-5925, E-mail: [email protected] Copyright ⓒ 2018 by Korean Society of Vision Science All rights reserved.

대한시과학회지 : 제 20 권 제 2 호 2018

159