Korean Journal of Vision Science 20(2): 151~159, June 2018 https://doi.org/10.17337/JMBI.2018.20.2.151
ISSN 1229-6457(Print) ISSN 2466-040X(Online)
Retinal Vessel Detection Using Deep Learning: A novel DirectNet Architecture Hyeongsuk Ryu⋅Hyeongjun Moon⋅Björn Browatzki⋅Christian Wallraven* Dept of. Brain and Cognitive Engineering, Korea University ՚խիխձվխլԨՕթցԨԹԸԴԨԺԸԹՀՂԨ՚խվձջխլԨՒսնխԨԹՁԴԨԺԸԹՀՂԨՉիիխոռխլԨՒսնխԨԹՁԴԨԺԸԹՀԱ
Abstract Purpose : The aim of this study is to develop a novel deep learning system for vessel segmentation of retinal images. We present a recurrent Convolutional Neural Network (CNN) architecture and compare performance with existing CNN approaches, showing greatly reduced processing time with excellent performance. Methods : The proposed DirectNet architecture is composed of blocks, with each block containing a collection of convolutional layers. Blocks are stacked up in a pyramid, such that the number of blocks is increased by one at each level. Data are repeatedly processed by each block and combined with outputs of other blocks. This recurrent structure combined with the use of large kernel avoids the need for up- or downsampling layers, thus creating a direct pixel-to-pixel mapping from pixel inputs to the outputs of segmentation. Results : DirectNet provides higher accuracy, sensitivity, specificity, and precision values compared to a state-of-the-art, patch-based CNN approach (0.9538 vs 0.9327, 0.7851 vs 0.7346, 0.9782 vs 0.9730, 0.8458 vs 0.7987). Training time on a standard dataset for DirectNet is reduced from 8 hours to 1 hour, and testing time per image is greatly reduced from 1 hour for the patch-based method to 6 seconds for our method. Conclusion : The proposed deep-learning architecture is eight times faster for training and 600 times faster for testing at slightly higher accuracy values than a state-of-the-art method. Segmentation successfully highlights retinal blood vessels of large down to small sizes. key words: Machine learning, Deep learning, Retinal vessel detection
Address reprint request to Christian Wallraven Korea University, Anam-Dong, Sungbuk-Ku, Seoul 136-701, Korea TEL: +82-2-3290-5925, E-mail:
[email protected] Copyright ⓒ 2018 by Korean Society of Vision Science All rights reserved.
딥러닝을 이용한 망막혈관 추출: 새로운 DirectNet 구조
Ⅰ. Introduction
are a class of algorithms that simulates the way a human brain processes patterns, being able to
Vessel analysis on retinal pictures can help to
robustly learn categories from large amounts of
1)
identify health-related problems at early stages.
data. Convolutional Neural Networks (CNNs), in
More specifically, visual impairments such as diabetic
particular, are a special class of ANNs that have
retinopathy or age-related macular degeneration
received a lot of attention for their performance
(AMD) can be deduced from retinopathy analysis
in image interpretation tasks, Fu et al. in 2016
of the macular vessels as it may show occlusion
developed such a model for the task of vessel
2, 3)
or hemorrhage of the vessels.
Even though this
method is generally reliable, finding occlusions of
segmentation, showing impressive segmentation accuracy.12)
blood vessels in patients with diabetes or hypertension
Current CNNs architectures usually operate in
from fundus photographs is difficult – similarly,
a patch-based fashion: small patches are extracted
there are limitations in confirming the morphology
from the image and each pixel in each patch is
of micro-vessels such as neovascularization as
classified whether it belongs to a blood vessel or
compared with general blood vessels. Accurate
not. Since the patch size needs to be small in order
analysis of retinal vessel structure with the aim
for robust learning, this means that such an
of early diagnosis of retinal disease is therefore
architecture takes a long time both during training,
an important area of research with a wide range
and more importantly, also during testing. In the
4)
present paper, we try to overcome this problem
of applications in practice.
One of the first automatic processing algorithms was developed in 2002 by Walter et al. who
and develop a novel, deep-learning-based architecture for vessel segmentation called DirectNet.
implemented an algorithm that collected common features from patients with diabetic retinopathy to extract exudates in retinal images. The algorithm
Ⅱ. Methods
extracted morphological characteristics of the exudate image that are common in diabetic retinopathy.5)
DirectNet is a fully convolutional neural network
Nearest-neighbor-classifiers were then developed
that approaches retinal image segmentation as an
to distinguish the blood vessels from the retinal
image-to-image translation task. In a traditional
image, which further improved the accuracy by
feedforward CNN, data flows through the network
subdividing and analyzing the features of the blood
continuously in one direction from the top to the
6-8)
vessel image.
bottom layer. In contrast, we propose the use of
For the purpose of vessel segmentation, Wang
recurrent structures to build a compact, yet
et al. implemented computer vision techniques in
sufficiently complex model. An architecture that
9)
2000. These showed reasonably high accuracy on
allows for very fast analysis, while maintaining
test retinal images; however reliable extraction of
accuracy is developed in this study.
branch vessels such as microvascular or neovascular
Our network consists of a pyramid-shaped stack
vessels required more advanced algorithm architectures
of recurring blocks of convolutional layers as
that were also able to distinguish between normal
depicted in Figure 1b. Data flows through the
10,11)
Recently, Artificial
network in 4 stages, being processed repeatedly
Neural Networks (ANNs) have experienced a
by 4 distinct blocks. Each block consists of a set
renaissance in image processing and analysis – ANNs
of convolutional layers. At stage one, the input
vessels and abnormal vessels.
152
Korean J. Vis. Sci. Vol. 20, No. 2
유형석⋅문형준⋅Björn Browatzki⋅Christian Wallraven
(a) Convolutional Neural Network for vessel segmentation
(b) Proposed DirectNet for vessel segmentation. Blocks depicted in the same color denote identical network components. Each block consists of a collection of layers.
Fig 1. Comparison of the standard, patch-based CNNs approach (a) with our proposed DirectNet architecture (b).
image is processed by block 1 (depicted as red block
it does not contain any pooling layers. Instead,
in Figure 1b). Outputs are then passed on to block
the required receptive field size is achieved by
2 (green) but also fed back into block 1. At the
employing larger kernels sizes (5×5, 7×7, 15×15)
next stage, outputs of block 1 are again given directly
combined with the aforementioned recurrent
to block 1 and outputs of block 2 are given to block
structure of the network. This structure allows the
3 (blue). Outputs of block 1 and 2 are then merged
storage and propagation of information across the
and then passed to block 2. This process continues
image. The increase in computational complexity,
in the same fashion through stages 3 and 4. Finally,
introduced by larger kernels can be mitigated by
the results of all individual blocks on stage 4 are
the use of depthwise separable convolutions as
combined into a joint prediction. The final output
proposed by Chollet15). This method can reduce
is a vessel probability map of the same size as the
computation time by more than 30% compared to
input image.
general convolutions by separating spatial convolution
Most current CNN models use the combination
from the convolution across image channels.
of small 3×3 convolutional kernels and pooling layers to reduce model parameters14). Since the DirectNet architecture presented here relies on a rather shallow network design with fewer layers,
대한시과학회지 : 제 20 권 제 2 호 2018
153
딥러닝을 이용한 망막혈관 추출: 새로운 DirectNet 구조
Ⅲ. Results
experiments were run on the Intel Core i7 processor with 16Gb RAM and a GeForce GTX 1080Ti graphics
The proposed DirectNet model for retinal vessel
card.
segmentation was implemented using the Keras
Performance of the vessel segmentation can be
library with a Tensorflow backend and evaluated
described with several metrics including the
on the DRIVE dataset, which is the most commonly-
F1-score, accuracy, sensitivity, specificity, precision,
6)
used dataset for vessel segmentation . The testing
and the area under the ROC curve. A definition
procedure followed the common methodology of
of these metrics is given in (1).
selecting the annotations of the first human observer as ground truth (annotations of the second observer are usually only used to study human performance). The DRIVE dataset contains 40 fundus images, split into 20 images for training and 20 images for testing. All images were cropped to an input size of 584 × 565 pixels. To compare the DirectNet architecture to a
, , ,
(1)
Pr , Pr × Pr ×
×
standard, state-of-the-art method, another patchbased CNN was trained on DRIVE to serve as a baseline. For this a publicly available implementation* based on the U-Net architecture13) was used. In the DirectNet architecture, the total number of parameters was 273,668 (see Table 1), whereas the U-Net implementation had 517,666 parameters. All
, where, TP, TN, FP, and FN stand for true positive, true negative, false positive, and false negative classification, respectively. The area under the ROC curve was calculated using the standard implementation provided in the Python library. Segmentation results in form of probability maps
Fig 2. ROC curves of DirectNet (blue) and patch-based CNN (red). * https://github.com/orobix/retina-unet
154
Korean J. Vis. Sci. Vol. 20, No. 2
유형석⋅문형준⋅Björn Browatzki⋅Christian Wallraven
Table 1. DirectNet architecture Block
Block 1
Block 2 Bl. 1 + 2 Block 3 Bl. 1+2+3 Block 4 Block 1+2+3+4 Final Block
Layer Input Layer Conv 2D Conv 2D Conv 2D (res) Separable Conv 2D Separable Conv 2D(+res) Conv 2D (res) Separable Conv 2D Separable Conv 2D Add Conv 2D (res) Separable Conv 2D Separable Conv 2D (+res) Add Conv 2D (res) Separable Conv 2D Separable Conv 2D (+res) Add Separable Conv 2D Separable Conv 2D Total
Batch norm.&Activation BatchNorm & ReLU BatchNorm & ReLU BatchNorm BatchNorm & ReLU BatchNorm & ReLU BatchNorm BatchNorm & ReLU BatchNorm & ReLU ReLU BatchNorm BatchNorm & ReLU BatchNorm & ReLU ReLU BatchNorm BatchNorm & ReLU BatchNorm & ReLU ReLU BatchNorm & ReLU BatchNorm & Sigmoid
Kernel 3 × 7 × 1 × 15 × 15 × 1 × 15 × 15 × 1 × 15 × 15 × 1 × 15 × 15 × 7 × 5 ×
Size 3 7 1 15 15 1 15 15 1 15 15 1 15 15 7 5
Num. of Parameter 864 + 128(BN) 100352 + 256(BN) 4096 + 256(BN) 18496 + 256(BN) 18496 + 256(BN) 4096 + 256(BN) 18496 + 256(BN) 18496 + 256(BN) 4096 + 256(BN) 18496 + 256(BN) 18496 + 256(BN) 4096 + 256(BN) 18496 + 256(BN) 18496 + 256(BN) 4160 + 64(BN) 416 + 4(BN) 273,668
Table 2. Performance comparison of patch-based CNN, Liskowskiet al17), and our method Fundus Images Fundus Training Patches F1 score Accuracy Sensitivity Specificity Precision AUC ROC curve Jaccard similarity score Training time Test time per image
Patch-based CNN 20 400,000 (20×20,000) 0.7653 0.9427 0.7346 0.9730 0.7987 0.9640 0.9426 8h 1h
Liskowski et al. 17) 20 400,000 (20×20,000) 0.9535 0.7811 0.9807 0.9790 8h 92sec
DirectNet 20 20 0.8124 0.9538 0.7851 0.9782 0.8458 0.9733 0.9490 1h 6sec
were converted to binary images by a fixed threshold
the use of multiple dataset splits, such that
of 0.5. Since threshold values may change the
statistical tests or Bland-Altman plots cannot be
results, the automatic Otsu threshold selection
run). This was mostly driven by an increase in
16)
method was also tried , but did not obtain better
sensitivity, meaning the algorithm’s ability to
results.
accurately detect a pixel of a vessel. Additionally,
As shown in Table 2, all parameters obtained
DirectNet showed a significant speed-up compared
from DirectNet were relatively higher than the
to the U-Net architecture: training of the
patch-based CNN method. In particular, the F1
patch-based CNN took 8 hours, whereas DirectNet
score for predictive evaluation of 0.8124 compared
took only 1 hour. Similarly, during testing, U-Net
to 0.7653 showed a strong increase in performance
took 1 hour to process all patches of one retinal
(note, that the format of the training and testing
image, whereas DirectNet finished the same task
paradigm, which is used in the field so far prohibits
in 6 seconds – a strong speed-up.
대한시과학회지 : 제 20 권 제 2 호 2018
155
딥러닝을 이용한 망막혈관 추출: 새로운 DirectNet 구조
Figure 3 shows the qualitative evaluation of
recent deep-learning method suggested by Liskowski
segmentation results on the first 4 images in
et al.17). This method required 8 hours of training
DRIVE. The original retinal photos are shown in
time on 400,000 sample patches extracted from
the first column. The second and third columns
the 20 training images in DRIVE. Both methods
show human annotations and the developed
achieved comparable accuracy (0.9535 vs 0.9538)
vessel segmentation maps, respectively.
and virtually identical ROC performance (AUC was
Finally, DirectNet was also compared to another
0.9790 for DirectNet, compared to 0.9733 for
Fig 3. Segmentation results for the first 4 fundus images in DRIVE. Left: original fundus images; middle: ground-truths; right: segmentation results produced by DirectNet.
156
Korean J. Vis. Sci. Vol. 20, No. 2
유형석⋅문형준⋅Björn Browatzki⋅Christian Wallraven
Liskowski et al.). Importantly, at test time
Acknowledgement
DirectNet was still more than 15 times faster than the other approach, which took 92 seconds per
This work was supported by an Institute for
image (see Table 2 for all results).
Information & Communications Technology Promotion
`
(IITP) grant funded by the Korean government (No. 2017-0-00451).
Ⅳ. Conclusion This study presents a novel method for retinal
References
blood vessel segmentation that is time- and memory-efficient while providing high segmentation
1. Lee CH, Woo JM et al.: Clinical Characteristics
accuracy. The proposed recurrent DirectNet architecture
of Retinal Arterial Macroaneurysms. J Korean
provides a compact network architecture (low
Ophthalmol Soc. 43(9), 1612-1620, 2002.
parameter count) that does not require patch-based
2. Higgins RD, Yan Y et al.: Regression of
scanning techniques or any post-processing steps.
retinopathy by squalamine in a mouse model.
It is able to predict a segmentation image by
Pediatr Res. 56(1), 144-149, 2004.
operating directly on the image without prior up-
3. Schmidt-Erfurth UM, Pruente C: Management
or downsampling steps as necessary in other
of neovascular age-related macular degeneration.
approaches. DirectNet was benchmarked against
Prog Retin Eye Res. 26(4), 437-451. 2007.
two other state-of-the-art methods on the DRIVE
4. Ferris FL 3rd, Davis MD et al.: Treatment of
dataset, yielding or surpassing state-of-the-art
diabetic retinopathy. N Engl J Med. 341(1),
performance in terms of accuracy, sensitivity, and
667-678. 1999.
specificity. Importantly for practical implementations,
5. Walter T, Klein JC et al.: A contribution of
however, the proposed DirectNet architecture is
image processing to the diagnosis of diabetic
at least one order of magnitude faster than
retinopathy-detection of exudates in color
traditional patch-based CNNs.
fundus images of the human retina. IEEE
Vessel segmentation is only the first step in an
Trans Med Imaging 21(10), 1236-1243, 2002.
automatic analysis pipeline that can be implemented
6. Staal J, Abràmoff MD et al.: Ridge-based vessel
in clinical practice. In the future, our goal is to
segmentation in color images of the retina. IEEE
derive features based on the segmented vessels that
Trans Med Imaging 23(4), 501-509, 2004.
can be helpful in diagnosing certain types of
7. Jiang X, Mojon D: Adaptive local thresholding
retinopathies such as edema, or early signs of age-
by verification-based multithreshold probing
related macular degeneration (AMD). However, for
with application to vessel detection in retinal
specialized diagnostic tasks based on retinal images
images. IEEE TPAMI. 25(1), 131-137, 2003.
a large amount of training data will be required,
8. Hoover A, Kouznetsova V et al.: Locating blood
going far beyond currently available datasets such
vessels in retinal images by piecewise threshold
as DRIVE. Especially in these cases, efficient
probing of a matched filter response. IEEE Trans
architectures, like the proposed DirectNet, will be
Med imaging 19(3), 203-210, 2000.
necessary for training on large datasets and in clinical application use cases.
9. Wang H, Hsu W et al.: An effective approach to detect lesions in color retinal images. In:
대한시과학회지 : 제 20 권 제 2 호 2018
157
딥러닝을 이용한 망막혈관 추출: 새로운 DirectNet 구조
Proc IEEE Conference Computer Vis Pattern Recognition 2, 181-186, 2000. 10. Zuluaga MA, Magnin IE et al.: Automatic
13. Simonyan K, Zisserman A: Very deep convolutional networks for large-scale image recognition. ICLR 2015, 1-14, 2015.
detection of abnormal vascular cross-sections
14. Chollet F: Xception: Deep learning with depthwise
based on density level detection and support
separable convolutions. arXiv preprint, arXiv:
vector machines. Int J Comput Assist Radiol
1610.02357v2, 1251-1258, 2016.
Surg. 6(2), 163-174, 2011.
15. Ronneberger O, Fisher P et al.: U-net: Convolutional
11. Sopharak A, Dailey MN et al.: Machine learning
networks for biomedical image segmentation.
approach to automatic exudate detection in
Int Conference Med Image Computing Computer-
retinal images from diabetic patients. J Mod
assisted Intervention 9351, 234-241, 2015.
Opt. 57(2), 124-135, 2010. 12. Fu H, Xu Y al.: Retinal vessel segmentation via deep learning network and fully-connected
16. Otsu N: A threshold selection method from gray-level histograms. IEEE Trans Systems Man Cybernet 9(1), 62–66, 1979.
conditional random fields. Proceedings IEEE
17. Liskowski P, Krawiec K: Segmenting retinal
Int Symposium Biomed Imag. 2016, 698-701,
blood vessels with deep neural networks. IEEE
2016.
Trans Medical Imaging 35(11), 2369-2380, 2016.
158
Korean J. Vis. Sci. Vol. 20, No. 2
Korean Journal of Vision Science 20(2): 151~159, June 2018
ISSN 1229-6457(Print) ISSN 2466-040X(Online)
https://doi.org/10.17337/JMBI.2018.20.2.151
딥러닝을 이용한 망막혈관 추출: 새로운 DirectNet 구조 유형석⋅문형준⋅Björn Browatzki⋅Christian Wallraven* 고려대학교 일반대학원 뇌공학과 ⼪⑺ԺԸԹՀโԨԽ⏒ԨԹԸ⑺ԱԴԨ┓⑺ԺԸԹՀโԨԾ⏒ԨԹՁ⑺ԱԴԨঊ⒪㍓┓⑺ԺԸԹՀโԨԾ⏒ԨԹՁ⑺Ա
요 약 목적: 본 연구는 망막 영상의 혈관 분할을 위한 새로운 심층 인공 학습 시스템을 구축하는데 목적이 있 다. 기존 CNN (Convolutional Neural Network) 아키텍처를 제시하고, CNN 방식을 토대로 성능을 개선한 DirectNet을 고안하는데 있다. 특히, 기존 CNN에 비해 처리 시간을 크게 단축시키는데 있다. 방법: 제안된 DirectNet 아키텍처는 피라미드 형식의 블록(Block)들로 구성되며 각 블록은 컨볼루션레이 어 (Convolution layer) 묶음을 포함한다. 하나의 블록은 학습 결과를 보존(저장)하는 단위이다. 블록들은 순차적으로 하나씩 증가하며 피라미드 형식으로 쌓이게 되고 이를 통해 초기 학습 결과가 소멸되지 않고 최 종 분석에 활용될 수 있도록 하였다. DirectNet은 패치(Patch) 추출과 Pooling 과정 없이 이미지를 학습하 여 학습 층을 거듭하여도 원본 이미지와 동일하게 유지되는 것이 특징이다. 또한 다양한 커널사이즈를 활용 하되 Depthwise Separable Convolution(DSC)을 활용하여 색을 구성하는 RGB(Red, Green, Blue)픽셀로 부터 혈관 모양을 인식 및 탐지한다. 결과: DirectNet은 최첨단 패치 기반 CNN 접근 방식 (0.9538 vs 0.9327, 0.7851 vs 0.7346, 0.9782 vs 0.9730, 0.8458 vs 0.7987)과 비교하여 더 높은 정확도, 감도, 특이도 및 정밀도 값을 제공하였다. Direct Net의 학습 시간은 8시간에서 1시간, 테스트에 소요된 시간은 1시간에서 이미지 당 6초로 크게 단축하였다. 결론: 제안 된 심층 학습 아키텍처는 기존의 CNN 방식으로 진행한 학습 시간에 비해 8배, 테스트에서 600배 빠른 속도로 결과를 제공한다. DirectNet은 CNN에 비해 2.11%의 약간 높은 정확도를 보였으며, 다 른 측정 항목에서도 동등하거나 그 이상의 결과를 보여 분석 시간 효율을 크게 높였다. 찾아보기 낱말 : 기계학습, 딥러닝, 망막혈관탐지
Address reprint request to Christian Wallraven Korea University, Anam-Dong, Sungbuk-Ku, Seoul 136-701, Korea TEL: +82-2-3290-5925, E-mail:
[email protected] Copyright ⓒ 2018 by Korean Society of Vision Science All rights reserved.
대한시과학회지 : 제 20 권 제 2 호 2018
159