Autofocus Layer for Semantic Segmentation Yao Qin1,2, Konstantinos Kamnitsas1,3, Siddharth Ancha1,4, Jay Nanavati1, Garrison Cottrell2, Antonio Criminisi1, and Aditya Nori1 1Microsoft Research Cambridge, UK , 2Univercity of California San Diego, USA , 3Imperial College London, UK , 4Carnegie
Mellon University, USA
Github: https://github.com/yaq007/Autofocus-Layer e-mail:
[email protected]
Improving multi-scale processing for segmentation with DNNs Ø Develop a better automatic segmentation method. Ø Improve multi-scale processing capabilities of neural nets. Ø Context facilitates recognition systems [Calleguillos ‘10]
Foveal mechanism for biological multi-scale context aggregation.
Zoum in / out on different objects.
Multi-scale processing in SOTA methods DeepMedic [Kamnitsas’15,’16,’17]
FCN [Long’15]
HighResNet [Li’17]
Unet [Ronneberger’15]
Why Autofocus? Limitation of current models: Fusion of multi-scale info is static. • Learn “best overall” weights for combining features. • Apply them for inference regardless the input. X But, different scales may be optimal for different content. Autofocus layer: ü Data-driven combination of features ü Modular block
Attention module
Earlier attention to scale [Chen’16] : • For a specific architecture • Results fall behind SOTA
Dilated Convolutions and ASPP Dilated convolution [Chen’17, Yu’16] Dilation rate
r=1
Atrous Spatial Pyramid Pooling (ASPP) module: [Chen ‘17]
r:
r=2
r=3 Conv. BN ReLU Pool
source
Conv. BN ReLU Pool
• Parallel convs with different dilation rates. • Static & naive fusion via sum or concatenation.
Autofocus Layer Soft-attention network [Bahdanau’15]: ⇤l = (Convl,2 (ReLU(Convl,1 (Fl 1 ))))
FLAIR
Segmentation
Here K=4 parallel scales Attention map
⇤1l
Attention map
⇤2l
ü Fully convolutional ü Trainable end-to-end X Number of parameters x K X Memory requirements x K
Weight sharing & scale invariance Reduce parameters by sharing weights between parallel dilated kernels. • Same patterns at different scales. But enforces scale invariance? Good for natural images [Farabet ’13]: ü Small and big car are both cars. Scale invariance in medical imaging? ü Small and big bones are both bones. X Smaller and bigger abnormality not always same type. Attention module is not invariant. It can adapt its behavior according to task & context.
Evaluation - Networks Basic (dilated) Dilated conv r = 2
Auto Focus
BN +Relu
AutoFocusNets: AFN-{1,…,6} ASPP:
DeepMedic (DM): Receptive Field: DM = AFN-1 = ASPP
Same as AFN-1, but ASPP instead of Autofocus (i.e., no attention)
Multi-organ segmentation in pelvic CT
Two databases: ADD (86 scans) and Uw (34 scans) Two different clinical centers and different populations 512x512 slices, 1mm inter-slice spacing Evaluate generalization: Train on ADD - Test on Uw
Multi-organ segmentation - Results
Clear benefits from attention
Clear positive trend with more Autofocus
Brain tumor segmentation (val) Cross validation on BRATS’15 training database [Menze’15]: • Total 274 cases (HGG + LGG) • Train on 193, evaluate on 54 (23 held-out for parameter config.) Ø Less benefits from attention (less size variations?). Ø Clear positive trend with more Autofocus layers. Ø Architecture developed on multi-organ, works well on BRATS.
BRATS’15 (Blind test) Blind evaluation on BRATS’15 online platform: • Train on 274 cases (HGG + LGG) • Test on 110 blind cases (unknown grade) • Single model AFN-6
Difficult to compare: • CNN ensembles • Extensive augmentation • Deep supervision
(Extra) Evaluation PASCAL VOC ‘12 validation set
Take home messages Autofocus layer: • Data-driven choice of optimal scale for context. • Modular. Just replace a convolution with Autofocus layer! • Works. Good performance out of the box on diverse tasks. Limitation: • Memory usage increases linearly with number parallel scales. Future work: • Develop GPUs with more memory. • Incorporate attention in other modules and architectures.
References Galleguillos and Belongie: Context based object categorization: A critical survey. CVIU. (2010) Kamnitsas et al: Multi-scale 3D convolutional neural networks for lesion segmentation in brain MRI. ISLES-MICCAI. (2015) Kamnitsas et al: Deepmedic for brain tumor segmentation. MICCAI BraTS Challenge. (2016) Kamnitsas et al: Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation. MedIA (2017) Long et al: Fully convolutional networks for semantic segmentation. CVPR. (2015) Ronneberger et al: U-net: Convolutional networks for biomed- ical image segmentation. MICCAI (2015) Yu and Koltun: Multi-scale context aggregation by dilated convolutions. ICLR, 2016. Chen et al: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. TPAMI (2017) Chen et al: Attention to scale: Scale- aware semantic image segmentation. CVPR (2016) Bahdanau et al: Neural machine translation by jointly learning to align and translate. ICLR (2015) Farabet et al: Learning hierarchical features for scene labeling. IEEE PAMI (2013) Pereira et al: Brain tumor segmentation using convolutional neural networks in mri images. IEEE TMI (2016) Bakas et al.: Glistrboost: combiningmultimodal mri segmentation, registration, and biophysical tumor growth modeling with gradient boosting machines for glioma segmentation. MICCAI BraTS Challenge. (2015) Kayalibay et al: Cnn-based segmentation of medical imaging data. arXiv (2017) Isensee et al: Brain tumor segmentation and radiomics survival prediction: Contribution to the brats 2017 challenge. MICCAI BraTS Challenge (2017) Menze et al.: The multimodal brain tumor image segmentation benchmark (brats). IEEE TMI (2015)
Yao Qin, Konstantinos Kamnitsas, Siddharth Ancha, Jay Nanavati, Garrison Cottrell, Antonio Criminisi, and Aditya Nori
Github: https://github.com/yaq007/Autofocus-Layer e-mail:
[email protected]
Thank you for funding my MICCAI trip!