Machine Learning in Biomedical Applications: Case ...

4 downloads 0 Views 142KB Size Report
Radiomics [13] are new and emerging techniques that extract feature from ... Radiomics already showed the promising performance for studies related to lung ...
Machine Learning in Biomedical Applications: Case Study Breast Cancer Nooshin Osmany, Department of Medical Informatics, Tarbiat Modares University Subject Impossible to control the growth of cancer cells in an area of the body; this is common to all of the cancer, despite the fact that what causes of the cancer and damages to the cells. Regular healthy cells grow, divide and die. The cell is growing rapidly in embryonic period but it occurs until the person reaches full maturity. In most parts of the body, the cells grow and divide into new cells as needed to replace worn-out cells, are dead or dying. When tissue is damaged (scarring, failure), healthy cells rapidly grow and divide to repair damaged tissue. But cancer cells, unlike normal cells do not die but continuously grow, divide and constantly produces abnormal cells [1]. If the cancer is not diagnosed, it can spread out to tissues and eventually leads to death. Based on the location of the spreading the cancer cells, there are different type of cancers such as lung, prostate, skin and breast cancer. They have similar characteristics such as 1) abnormal growth 2) invasion to the other tissues 3) spreading out thru blood vessels and finally metastasis. Suspicious areas are usually identified by a pathologist, oncologist or radiologist [2].There are different approaches to diagnose the cancer. The commons approaches are biopsy, endoscopy, blood test and imaging. According to the NCI dictionary [3], the breast cancer is type of the cancer which is located in breast area. The most common breast cancer is named ductal carcinoma which spreads out thru breast milk ducts. Another common breast cancer is lobular carcinoma which spreads out thru lobules of the breast. Based on the World Cancer Research Fund International report [4], the breast cancer is the most common cancer in women. Imaging is using to detect the cancer cells. Imaging data can demonstrate the body of the man in 2D and 3D, so they are one the most significant mean for clinicians to focus on. The raw data are not usually usable; as a result enormous processing is done on them. The imaging data are “Big Data” because they have huge amount of information within their pixels which are easy to catch. Thereby, they required to be analyzed, interoperated to diagnose and treat the cancer. The diagnosis procedure is two folded: visualization and interpretation. There are errors in interpretation of the imaging data and unneglectable. Some factors can effect on decision making such as variation in imaging data. These complexities can mislead the decision making. Errors can dramatically affect patient treatment, delay in examination and incorrect diagnosis. The main source of the error is the human error in interpretation of the visualized imaging data [5]. Due to these kinds of errors, computational

analysis is a great alternative. Quantitative image analysis can aid in early diagnosis [6]. It is crucial to diagnose diseases in early stage which can enhance the treatment result, save lives and reduce expenses. Computational image analysis can improve reproducibility and objectivity [7,8]. More than 40 years, image processing is considered to diagnose the cancer automatically [9], but there are still challenges because of complexity of imaging data [10]. The goal in many classification problems of breast cancer is to determine whether the lesion is benign or malignant. Machine learning techniques are successfully applied in such subject recently. For example, in Turkey, they report about 99% accuracy is classification of breast cancer by using Support Vector Machines [11]. Machine Learning is one sub category of artificial intelligence with focusing on learning algorithms based on observed data with optimization of target function or loss function. The statistical machine learning is comprehensively interesting in medical sciences because it significantly improves the sensitivity of detection of the disease. While at the same time increase the objectivity of the decision making process. Breast cancer can be detected with a tumor classification. There are two types of tumor: benign or malignant. Doctors need a reliable diagnostic process to distinguish between these tumors. But in general it is very difficult to detect tumors, even by experts. That's because the computer system required for tumor detection. Researchers have tried Machine Learning algorithms to detect cancer survival benefit in humans. Researchers proved that these algorithms work better in cancer. Machine learning algorithms are supervised, unsupervised. Among machine learning algorithms, the artificial neural network (ANN) is one of the learning algorithms which its structure and functional aspects have inspired from biological neural networks. The network is based on the performance of the nervous system including the brain, which is designed to process information. ANN is used in various tasks such as forecasting, clustering and classification [12]. Radiomics [13] are new and emerging techniques that extract feature from imaging data, mine them and then use them for clinical purpose analysis. To extract the imaging feature, we utilize methods which are rooting in data sciences such as image processing and computer vision. To obtain meaningful and discriminative variables from image, we use data mining methods to reduce the dimension of the data and remove some redundant and irrelevant features. Finally, we apply machine learning methods to do classification or clustering goals related to clinical task. Radiomics already showed the promising performance for studies related to lung [14], breast [15, 16], head and neck [17], sarcomas [18, 19] and brain cancer [20]. Deep learning is another powerful tool to analyze medical data. The idea of using deep learning approach have existed many years but due to constraints such as lack of appropriate hardware, could not be used. But recently, by removing restrictions, many activities have been conducted in

this area. The deep learning concept is originated in the artificial neural network research. Deep Neural Networks are often the feedforward neural network with a hidden layer or multi-layer Perceptron more. In recent years, deep learning is the most widely studied concept in machine learning. Deep learning approaches are different. In general, these methods can be divided into four categories [21] -

convolution neural network

-

restricted boltzman machines (RBMs)

-

parse coding

-

Auto encoder

In this study, we focus on methods CNN subset of the deep learning. One of the best neural structures developed based on image-based neural networks, is the convolutional neural networks. The convolution neural network structure is modeling of functionality of the human visual system and its relationship to the visual area of the brain which handles the automatic extraction of key features, learning images in a complex neural structure and also removes the possible redundancy has it. CNNs are first truly successful deep learning approach where hierarchies of layers have been successfully trained in a strong manner. CNN architecture or topology is chosen in a manner that reduces the number of parameters that must be learned and therefore improves learning. CNN, as a deep learning framework, use the least amount of pre-processing to the best of our knowledge. Learning Strategies One of the benefits of deep learning vs. shallow learning is creating deep architecture to learn more abstract. But the large number of parameters triggers over fitting. There are methods to deal with overfitting such as DropConnect , Dropout , Data Augmentation , pre-training and finetuning which improves network performance. These methods are applying in CNNs as well as traditional neural networks are also used. Pretrained means initialize the network with trained parameters rather than random parameters which accelerate the performance of the learning algorithm and improve generalization. Fine tuning also is an important step for redefining the model to suit specific tasks and database which need to label the training dataset to calculate the error function. Deep learning is widely used in different fields of computer vision applications; the most important applications are image classification, image recognition, object detection, segmentation, and semantic image retrieval. Image classification means labeling the input image with most possible assigned class of objects [14]. Accordingly, the process is that after the initial studies, a pretrained network is using for images of breast cancer to be learned and convert them to a binary output of our network. The network must recognize that given image is either benign or malignant breast cancer.

Purpose Major Goal Using the convolutional neural network to solve problem in to recognize the breast cancer and giving better result than state-of-art methods. Minor Goal 1. Learn how to use neural network convolution 2. Reduce medical errors in analysis of images 3. Reduce costs for patients 4. Help physicians to save time Challenges 1. Does the convolution neural network model improve diagnosis of medical images? 2. Does the use of convolutional neural networks for diagnosis increase the reliability of the diagnosis? 3. Does the use of convolutional neural networks enhance the accuracy in detecting the disease? Review “Kowal et al. [22] compare and test different algorithms for nuclei segmentation, where the cases are classified as either benign or malignant on a dataset of 500 images, and report accuracies ranging from 96% to 100%. Filipczuk et al. [23] present a BC diagnosis system based on the analysis of cytological images of fine needle biopsies, to discriminate the images as either benign or malignant. Using four different classifiers trained with a 25-dimensional feature vector, they report a performance of 98% on 737 images. Similarly to [22] and [23], George et al. [24] propose a diagnosis system for BC based on the nuclei segmentation of cytological images. Using different machine learning models, such as neural networks and support vector machines, they report accuracy rates ranging from 76% to 94% on a dataset of 92 images. Zhang et al. [25] propose a cascade approach with rejection option. In the first level of the cascade, authors expect to solve the easy cases while the hard ones are sent to a second level where a more complex pattern classification system is used. They assess the proposed method on a database proposed by the Israel Institute of Technology, which is composed of 361 images and report results of 97% of reliability.” [10]. Fabio et. al. [10] apply LeNet [26] on BREAKHINS data set but it has low performance of 72% accuracy. Then they used AlexNet [27], more complex artitecture which has better result. Anthimopoulos [28] used CNN for Interstitial Lung

Diseases using 120 CT images with 141696 patches and finally got 85.5% accruacy for their classification puporses.

Assumptions: 1. Convolutional neural network architecture can improve the quality of diagnosis of medical images, as well as reduces the cost of treatment. 2. The use of convolutional neural networks increases the reliability of detection of diseases. 3. The use of convolutional neural networks for diagnosis improves the accuracy in detecting of diseases Methods and Novelty: In this study, we want to use the different architectures of CNN as well as breast cancer dataset for different classification clinical tasks. We want to use pre-trained networks for images by using transfer learning approach to extract convolved features as well as fine-tuning. We start to use MatConvNet software provided by MATLAB. Then, we try to manipulate the layer with modifying them to improve the results. References 1. 2. 3. 4. 5. 6. 7. 8. 9.

10.

11. 12. 13.

A. C. Society. (2016). What is cancer? Available: http://www.cancer.gov/about-cancer/understanding/what-iscancer A. Mandal. What is Cancer? Available: http://www.news-medical.net/health/What-is-Cancer.aspx I. NC. NCI Dictionary of Cancer Terms. Available: http://www.cancer.gov/publications/dictionaries/cancerterms?cdrid=444971 I. WCRF. Breast cancer statistics. Available: http://www.wcrf.org/int/cancer-facts-figures/data-specificcancers/breast-cancer-statistics E. A. Krupinski, "Current perspectives in medical image perception," Attention Perception & Psychophysics, vol. 72, pp. 1205-1217, Jul 2010. F. Y. Xing, Y. P. Xie, and L. Yang, "An Automatic Learning-Based Framework for Robust Nucleus Segmentation," Ieee Transactions on Medical Imaging, vol. 35, pp. 550-566, Feb 2016. E. Meijering, "Cell Segmentation: 50 Years Down the Road," Ieee Signal Processing Magazine, vol. 29, pp. 140-145, Sep 2012. M. N. Gurcan, L. E. Boucheron, A. Can, A. Madabhushi, N. M. Rajpoot, and B. Yener, "Histopathological image analysis: A review," IEEE reviews in biomedical engineering, vol. 2, pp. 147-171, 2009. B. Stenkvist, S. Westmannaeser, J. Holmquist, B. Nordin, E. Bengtsson, J. Vegelius, O. Eriksson, and C. H. Fox, "Computerized Nuclear Morphometry as an Objective Method for Characterizing Human Cancer CellPopulations," Cancer Research, vol. 38, pp. 4688-4697, 1978. N. Hatipoglu and G. Bilgin, "Classification of Histopathological Images Using Convolutional Neural Network," 2014 4th International Conference on Image Processing Theory, Tools and Applications (Ipta), pp. 295-300, 2014. M. F. Akay, "Support vector machines combined with feature selection for breast cancer diagnosis," Expert Systems with Applications, vol. 36, pp. 3240-3247, Mar 2009. B. Gayathri, C. Sumathi, and T. Santhanam, "Breast Cancer Diagnosis using Machine Learning Algorithms-A Survey," International Journal of Distributed and Parallel Systems, vol. 4, p. 105, 2013. P. Lambin, E. Rios-Velazquez, R. Leijenaar, S. Carvalho, R. G. van Stiphout, P. Granton, C. M. Zegers, R. Gillies, R. Boellard, and A. Dekker, "Radiomics: extracting more information from medical images using advanced feature analysis," European journal of cancer, vol. 48, pp. 441-446, 2012.

14. T. P. Coroller, P. Grossmann, Y. Hou, E. R. Velazquez, R. T. Leijenaar, G. Hermann, P. Lambin, B. HaibeKains, R. H. Mak, and H. J. Aerts, "CT-based radiomic signature predicts distant metastasis in lung adenocarcinoma," Radiotherapy and Oncology, vol. 114, pp. 345-350, 2015. 15. B. Chaudhury, M. Zhou, H. Farhidzadeh, D. B. Goldgof, L. O. Hall, R. A. Gatenby, R. J. Gillies, R. J. Weinfurtner, and J. S. Drukteinis, "Predicting Ki67% expression from DCE-MR images of breast tumors using textural kinetic features in tumor habitats," in SPIE Medical Imaging, 2016, pp. 97850T-97850T-7. 16. R. R. Colen and D. Piwnica-Worms, "Radiomics and Radiogenomics in Breast Cancer," Breast Diseases: a YB Quarterly, vol. 27, pp. 23-24, 2016. 17. H. Farhidzadeh, J. Y. Kim, J. G. Scott, D. B. Goldgof, L. O. Hall, and L. B. Harrison, "Classification of progression free survival with nasopharyngeal carcinoma tumors," in Proc. SPIE, Medical Imaging 2016: Computer-Aided Diagnosis, 2016, pp. 97851I-97851I-7. 18. H. Farhidzadeh, D. B. Goldgof, L. O. Hall, J. G. Scott, R. A. Gatenby, R. J. Gillies, and M. Raghavan, "A Quantitative Histogram-based Approach to Predict Treatment Outcome for Soft Tissue Sarcomas Using Preand Post-treatment MRIs," (In Press) in Systems, Man, and Cybernetics (SMC), 2016 IEEE International Conference on, 2016. 19. H. Farhidzadeh, D. B. Goldgof, L. O. Hall, R. A. Gatenby, R. J. Gillies, and M. Raghavan, "Texture feature analysis to predict metastatic and necrotic soft tissue sarcomas," in Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on, 2015, pp. 2798-2802. 20. H. J. Aerts, E. R. Velazquez, R. T. Leijenaar, C. Parmar, P. Grossmann, S. Carvalho, J. Bussink, R. Monshouwer, B. Haibe-Kains, and D. Rietveld, "Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach," Nature communications, vol. 5, 2014. 21. Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew, "Deep learning for visual understanding: A review," Neurocomputing, vol. 187, pp. 27-48, 2016. 22. M. Kowal, P. Filipczuk, A. Obuchowicz, J. Korbicz, and R. Monczak, "Computer-aided diagnosis of breast cancer based on fine needle biopsy microscopic images," Computers in Biology and Medicine, vol. 43, pp. 1563-1572, Oct 1 2013. 23. P. Filipczuk, T. Fevens, A. Krzyzak, and R. Monczak, "Computer-Aided Breast Cancer Diagnosis Based on the Analysis of Cytological Images of Fine Needle Biopsies," Ieee Transactions on Medical Imaging, vol. 32, pp. 2169-2178, Dec 2013. 24. Y. M. George, H. H. Zayed, M. I. Roushdy, and B. M. Elbagoury, "Remote Computer-Aided Breast Cancer Detection and Diagnosis System Based on Cytological Images," Ieee Systems Journal, vol. 8, pp. 949-964, Sep 2014. 25. Y. G. Zhang, B. L. Zhang, F. Coenen, and W. J. Lu, "Breast cancer diagnosis from biopsy images with highly reliable random subspace classifier ensembles," Machine Vision and Applications, vol. 24, pp. 1405-1420, Oct 2013. 26. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, pp. 2278-2324, 1998. 27. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105. 28. H. Wang, A. Cruz-Roa, A. Basavanhally, H. Gilmore, N. Shih, M. Feldman, J. Tomaszewski, F. Gonzalez, and A. Madabhushi, "Mitosis detection in breast cancer pathology images by combining handcrafted and convolutional neural network features," Journal of Medical Imaging, vol. 1, pp. 034003-034003, 2014.

Suggest Documents