Content-Based Image Retrieval System for Differential Diagnosis of Lung Cancer Ashis Kumar Dhara1 , Chanukya Krishna Chama1 , Sudipta Mukhopadhyay1 1 Indian Institute of Technology, Kharagpur 721302, INDIA. Email:
[email protected],
[email protected],
[email protected] Niranjan Khandelwal2 2 Dept. of Radiodiagnosis, PGIMER, Chandigarh, INDIA Email:
[email protected] Abstract In this paper the framework for content-based image retrieval (CBIR) system for solitary pulmonary nodules from lung CT images has been described. The objective of CBIR system is to retrieve similar nodules from large CT image database for a given query nodule. The CBIR system will help for differential diagnosis of lung cancer as well as radiological self learning tool. In modern hospitals the radiological images and reports are stored in PACS and RIS respectively. Finding similar images based on diagnostic feature will assist radiologists in differential diagnosis during daily clinical practice. Budding radiologists can enrich their perception by visualization of all possible nodules for given query nodule and learn from corresponding reports without assistance of expert radiologists. The proposed standalone CBIR system is validated on 45 solid pulmonary nodules from LIDC dataset and 40 solid pulmonary nodules from PGIMER data set respectively. The average precision achieved for LIDC data set is 72.18% and for PGIMER data set is 78.29% considering top five retrieved nodules. Keywords: Content based image retrieval, Lung nodule, Computed tomography(CT), Retrieval, Shape and texture feature, Precision.
1
Introduction
According to the statistics from American Cancer society, lung cancer is the primary cause of cancer related death in United States [1]. A pulmonary nodule is defined as approximately round opacity having maximum diameter less than 3 cm [2]. The pulmonary nodules are early indication lung cancer. The five years survival rate of a patient diagnosed with lung cancer can be increased to 65%-80% from 10%-15% if nodules are detected at early stage [3]. Good understanding is required on different diagnostic features of nodule for efficient prognosis. Proposed CBIR system for lung nodules will help as a self learning tool for differential diagnosis of lung nodules Large number of images is generated by hospitals and clinics every day. These images play very important role in diagnosis of diseases, medical research and education. CBIR systems have identified as an important research topic in radiology to facilitate diagnostic decision support for medical image interpretation using gradually increasing clinical data [4]. Li et al. [5] present an example of a CBIR-based tool to aid in radiological diagnosis. Kawata et al. [6] developed a CBIR system on lung nodule in 2004 considering shape descriptors and density histograms to retrieve 3-D lung nodules but precision and recall of this CBIR system was not reported. Lam et al.[7] in
2007 developed an open source pulmonary nodule image retrieval framework using Haralick features from grey-level co-occurrence matrix. They achieved precision of about 88% when one item is retrieved. In their work various shape features of nodule and class level retrieval accuracy for different category of nodules were not reported. Lung nodules have large variety of shapes, size and internal structure. Depending on margin sharpness nodule is classified as well defined and ill defined. Depending on surface geometry classifications are lobular spicular. Presence of internal cavity categorized nodule as excavated.The objective of proposed CBIR system is to assist budding radiologists in identifying the right description of nodule. The block diagram of proposed CBIR system is shown in Figure 1. The different steps are Query formation (in form of volume of interest), visual content extraction (in form of feature vector), measuring similarity between feature vector for the query image and other nodule images in database, retrieval of similar nodules and displaying and comparative study among query and retrieved nodules in terms of few pathological index provided by radiologists.
Figure 1: Framework of CBIR system for SPN
2 2.1
Materials and Method Nodule Segmentation
Volume of interest (VOI) is selected enclosing the nodule of size (30mm × 30mm × 30mm) coinciding the centroid of VOI with the centroid of nodule. Hybrid preprocessing method [8] is applied within the VOI prior to nodule segmentation for improving segmentation accuracy. This preprocessing method consists of geometry based diffusion for preserving nodule boundary and followed by selective enhancement filtering to highlight blob-like structure and to suppress vessel-like structure. Due to diversity of nodule types, there is need to design a robust region growing method for lung nodule segmentation. Contrast based region-growing is applied considering 26 neighbor. Seed point is selected as any point on the nodule and threshold parameter is set using adaptive thresholding.
2
2.2
Feature Extraction
For designing CBIR system for pulmonary large amount of visual features and low-level image character features such as geometrical and texture has included to describe the segmented nodule in multi-dimensional feature space in terms of feature vector. This feature vector is linked with corresponding subject in the CT image database. The numbers of features of an image are usually very large and some of features are useful to represent image characters and some are not. The fewer the dimension numbers of feature vectors indicates less costs of similarity calculation of every searching. We have used logistic regression to find out more discriminating feature subset satisfying maximum relevance and minimum redundancy criteria.
2.2.1
Shape and Texture Feature
The shape features like sphericity, lobulation index, spiculation index, mean radial distance, Calcification index, 3-D accutance on nodule surface are calculated to characterize various geometrical property of nodule in feature space. sphericity is measured by dividing nodule surface area by volume of nodule. Spiculation index is calculated from the standard deviation of radial distance of each surface point on nodule. Mean of radial distance indicates average diameter of nodule. Calcium content in nodule is represented by calcification index and calculated using intensity based thresholding. Sharpness of nodule margin is represented in terms of 3-D accutance which is normalized average gradient computed on surface points of nodule. Internal structure of nodule consists of soft tissue, fat, fluid and partly solid tissue. Fat content, soft tissue content and fluid content is extracted from intensity based thresholding. Homogeneity in nodule structure is represented by entropy and cluster tendency calculated from from 3-D gray level co-occurrence matrix (GLCM). The correspondence of visual features of nodule annotated by radiologists and extracted by applying different computational technique is described in Table 1.
Table 1: Correspondence among diagnostic feature and machine level feature of solid nodule Annotated by radiologists Sphericity Spiculation Equivalent diameter Calcification Sharpness of margin Subtlety Internal structure Texture
2.3
Machine level feature Compactness in 3-D SD of radial distance Mean radial distance Calcification index Accutance in 3-D Contrast from GLCM Entropy and cluster tendency from GLCM Fat, soft tissue content and homogeneity from GLCM
Similarity Measure
The machine level features are extracted from segmented nodule and stored in feature database. Determining of similar nodules for a given query nodule is done by comparing the feature vector of query nodule with all feature vectors in feature database. Euclidian distance is used for similarity measure in feature space using feature vector of corresponding nodule. The retrieved nodules for a particular query is ranked depending on Euclidian distance.
3
2.4
Database Preparation
Our database consists of 30 subjects from lung image database consortium (LIDC) public containing 45 solid pulmonary nodules and 25 subjects from PGIMER, Chandigarh containing 40 solid pulmonary nodules for evaluation of the performance of nodule CBIR system. [9]. The LIDC protocol does not enforce consensus among the radiologists, rather allowing each radiologist to review the outlines and rating drawn by other three radiologists. Radiologists considered nodules having diameter from 3mm to 30mm and rated each nodule with nine diagnostic characteristics like; texture, subtlety, spiculation, lobulation, sphericity, margin, malignancy, internal structure in scale of 1-5 and calcification in scale of 1-6. In the scale of 1 to 5, 5 indicates high value. We have applied same diagnostic feature annotation protocol for data set collected from PGIMER Chandigarh. Chandigarh The internal structure and variation in calcification of nodule are given in table 2 and table 3.
Table 2: Calcification type Calcification Popcorn Laminated Solid Non Central Central Absent
3
Index 1 2 3 4 5 6
Table 3: Internal structure Structure Soft tissue Fluid Fat Air
Index 1 2 3 4
Simulation and Results
The proposed standalone CBIR system is validated on 45 solid pulmonary nodules from LIDC dataset and 40 solid pulmonary nodules from PGIMER data set. The performance of CBIR system has provided in terms of precision where precision is the ratio of relevant retrieved nodule to all retrieved nodule. The average precision achieved for LIDC data set is 72.18% and for PGIMER data set is 78.29% considering top 5 retrieved nodules. Radiologists are interested on spiculation, lobulation, boundary sharpness, internal calcification, internal fat content and homogeneity of nodule for differential diagnosis. We have mentioned different clinical feature index for query and retrieved nodules provided by radiologists in ground truth for better clarity. The retrieved results for different types query nodules is shown in Figure 2.
4
Conclusion
A CBIR system is presented considering solid pulmonary nodule as pathology bearing region for differential diagnosis and self learning of budding radiologists.The average precision achieved for LIDC data set is 72.18% and for PGIMER data set is 78.29% considering top 5 retrieved nodules. Work is going on to increase precision by improving feature extraction and feature selection strategy. Further research on nodule characterization should be focused on the integration of multiple features like patient history and several histopathological information as well as details imaging information CT scan.
4
Figure 2: Retrieved nodules for spicular, well defined, lobular and excavated query nodules. The diagnostic features of query and retrieved nodule are given below the respective images.
5
5
Acknowledgments
This work has been supported by Department of Information Technology (DIT), Govt. of India Grant number 1(3)2009 - ME & TMD. The authors are grateful to Dr. Anup sadhu, Calcutta Medical College, Kolkata for providing valuable advice to our research work.
References [1] Cancer Facts and Figure 2009 by American Cancer Society, http://www.cancer.org [2] Diederich S., Wormanns S., Semik M., Thomas M., Lenzen M., Roos N., and Heindel W., “Screening for early lung cancer with low-dose spiral CT: prevalence in 817 asymptomatic smokers,” Radiology, vol. 222, no.3, pp. 773-781, 2002. [3] Austin J. H. M., Muller N. L., Friedman P. J., Hansell D. M., Naidich D. P., Jardin M. R., Webb W. R. and Zerhouni E. A., “Glossary of terms for CT of lungs; recommendations of the Nomenclature Committee of the Fleischner Society”, Thoracic Radiology, vol. 200, pp.327-331, 1996. [4] Li Q., Li F., Shiraishi J., Katsuragawa S., Sone S. and Doi K., “Investigation of new psychophysical measures for evaluation of similar images on thoracic computed tomography for distinction between benign and malignant nodules”, Medical Physics,vol. 30, no.30, pp. 2584-2593, 2003. [5] Muller H., Michoux N., Bandond D. and Geissbuhler A., “A review of content-based image retrieval systems in medical applications-clinical benefits and future directions”, Int J Med Informatics, vol. 73, pp. 1-23, 2004 [6] Kawata Y., Niki N., Ohmatsu H., Kusumoto M., Kakinuma R., Yamada K., Mori K., Nishiyama H., Eguchi E., Kaneko M., and Moriyama N., ”Pulmonary nodule classification based on nodule retrieval from 3-D thoracic CT image database”, Medical Image Computing and ComputerAssisted Intervention (MICCAI 2004). [7] Lam M., Disney T., Raicu D. S., Furst J. and Channin D. S., “BRISC-An open source pulmonary nodule image retrieval framework”, Journal of digital imaging, 2007. [8] Dhara A. K. and Mukhopadhyay S., “Hybrid Preprocessing Method Using Geometry Based Diffusion and Selective Enhancement Filtering for Pulmonary Nodule Detection”, Proc. of SPIE Medical Imaging 2012, Vol. 8315. [9] Armato et al. S. G., “The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A Completed Reference Database of Lung Nodules on CT Scans”, Medical Physics, Vol. 38, No. 2, February 2011
6