2015 International Conference
on Industrial Instrumentation and Control (ICIC) Col/ege ofEngineering Pune, India. May28-30,2015
A New Features Extraction Method based on Polynomial Regression for the assessment of Breast Lesion Contours Patri Shivakshit EEE: dept. Bits-Pilani, Hyderabad Campus Hyderabad, India
[email protected]
Spandana Paramkusham
EEE dept. Research Scholar Bits-Pilani, Hyderabad Campus Hyderabad, India
[email protected] Dr.K.M.M.Rao EEE dept. Adjunct Faculty, Life Seniormember IEEE Bits-Pilani , Hyderabad Campus Hyderabad, India
Abstract-
B.V.V.S.N.Prabhakar Rao EEE dept. Faculty, Assistant Professor Bits-Pilani , Hyderabad Campus Hyderabad, India
Shape of breast Contours are prominent signs to
determine malignancy in mammograms. A new algorithm for feature extraction is proposed based on polynomial regression on the signatures of benign and malignant contours. Two features mean absolute error and correlation coefficient were extracted for 57 mammograms of which 32 images were malignant contours and 25 images were benign contours. Three different pattern classifiers Support vector machine with radius basis function as kernel and sigma=0.7,Linear discriminate analysis, Bayes linear classifier methodologies were
used
for
measures.Our remarkable
calculation
new
feature
recognition
of
performance
extraction accuracy
method and
evaluation attained
Area
a
under
curve(AVC) of above 89%
in all three pattern classifier
techniques.
three
Among
all
the
classifiers
Bayes
linear
classifier gave good recognition accuracy of 96.29% and AVC of 0.9833.
Keywords-Benign, Malignant, Signature,Classijier.
I. INTRODUCTION Breast cancer is the most common cancer in women all over India and In India among all the cancers breast cancer records 25% to 31% in women in Indian cities [1]. According to WHO for the year 2012, an estimated 70218 women died in India due to breast cancer, more than any other country in the world. Early detection and diagnosis is very important to prevent fatality. Digital mammography is an important tool to detect breast cancer in early stages, but with some exceptions which require additional diagnosis in the decision making. Mammograms with computer algorithms help to prevent undesirable conclusions which directly affect the results and life span of the patient. It also helps radiologists to detect breast cancer easily in less time and forbid unnecessary biopsies. Image processing algorithms help to assess the malignancy of tumour.
978-1-4799-7165-7/15/$31.00 ©2015 IEEE
579
Boundaries of these masses are the prominent signatures of malignancy in the breast mammograms. According to the Breast Imaging Reporting and Data Systems (BIRADS), masse is an area of mammogram having high intensity region and looks abnormal than other regions of mammogram. They are distinguished by their shape (round, oval, lobular, irregular) and margins (circumscribed, microlobulated, obscured, indistinct, spiculated).Work has been done on segmentation of mass in past to know the spread of speculation in the breast tissue. Segmentation of mass and its boundary extraction plays vital role for extraction of quantitative parameters to delineate benign and malignant masses. Automatic segmentation of mass is an open problem as the mammograms have low contrast and noisy and the lesions overlap with the breast tissue in the mammogram. Many algorithms have been developed by researchers for automatic segmentation and classification of masses. Mean shift algorithm and Fuzzy C-means and active contour models are used in [2] for the detection of masses. Suspected regions were identified based on the iris filter output and, grey level, texture, contour-related, and morphological features were extracted and further classified using back propagation neural network in [3]. Wavelet packet energy and Tsallis entropy parameterization were used in [4] for extraction of features and a support vector machine, multilayer perception model were used for classification of mammographic regions. Shape features like elongatedness, eccentricity, Euler number, Max Radius, Min Radius were used to distinguish four different shapes round, oval, lobular, irregular of mass by using C5.0 decision tree algorithm in [5].The fractal dimension of mass contour from the 2-dimensional(2D) contour or from a 1dimensional (lD) signature derived for classifying masses in [6].Yin-Yin Liao et al. [7] have used a best fit ellipse as a
baseline and used nakagamiparameter and standard deviation as features for classification of benign and malignant tumors. Region-based measure of image edge profile acutance by polygonal approximation and measured shape features like compactness, Fourier descriptors, central invariant moments and chord-length statistics were implemented in [8] to distinguish between circumscribed and spiculated tumours.
B. K-Means Clustering This algorithm classify objects into a K number of clusters based on their characteristics. Where K is a positive integer number. We segment mass region using k-means algorithm. We consider the input as image pixels and their features are their grey-level or intensity values. The algorithm aims at minimizing sum of any pixel point to cluster centroid distances, we have chosen Euclidean distance as distance measure. Fig.2b. shows the output after k-means clustering.
II. DATASET
The mammogram images used in our experiments were taken from DDSM data base. The database contains 2,500 studies [9]. Each study includes two views of mammogram i.e., Craniocadal view and Mediolateral oblique view, along with patient information.Digital Database for screening Mammography (DDSM) digitized by four different scanners (HOWTEK-A, HOWTEK-D, LUMISYS, and DBA) with a pixel resolution 12 or 16bit per pixel. They also included software for accessing the ground truth information and for calculating performance measures for computer aided detection. III.METHODS
The automatic lesions segmentation and boundary extraction is organized into three main steps.1) K-means Clustering 2) Morphological operations 3) Morphological gradient. A flow chart of the automatic segmentation and boundary extraction is shown in fig.I.In this paper we have included feature extraction using polynomial regression and classification. For the details pertaining to automatic mass detection and boundary extraction, we referred to [10]. Image
+
I
fob
=
(feb) ffi b
(1) It is the erosion by f by b followed by a dilation of the result with b. Fig.2c shows the output after binary morphology operations.
Morphological gradient Combination of Dilation, Erosion, Image subtraction gives morphological gradient. The dilation thickens regions in an image and the erosion shrinks them[II]. Subtraction operation tends to remove the constant intensity areas and edges are enhanced. Fig 2d shows the boundary of the lesion. f is an input image. Morphological Gradient Dilation (t) - Erosion (t).
D.
A. Automatic mass segmentation and boundary detection
I
C. Binary Morphology Opening operation is used to remove small objects in the image and border object. Opening is used to break narrow isthmuses, and eliminates thin protrusions [11] without disturbing overall pixel intensity values and large bright objects. Opening of image f with structuring element b is given by
=
I
K-means Clustering
J.
I I
Thresholding
+ Removal of small objects with binary morphology
..
c) Malignant Lesion
Removal of border objects with binary morphology
I
E. Signature A signature is a ID functional representation of a boundary. It is a plot of the distance from the centroid to the boundary as a function of angle [11]. Fig.3a, 3b.show the borders of benign mass and Malignant mass. Fig.3c,3d show the signatures of this borders.
�
Morphological gradient
+ Border Extraction
I
..
ID signature
d) Boundary
I
Figure.I.Flow chart for automatic segmentation and boundary extraction
580
The correlation coefficient of data sets A and B is given by cc
Figure.3.a)8enign
(5)
b) Malignant
mean absolute error = l/mL�:::� IHce(i))-rCe(i))1(6)
c) Signature of8enign
F.
Equation (5) and (6) gives two features. Correlation coefficient and mean absolute error of the signature r(e)(Green color curve shown in Fig.4) and its corresponding polynomial fit H( e)(Red curve shown in Fig.4.).
d) Signature of Malignant
Polynomial fit by gradient descent algorithm:
. .
The polynomial hypothesis of random coefficients for the signature is given in (2).
H(8)
=
ao
+
l:�=1(an8n)
(2)
Where e is the angle measured from the vertical direction in the anti-clockwise direction and ao.aJ, ... a" are the th coefficients of the n - order polynomial hypothesis. Cost function J shown in (3) measures how indifferent the hypothesis is, from the actual signature of the contour.
jce) =
� I:=1(HCe(i))-rce(i)))2
I'
(3)
li
Figure.4.a) Malignant signature and its 15 degree polynomial hypothesis G.
a
an = an -a-jCe) aan
,. .
I
581
�
. . . ,.
�
M
b:)8enign Signature and its 15degreepolynomial hypothesis
Fit Polynomial Hypothesis H(e) of different degrees
+ Calculate Mean absolute error and correlation between b/w H(e) and r(e)
(4)
Where a is known as the learning rate, n is the degree of the polynomial hypothesis. If a is too low then it takes a huge number of iterations for J(e) to converge. If a is too high then J(e )might not even converge, therefore a mediocre value must be chosen. This procedure is continued until J(e)converges. We observed that the polynomial hypothesis H(e) fits benign signature fits well than that of malignant signature as shown in Fig.4a and Fig.4b.The degree of polynomial is selected empirically such that polynomial hypothesis perfectly fits the signatures of benign and under fit the signatures of malignant tumors. Mean absolute error and correlation between polynomial hypothesis H(e) and signature r(e) are extracted to delineate benign and malignant boundaries.
�
�
Extraction of features Extraction of Signature Coordinates r(e) and e
Where r (e) is the actual radial distance of the boundary from the median at an angle e and m is the number of samples. e(i)is a sample angle from the signature which is used for training the hypothesis. e varies from -n to n. All the coefficients aO,al,a2,a3 ..an updates continuously to optimize the hypothesis so that the cost function reaches to minimum value.
J
Figure.5. Flow chart for the extraction of features
In this frame work we have applied Polynomial fit with gradient descent algorithm which yielded different degree of polynomial to fit the signatures of lesions. Correlation coefficient and mean absolute error were taken as features for classification. Flow chart for the extraction of features is shown in Fig.5
Classification We calculated mean absolute error and correlation coefficient for different values of (n=7, 10,15, 25,60). Performance evaluation measures were calculated by using three different classifiers 1) SVM classifier using radial
H.
basis function with sigma=0.7 2) Linear discriminate analysis 3) Bayes Linear classifier. IV RESULTS Three pattern classification techniques are applied to obtain different evaluation parameters on 57 mammogram from DDSM database, out of which 32 images were malignant contours and 25 were benign contours. These were randomly selected and separated into two sets, 29 samples for training and 28 samples for testing. The performance of our mass detection algorithm is evaluated by applying a Hold-out methodology. The proposed new feature extraction method for delineating malignant and benign masses is evaluated by Accuracy(A), Sensitivity(Se), Specificity(Sp), Confusion matrix, Positive Predictive value(PPV), Negative Predictive Value(NPV), AUC(Area under ROC curve). TP TP+FN TN TN+FP
Sensitivity Specificity
TABLE TIT D
Evaluation Parameters With Bayes Linear Classifier PPV NPV A AVe Se Sf)
7 lO 15 25 60
92.59
100
84.16
87.5
100
0.9505
88.88
85.71
92.31
92.307
85.714
0.9560
96.29
91.67
100
100
93.75
0.9833
85.18
82.35
90
93.333
75
0.9353
51.85
100
7.14
50
100
0.6703
Table I, Table II, Table III give performance evaluation parameters of three different classifiers. Among them, polynomial hypothesis of degree 15 gives good recognition accuracy greater than 90% of all three classifiers. The best results for recognition accuracy (96.42%) are attained by Bayes classifier with polynomial hypothesis of degreel5. We have observed that if the degree of polynomial increases the recognition accuracy decreases. Fig.7 illustrates the performance evaluation parameters of three different classifiers and Fig.8. gives ROC analysis of Bayes classifier and SVM classifier. 102 100
Accuracy Positive Predictive value Negative Predictive Value =
TP+TN TP+TN+FP+FN TP TP+TN TN FN+TN
. .. "' r........ � )( / ",p ..,/
./\. / "J ./ " , - " ,
98 96 94 92
-SVM - L OA
90
-B!!yes
88 86
�"bc-.,.J"" ....�� ... s::-o.� ","