A New Features Extraction Method based on Polynomial Regression ...

2015 International Conference

on Industrial Instrumentation and Control (ICIC) Col/ege ofEngineering Pune, India. May28-30,2015

A New Features Extraction Method based on Polynomial Regression for the assessment of Breast Lesion Contours Patri Shivakshit EEE: dept. Bits-Pilani, Hyderabad Campus Hyderabad, India [email protected]

Spandana Paramkusham

EEE dept. Research Scholar Bits-Pilani, Hyderabad Campus Hyderabad, India [email protected] Dr.K.M.M.Rao EEE dept. Adjunct Faculty, Life Seniormember IEEE Bits-Pilani , Hyderabad Campus Hyderabad, India

Abstract-

B.V.V.S.N.Prabhakar Rao EEE dept. Faculty, Assistant Professor Bits-Pilani , Hyderabad Campus Hyderabad, India

Shape of breast Contours are prominent signs to

determine malignancy in mammograms. A new algorithm for feature extraction is proposed based on polynomial regression on the signatures of benign and malignant contours. Two features mean absolute error and correlation coefficient were extracted for 57 mammograms of which 32 images were malignant contours and 25 images were benign contours. Three different pattern classifiers Support vector machine with radius basis function as kernel and sigma=0.7,Linear discriminate analysis, Bayes linear classifier methodologies were

used

for

measures.Our remarkable

calculation

new

feature

recognition

of

performance

extraction accuracy

method and

evaluation attained

Area

a

under

curve(AVC) of above 89%

in all three pattern classifier

techniques.

three

Among

all

the

classifiers

Bayes

linear

classifier gave good recognition accuracy of 96.29% and AVC of 0.9833.

Keywords-Benign, Malignant, Signature,Classijier.

I. INTRODUCTION Breast cancer is the most common cancer in women all over India and In India among all the cancers breast cancer records 25% to 31% in women in Indian cities [1]. According to WHO for the year 2012, an estimated 70218 women died in India due to breast cancer, more than any other country in the world. Early detection and diagnosis is very important to prevent fatality. Digital mammography is an important tool to detect breast cancer in early stages, but with some exceptions which require additional diagnosis in the decision making. Mammograms with computer algorithms help to prevent undesirable conclusions which directly affect the results and life span of the patient. It also helps radiologists to detect breast cancer easily in less time and forbid unnecessary biopsies. Image processing algorithms help to assess the malignancy of tumour.

978-1-4799-7165-7/15/$31.00 ©2015 IEEE

579

Boundaries of these masses are the prominent signatures of malignancy in the breast mammograms. According to the Breast Imaging Reporting and Data Systems (BIRADS), masse is an area of mammogram having high intensity region and looks abnormal than other regions of mammogram. They are distinguished by their shape (round, oval, lobular, irregular) and margins (circumscribed, microlobulated, obscured, indistinct, spiculated).Work has been done on segmentation of mass in past to know the spread of speculation in the breast tissue. Segmentation of mass and its boundary extraction plays vital role for extraction of quantitative parameters to delineate benign and malignant masses. Automatic segmentation of mass is an open problem as the mammograms have low contrast and noisy and the lesions overlap with the breast tissue in the mammogram. Many algorithms have been developed by researchers for automatic segmentation and classification of masses. Mean shift algorithm and Fuzzy C-means and active contour models are used in [2] for the detection of masses. Suspected regions were identified based on the iris filter output and, grey level, texture, contour-related, and morphological features were extracted and further classified using back propagation neural network in [3]. Wavelet packet energy and Tsallis entropy parameterization were used in [4] for extraction of features and a support vector machine, multilayer perception model were used for classification of mammographic regions. Shape features like elongatedness, eccentricity, Euler number, Max Radius, Min Radius were used to distinguish four different shapes round, oval, lobular, irregular of mass by using C5.0 decision tree algorithm in [5].The fractal dimension of mass contour from the 2-dimensional(2D) contour or from a 1dimensional (lD) signature derived for classifying masses in [6].Yin-Yin Liao et al. [7] have used a best fit ellipse as a

baseline and used nakagamiparameter and standard deviation as features for classification of benign and malignant tumors. Region-based measure of image edge profile acutance by polygonal approximation and measured shape features like compactness, Fourier descriptors, central invariant moments and chord-length statistics were implemented in [8] to distinguish between circumscribed and spiculated tumours.

B. K-Means Clustering This algorithm classify objects into a K number of clusters based on their characteristics. Where K is a positive integer number. We segment mass region using k-means algorithm. We consider the input as image pixels and their features are their grey-level or intensity values. The algorithm aims at minimizing sum of any pixel point to cluster centroid distances, we have chosen Euclidean distance as distance measure. Fig.2b. shows the output after k-means clustering.

II. DATASET

The mammogram images used in our experiments were taken from DDSM data base. The database contains 2,500 studies [9]. Each study includes two views of mammogram i.e., Craniocadal view and Mediolateral oblique view, along with patient information.Digital Database for screening Mammography (DDSM) digitized by four different scanners (HOWTEK-A, HOWTEK-D, LUMISYS, and DBA) with a pixel resolution 12 or 16bit per pixel. They also included software for accessing the ground truth information and for calculating performance measures for computer aided detection. III.METHODS

The automatic lesions segmentation and boundary extraction is organized into three main steps.1) K-means Clustering 2) Morphological operations 3) Morphological gradient. A flow chart of the automatic segmentation and boundary extraction is shown in fig.I.In this paper we have included feature extraction using polynomial regression and classification. For the details pertaining to automatic mass detection and boundary extraction, we referred to [10]. Image

+

I

fob

=

(feb) ffi b

(1) It is the erosion by f by b followed by a dilation of the result with b. Fig.2c shows the output after binary morphology operations.

Morphological gradient Combination of Dilation, Erosion, Image subtraction gives morphological gradient. The dilation thickens regions in an image and the erosion shrinks them[II]. Subtraction operation tends to remove the constant intensity areas and edges are enhanced. Fig 2d shows the boundary of the lesion. f is an input image. Morphological Gradient Dilation (t) - Erosion (t).

D.

A. Automatic mass segmentation and boundary detection

I

C. Binary Morphology Opening operation is used to remove small objects in the image and border object. Opening is used to break narrow isthmuses, and eliminates thin protrusions [11] without disturbing overall pixel intensity values and large bright objects. Opening of image f with structuring element b is given by

=

I

K-means Clustering

J.

I I

Thresholding

+ Removal of small objects with binary morphology

..

c) Malignant Lesion

Removal of border objects with binary morphology

I

E. Signature A signature is a ID functional representation of a boundary. It is a plot of the distance from the centroid to the boundary as a function of angle [11]. Fig.3a, 3b.show the borders of benign mass and Malignant mass. Fig.3c,3d show the signatures of this borders.

�

Morphological gradient

+ Border Extraction

I

..

ID signature

d) Boundary

I

Figure.I.Flow chart for automatic segmentation and boundary extraction

580

The correlation coefficient of data sets A and B is given by cc

Figure.3.a)8enign

(5)

b) Malignant

mean absolute error = l/mL�:::� IHce(i))-rCe(i))1(6)

c) Signature of8enign

F.

Equation (5) and (6) gives two features. Correlation coefficient and mean absolute error of the signature r(e)(Green color curve shown in Fig.4) and its corresponding polynomial fit H( e)(Red curve shown in Fig.4.).

d) Signature of Malignant

Polynomial fit by gradient descent algorithm:

. .

The polynomial hypothesis of random coefficients for the signature is given in (2).

H(8)

=

ao

+

l:�=1(an8n)

(2)

Where e is the angle measured from the vertical direction in the anti-clockwise direction and ao.aJ, ... a" are the th coefficients of the n - order polynomial hypothesis. Cost function J shown in (3) measures how indifferent the hypothesis is, from the actual signature of the contour.

jce) =

� I:=1(HCe(i))-rce(i)))2

I'

(3)

li

Figure.4.a) Malignant signature and its 15 degree polynomial hypothesis G.

a

an = an -a-jCe) aan

,. .

I

581

�

. . . ,.

�

M

b:)8enign Signature and its 15degreepolynomial hypothesis

Fit Polynomial Hypothesis H(e) of different degrees

+ Calculate Mean absolute error and correlation between b/w H(e) and r(e)

(4)

Where a is known as the learning rate, n is the degree of the polynomial hypothesis. If a is too low then it takes a huge number of iterations for J(e) to converge. If a is too high then J(e )might not even converge, therefore a mediocre value must be chosen. This procedure is continued until J(e)converges. We observed that the polynomial hypothesis H(e) fits benign signature fits well than that of malignant signature as shown in Fig.4a and Fig.4b.The degree of polynomial is selected empirically such that polynomial hypothesis perfectly fits the signatures of benign and under fit the signatures of malignant tumors. Mean absolute error and correlation between polynomial hypothesis H(e) and signature r(e) are extracted to delineate benign and malignant boundaries.

�

�

Extraction of features Extraction of Signature Coordinates r(e) and e

Where r (e) is the actual radial distance of the boundary from the median at an angle e and m is the number of samples. e(i)is a sample angle from the signature which is used for training the hypothesis. e varies from -n to n. All the coefficients aO,al,a2,a3 ..an updates continuously to optimize the hypothesis so that the cost function reaches to minimum value.

J

Figure.5. Flow chart for the extraction of features

In this frame work we have applied Polynomial fit with gradient descent algorithm which yielded different degree of polynomial to fit the signatures of lesions. Correlation coefficient and mean absolute error were taken as features for classification. Flow chart for the extraction of features is shown in Fig.5

Classification We calculated mean absolute error and correlation coefficient for different values of (n=7, 10,15, 25,60). Performance evaluation measures were calculated by using three different classifiers 1) SVM classifier using radial

H.

basis function with sigma=0.7 2) Linear discriminate analysis 3) Bayes Linear classifier. IV RESULTS Three pattern classification techniques are applied to obtain different evaluation parameters on 57 mammogram from DDSM database, out of which 32 images were malignant contours and 25 were benign contours. These were randomly selected and separated into two sets, 29 samples for training and 28 samples for testing. The performance of our mass detection algorithm is evaluated by applying a Hold-out methodology. The proposed new feature extraction method for delineating malignant and benign masses is evaluated by Accuracy(A), Sensitivity(Se), Specificity(Sp), Confusion matrix, Positive Predictive value(PPV), Negative Predictive Value(NPV), AUC(Area under ROC curve). TP TP+FN TN TN+FP

Sensitivity Specificity

TABLE TIT D

Evaluation Parameters With Bayes Linear Classifier PPV NPV A AVe Se Sf)

7 lO 15 25 60

92.59

100

84.16

87.5

100

0.9505

88.88

85.71

92.31

92.307

85.714

0.9560

96.29

91.67

100

100

93.75

0.9833

85.18

82.35

90

93.333

75

0.9353

51.85

100

7.14

50

100

0.6703

Table I, Table II, Table III give performance evaluation parameters of three different classifiers. Among them, polynomial hypothesis of degree 15 gives good recognition accuracy greater than 90% of all three classifiers. The best results for recognition accuracy (96.42%) are attained by Bayes classifier with polynomial hypothesis of degreel5. We have observed that if the degree of polynomial increases the recognition accuracy decreases. Fig.7 illustrates the performance evaluation parameters of three different classifiers and Fig.8. gives ROC analysis of Bayes classifier and SVM classifier. 102 100

Accuracy Positive Predictive value Negative Predictive Value =

TP+TN TP+TN+FP+FN TP TP+TN TN FN+TN

. .. "' r........ � )( / ",p ..,/

./\. / "J ./ " , - " ,

98 96 94 92

-SVM - L OA

90

-B!!yes

88 86

�"bc-.,.J"" ....�� ... s::-o.� ","

A New Features Extraction Method based on Polynomial Regression ...

A New Features Extraction Method based on Polynomial Regression ...

Suggest Documents

A New Robust Regression Method Based on

Local Polynomial Regression Based on Functional Data

An Eager Regression Method Based on Selecting Appropriate Features

Local Polynomial Quantile Regression With Parametric Features

A New Color Feature Extraction Method Based on ... - Semantic Scholar

A New Feature Extraction Method Based on EEMD ... - Semantic Scholar

A New Scene Classification Method Based on Local Gabor Features

the local polynomial regression method - Springer Link

Polynomial Regression through Least Square Method

Multivariable fractional polynomial method for regression model

Multiple polynomial regression method for ... - OSA Publishing

New Detection Method Based on ECG Signal Features to Determine ...

ECG FREQUENCY DOMAIN FEATURES EXTRACTION: A NEW ...

A Simhash-Based Integrative Features Extraction

SIMM Method Based on Acceleration Extraction for

Extraction method based on emulsion breaking for

Modelling using polynomial regression

A New Perturbation Approach to Optimal Polynomial Regression - MDPI

A robust polynomial regression-based voice activity detector for ...

On Comparison of Local Polynomial Regression ...

A Review on Supercritical Fluid Extraction as New Analytical Method

A Review on Supercritical Fluid Extraction as New Analytical Method

A MUSIC CLASSIFICATION METHOD BASED ON TIMBRAL FEATURES

a music classification method based on timbral features - CiteSeerX