Cost Optimization using Normal Linear Regression

International Conference on Electronics, Communication and Aerospace Technology ICECA 2017

Cost Optimization using Normal Linear Regression Method for Breast Cancer Type I Skin Chandrasegar Thirumalai, IEEE Member,

Rashad Manzoor

School of Information Technology and Engineering, VIT University, Vellore, India. [email protected]

School of Information Technology and Engineering VIT University, Vellore, India. [email protected]

Abstract—There are six main categories of breast cancer be existent. In this paper, we have taken the Type 1 carcinoma cancer to support the decision making. For this, a novel machine learning based cost optimization is applied to make an efficient decision from the samples. Moreover, we have applied our methodology on the real datasets to predict cancer with appropriate parameters using Pearson correlation. This work can be used well on lightweight devices like smartphones or tablets to decide more precisely with primary factors. Keywords-component; Pearson Correlation Coefficient, Cost Optimization, Linear Regression.

I.

INTRODUCTION

Even though the chance of breast cancer in a young woman is less and few of them die, one should be aware of breast cancer preventions and impact [36]. To be in safe side, once the woman attains 25 ages, they should check by themselves once a year mammogram test [7], [26], [28]. Breast cancer can be predicted or decided with the biomedical application [8] by conducting various tests and early prediction models [37], [41], [45], [48]. Some of the prediction model includes heuristic fuzzy [10], [13], [18], Analytical Hierarchy Process (AHP) [20], Genetic Programming (GP) [25], machine learning based classification [46], [49] and agent-based model [23]. Breast cancer detection [17], [30], [39], [43] can be supported by single [2], [4], [6] and multi-attribute [1], [3], [5], [9], [14], [47] decision models. This will help the doctor to recommend appropriate treatment or surgery [21], [32]. In general, before prediction or decision sampling, data analysis methods like box plot and control charts [50] are applied to find the optimized result. Nowadays, breast cancer prediction and detection model are supported by cloud secure model [16], [33], [44]. The size of the medical image [11], [15], [19], [34] is in huge form and it is supported by the big data model. Initially, the patient undergoes various breast test conducted by physicians through the recommendation of a doctor. The test result of a patient is kept secret say mammogram results. Here the doctor will be residing at a remote location and looking for the pharmacist to send the patient detailed report. For this scenario, the sensitive and decision-making attributes of breast cancer are clustered with decisional attributes using AHP, intuitionistic fuzzy, machine learning based prediction model. These sensitive attributes are uplifted securely on common media though secured [12], [27], [29], authenticated [22], [24], [31] and multi-user support

[35], [38], [40], [42] methods. A portion of the previous techniques to gauge the choices in light of their relationship to quality are Spearman. For to make effective group decision from the various history of the patients heuristic prediction and machine learning models plays a major role. Decision making is a cognitive process for making the decision for an important process. The result obtained from the decision making should be the final choice i.e. the result obtained should be exact or the most suitable value. Breast cancer is undoubtedly the most common type of cancer diagnosed in women. Statistics have proven that even children are prone to breast cancer. The newly diagnosed breast cancer patients face difficulty in decision making due to the complexity of the different data obtained from different types of tests like MRI (magnetic resonance imaging), Mammogram, breast exam etc. To stage the level of breast cancer, various procedures are adopted. Some of the usual methods include blood test (blood count), mammography, breast MRI, bone scan, CT scan (computerized tomography) and PET scan (positron emission tomography). The breast cancer stages fall from 0-IV. Presently, many of the decision-making systems do not meet the quality of an efficient decision-making system. So, there is a need for different methodologies to improve the quality of decision making due to the sensitivity of data. Here, we have taken the real data set of breast cancer. The features of breast cancer includes IO – Impedivity (ohm) at zero frequency; PA500 –phase angle at 500 kHz; HFS –high-frequency slope of phase angle; DA –impedance distance between spectral ends; AREA – area under spectrum; A/DA –area normalized by DA; MAX IP – maximum of the spectrum; DR –distance between I0 and real part of the maximum frequency point; P – length of the spectral curve. There are two types of tumor namely, benign and malignant. A benign tumor is a non-cancerous condition which is very common whereas, a malignant tumor is a cancerous condition that starts from the cells of breast tissue and that may grow into surrounding tissues. There are six categories of breast cancer: Carcinoma – it develops from the epithelial cell of the lining. Fibroadenoma – it’s a benign tumor and its common symptoms are a lump. Mastopathy – caused by hormonal imbalance in the breast. Glandular – affect’s in breast glands.

978-1-5090-5686-6/17/$31.00 ©2017 IEEE

264


Connective – affect’s in connective tissue which provides support and shape. Adipose – affect’s in adipose which made up of fat and one of the most dangerous. TABLE I.

CARCINOMA CANCER DATASET.

No.

I0

PA500

HFS

DA

Area

A/DA

Max IP

DR

P

1

524

0.18

0.03

228

6843

29.9

60.2

220

556

2

330

0.22

0.26

121

3163

26.1

69.7

99

400

3

551

0.23

0.06

264

11888

44.8

77.7

253

656

4

380

0.24

0.28

137

5402

39.2

88.7

105

493

5

362

0.20

0.24

124

3290

26.3

69.3

103

424

6

389

0.15

0.09

118

2475

20.8

49.7

107

429

7

290

0.14

0.05

74

1189

15.9

35.7

65

330

8

275

0.15

0.18

91

1756

19.1

39.3

82

331

9

470

0.21

0.22

184

8185

44.3

84.4

164

603

10

423

0.21

0.26

172

6108

35.4

79.0

153

558

11

410

0.31

0.29

255

10622

41.5

67.5

246

508

12

500

0.22

0.05

219

9819

44.7

76.8

207

602

13

438

0.21

0.06

120

4879

40.3

80.7

89

525

14

366

0.28

0.25

172

7064

40.8

75.6

155

471

15

485

0.23

0.13

253

8135

32.0

64.8

245

541

16

390

0.35

0.20

245

10055

40.9

70.3

236

477

17

269

0.20

0.03

80

1963

24.4

44.7

66

329

18

300

0.19

0.16

97

3039

31.3

51.3

82

387

19

325

0.22

0.28

229

5705

24.8

35.6

227

462

20

294

0.20

0.46

194

5541

28.4

36.7

191

445

21

500

0.19

0.19

144

3055

21.1

96.5

107

542

II.

EXISTING SYSTEM

2. Find out the most similar distance between of two points by using the Euclidean distance function

v ( xi1 x j1 ) 2 ........( xin x jn ) 2 . 3. Scaling each attribute by the maximum diff of each xij value, v . ij max k xkj min k xkj C. ID3 1. Begins with the root node S of the discrete set. 2. Iterate through every unused attribute of the discrete data set. 3. Calculate the entropy H (S ) . 4. Split the discrete set by the selected attribute and produce subsets of the current discrete set. 5. Iterate until the sets are diluted and obtain the tree structure. III.

PROPOSED METHODOLOGIES

A. Linear Regression 1. Get the two data sets with n examples and let us denoted it as xi and yi for each. 2. Find out the variable m by using the equation, ( i xi yi ) nx y . m

(i xi2 ) nx

2

3. Then find out the variable b y mx .

b by using the equation,

4. At last substitute the value of b and m in the linear

f x mx b we can predict any value of y when x as input. regression equation,

B. Nearest Neighbor 1. Start

with

two

points

( xi1, xi 2 ......xin )

and

( x j1 , x j 2 ,.....x jn ) . Figure 1. Proposed Decision Making Model.

978-1-5090-5686-6/17/$31.00 ©2017 IEEE

265


For analyzing the dataset and to obtain the best alternative solutions which are related to the problem there are different kinds of methodologies available and each of them works in different ways. By choosing the best methodologies we can obtain more efficient decision and alternative solution for the decision making process.

2. Select the attribute a1...an of carcinoma breast tissue from

A. Pearson Correlation Coefficient Method

5. 6. 7. 8. 9.

Pearson correlation coefficient or PPMC is the measure of correlation (linear) between two variables x and y i.e. to measure how the variables x and y are related to each other and the relationship between x and y denoted by r and obtained from the equation,

r

n( xy ) ( x)( y )

[n x 2 ( x) 2 ][ n y 2 ( y ) 2 ]

{A}

3. Choose the primary by applying Pearson correlation among the attributes. 4. Create similarity index r , for every attribute pair (ai , a j ) using Pearson method. Select the attribute pair with high correlation. Compute mean for all attributes. Compute variance for high correlated attributes. Compute standard deviation for step 6. For every pair compute the linear variable S .D( y ) . br S .D( x)

10. Compute a y bx . Where y and x are the corresponding mean. 11. Now predict y from x using the form a bx . 12. Correlate the image with respect to skin disease class.

If the values from the above equation lies,

IV.

Between 0 and 1 then, positive correlation. On 0, no correlation. Between 1 and 0 then, negative correlation.

TABLE III.

PEARSON, MACHINE LEARNING, AND LINEAR REGRESSION

Attributes

A1

A2

A3

A4

P VS IO

P VS DA

P VS AREA

P VS A/DA

Pearson (r)

0.922541

0.73183

0.78832

0.72586

Cost by machine learning

12974.3

3445.731

15708603

554.495

Optimized, θ

1.5

1.5

2

0

Std ( y) Std ( x)

0.861747

0.43715

26.83599

0.072288

a y bx

-19.6385

-68.9831

-7158.27

-2.69839

From these correlation values, we can find out which are highly correlated and the variables which are least correlated. TABLE II.

NUMERICAL RESULTS

CARCINOMA ATTRIBUTE VALUES BY PEARSON

A1

A2

A3

A4

A5

A6

A7

A8

A9

A1

1.0

0.2

-0.3

0.6

0.7

0.5

0.7

0.6

0.9

A2

0.2

1.0

0.3

0.6

0.7

0.7

0.4

0.6

0.3

A3

-0.3

0.3

1.0

0.1

0.1

0.4

0.0

0.1

-0.8

A4

0.6

0.6

0.1

1.0

0.9

0.6

0.2

1.0

0.7

A5

0.7

0.7

0.1

0.9

1.0

0.8

0.4

0.9

0.8

A6

0.5

0.7

0.4

0.6

0.8

1.0

0.6

0.5

0.7

A7

0.7

0.4

0.0

0.2

0.4

0.6

1.0

0.1

0.7

A8

0.6

0.6

0.1

1.0

0.9

0.5

0.1

1.0

0.7

A9

0.9

0.3

-0.8

0.7

0.8

0.7

0.7

0.7

1.0

B. Machine Learning

b r.

TABLE IV.

Machine learning is the ability of the computers to learn without being explicitly programmed. There are two types of machine learning, unsupervised and the supervised machine learning. The unsupervised machine leaning indicates that inferring a function from unlabeled data whereas the supervised machine learning is to infer the function from known datasets. 1. Input the image.

DERIVED LINEAR REGRESSION EQUATION FOR CARCINOMA Attribute

y a bx

P vs IO

y 19.6385 0.861747 x

P vs DA

y 68.9831 0.43715 x

P vs AREA

y 7158.27 26.83599 x

P vs A/DA

y 2.69839 0.072288 x

978-1-5090-5686-6/17/$31.00 ©2017 IEEE

266


V.

CONCLUSION

Here, we have taken the real dataset of breast cancer skin type-I (carcinoma). For each attribute, we have analyzed its correlation coefficient with other attributes, by using Pearson method. From these coefficient values, the attributes having values which are greater than 0.55 are taken into consideration. Further, machine learning method is used for finding out the least cost function and its corresponding theta value. By using simple linear regression method on the reduced attributes, we have obtained linear equations which have been represented in table IV. From these equations, we can find out the required attribute by giving the appropriate inputs. Moreover, from our proposed model one can take wise decisions with the help of handy electronic devices like smartphones and tabs. REFERENCES [1] Zhang, Xiaolu, and Zeshui Xu. "Hesitant Fuzzy Methods for Multiple Criteria Decision Analysis." Studies in fuzziness and soft computing, 2016. [2] Kwait, Rebecca M., et al. "Influential Forces in Breast Cancer Surgical Decision Making and the Impact on Body Image and Sexual Function." Annals of Surgical Oncology 23.10 (2016): 3403-3411. [3] Van Wersch, Anna, and Katherine Swainston. "Women’s experiences of treatment decision-making in breast cancer." Praeger, 2014. [4] Levine, Mark Norman, et al. "Population-based evaluation of 21-gene assay in treatment decision making for early breast cancer in Ontario." (2014): 583-583. [5] Augustovski, Federico, et al. "Decision-making impact on adjuvant chemotherapy allocation in early node-negative breast cancer with a 21gene assay: systematic review and meta-analysis." Breast cancer research and treatment 152.3 (2015): 611-625. [6] O’Brien, Mary Ann, et al. "Physician-related facilitators and barriers to patient involvement in treatment decision making in early stage breast cancer: perspectives of physicians and patients." Health Expectations 16.4 (2013): 373-384. [7] Vikhe, P.S., Thool, V.R. Contrast enhancement in mammograms using homomorphic filter technique, International Conference on Signal and Information Processing, IConSIP 2016. [8] Rama Devi, K., Rani, A.J., Prasad, A.M. Design of microstrip antenna with improved bandwidth for biomedical application (2017) Advances in Intelligent Systems and Computing, 468, pp. 201-215. [9] Zhang, G. Q., & Lu, J. (2003). An integrated group decision-making method dealing with fuzzy preferences for alternatives and individual judgments for selection criteria. Group Decision and Negotiation, 12, 501–515. [10] Kalaiarassan G, Krishan, Somanadh M, Chandrasegar Thirumalai, Senthilkumar M, "One-Dimension Force Balance System for Hypersonic Vehicle an experimental and Fuzzy Prediction Approach," Elsevier, ICMMM - 2017. [11] Viswanathan, P. "Fusion of cryptographic watermarking medical image system with reversible property." Computer Networks and Intelligent Computing. Springer Berlin Heidelberg, 2011. 533-540. [12] Chandrasegar Thirumalai, Viswanathan P, “Diophantine based Asymmetric Cryptomata for Cloud Confidentiality and Blind Signature applications,” JISA, Elsevier, 2017. [13] Ponnurangam, Dhavachelavan, and G. V. Uma. "Fuzzy complexity assessment model for resource negotiation and allocation in agent-based software testing framework." Expert Systems with Applications 29.1 (2005): 105-119. [14] Zhang, X. L., Xu, Z. S., & Wang, H. (2015). Heterogeneous multiple criteria group decision making with incomplete weight information: A deviation modeling approach. InformationFusion, 25, 25–62.

[15] Viswanathan, P., and P. Venkata Krishna. "Text fusion watermarking in medical image with semi-reversible for secure transfer and authentication." Advances in Recent Technologies in Communication and Computing, 2009. ARTCom'09. International Conference on. IEEE, 2009. [16] Behera, S., Rani, R. Comparative analysis of density based outlier detection techniques on breast cancer data using hadoop and map reduce, Proceedings of the International Conference on Inventive Computation Technologies, ICICT 2016. [17] Velusamy, P.D., Karandharaj, P., Prabakar, S. Morphological analysis for breast cancer detection (2017) Advances in Intelligent Systems and Computing, 467, pp. 197- 208. [18] Chandrasegar Thirumalai, Senthilkumar M, “An Assessment Framework of Intuitionistic Fuzzy Network for C2B Decision Making”, International Conference on Electronics and Communication Systems (ICECS), Feb. 2017 [19] Mhala, N.C., Bhandari, S.H. Improved approach towards classification of histopathology images using bag-of-features, International Conference on Signal and Information Processing, IConSIP 2016. [20] Vaishnavi B, Karthikeyan J, Kiran Yarrakula, Chandrasegar Thirumalai, “An Assessment Framework for Precipitation Decision Making Using AHP”, International Conference on Electronics and Communication Systems IEEE – ICECS, pp. 418 – 421 Feb. 2016 [21] Lam, Wendy WT, et al. "Does the use of shared decision-making consultation behaviors increase treatment decision-making satisfaction among Chinese women facing decision for breast cancer surgery?." Patient education and counseling 94.2 (2014): 243-249. [22] Chandrasegar Thirumalai, Viswanathan P, “Hybrid IT architecture with Gene based Cryptomata (HITAGC) for mutual authentication and privacy preserving security services,” International Journal of Advanced Intelligence Paradigms, 2017. [23] Vengattarman, T., and Ponnurangam Dhavachelvan. "An agent-based personalized e-learning environment: Effort prediction perspective." Intelligent Agent & Multi-Agent Systems, 2009. IAMA 2009. International Conference on. IEEE, 2009. [24] Viswanathan, P., P. Venkata Krishna, and S. Hariharan. "Multimodal biometric invariant moment fusion authentication system." Information Processing and Management, Springer Berlin Heidelberg, 2010. 136143. [25] M.Senthilkumar, T.Chandrasegar, M.K. Nallakaruppan, S.Prasanna, “A Modified and Efficient Genetic Algorithm to Address a Travelling Salesman Problem,” in International Journal of Applied Engineering Research, Vol. 9 No. 10, 2014, pp. 1279-1288 [26] Morrow, Monica, et al. "Access to breast reconstruction after mastectomy and patient perspectives on reconstruction decision making." JAMA surgery 149.10 (2014): 1015-1021. [27] Chandrasegar Thirumalai, “Physicians Drug encoding system using an Efficient and Secured Linear Public Key Cryptosystem (ESLPKC),” International journal of pharmacy and technology, Vol. 8 Issue 3, Sep. 2016, pp. 16296-16303 [28] Saubhagya, V.K., Rani, A., Singh, V. ANN based detection of Breast Cancer in mammograph images, 1st IEEE International Conference on Power Electronics, Intelligent Control and Energy Systems, ICPEICES 2016. [29] Chandrasegar Thirumalai, Senthilkumar M, Vaishnavi B, “Physicians Medicament using Linear Public Key Crypto System,” in International conference on Electrical, Electronics, and Optimization Techniques ICEEOT, IEEE, pp. 1937 – 1939, March 2016. [30] Chen, Bang-Bin, et al. "A pilot study to determine the timing and effect of bevacizumab on vascular normalization of metastatic brain tumors in breast cancer." BMC cancer 16.1 (2016): 466. [31] Chandrasegar Thirumalai, Senthilkumar M, “Secured E-Mail System using Base 128 Encoding Scheme,” International journal of pharmacy and technology, Vol. 8 Issue 4, Dec. 2016, pp. 21797-21806. [32] Hopko, Derek R., et al. "Pretreatment depression severity in breast cancer patients and its relation to treatment response to behavior therapy." Health Psychology 35.1 (2016): 10.

978-1-5090-5686-6/17/$31.00 ©2017 IEEE

267


[33] Chandrasegar Thirumalai, Senthilkumar M, Silambarasan R, Carlos Becker Westphall, “Analyzing the strength of Pell’s RSA,” IJPT, Vol. 8 Issue 4, Dec. 2016, pp. 21869-21874. [34] Bronson, Mackenzie. "Early Stage Breast Cancer And Preoperative Breast Magnetic Resonance Imaging (Mri) Use: Estimated Long-Term Outcomes And Cost-Effectiveness" 38th Annual North American Meeting of the Society for Medical Decision Making. Smdm, 2016. [35] Chandrasegar Thirumalai, “Review on the memory efficient RSA variants,” International Journal of Pharmacy and Technology, Vol. 8 Issue 4, Dec. 2016, pp. 4907-4916. [36] Bouskill, Kathryn, and Michael Kramer. "The impact of cancer and quality of life among long-term survivors of breast cancer in Austria." Supportive Care in Cancer 24.11 (2016): 4705-4712. [37] Chakraborty, S., Bhowmik, M.K., Ghosh, A.K., Pal, T. Automated edge detection of breast masses on mammograms (2017) IEEE Region 10 Annual International Conference, Proceedings/TENCON, pp. 12411245. [38] T Chandra Segar, R Vijayaragavan, “Pell's RSA key generation and its security analysis,” in Computing, Communications and Networking Technologies (ICCCNT) 2013, pp. 1-5 [39] Thivya, K.S., Sakthivel, P. Analysis of framelets for the microcalcification (2017) Advances in Intelligent Systems and Computing, 459 AISC, pp. 11-21. [40] Chandrasegar Thirumalai, Himanshu Kar, “Memory Efficient Multi Key (MEMK) generation scheme for secure transportation of sensitive data over Cloud and IoT devices,” IEEE IPACT 2017. [41] Balachandran, D., Lavanya, R. Mass characterization in mammograms using an optimal ensemble classifier (2017) IEEE Region 10 Annual International Conference, Proceedings/TENCON, pp. 2567-2570. [42] Chandrasegar Thirumalai, Sathish Shanmugam, “Multi-key distribution scheme using Diophantine form for secure IoT communications,” IEEE IPACT 2017. [43] Singh, I., Sanwal, K., Praveen, S. Breast cancer detection using two-fold genetic evolution of neural network ensembles, Proceedings of the 2016 International Conference on Data Science and Engineering, ICDSE 2016. [44] Chandrasegar Thirumalai, Senthilkumar M, “Spanning Tree approach for Error Detection and Correction,” IJPT, Vol. 8, Issue No. 4, Dec2016, pp. 5009-5020. [45] Appukuttan, A., Sindhu, L. Curvelet and PNN classifier based approach for early detection and classification of breast cancer in digital mammograms, Proceedings of the International Conference on Inventive Computation Technologies, ICICT 2016. [46] E Malathy, Chandra Segar Thirumalai, "Review on non-linear set associative cache design," IJPT, Dec-2016, Vol. 8, Issue No.4, pp. 53205330 [47] Yu, P. L. (1973). A class of solutions for group decision problems. Management Science, 19, 936–946. [48] Vidya, V.K., Mathew, S. An accurate method of breast cancer detection from ultra sound images using probabilistic fuzzy clustering algorithm, (2017) 2016 International Conference on Communication Systems and Networks, ComNet 2016, pp. 231-235. [49] Kompalli, V.S., Kuruba, U.R. Combined effect of soft computing methods in classification (2017) Advances in Intelligent Systems and Computing, pp. 501-509. [50] Software metric Numerical Data analysis using Box plot and control chart methods, DOI:10.13140/RG.2.2.27422.95041

978-1-5090-5686-6/17/$31.00 ©2017 IEEE

268