IEEE Paper Template in A4 (V1)

26 downloads 32195 Views 390KB Size Report
algorithm is the size of training set and the issues of Decision tree are .... predictive analytics, text mining and business analytics [13]. This tool is used to ...
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 10, Number 9 (2015) © Research India Publications ::: http://www.ripublication.com

Applying Decision tree algorithm to predict Lupus using Rapid Miner S. Gomathi1, Dr. V. Narayani2 1

Information and Computer Technologies, Research and Development Centre, Bharathiar UniversityCoimbatore –Sri Krishna Arts and Science College. 2 Director i/c, Department of MCA, Karpagam College of Engineering, Coimbatore, India 1

[email protected],[email protected] TABLE I SLEDAI

Abstract— Lupus which is called as inflammatory Connective Tissue Disease (CTD) affects all parts of the body. The detailed study shows that the lupus affects more female than male and Blackish Europeans affected by lupus more than the Whitish American. Lupus is a chronic disease which cannot be cured but the lifetime of the patients can be extended. To extend the life time the special technique to predict the disease is important which can be easily done by data mining classification algorithm. This paper shows the detailed view of data mining decision tree algorithm to predict the lupus disease and the set of output acquired from the rapid miner tool by applying decision tree algorithm in the data set. Keywords — Data mining, Lupus, autoimmune, decision tree, Rapid Miner, classification, ACR.

INTRODUCTION Lupus is also called as Systemic Lupus Erythematosus which is common in American, African, Caucasian, Asian and Hispanic countries. SLE patients have increased comorbidities including cardiovascular, osteoporosis disease, infection risk and depression [14]. Prevalence of SLE/lupus in India is estimated to be 30 per million people. The lupus symptoms vary from flare to severe. Most of the lupus affected patients will come to know in the medium stage that they are affected with lupus [1] due to lack of awareness about the disease. Data Mining plays an important role to predict the disease effectively. Since the disease cannot be cured and the only thing is the life time of the patients can be extended, Data Mining decision tree is used in this paper for prediction. American College of Rheumatology (ACR) derived 11 criteria to diagnose the disease which are Malar rash, photosensitivity , Discoid rash, , non erosive arthritis, oral ulcers, pericarditis or pleuritis, neurologic disorder, renal disorder, hematologic disorder, positive antinuclear antibody and immunologic disorder[2, 15]. Systemic Lupus Erythematosus Disease Activity Index (SLEDAI) fixed the score to obtain the severity of the disease which is shown in Table I. Decision Tree is a popular classification technique which is easy and simple to implement[16]. It requires no domain knowledge or parameter setting and can handle high dimensional data [9]. The results obtained from Decision Trees are easier to read and understand. The drill through feature to access detailed patient’s profiles is only available in Decision Trees [17].

Descriptor Seizure

Score 8

Psychosis

8

Organ brain syndrome Visual disturbances

8

Cranial nerve disorder Lupus Headache

8

CVA

8

Vasculitis

8

Arthritis

4

Myositis

4

Urinary Casts

4

Haematuria

4

Proteinuria

4

Pyuria

4

Rash

2

Alopecis

2

Pleurisy

2

Pericarditis

2

Low Complement

2

Increase DNA binding Fever

2

Thrombocytopenia

1

Leukopenia

1

8

8

1

I. LITERATURE REVIEW In a study by Lopez et al of 150 lupus patients, higher disease activity, older age, and pre-existing organ damage were all independently associated with premature death. In addition, renal disease (identified either at biopsy or by measurement of serum creatinine) and thrombocytopenia are associated with increased mortality [1]. The British Isles Lupus Assessment Group (BILAG) index offers a more comprehensive approach to the assessment of lupus disease activity. BILAG generating a global activity score, the BILAG-2004 index analyze and classifies activity, in a 4-week period, according to the 9 different organ systems [2].

6728

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 10, Number 9 (2015) © Research India Publications ::: http://www.ripublication.com

Gladman et.al, proposed that non-organ-specific symptoms in SLE are common and include weight loss, fever, myalgia, marked fatigue, arthralgia and mood changes (often depression and low mood). These damages can be severe enough to significantly impair a patient’s quality of life [3].

DECISION TREE ALGORITHM Input : TraD // Training data Output: DeciT // Decision Tree Attributes: DB – Database, tp – tuple, Cl- Classes DeciT Algorithm: Step 1: assign zero to DeciT; Step 2: Create root node and label the splitting attribute to DeciT; Step 3: Add arc to root node for each split predicate and label it in DeciT; Step 4: for each arc do Step 4(a): create database by applying splitting predicate to TraD Do Step 5 if stopping point reached for this path. Step 4(b): else Step 4(c): Assign DeciT(TraD) to DeciT’ Step 4(d): Assign Add DeciT’ to DeciT //to arc Step 5: Create leaf node and label with appropriate class and assign to DeciT’

HianChyeKoh and Gerald Tan mainly discusses data mining and its applications with major areas like Treatment effectiveness, Detection of fraud and abuse, Hospital management and Customer relationship management[4]. Jayanthi Ranjan presents how data mining discovers and extracts useful patterns of this large data to find observable patterns. This paper demonstrates the ability of Data mining in improving the quality of the decision making process in pharma industry [5]. Tsokos suggested that the clinical manifestations of SLE are diverse, ranging from fatigue and oral ulcerations to deadly and life-threatening neurologic and renal disease. Disease activity varies with periods of remissions and flares [6]. Looney et al. [9] reported a series of 17 patients with refractory lupus nephritis who were treated (in an open uncontrolled study) with increasing doses of RTX. Clinical improvement was observed in 11 patients who had B cell depletion. In the later study, the clinical research was not managed and sustained anti-dsDNA autoantibody [10]. The prognosis of SLE has improved dramatically in the last 4 decades and the mortality remains a major concern [7]. Survival rates are ~80% at 10 years after diagnosis and ~65% at 20 years. Deaths early in the course of SLE are usually attributed to active disease and infection, but deaths that occur later in the disease course are often due to atherosclerotic vascular disease [6]. Sellappan Palaniappan et al. [12] developed a prototype Intelligent Heart Disease Prediction System using data mining techniques, like Naïve Bayes, Neural Network Decision Trees. IHDPS usually answer critical what if queries, where traditional decision support systems unable to answer. With the help of medical cases such as sex, age, blood sugar and blood pressure it can predict the likelihood of patients getting a heart disease [14]. Arvind Sharma and P.C. Gupta discussed that Data mining can contribute with important benefits to the blood bank labs. WEKA tool and J48 algorithm have been used for the full research studies. Classification rules worked well in the classification of blood donors, whose accuracy rate reached 89.9% [8].

The major factor in the performance of the decision tree algorithm is the size of training set and the issues of Decision tree are A. Selecting Splitting Attributes: What type of attributes will impact the performance. Ordering of Splitting attributes: The order in which attributes are chosen. Splits: Related with the ordering of the attributes is the number of splits to take. Number of splits is based on the domain. B. Tree Structure: A balanced tree with fewest levels is recommendable. Multi way branching are applicable. C. Stopping criteria: Tree stops when training data are classified perfectly. This is the situation to calculate the performance and accuracy. D. Training Data: The structure of decision tree is based on the training data. Training data must be larger to generate the decision tree properly and to measure the throughput effectively. E. Pruning: Modifying the tree to improve the performance of the tree during the classification phase. This phase removes redundant and remove unwanted sub trees.

6729

II. ID 1

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 10, Number 9 (2015) © Research India Publications ::: http://www.ripublication.com TABLE II CLINICAL PROFILE OF 15 DATASET IMPORTANT ATTRIBUTE TO PREDICT LUPUS

Attributes name Age (in years)

2

Gender

3

Sample type

4

Disease activity

5

Ethnicity

6

Organs involved

7

8

Tests

SLEDAI Score

Domain Values

No of Cases

A.

Age 11-20 21-30 70-100 B. Gender Male Female C. Mucocutaneous Manifestation Photosensitivity Malar rash Alopecia Oral Ulcers Raynaud’s symptom Vasculitic rash

1: >=11 and =21 and 70 and