Developing an Intelligent System for Diagnosis of

0 downloads 0 Views 411KB Size Report
Jul 30, 2015 - ORIGINAL PAPER / ACTA INFORM MED. 2015 AUG ... 9). Data mining issues often involve the kind of points that .... the 3 years old due to.
220

Developing an Intelligent System for Diagnosis of Asthma Based on Artificial Neural Network Published online: 30/07/2015

Published print:08/2015

doi: 10.5455/aim.2015.23.220-223 ACTA INFORM MED. 2015 AUG 23(4): 220-223 Received: 11 June 2015 • Accepted: 05 July 2015

© 2015 Behruz Alizade, Reza Safdari, Azadeh Bashiri This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

ORIGINAL PAPER

Developing an Intelligent System for Diagnosis of Asthma Based on Artificial Neural Network Behrouz Alizadeh1, Reza Safdari2, Maryam Zolnoori3, Azadeh Bashiri4 School of Allied-Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran Health Information Management Department, School of Allied-Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran 3 School of Informatics and Computing, Indiana University, Indiana, USA 4 School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran 1 2

Corresponding author: Azadeh Bashiri. School of Allied-Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran, E-mail: [email protected]

ABSTRACT Introduction: Lack of proper diagnosis and inadequate treatment of asthma, leads to physical and financial complications. This study aimed to use data mining techniques and creating a neural network intelligent system for diagnosis of asthma. Methods: The study population is the patients who had visited one of the Lung Clinics in Tehran. Data were analyzed using the SPSS statistical tool and the chi-square Pearson’s coefficient was the basis of decision making for data ranking. The considered neural network is trained using back propagation learning technique. Results: According to the analysis performed by means of SPSS to select the top factors, 13 effective factors were selected, in different performances, data was mixed in various forms, so the different modes was made for training the data and testing networks and in all different modes, the network was able to predict correctly 100% of all cases. Conclusion: Using data mining methods before the design structure of system, aimed to reduce the data dimension and the optimum choice of the data, will lead to a more accurate system. So considering the data mining approaches due to the nature of medical data is necessary. Key words: Asthma, Data mining, Artificial Neural Network.

1. INTRODUCTION

Asthma is one of the most common diseases throughout the world. And in different parts of the world, has a prevalence of 1.4 to 27.1 percent (1, 2). Asthma is a chronic reversible airway disease and usually gives a good response to treatment (3). Diagnosis of this disease is based on GINA criteria and clinical findings, including coughing, shortness of breath, heaviness and wheezing occurs (4). Asthma has many similarities with other respiratory diseases (5), and also has relationships with other diseases such as systemic amyloidosis and poly chondroitin recurrent (6). Some of the obstructive diseases of the upper respiratory tract have symptoms such as vocal cord dysfunction similar to asthma (7). Many physicians understand wheezing and shortness of breath and along with, cough equivalent to diagnose asthma. These symptoms are not specific to asthma and wrong diagnosis of it and inappropriate treatment can lead to physical and financial complications (5). The rapid development of computer technology has led to the automation of many processes that were previously done by professionals. Technique development based on artificial intelligence (AI) as a major is an important requirement to solve complex problems. Medical diagnosis, due to the complexity of the human body and mind and vague knowledge in this field, is a good example of a class of complex problems and also the existent knowledge are different between people with various experiences. The physicians ‘experience role in the diagnostic process is undeniable, and to avoid wasting time in the diagnosis, techniques based on artificial intelligence are required, so that detect it so rapid, reli-

ORIGINAL PAPER / ACTA INFORM MED. 2015 AUG 23(4): 220-223

able, accurate and knowledge base (8). In particular, artificial intelligence is a branch of data mining that uses automate process for finding information and patterns between the data. Data mining is used in a wide range of various purposes, but one of its most obvious application, is in the process of medical diagnosis based on clinical findings, which is used in order to explore the relationship between clinical findings and diagnostic results. Designing optimized intelligent systems in addition to the structure of the system depends on the input data. The garbage in garbage out phrase particularly emphasized this point that achieving to optimal system, accurate information and best use of the relevant data, are necessary (8, 9). Data mining issues often involve the kind of points that they all can be as a potential of predictor of outputs, but in practice, a few of them in different levels influence the output. In order to reduce input dimensions and selection of optimum alternatives, there are several ways, but the common stages among these methods are as follows: * Delete the unimportant data, * Ranking the remained data, * Select among the ranked data (9). One of the artificial intelligence techniques to solve realworld problems is, using the concept of a neural network. The main idea to use a neural network is, learning ability by old data and derives a new solution for overcoming the problems (10). The purpose of this study is performing the data mining on database of clinical results for excellent and effective selections in the diagnosis of asthma and making intelligent systems using the neural network technique, that is very useful

Developing an Intelligent System for Diagnosis of Asthma Based on Artificial Neural Network

Symbol

Concept

cough

cough

phlegm

Phlegm

Wheeze

Wheeze

DYSP

Respiratory distress

CT

pursiness Symptoms of exerciseinduced Nasal mucosal swelling caused by allergies.

Exe.s AR BMI

Body Mass Index

GER

Gastric esophagitis reflux Allergies /Catarrh of Allergy in both parents Allergies /Catarrh of Allergy in one of the parents Allergies /Catarrh of Allergy in one of the relatives Response to allergens Response to irritants Emotional reactions Background of eczema or Catarrh Hospitalization Before the 3 years old due to viral infections Food allergy before the 3 years old air pollution of the Location smoking Now or in the past Humidity levels in the home or humid environment Colds

Par2 Par1

Rel R.AL R.Ir R.Em Ecz Hosp Food AP P.smoke damp cold

explanation Based on the type, frequency and intensity is placed in one of the 0 to 5 groups. Is classified according to the presence or absence of sputum Wheezing while breathing (0-3) Based on type and frequency of wheezing is placed in one of the0 to 3 groups.

Figure 1. used clinical in detecting asthma Figure Results 1: used clinical Results in detecting asthma Is used To distinguish overweight patients from other patients.

Table 1. Clinical characteristics used by physicians for asthma ‘diagnosis

to reduce the financial and repeated examination costs, and also the wrong diagnosis and its consequences.

2. METHODOLOGY

221

Investigating community in this Study were the patients who had visited in one of the lung clinics in Tehran that due to the completeness of the recorded information in the case records, selected 254 patients. This information in a database can be found as clinical findings related to the asthma and also the diagnosis results in either the presence or absence of asthma, were stored. The data were analyzed using SPSS statistical software. Basis of decision making for ranking data was the Pearson chi-square coefficient, which aims to identify the effective characteristics on decision making of the diagnosis. Table 1 shows Clinical characteristics used by physicians to diagnose asthma. Given the importance of patient characteristics as input to optimal neural network, the data in the database were investigated in terms of optimality and effectiveness of the output of

The basis of this test, is a comparison between the expected and calculated value. It means,

want to knowmethod that is thereand any significant difference between thechi-square observed and expected awestatistical based on the Pearson coeffrequencies or the difference is negligible and the result of chance. In fact, we want to know ficients. Due to the fragmenting the data before inserting that is there a relationship between two variables or they are independent. In the into below formulas, Fo is observed Fe isconditions expected frequency that calculated follows in the database andfrequency underandthe outlined in asFigure1, Equation 1. Also feature selection method based on criteria introduced in relationship that using the chi-square distribution is as reasonably practical cricalculated as follows in Equation 2. terion for ranking data. Figure Figure1:1:used usedclinical clinicalResults Resultsinindetecting detecting The basis of this test, is a comparison between the expected (n  n ) Fe  n basis aacomparison between the andThe calculated value. Ittest, means, want to know that is there The basisof ofthis this test,isiswe comparison between theexpected expectedand an Equation1: expected frequency anywe significant difference between the observed and expected wewant wanttotoknow knowthat thatisisthere thereany anysignificant significantdifference differencebetwee betwee Fe the ( Fo or ) frequencies is negligible and theand result ofresult of chan    frequencies orordifference the frequencies thedifference differenceisisnegligible negligible andthe the result of chan Fe chance. In fact, we want to know that is there a relationship that is there a relationship between two variables or they are that is there a relationship between two variables Equation2: chi square value computation for observations and expected values between two variables or they are independent. In the below or they ar formulas, isisobserved frequency Fe frequency formulas, Fo observed frequency and Feisisexpected expected frequenc formulas, Fo isFo observed frequency and Feand is expected freEquation 1. Also feature selection method based on criteria According to table 2, unimportant data are excluded from total results and the first 13 cases quency that calculated as follows in Equation 1. Also feature Equation 1. Also feature selection method based on criteriaint in were selected as the factors that affect asthma detecting, and the used database was reduced selection method on70% criteria introduced relationship asasbased follows ininofEquation 2.2. asintraining calculated follows Equation to 13calculated items for each patient. Randomly, the data are classified data and the that calculated asproposed follows in Equation 2.neural network, the programming rest as testing data of the system. In designing the i.

.j

i

2

k

i 1

i

i

2

i

language C # fourth version of Visual Studio 2010 and the neural network Library of 4 NeuronDotNet that is free and is made in implementing the neural network in dotNET, has

(n(ni. i.nn. j. )j ) Fe Fei i  nn Equation1: expected Equation1: expected frequency Equation 1. Expected frequency frequency 2

kk ((Fo Foi iFe Fei i)) 2 22  i i1 1

Fe Fei i

Equation 2. Chi square value computation for observations and expected Equation2: chi observations Equation2: chisquare squarevalue valuecomputation computationfor for observationsand andex ex values

According to table 2, unimportant data are excluded from total results and the first 13 cases were selected as the factors thatAccording affect asthma detecting, and the useddata database was re- from tototable are According table2,2,unimportant unimportant data areexcluded excluded fromtotal total duced to 13 items for each patient. Randomly, 70% of the were wereselected selectedasasthe thefactors factorsthat thataffect affectasthma asthmadetecting, detecting,and andthe th datatoare classified as training data and the rest as testing data to13 13items itemsfor foreach eachpatient. patient.Randomly, Randomly,70% 70%ofofthe thedata dataare areclass clas of the proposed system. In designing the neural network, the rest as testing data of the proposed system. In designing the neura rest as testing dataCof#the proposed designing the neur programming language fourth versionsystem. of VisualInStudio language version ofofofVisual Studio language C ## fourth fourth version Visual Studio 2010 2010 and and the the n 2010 and the C neural network Library 4 NeuronDotNet that is free and is made in implementing the thatNeuronDotNet is free and is made in implementing the neural network in NeuronDotNet that is free and is made in implementing theneu ne dotNET, has been used. Such as figure 2, the network layers were implemented as input, hidden and output layer. Input layer is linear and is only responsible for data transferring from the internal structure of the network; hidden and outlet layers are Sigmoid type, and perform calculations on the network. Network structure was designed as13-26-1 in which there are 13 nodes in the input layer, 26 nodes in the hidden layer and one node in the output layer. The output classification criterion is as a conditional function according to Equation 3. In this Equation, 1 means affected by the disease and 0 means the absence of the disease. The considered ACTA INFORM MED. 2015 AUG 23(4): 220-223 / ORIGINAL PAPER

222

Developing an Intelligent System for Diagnosis of Asthma Based on Artificial Neural Network

en used. Such as figure 2, the network layers were implemented as input, hidden and output er.

tion impact, have been reported. In the higher number of EPOCH, the results for the lowest and levels of decision accuracy remains constant it In the highest higher number of EPOCH, the results for the lowest and highest levels and of decision accuracy remains constant and it didn’t need more training to system and a further number of didn’t need more training to system and a further number of Figure 2: an example of neural network with input, hidden and output layers EPOCHS. In the below chart, the vertical axis represents the mean square error and the EPOCHS. In the below chart, the vertical axis represents the horizontal axis indicates Education Epochs' of Network in 100000 EPOCH. meanfrom square error and the horizontal Input layer is linear and is only responsible for data transferring the internal structure of axis indicates Education the network; hidden and outlet layers are Sigmoid type, and perform calculations on the network. Network structure was designed as13-26-1 in which there are 13 nodes in the input layer, 26 nodes in the hidden layer and one node in the output layer. The output classification gure 2: an example of neural andwith output layers Figure 2: network An ofinput, neuralhidden network input, hidden and output criterion is example as awith conditional function according to Equation 3. In this Equation, 1 means layers the disease and 0 means absence put layer is linearaffected and is only by responsible for data transferring from thethe internal structureofof the disease. The considered neural network; hidden and outlet are Sigmoid type, and perform calculations on the network is layers trained back propagation learninglearning technique. neural network is using trained using back propagation work. Network structure was designed as13-26-1 in which there are 13 nodes in the input technique.

er, 26 nodes in the hidden layer and one node in the output layer. The output classification terion is as a conditional function according to Equation 3. In this Equation, 1 means 1 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 > 0.5 Y= { ected by the disease and 0 means the absence 0 output ≤ 0.5 of the disease. The considered neural work is trained using back propagation learning technique.

Equation 3. Converting the network`s outputs in binary mode

Figure 3. The neural network error ratenetwork error rate Figure 1: The neural Discussion Epochs’and of Conclusion: Network

in 100000 EPOCH.

The aim of this study is to develop an intelligent system for detecting the presence or absence

Equation 3: Converting the network`s outputs in binary mode of asthma in which the clinical features in terms of influencing the output are investigated

1 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 > 0.5 Y= { 0 output ≤ 0.5 3. RESULTS

4. DISCUSSION AND CONCLUSION

and ordered, and only the effective features are used as the input of neural network. 70% of Thehave aim thisasstudy is tonetworks’ develop intelligent system the data beenoftrained the required inputan to achieve the desired result and errors, and thethe rest presence of data are used test the system. As a result,inthewhich system isthe able to From the total of 254 recorded cases related to research, 169 reduce for detecting ortoabsence of asthma uation 3: Converting the network`s outputs in binary mode accurately predict disease based on clinical findings in 100% of the cases. To study former patients suffering from asthma and 85 cases were not affected clinical features in terms of influencing the output are invesResults: neural networks in the field of medical science, In a study that was conducted in 2012 by with this disease. In the records related to the asthmatic pa- Zonnour tigatedandand ordered, and only theintelligent effective features are asthma used in colleagues entitled computerized system for detecting From the total ofare 254 recorded cases related to research, 169 patients from asthma children, the fuzzysuffering intelligent system designed to predict Asthma and was able been to predict tients, 22 factors recorded in relation to the diagnosis. Acas the input of neural network. 70% of the data have sults: 100% correctly sick and healthy cases(2). Biglarian and colleagues’ research which compared cording to the performed analysis by means of SPSS to select trained as the required networks’ input to achieve the desired ANN modelrelated and Cox regression predicting survival of patients with gastric cancer can andrecorded 85 cases were tonot affected with suffering this disease. In the therecords to theonasthmatic om the total of 254 cases related research, 169 patients from asthma be pointed, it was done inerrors, 2011 andand was the achieved accuracy the top factors, 13 factors were effective. result and reduce resttoof81.511% data are usedrateto(12). testIn a d 85 cases werepatients, not affected22with this disease. In the records related to the researchAccording that Sedhi and colleagues done in 2010, called designing integrated artificial factors are recorded in relation to asthmatic theEpoch diagnosis. to the were performed Designed neural networks with a different numbers the system. As a result, the system is able to accurately preneural network to predict the metabolic syndrome and insulin resistance index, they used ients, 22 factors are recorded in relation to the diagnosis. According to the were trained and evaluated. for performed different dict disease based on clinical findings in 100% of the cases.rateTo(13). neural networks for prediction of the syndrome and reached to 75.67% accuracy analysis by means of SPSS toObtained select theresults top factors, 13 factors were effective. methods, that was conducted Zanganeh, called the IHD by data mining alysis by means of number SPSS to select the top factors, factors3.were effective. after training Research of EPOCH are as13Table Eventually, study former neuralbynetworks in the fieldanalysis of medical science, the proposed system was able to diagnose the patients and healthy cases 96.4% accurately to 100000 EPOCH, the mean square error was reduced to less (14). In aInstudy that was conducted in 2012 by Zonnour and cola study of Alfred Bvshlvyr and colleagues as intelligent neural networks that to thanand 0.00002. Given that the for in training and testing the findings leagues computerized intelligent detecting predict lymph node metastasis in gastric cancer conducted insystem 2004, thefor proposed system was Table 2: Ranking selection2: ofRanking the top clinical findings asthma detecting Table anddata selection of the top clinical inentitled asthma detecting to predict in 93% (15). intelligent Since the one ofsystem the most designed important factors neural network are selected by random Shuffleboard (mixing able asthma in accurately children, thecases fuzzy to in the development of the neural network is to select the appropriate structure for it and using and shuffle) in different runs. Due to the nature of Shuffle- optimal predict Asthma and was able to predict 100% correctly sick data input, using the waste data will lead to incorrect education of neural network and it in the wrong by selectand the optimum data as research input of thewhich network by board, different modes of network training and testing data, will andputhealthy casespath. (2). Also Biglarian colleagues’

Table 2. Ranking and selection of the top clinical findings in asthma detecting

Designed neural networks with a different numbers Epoch were trained and evaluated. Obtained results for different number of EPOCH are as Table 3. Eventually, after training to 100000 EPOCH, the mean square error was reduced to less than 0.00002. Given that the data

created, and in all cases system was able to predict 100% correct. shuffle) in different runs. Due to the nature of Shuffleboard, different modes of network Minimum and maximum values for ​​ networks accuracy of training and testing data, created, and in all cases system was able to predict 100% correct. different performances and the results of Shuffleboard funcfor training and testing the neural network are selected by random Shuffleboard (mixing and

The most accuracy Minimum accuracy Number of Epoch 100% 93.5% 1000 100% 96.5% 10000 number and accuracy of the designed 100% Table 3- EPOCH 97.4% 20000 neural network Number of Epoch The most accuracy 100% 98.6%Minimum accuracy 100000 1000

93.5%

100%

10000

96.5%

100%

20000

97.4%

100%

100000

98.6%

100%

Table 3. EPOCH number and accuracy of the designed neural network ORIGINAL PAPER / ACTA INFORM MED. 2015 AUG 23(4): 220-223

Minimum and maximum values for networks accuracy of different performances and the results of Shuffleboard function impact, have been reported.

compared the ANN model and Cox regression on predicting survival of patients with gastric cancer can be pointed, it was done in 2011 and was achieved to 81.511% accuracy rate (12). In a research that Sedhi and colleagues were done in 2010, called designing integrated artificial neural network to predict the metabolic syndrome and insulin resistance index, they used neural networks for prediction of the syndrome and reached to 75.67% accuracy rate (13). Research that was conducted by Zanganeh, called the IHD analysis by data mining methods, the proposed system was able to diagnose the patients and healthy cases 96.4% accurately (14). In a study of Alfred Bvshlvyr and colleagues as intelligent neural networks that to predict lymph node metastasis in gastric cancer conducted in 2004, the proposed system was able to predict accurately in 93% cases (15). Since the one of the most important factors in the development of the neural network is to select the appropriate structure for it and using optimal data input, using the waste data will lead to incorrect education of neural network and will put it in the wrong path. Also by select the optimum data as input of the network by statistical methods, the number of data that has most efficiency in outlets be selected. This action will leads to a proper education to the network and will increase the accuracy of its decision making. Finally, the neural network was able to correctly identify in 96.5% to 100% of cases. Achieving to mentioned accuracy rate, depends on different factors, including the nature of the

Developing an Intelligent System for Diagnosis of Asthma Based on Artificial Neural Network

disease and its risk factors and the clinical features, but one of the most important findings in achieving to the accuracy is to measure the optimality of the effects of finding on the obtained results of detection and ordering and selection of priority cases. According to the obtained results in the selection of effective characteristics, among the 22 cases recorded in the patients` records, 13 agents were selected as the most effective factors for detecting asthma. It seems that practitioners` focus on selected cases to diagnosis the asthma, increases accuracy and certainty in decision making. The success of any intelligent system, regardless of its internal structure and the used techniques for Intelligent, is largely depends on the input data optimality. Therefore, commonly, using data mining methods before designing the structure of system, aimed at reducing the data dimension and optimal choice, will lead to a more precise system. So, using data mining approaches is essential due to the nature of medical data. According to low error rate of the proposed system in detecting asthma based on clinical findings, above system can be used as an assistant of decisions on detecting disease.

5.

CONFLICT OF INTEREST: NONE DECLARED.

12.

REFERENCES 1.

2.

3. 4.

Fauci A, Braunwald E, Kasper D, Hauser S, Longo D, Jameson J, et al. Harrison’s Principles of Internal Medicine, 17th Edition: McGraw-Hill Companies, Incorporated, 2008. Zolnoori M, Fazel Zarandi MH, Moin M, Heidarnezhad H, Kazemnejad A. Computer-aided intelligent system for diagnosing pediatric asthma. Journal of medical systems. 2012; 36(2): 80922.  http://dx.doi.org/10.1007/s10916-010-9545-5 Murray JF, Mason RJ. Murray and Nadel’s textbook of respiratory medicine. Philadelphia, PA: Saunders/Elsevier, 2010. Mehrabi SSM, Ghauomi SMA. Determining the Most Suitable Spirometric Parameters to Differentiate Chronic Obstructive pulmonary Disease (COPD) from Asthma. Armaghan Danesh. 2010; 14(4): 76-86.

6.

7.

8.

9.

10.

11.

13.

14. 15.

Tilles SA. Differential diagnosis of adult asthma. The Medical clinics of North America. 2006; 90(1): 61-76. http://dx.doi. org/10.1016/j.jacr.2009.03.005 Sharma SK, Ahluwalia G, Ahluwalia A, Mukhopadhyay S. Tracheobronchial amyloidosis masquerading as bronchial asthma. The Indian journal of chest diseases & allied sciences. 2004; 46(2): 117-119. http://dx.doi.org/10.1097/LBR.0b013e318166d233 Suri JC, Sen MK, Chakrabarti S, Mehta C. Vocal cord dysfunction presenting as refractory asthma. The Indian journal of chest diseases & allied sciences. 2002; 44(1): 49-52. Innocent PR, John RI, Garibaldi JM. Fuzzy methods for medical diagnosis. Applied Artificial Intelligence. 2004; 19(1): 6998. Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications: ACM, 1998. Grochowski M, Duch W. Constructive neural network algorithms that solve highly non-separable problems. Constructive neural networks: Springer, 2009: 49-70. Azar A, Momeni M. Statistics and its Application on Management. Tehran: SAMT, 2012. Biglarian A, Hajizadeh E, Kazemnejad A. Comparison of artificial neural network and Cox regression models in survival prediction of gastric cancer patients. Koomesh. 2010: 215-220. MSO et al. Designing artificial neural networks for predicting the metabolic syndrome associated with insulin resistance index: Tehran Lipid and Glucose Study. Daneshvar. 2009; 17 (85): 29 -38. Zanganeh S. Data mining techniques to analyze cardiac ischemia: Science and Research Branch, Tehran, 2012. Bollschweiler EH, Monig SP, Hensler K, Baldus SE, Maruyama K, Holscher AH. Artificial neural network for prediction of lymph node metastases in gastric cancer: a phase II diagnostic study. Annals of surgical oncology. 2004; 11(5): 506-511. http://dx.doi.org/10.1245/aso.2004.04.018

ACTA INFORM MED. 2015 AUG 23(4): 220-223 / ORIGINAL PAPER

223