Int J Biol Med Res. 2011; 2(4): 929-932 Int J Biol Med Res
www.biomedscidirect.com
Volume 2, Issue 3, July 2011
Contents lists available at BioMedSciDirect Publications
International Journal of Biological & Medical Research BioMedSciDirect Publications
Journal homepage: www.biomedscidirect.com
International Journal of BIOLOGICAL AND MEDICAL RESEARCH
Original Article
Risk Factors In Adolescent On Arteriosclerosis: An Analysis Using Data Mining And Docking Techniques Kaladhar SVGK Dowluru* a, N V Rama Raob, Govinda Rao Duddukuric, A.Krishna Chaitanyad a,b,c,d
Department of Biochemistry/ Bioinformatics, GIS, GITAM University, Visakhapatnam-530045, India.
ARTICLE INFO
ABSTRACT
Keywords: Arteriosclerosis Data analysis Data mining Docking
Aims: Data related to arteriosclerosis has been collected from different hospitals from Visakhapatnam, India. Data has to been analyzed for future experimentations so that mechanism of metabolites on aging diseases such as arteriosclerosis can be cured. Methods: Data related to arteriosclerosis has been collected from different hospitals in Visakhapatnam during January and February 2011. Analysis of this data has been done using machine learning approaches. Docking studies related to prescribed drugs has been conducted using docking software. Results: The results showed that the arteriosclerosis occurs mostly to the old people (60-69 years). The data analyzed has been done through data mining techniques which showed 92.9% accuracy using BayesNet. From the data collected, the drugs which are being used are docked with a diseased protein which causes atherosclerosis, showed Betaloc can be an anti-arteriosclerotic effective drug. Arteriosclerosis is a metabolic syndrome and genitical disease, occurring more in the persons who prefer tea rather than coffee. Eighteen percent of the persons are predicted with family background. Conclusion: The present results showed that Age, height, gender, weight, family history, tea, coffee, number of times tea/coffee consumption, blood test, cholesterol, diabetes, obesity, drinking and smoking are the factors responsible for arteriosclerosis based on data analysis and machine learning approaches. Clinical research has to be conducted further in this direction. c Copyright 2010 BioMedSciDirect Publications IJBMR -ISSN: 0976:6685. All rights reserved.
1. Introduction Cardiovascular diseases are primarily due to arteriosclerosis is a multi-factorial inflammatory disease of large and medium-sized muscular arteries and is characterized by formation of plaque, acute and chronic luminal obstruction, remodeling of vascular system, diminished oxygen supply, and abnormalities of blood flow to target organs. Complications with Cardiovascular diseases are the major cause of morbidity and mortality in renal disease patients [1], leading to myocardial infarction and cerebrovascular events [2]. Arteriosclerosis is hardening of the arteries causes stiffness and a loss of elasticity and threatens to cause a heart attack. * Corresponding Author : Kaladhar SVGK Dowluru Department of Biochemistry/ Bioinformatics, GIS, GITAM University, Visakhapatnam-530045, India Phone:+91-9885827025 Email:
[email protected] c
Copyright 2010 BioMedSciDirect Publications. All rights reserved.
Atherosclerosis is the most common type of arteriosclerosis caused by plaque building up in the vessel. Conventional attributes such as smoking, hypertension, diabetes, raised cholesterol and novel risk factors such as markers of a prothrombotic state are as yet undiscovered risk factors of cardiovascular events [3]. Classification of human complex diseases become an attractive topic of research in bioinformatics which need to evaluate the usefulness of the cases studied [4]. An advance in Data Mining and Knowledge Discovery brings the latest research in statistics, databases, machine learning, and artificial intelligence in transforming this data into useful knowledge [5]. The key ideas are to use data mining techniques and machine learning approaches to discover consistent and useful patterns [6], assuming data on arteriosclerosis are in the main memory. These attributes describe program and user behavior, and use the set of relevant system features to compute (inductively learned) classifiers that can recognize variances and known interruptions [7]. Two general data mining algorithms that have been applied are: the association rules algorithm and the classification algorithm.
Kaladhar et. al / Int J Biol Med Res. 2011; 2(4): 929-932 930
In silico drug design consists of a collection of tools, helping in designing of new drug candidates by the adoption of computational methods at lower costs and short time [8]. Drug designing is a new emergent area of science that uses computational approaches in answering biological problems [9, 10]. 2.Materials and Methods 2.1.Data Collection The data from various patients are collected from various groups of people from various hospitals in Visakhapatnam, suffering with Arteriosclerosis from 1st January-28th February 2011. 2.2.Questionnaire on Arteriosclerosis 1. Name :__________________ 2. Age: ____________ 3. Gender: M/F__________ 4. Height: ________________ 5. Weight: _______________. 6. Family background.______________ 7. Is the person having smoking habit? ___________________ 8. Is the person having drinking habit_________________________? 9. Does the person take coffee or tea _________________? 10. How many times taking tea/coffee? ________________________________ 11. Any other problems present? Related to Diabetes 12. Any other problems present? Related to Obesity. 13. Blood test Normal/Abnormal: ____________________________________ 14. Cholesterol: _____________ 15. Medicines prescribed: _______________________________
The results showed that the disease occurs mostly to the old people (60-69 years). Forsdahl, 1977 presented a significant positive correlation found between the county age-adjusted mortality from arteriosclerotic heart disease in people aged between 40 to 69 years [11]. The highest incidence of atherosclerosis was in the age of 60-69 years (44%) but the incidence for the lower age group (41-50 years) was higher and equal to 87.3% [12]. The family history shown that 18.5% of patients developed arteriosclerosis through hereditary. Nearly 81.5% patients do not shown transfer of genes from previous generations, related to arteriosclerosis. Hence the arteriosclerosis may be related to metabolic syndrome and genetic disease. The serum lipids in adolescence are primarily related to age and sex but also to early determinants like family history of cardiovascular diseases, infant feeding, and early physical growth [13]. Figure 1: Data analysis related to arteriosclerosis
2.3.Materials In this work, we attempted to carry out the Analysis of protein and drug designing with the following infrastructure. 1. SYSTEM USED –Intel Pentium 4 GHz, 2GB RAM 2. OPERATING PLATFORM- Microsoft Windows XP pro 2002 service pack 3. SOFTWARE PACKAGES – HEX 5.1, Weka v3.6.3 4. PROTEIN- 3SN6.PDB (the beta2 adrenergic receptor-Gs protein complex Receptor).
Figure 2: Comparison of arteriosclerosis with different age groups
Hex is an interactive protein docking and molecular superposition program for calculating and displaying feasible docking modes of pairs of protein and DNA molecules. Hex can calculate protein-ligand docking, assuming the ligand is rigid, and it can superpose pairs of molecules using only knowledge of their 3D shapes. The Waikato Environment for Knowledge Analysis (Weka) is a comprehensive suite of Java class libraries, evolved substantially is a workbench for machine learning and now accompanies a text on data mining. 3.Results Data collection has been done on patients, suffering with Arteriosclerosis from Visakhapatnam. Analysis of the data has been conducted and is presented in Figure 1 and 2.
Most of the patients who take tea rather than coffee developed Arteriosclerosis. The data (Figure 1) presented that 93% people, who are suffering with arteriosclerosis take tea. The data also showed that 27.14% people take coffee. Hence, the present experimentation predicts that the intake of catechins, theanine and caffeine extracting from the leaves of Camellia sinensis develops arteriosclerosis at older ages (60-69 years). A slight risk reduction for Coronary heart disease (CHD) mortality with moderate coffee consumption and strengthen the evidence on the lower risk of CHD with coffee and tea consumption [14]. Ryuichiro
Kaladhar et. al / Int J Biol Med Res. 2011; 2(4): 929-932 931
et al., 2004 demonstrated that green tea polyphenols were capable of preventing the development of proliferative diseases such as cancer and arteriosclerosis [15]. The present data provided tea can also be a risk factor at older ages. Coffee is a complex mixture of chemical and biological compounds that may have either beneficial or harmful effects on the cardiovascular system. Diterpenes present in unfiltered coffee and caffeine each show to increase risk of coronary heart disease [16].
Figure 3: Data representation from Weka, a machine learning software
Arteriosclerosis is also associated with D2M (Diabetes 2 Mellitus) and Obesity. About 21.4% and 20% of the patients are associated with D2M and Obesity respectively. Interleukin-6 as a central mediator of cardiovascular risk can be associated with chronic inflammation, smoking, diabetes, and visceral obesity, down-regulation with essential fatty acids, ethanol and pentoxifylline [17]. Figure 3 shows that age is also a major criterion for development of Arteriosclerosis. Age from 50.8 - 61.2 years is more prone to causing arteriosclerosis rather than younger people (below 40 years). Height more than 5.7 feet in humans is less prone to cause arteriosclerosis. Greater number of patients suffering with arteriosclerosis ranges between 5.4-5.6 feet height. More number of males compared with females prone to cause arteriosclerosis. Weight range between 63.2 - 71.6 kg can cause disease at normal heights (5.4-5.6 feet). Weight below 54.6 kg shows lesser effect. Family history is lesser considered in action of this disease. Most of the patients produced arteriosclerosis by consuming tea/coffee 1 to 4 times daily. Consuming tea or coffee for 7 to 15 times daily shown lesser action in the cause of arteriosclerosis. Tea and Coffee consumption is also predicted bad for health at older ages. Blood test is not the major critetria in diagnosis of arteriosclerosis. Most of the data collected from patients shown normal blood levels. Cholesterol, Diabetes, Obesity, drinking, and smoking are also linked with arteriosclerosis. Consuming alcohol is more dangerous than smoking found, based on collected data. FilteredAssociator predicted that all the 14 attributes are associated with Arteriosclerosis (Age, height, gender, weight, family history, tea, coffee, number of times tea/coffee consumption, blood test, cholesterol, diabetes, obesity, drinking and smoking). Based on CfsSubsetEval with GeneticSearch method, the best selected attributes presented are height, blood test, obesity and drinking. Based on PrincipalComponents using Ranker showed 11 group attributes, the highest to lowest ranking is as follows: Weight + gender, times of tea/coffee consumption + family, tea + age, diabetes + obesity, Cholesterol + blood test, coffee + cholesterol + obesity, drinking + height + obesity, diabetes + obesity + times of tea/coffee consumption, tea + family + coffee + age, blood test + age + height + obesity, gender + time + drinking.The data analyzed using machine learning technique (BayesNet) showed 93% accuracy (Figure 4). From the data collected, the drugs which are being used are docked with a protein which causes atherosclerosis and this showed 100% predictions. The prescribed drugs to patients prove that the drugs being used are effective and anti-arteriosclerotic (Table 1). Betaloc produced good results compared with Ecosprin, Diclofenac, Dolo250 and Augmentin, based on Emin and R-values.
Figure 4: Data prediction and accuracy using bayesnet
Table 1: Docking results RECEPTOR
LIGAND
R VALUE
E.MAX
E.MIN
The beta2 adrenergic receptor-Gs protein complex
BETALOC
11.2
-164.28
222.51
The beta2 adrenergic receptor-Gs protein complex
ECOSPRIN
22.4
-146.38
-254.40
The beta2 adrenergic receptor-Gs protein complex
DICLOFENAC
21.6
-152.42
-217.99
The beta2 adrenergic receptor-Gs protein complex
DOLO250
20.8
-136.02
-217.59
The beta2 adrenergic receptor-Gs protein complex
AUGMENTIN
22.4
-131.41
-185.5
Kaladhar et. al / Int J Biol Med Res. 2011; 2(4): 929-932 932
4.Conclusions Life is mortal, starts from birth to death. Most of the complex cellular reactions depends based on hereditary and metabolic factors. The present results showed that Age, height, gender, weight, family history, tea, coffee, number of times tea/coffee consumption, blood test, cholesterol, diabetes, obesity, drinking and smoking are the factors responsible for arteriosclerosis based on data analysis and machine learning approaches. Betaloc produced good results based on Emin and R-values after docking (the best binding affinity between molecules). 5.Acknowledgement The author would like to thank GITAM University for providing lab facility and access to e-journals to carry out the research.
[13] Bergström E, Hernell O, Persson LÅ, Vessby B. Serum lipid values in adolescents are related to family history, infant feeding, and physical growth. Atherosclerosis.1995;117(1):1-13. [14] Margot de KGJ, Cuno SPMU, Yvonne TVS, Jolanda MAB, Diederick EG, Monique VWM,Joline WJB. Tea and Coffee Consumption and Cardiovascular Morbidity and Mortality. ATVB. 2010;30:1665-1671. [15] Ryuichiro S, Takato U, Toru N, Masaharu S, Takuji T, Michio S. Green tea polyphenol epigallocatechin-3-gallate inhibits platelet-derived growth factor-induced proliferation of human hepatic stellate cell line LI90. Journal of Hepatology. 2004;40(1):52-59. [16] Cornelis MC, El-Sohemy A. Coffee, caffeine, and coronary heart disease. Current Opinion in Lipidology .2007;18(1):13-19. [17] McCarty MF. Interleukin-6 as a central mediator of cardiovascular risk associated with chronic inflammation, smoking, diabetes, and visceral obesity: down-regulation with essential fatty acids, ethanol and pentoxifylline. Medical Hypotheses. 1999;52(5):465-477.
6.References [1] Alexander G, David DL, David M. Large-scale bayesian logistic regression for text categorization. Technometrics. 2007;49(3):291-304. [2] Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J Jr, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000;403(6769):503-511. [3] Sonia SA, Salim Y, Vladmir V, Sudarshan D, Koon KT, Patricia AM, Linda K, Cheelong Y, Eva L, Hertzel G, Robert AH, Matthew M and for the SHARE Investigators. Differences in risk factors, atherosclerosis, and cardiovascular disease between ethnic groups in Canada: the Study of Health Assessment and Risk in Ethnic groups (SHARE). The Lancet. 2000;356(9226):279-284. [4] Kaladhar DSVGK, Chandana B. Data mining, inference and prediction of Cancer datasets using learning algorithms. IJSAT. 2011;1(3):68-77. [5]
Usama F, Gregory PS, Padhraic S. From Data Mining to Knowledge Discovery in Databases. AI Magazine. 1996;17(3):37-53.
[6] Paul L, Douglas WO, Ronald NK. Textual Data Mining to Support Science and Technology Management. JIIS .2000;15:99–119. [7] Wenke L, Salvatore JS, Kui WM. Adaptive Intrusion Detection: A Data Mining Approach. Artificial Intelligence Review .2000;14(6):533-567. [8] Vincent Z, Aurélien G, Olivier M. Docking, virtual high throughput screening and in silico fragment-based drug design. J Cell Mol Med.2009;13(2):238–248. [9] Ravarapu KA, Dowluru. SVGKK. Genomics, Proteomics and Drug designing approaches on Avian Leukemia Virus. JABAR. 2011;2(2):149-154. [10] Kaladhar DSVGK. Computational studies of Sirtuins in the treatment of type II Diabetes mellitus. JPRHC .2011;3(2):38-42. [11] Forsdahl A. Are poor living conditions in childhood and adolescence an important risk factor for arteriosclerotic heart disease?, Br J Prev Soc Med. 1977;31:91-95. [12] Seyed ATY, Alireza R, Jafar BA, Aria H, Mohammad TS, Mahdi KS. Prevalence of Atherosclerotic Plaques in Autopsy Cases with Noncardiac Death. Iranian Journal of Pathology .2009;4(3):101-104.
c Copyright 2010 BioMedSciDirect Publications IJBMR -ISSN: 0976:6685. All rights reserved.