Hindawi Publishing Corporation BioMed Research International Volume 2014, Article ID 831751, 15 pages http://dx.doi.org/10.1155/2014/831751
Research Article Supervised Clustering Based on DPClusO: Prediction of Plant-Disease Relations Using Jamu Formulas of KNApSAcK Database Sony Hartono Wijaya,1,2 Husnawati Husnawati,3 Farit Mochamad Afendi,4 Irmanida Batubara,5 Latifah K. Darusman,5 Md. Altaf-Ul-Amin,1 Tetsuo Sato,1 Naoaki Ono,1 Tadao Sugiura,1 and Shigehiko Kanaya1 1
Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan Department of Computer Science, Bogor Agricultural University, Kampus IPB Dramaga, Jl. Meranti, Bogor 16680, Indonesia 3 Department of Biochemistry, Bogor Agricultural University, Kampus IPB Dramaga, Jl. Meranti, Bogor 16680, Indonesia 4 Department of Statistics, Bogor Agricultural University, Kampus IPB Dramaga, Jl. Meranti, Bogor 16680, Indonesia 5 Biopharmaca Research Center, Bogor Agricultural University, Kampus IPB Taman Kencana, Jl. Taman Kencana No. 3, Bogor 16151, Indonesia 2
Correspondence should be addressed to Shigehiko Kanaya;
[email protected] Received 30 November 2013; Accepted 18 February 2014; Published 7 April 2014 Academic Editor: Samuel Kuria Kiboi Copyright © 2014 Sony Hartono Wijaya et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Indonesia has the largest medicinal plant species in the world and these plants are used as Jamu medicines. Jamu medicines are popular traditional medicines from Indonesia and we need to systemize the formulation of Jamu and develop basic scientific principles of Jamu to meet the requirement of Indonesian Healthcare System. We propose a new approach to predict the relation between plant and disease using network analysis and supervised clustering. At the preliminary step, we assigned 3138 Jamu formulas to 116 diseases of International Classification of Diseases (ver. 10) which belong to 18 classes of disease from National Center for Biotechnology Information. The correlation measures between Jamu pairs were determined based on their ingredient similarity. Networks are constructed and analyzed by selecting highly correlated Jamu pairs. Clusters were then generated by using the network clustering algorithm DPClusO. By using matching score of a cluster, the dominant disease and high frequency plant associated to the cluster are determined. The plant to disease relations predicted by our method were evaluated in the context of previously published results and were found to produce around 90% successful predictions.
1. Introduction Big data biology, which is a discipline of data-intensive science, has emerged because of the rapid increasing of data in omics fields such as genomics, transcriptomics, proteomics, and metabolomics as well as in several other fields such as ethnomedicinal survey. The number of medicinal plants is estimated to be 40,000 to 70,000 around the world [1] and many countries utilize these plants as blended herbal medicines, for example, China (traditional Chinese medicine), Japan (Kampo medicine), India (Ayurveda, Siddha, and Unani), and Indonesia (Jamu). Nowadays, the use
of traditional medicines is rapidly increasing [2, 3]. These medicines consist of ingredients made from plants, animals, minerals, or combination of them. The traditional medicines have been used for generations for treatments of diseases or maintaining health of people and the most popular form of traditional medicine is herbal medicine. Blended herbal medicines as well as single herb medicines include a large number of constituent substances which exert effects on human physiology through a variety of biological pathways. The KNApSAcK Family database systems can be used to comprehensively understand the medicinal usage of plants based upon traditional and modern knowledge [4, 5]. This
2
BioMed Research International
Table 1: List of diseases using International Classification of Diseases ver. 10 (class of disease IDs correspond to Table 2). ID
Disease
Class of disease
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
Abdominal pain Abdominal pain, diarrhea Acne Acne, skin problems (cosmetics) Amenorrhoea, dysmenorrhea Amenorrhoea, irregular menstruation Anaemia Appendicitis, urinary tract infection, tonsillitis Arthralgia Arthralgia, arthritis Asthma Benign prostatic hyperplasia (Bph) Breast disorder Bromhidrosis Bronchitis Cancer Cancer pain Cancer, inflammation Colic abdomen, bloating (in infant) Common cold Common cold, dyspepsia, insect bites Common cold, influenza Cough Degenerative disease Dermatitis, urticaria, erythema Diabetes Diabetic gangrene Diarrhea Diarrhea, abdominal pain Diseases of the eye Disorders in pregnancy Dysmenorrhea Dysmenorrhea, irregular menstruation Dysmenorrhea, menstrual syndrome Dyspepsia Dyspnoea Dyspnoea, cough, orthopnoea Fatigue Fatigue, anaemia, loss appetite Fatigue, lack of sexual function Fatigue, low back pain Fatigue, myalgia, arthralgia Fatigue, osteoarthritis Fertility problem Fever
3 3 16 16 6 6 1 3 11 11 15 10 6 16 15 2 2 2 3 15 15, 3, 16 15 15 14 16 14 16 3 3 5 6 6 6 6 3 15 15 11 1 6 11 11 11 6, 10 0
Table 1: Continued. ID
Disease
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Gastritis, gastric ulcer Haemorrhoids Headache Heart diseases Heartburn Hepatitis, other diseases of liver Hypercholesterolaemia Hypertension Hypertension, diabetes Hypertension, hypercholesterolaemia Hyperuricemia Immunodefficiency Indigestion (K.30) Indigestion, lose appetite Infertility Irregular menstruation, menstruation syndrome Kidney diseases Lactation problems Leukorrhoea (Vaginalis) Leukorrhoea (Vaginalis), dysmenorrhoea Lose appetite Lose appetite, underweight Low back pain, myalgia, arthralgia Low back pain, myalgia, constipation Low back pain, urinary tract infection Lung diseases Malaise and Fatigue Malaise and Fatigue, Constipation Malaise and Fatigue, Fertility Problems Malaise and Fatigue, Low Back Pain Malaise and Fatigue, Sexual Dysfunction Malaise and Fatigue, Skin Problems (Cosmetics) Malaria, anaemia Meno-metrorrhagia Menopausal syndrome Menopause/menstrual syndrome, leukorrhoea (vaginalis) Menstrual syndrome Menstrual syndrome, fatigue Migraine Mood disorder Myalgia, arthralgia Nausea/vomiting of pregnancy Osteoarthritis Osteoarthritis, fatigue
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
Class of disease 3 1 13 8 3, 8 3 14 8 14 14 1 9 3 3 6, 10 6 17 6 6 6 3 14 11 11 17 15 11 11 10, 11 11 11, 6, 10 16 1 6 6 6 6 6 13 18 11 6 11 11
BioMed Research International
3
Table 1: Continued. ID 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106
Disease
Overweight, obesity Paralysis Post partum syndrome Prevent from overweight Respiratory infection due to smoking Respiratory tract infection Rheumatoid arthritis, gout Secondary amenorrhea Secondary amenorrhea, irregular menstruation Sexual dysfunction, fatigue Skin diseases Skin problems (cosmetics) Sleeping and Mood Disorders Sleeping disorders Stomatitis Stomatitis, gingivitis, tonsilitis Stone in kidney (N20.0) 107 Stone in kidney (N20.0), urinary bladder stone (N21.0) 108 Tonsilitis 109 Tonsilofaringitis 110 Toothache 111 Typhoid, dyspepsia 112 Ulcer of anus and rectum 113 Underweight, lose appetite 114 Urinary tract infection (urethritis) 115 Vaginal discharges 116 Vaginal diseases
Class of disease 14 13 6 14 15 15 11 6 6 6, 10 16 16 18 18 3 3 17 17 4 4 13 3 3 3 17 6 6
database has information about the selected herbal ingredients, that is, the formulas of Kampo and Jamu, omics information of plants and humans, and physiological activities in humans. Jamu is generally composed based on the experience of the users for decades or even hundreds of years. However, versatile scientific analyses are needed to support their efficacy and their safety. Attaining this objective is in accordance with the 2010 policy of the Ministry of Health of Indonesian Government about scientification of Jamu. Thus, it is required to systemize the formulations and develop basic scientific principles of Jamu to meet the requirement of Indonesian Healthcare System. Afendi et al. initiated and conducted scientific analysis of Jamu for finding the correlation between plants, Jamu, and their efficacy using statistical methods [6–8]. They used Biplot, partial least squares (PLS), and bootstrapping methods to summarize the data and also focused on prediction of Jamu formulations. These methods give a good understanding about relationship between plants, Jamu, and their efficacy. Among 465 plants used in 3138 Jamu, 190 plants were shown to be effective for at least one efficacy and these plants were considered
to be the main ingredients of Jamu. The other 275 plants are considered to be supporting ingredients in Jamu because their efficacy has not been established yet. Network biology can be defined as the study of the network representations of molecular interactions, both to analyze such networks and to use them as a tool to make biological predictions [9]. This study includes modelling, analysis, and visualizations, which holds important task in life science today [10]. Network analysis has been increasingly utilized in interpreting high throughput data on omics information, including transcriptional regulatory networks [11], coexpression networks [12], and protein-protein interactions [13]. We can easily describe relationship between entities in the network and also concentrate on part of the network consisting of important nodes or edges. These advantages can be adopted for analyzing medicinal usage of plants in Jamu and diseases. Network analysis provides information about groups of Jamu that are closely related to each other in terms of ingredient similarity and thus allows precise investigation to relate plants to diseases. On the other hand, multivariate statistical methods such as PLS can assign plants to efficacy by global linear modeling of the Jamu ingredients and efficacy. However, there is still lack of appropriate network based methods to learn how and why many plants are grouped in certain Jamu formula and the combination rule embedding numerous Jamu formulas. It is needed to explore the relationship between Indonesian herbal plants used in Jamu medicines and the diseases which are treated using Jamu medicines. When effectiveness of a plant against a disease is firmly established, then further analysis about that plant can be proceeded to molecular level to pinpoint the drug targets. The present study developed a network based approach for prediction of plant-disease relations. We utilized the Jamu data from the KNApSAcK database. A Jamu network was constructed based on the similarity of their ingredients and then Jamu clusters were generated using the network clustering algorithm DPClusO [14, 15]. Plant-disease relations were then predicted by determining the dominant diseases and plants associated with selected Jamu clusters.
2. Methods 2.1. Concept of the Methodology. Jamu medicines consist of combination of medicinal plants and are used to treat versatile diseases. In this work we exploit the ingredient similarity between Jamu medicines to predict plant-disease relations. The concept of the proposed method is depicted in Figure 1. In step 1 a network is constructed where a node is a Jamu medicine and an edge represents high ingredient similarity between the corresponding Jamu pair. In Figure 1, the nodes of the same color indicate the Jamu medicines used for the same disease. The similarity is represented by Pearson correlation coefficient [16, 17]; that is,
corr (𝑋, 𝑌) =
∑𝑙𝑖=1 (𝑥𝑖 − 𝑥) (𝑦𝑖 − 𝑦) √∑𝑙 (𝑥𝑖 − 𝑥)2 ∑𝑙 (𝑦𝑖 − 𝑦)2 𝑖=1 𝑖=1
,
(1)
4
BioMed Research International
Table 2: Distribution of Jamu formulas according to 18 classes of disease (classes of diseases are determined by NCBI in ID1 to ID16 and by the present study in ID17 and ID18 represented by asterisks in Ref. columns). ID
Class of disease (NCBI)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Blood and lymph diseases Cancers The digestive system Ear, nose, and throat Diseases of the eye Female-specific diseases Glands and hormones The heart and blood vessels Diseases of the immune system Male-specific diseases Muscle and bone Neonatal diseases The nervous system Nutritional and metabolic diseases Respiratory diseases Skin and connective tissue The urinary system Mental and behavioral disorders
Ref.
Number of Jamu
Percentage
NCBI NCBI NCBI NCBI NCBI NCBI NCBI NCBI NCBI NCBI NCBI NCBI NCBI NCBI NCBI NCBI ∗ ∗
201 32 457 2 1 382 0 57 22 17 649 0 32 576 313 163 90 21
6.41 1.02 14.56 0.06 0.03 12.17 — 1.82 0.70 0.54 20.68 — 1.02 18.36 9.97 5.19 2.87 0.67
The number of Jamu classified into multiple disease classes The number of Jamu unclassified
119 4
3.79 0.13
Total Jamu formulas
3138
100.00
where 𝑥𝑖 is the weight of plant-𝑖 in Jamu 𝑋, 𝑦𝑖 is the weight of plant-𝑖 in Jamu 𝑌, 𝑥 is mean of Jamu 𝑋, and 𝑦 is mean of Jamu 𝑌. The higher similarity between Jamu pairs the higher the correlation value. In the present study, 𝑥𝑖 and 𝑦𝑖 are assigned as 1 or 0 in cases the 𝑖th plant is, respectively, included or not included in the formula. Under such condition, Pearson correlation corresponds to fourfold point correlation coefficient; that is, corr (𝑋, 𝑌) =
𝑎𝑑 − 𝑏𝑐 , √(𝑎 + 𝑏) (𝑎 + 𝑐) (𝑏 + 𝑑) (𝑐 + 𝑑)
(2)
where 𝑎, 𝑏, 𝑐, and 𝑑 represent the numbers of plants included in both 𝑋 and 𝑌, in only 𝑋, in only 𝑌, and in neither 𝑋 nor 𝑌, respectively. In step 2 the Jamu clusters are generated using network clustering algorithm DPClusO. DPClusO can generate clusters characterized by high density and identified by periphery; that is, the Jamu medicines belonging to a cluster are highly cohesive and separated by a natural boundary. Such clusters contain potential information about plant-disease relations. In step 3 we assess disease-dominant clusters based on matching score represented by the following equation: matching score =
number of Jamu belonging to the same disease . total number of Jamu in the cluster (3)
Matching score of a cluster is the ratio of the highest number of Jamu associated with a single disease to the total number of Jamu in the cluster. We assign a disease to a cluster for which the matching score is greater than a threshold value. In step 4, we determine the frequency of plants associated with a cluster if and only if a disease is assigned to it in the previous step. The highest frequency plant associated to a cluster is considered to be related to the disease assigned to that cluster. True positive rates (TPR) or sensitivity was used to evaluate resulting plants. TPR is the proportion of the true positive predictions out of all the true predictions, defined by the following formula [18]: TPR =
TP , TP + FN
(4)
where true positive (TP) is the number of correctly classified and false negative (FN) is the number of incorrectly rejected entities. We refer to the proposed method as supervised clustering because after generation of the clusters we narrow down the candidate clusters for further analysis based on supervised learning and thus improve the accuracy of prediction of the proposed method.
3. Result and Discussion 3.1. Construction and Comparison of Jamu and Random Networks. We used the same number of Jamu formulas from previous research [6], 3138 Jamu formulas, and the set union
BioMed Research International
5
Input: Jamu formulas
Step 1 Constructing ingredient correlation network
Step 2 Extracting highly connected Jamu
A
B
C
Step 3 Supervised analysis for voting utilization
A
Step 4 Listing ingredients
D
B
C
D
Output: plant-disease relations
Figure 1: Concept of the methodology: network construction based on ingredient similarity between individual Jamu medicines, network clustering, and classification of medicinal plants to dominant disease.
J00550
J01529 J02944 J03080
J00027 J01651 J00468 J02515
J02350 J00441 J02682
J02315 J01838
J01934
J02557
J01379 J01378 J02919
J02465 J00540 J01177 J00879 J01645 J02933
J02661 J01212
J02351
J02344
J03124
J02310 J00455 J02300
J00844
J00400J02364
J02341
J03042
J00547 J01315
J01948
J01179 J02782 J00326 J00450 J01030
J00497 J01686 J02092
J00512 J01643
J00840 J02595 J01441 J02813
J02604
J01301
J00854 J01705 J00868
J00980
J02256
J01325
J01813
J00933
J00893
J00181
J01854
J00310 J02513 J02606 J02591 J03011 J01364 J00268 J01100 J01370 J01470 J02598 J01488 J03129 J00466 J00284J00924 J02599
J02512 J01308 J02628
J02343
J01424
J02941
J00258
J00250
J01181
J01927
J03035
J02963
J02586 J00411 J00183 J03071 J01575
J02408
J01123 J01578
J02187
J02185
J00953
J02758
J02103
J03004 J01359
J01090
J02886
J02085 J02990 J02316
J02993
J02991
J02983
J01571
J02083
J02305
J02945
J00652
J01910
J02838
J01188
J01180
J01087
J00123
J02852
J00154 J00156J00155
J01824 J02851 J02845
J02850 J00797 J02808
J00813 J00815J01367 J00085
J02875
J02908
J02861
J00612
J00711
J02876 J01234
J02923 J00640 J01780
J02885 J00614 J03087 J00633 J00646 J02929
J00337
J00086 J00075 J00355 J00818 J02854
J02874
J00635 J02771
J02084
J00592 J02868J01279
J01781
J02849 J02844 J02839
J00233 J00120 J02853 J00148 J01393 J01702 J02791 J00230 J01861 J01126 J02802 J01579 J00733 J03082 J01835J02801 J02856J01431 J01354 J01528 J01498 J01825 J02666 J00713J02036 J02840 J00898 J01615 J01909 J02670 J00153 J00158 J02859 J00332 J02831 J02261J02969 J02022 J01845 J01565 J02833 J01863 J01680 J02922 J02400 J01394 J02035 J00306 J00891 J01551 J02893 J01832 J00958 J00149 J02884 J01395 J02387 J02793 J02656 J00427 J01983 J03088 J02135 J00068J02374 J00226 J00986 J02203 J00894 J02262 J02263 J02803 J01704 J01617 J01500 J00160 J03099 J00088 J00054 J00821 J02792 J01569 J01690 J01502J02655 J00307 J00892 J01361 J02847 J02789J02754 J01698 J02032 J01649J02137 J02123 J03081 J02855 J02317 J01700 J00071 J02029 J02156 J03100 J02751 J02373 J00147J02038 J01432 J02045 J03097J00758 J01064 J02662 J01616 J02663 J00286 J02812 J02159 J01862 J02326 J02653 J02126 J01703 J01053 J00227 J02260 J02342 J03098 J02950 J01614 J02138 J02136 J01648 J02153 J00102 J02033 J02028 J02075 J01501 J02848 J01550 J01681 J03085 J01697 J02811 J02202 J01831J02925 J02995 J00985 J02399 J01827 J02339 J02124 J02152 J02830J01826 J01836 J01613 J00798 J02497 J01699 J00121 J01908 J02787 J02788 J02660 J02259 J00150 J02568 J01618 J00331 J01377 J02402 J01580 J01434 J03115 J01270 J01688 J02658 J02753 J02659 J00087 J02204 J01837 J02388 J01609 J02530 J02879 J03101 J00073 J00897J01063 J00157 J01689 J00067 J01855 J01120 J02125 J03084 J03086 J00734 J02354 J01576 J02809 J03095 J01353 J03083 J01368 J02157 J02258 J02948 J02039 J02786 J01336 J02037 J00735 J01548 J01564 J01683 J01685 J01830 J02034 J02469 J00146 J01122 J02810 J00161 J03094 J02949 J00053 J00066 J00820 J02968 J01271 J00077 J01583 J02031 J01527 J01121 J02529 J02401 J01360
J01952
J01789 J00955 J01071 J01993
J03061 J01444 J01456 J02880 J01611
J01438 J03079
J00865
J02390
J01170
J01074 J02915
J02921 J02318 J00746 J02743
J01992
J00488
J02896 J02869J02892 J01777
J02728
J02747 J02524 J02744J02733 J02097 J02740
J00542 J02371
J03075 J01116 J02871
J03089
J02843
J02587
J02687
J02927 J00145
J00460
J02741
J02558 J01077
J01035
J02027 J01542 J02255 J03131 J00298 J02732 J00732J00191 J01679 J03114 J01868 J01879 J01316 J02616 J02223 J02790
J02904 J01023
J02609
J02700
J01508
J02636
J02610 J00611
J01405
J02759
J02313
J02433
J01877
J02392
J02778
J00486
J02926
J02909
J02901
J01491
J00552
J01329
J02247
J01834
J01330
J02453
J02302
J01587
J02391
J02777
J00438
J02897
J02882
J02887
J01493
J00514
J01326
J01539
J01268
J00238
J01581
J02650
J00069
J02618
J01108
J02911
J00563
J00339
J02569
J01435
J02846
J02314
J01998
J01818
J02890
J00556
J00316
J00954
J00018
J02842
J02301
J00490
J01817
J00959
J01479
J02761
J00232
J01387
J02298
J01436
J01118
J01080
J00566
J01871 J00548
J01582
J02651
J00070
J02619
J02129
J01107
J02050
J02763
J02669
J02607
J02319
J01849
J01852 J00916
J01865
J02899
J02320
J00364
J01506
J00126 J01016
J00017
J00895 J01695
J00580
J02104
J00928 J00906
J03093 J02997 J02737
J00311 J01552 J00312 J00554
J02102 J01507
J02736 J02165
J02210
J01636 J01887
J01786
J02101
J02900
J02913
J02220 J01751
J00266
J01481
J03106
J00957
J02289 J02674
J02776
J00297
J00112
J01653 J02572 J01701 J01337 J02283J02290
J02930
J02857 J02271
J01454
J00415 J01504
J02466
J01001
J01265
J00247 J00900 J00899
J02360
J03017
J01513
J02912 J02894
J02760
J02142 J01631
J02002
J01607 J02836
J02564
J01037
J01939 J01938
J01519 J01590 J02234J02130 J01591 J01420 J02140 J01117 J03096 J00811 J02543 J02627 J01051 J01324 J01425 J02049 J01485 J01244 J01654 J01975 J02623J01483 J00021 J02825 J01842 J01972 J01509 J00322 J01138 J03024 J01922 J01650 J00192J01971 J00473 J02058 J00489 J01532J00259 J00862 J00918
J00216 J01931
J00024
J02973 J02615 J02956 J02410 J02407
J01078
J00361
J01081 J01380 J01417 J02182 J00108 J02378 J01954 J00522 J02552 J00685 J01131 J01753 J02605 J01990 J02056 J01752 J01737 J02603 J01040 J01373 J00857 J00133 J01741 J00391 J02585 J00368 J02088 J01099 J02646 J00340 J01953 J01814 J00866 J01661 J00792 J00961 J00568 J02622 J03002 J00198 J01757J00948 J02081 J02641 J02176 J02625 J00370 J00962 J02597J01143 J01823 J01733 J01147 J02814 J01763 J01142 J02928 J01211 J02015 J00681 J01853 J01738 J00669 J01522 J01624J00207J01348 J01742 J00267J00462 J01320 J00748 J00841J00424 J01328 J01124 J00103 J01525 J01520 J01521 J00341 J01482 J01215 J01963 J02376 J00668 J02164 J03126 J02883 J00197 J00837 J00092J00469 J02393 J02335 J01949 J02576 J02518 J01523 J01601 J00089 J00743 J02601 J01678 J02794 J00382 J01468 J02183 J01524 J00082 J02267 J02007 J01464 J02464 J00093 J02362 J02734 J02357 J02730 J00403 J02278 J00885 J02361 J02236 J00979J01977 J02227 J00090 J00487 J02746 J02600 J00500 J02249 J02163 J01745 J01052 J01750J01433 J00869 J00472 J01027 J02196 J02323 J01130 J00902 J02139 J03076 J02000 J00572 J01784 J01204 J00456 J00756J02691 J00091 J00921 J00945J01514 J03128 J00826 J02540 J01968 J01323 J02685 J00808 J01467 J02299 J02160 J02692 J00925 J02143 J01718 J00270 J02181 J01050 J01047 J01892 J00140 J00478 J00404 J00433 J00176J00274 J02516 J01041 J02631 J00532 J00289 J02377 J02975 J00336 J02212 J00359 J01533 J01046 J00480 J01543 J00479 J01221 J01560 J01974 J01819 J01873 J00751 J02235 J00028 J01076 J01860 J02815 J03072 J01275 J00278 J02954 J02689 J01338 J02109 J00186 J03046 J02534 J00730 J00937 J02578 J01962 J01828 J01538 J01385 J02748J02694 J01345 J00952 J02533 J02030 J01097 J01349 J02690 J00245 J02145 J00074 J02573 J02233 J00291 J01200 J01517 J01925 J02589 J00920 J01870 J00555 J00117 J02395 J01389 J02241 J03135 J01176 J02717 J02441 J01139 J00031 J01764 J00809 J01246 J01806 J00273J01978 J01290 J01840 J01423 J02307 J02086 J01558 J02499 J00474 J02111 J02394 J01426 J00080 J00296 J02562 J00030 J00830 J00328 J00015 J00315 J02277 J02783 J00098 J01157 J03006 J02438 J00546 J00436 J01350 J02442 J01959 J02953 J01478 J00785 J01346 J01284 J00816 J02722 J02199 J00588 J00573 J02200 J00308 J02547 J01144 J01294 J01134 J01146 J02291 J00465 J00537 J00251 J01859 J01714J00399 J02193 J01594 J00260 J00973 J02046 J01847 J00972 J01419 J01414 J00777 J01031 J01190 J02288 J03044 J01375 J02331 J00913 J01088 J02648 J02695 J02237 J01032 J02065 J00901 J02545 J01866 J02243 J02087 J02076J00530 J00388 J01299 J02982 J00338 J00760 J00853 J02423 J02668 J00269 J00452 J01383 J00982 J01381 J01960 J01603 J01627 J00418 J01950 J01734 J01158 J01788 J00819 J02711 J01357 J02131 J00704 J00858 J00127 J03127 J00293 J00867 J02577 J00849 J01098 J02106 J01461 J01736 J01512 J00981J00033 J01687 J01755 J02822 J02866 J01293 J00859 J01511 J02951 J01141 J01012 J01351 J02865 J02309 J00375 J02121 J02078 J02245 J01442 J02998 J00324 J02424 J01544 J01809 J01856 J02064 J01729 J02426 J01274 J01184 J02735 J00679 J00113 J00975 J01803 J02860 J02285 J01466 J00905 J02217 J01038 J00139 J01858 J00118 J01402 J01966 J02652 J01458 J01049 J00851 J02561 J02738 J02693 J01002 J02479 J02952 J02439 J01761 J03059 J00471 J01286 J02224 J00558 J01219 J00470 J01133 J01489 J02725 J02795 J02575 J00020 J02284 J00582 J00026 J01421 J01191 J01942 J02480 J01804 J00503 J02574 J01449 J00520 J00032 J00970 J00264 J00834 J02471 J00303 J02090 J00408 J01195 J02336 J01247 J00240 J02213 J01901 J02482 J03003 J00702 J02626 J03109 J01155 J02386 J00302 J02054 J01140 J02428 J02630 J03023 J00366 J03005 J02306 J02824 J00100 J01878 J02496 J01510 J01260 J00057 J02411 J00006 J01894 J01429 J00517 J02862 J02468 J02114 J03025 J01545 J01214 J00536 J02053 J02068 J00309 J02079J00313 J00755 J02489 J00277 J02168 J01243 J00539 J02221 J01222 J02414 J01287 J01024 J00544 J00908 J00323 J02067 J01749 J03047 J01236 J01196 J02009 J01979 J02910 J01768 J02514 J00848 J01206 J00599 J02461 J01712 J00618 J01447 J00012 J01754 J00184 J00144 J00282 J00583 J00695 J02451 J00444 J03065 J01340 J00608J00690 J00372 J01309 J02481 J02452 J00587 J00800 J01604 J00682 J02273 J01384 J02024 J00624 J01850 J01388 J00609 J02947J01307 J02582 J02567 J01756 J00376 J02447 J00281 J00019 J00389 J00065 J01055 J02292 J00936 J00499 J00446 J02444 J00620 J02057 J00623 J00594 J02755 J02436 J01746J00591 J01231J00223 J01566 J00745 J03104 J01178 J02520 J02170 J00935J00263 J01145 J00124 J01382 J00062 J00619 J01945 J01940 J02282 J00501J02008 J02863 J01606 J00025 J01790 J00653 J00295 J01710 J01857 J02296 J01798 J01880 J01312 J00173 J00670 J00058 J01103 J02001 J00244 J00045 J01413 J01487 J02643 J00252 J00457 J00696 J01534 J01239 J02055 J02492 J02820 J02226 J01484 J00387 J02443 J00330 J02440 J01547 J02269 J00225 J01676 J02353J00288 J01182 J01311 J02147 J01772 J02112 J01796 J02419 J01048 J02955 J01536 J01169 J02026 J00304 J01605 J01561 J01255 J00616 J01916 J00055 J02222 J00228 J02240 J02041 J02749 J02063 J00272 J02739 J01165 J02784 J02965 J00357 J00946 J01472 J02252 J02197 J00329 J02494 J01794 J00242 J01069 J02541 J00693 J01610 J01793 J02120 J01549 J01541 J01970 J02190 J00625 J00416 J01740 J00440 J01839 J00463 J02460 J00451 J00454 J02649 J01573 J00215 J00504 J02345 J00987 J02214 J00219 J00878 J00432 J00458 J01647 J00367 J00190J02016 J00224 J00061 J00437 J00081 J00482 J00775 J01807 J02051 J01967 J00445 J01396 J00220 J01802 J02312 J01775 J02264 J02796 J01808 J02254 J00011 J00776 J01706 J00974 J01713J01776 J00861 J00926 J00314 J00659 J00059 J02162 J02238 J02110 J01771 J01619 J03136 J01173 J02133 J02835 J00009 J02798 J00513J01153 J00046 J01769 J00079 J02583 J01072 J00739 J00002 J03016 J01792 J00708 J02454 J00723 J01920 J01692 J02495 J00860 J00703 J00194 J02563 J01183 J01882 J00731 J02177 J01632 J01172 J00665 J00722 J02676 J00453 J01943 J01783 J00812 J02617 J01663 J02253 J00222 J00320 J01440 J01774 J00673J02179 J00122 J00110 J00035 J01443 J01773 J02327 J00283 J02265 J01748 J00104 J01125 J02474 J02584 J00204 J00448 J02403J01708 J02174 J02475 J00005 J00106 J00036 J00601 J02571J02970 J00001 J02432 J02178 J02405 J00912 J00064 J01555 J00598 J00449 J01874 J01156 J02172 J00063 J00319 J00356 J01995 J02047 J01559 J01770 J02062 J00772 J00221 J02211 J01017 J00300 J02671 J00855 J00493 J00643 J00208 J00927 J02729 J00934 J01693 J00506J01355 J03107 J00076 J01779 J00666 J01918 J03045 J01159 J01273 J01465 J02108 J00645 J01427 J00044 J00519 J01937 J02231 J00136 J02488 J00196 J02490 J00610 J02347 J03009 J01717 J00740 J01535 J00822 J00790 J00773 J00188 J02418 J00138 J03105 J02590 J02966 J02498 J03034J02425 J00976 J02565 J02293 J01691 J01363 J01984 J00393 J02404 J03049 J02608 J01033 J02383 J00637 J02485 J00843 J01999 J02095 J02427 J01313 J03091 J00788 J00629 J00180 J01065 J02248 J03012 J01302 J02508 J02437 J01318 J01480J02279 J02731 J00603 J02580 J01248 J00716 J00664 J00774 J00764 J02964 J00802 J01356 J01288 J01556 J01229 J02367 J03053J00254 J02018 J03019 J01600 J00172 J00742 J00714 J01319J00491 J01929 J02118 J02113 J02209 J02602 J01720 J00825 J00717 J01422 J00193 J02080 J02832 J01843 J01895 J02962 J00229 J02219 J03013 J02902 J03007J00721 J00750 J00595 J00877 J03020 J01272 J00401 J01276 J01404 J01018 J00016 J02281 J03138 J02266 J02523 J00715 J00718 J03031 J03050 J00202 J02817 J02208 J01732 J01722 J00719 J02519 J02532J00676 J00782 J02446 J01043 J00694 J03018 J03078 J00729 J01890 J00010 J01366 J02295 J00720 J01416 J03055 J00727 J00903 J03051 J00231 J00724 J02251 J01462 J01486 J02048 J01593 J02099 J02363 J01296 J01410 J00607 J00725 J00481 J02467 J02978 J00585 J00944 J00968 J01085 J02559J01174 J00799 J00169 J02201 J00856 J01015 J00767 J00602 J02094 J00515 J00392 J00398 J00780 J00680 J00754 J00586 J03092 J00757 J02581 J01957 J00672 J02180J00299 J00789 J00651 J01277 J00850 J00647 J03038 J02891 J02476 J00151 J00551 J01743 J02073 J00109 J00152 J00761 J00655 J02985 J01985 J00649 J00421 J00590 J01719 J01572J02297 J00931 J01622 J01110 J02592 J00793 J02645J00763 J03112 J02677 J00386 J00701 J00013 J00390 J01042 J02531J00634 J00626 J02473J03054 J01278 J02415 J00779 J02458 J03110 J02713J03063J00632 J00014 J01150 J02429 J01009 J00038J00265 J00029 J01677J00378 J02330 J01585 J00674 J00007J00593 J01765 J01282 J00644J02430 J02708 J00712 J01730 J02072 J03026 J01135 J02215 J02449 J02710 J00597 J00084 J02781 J00915 J00781 J01475 J01474 J01062 J02937 J00212 J00969 J02501 J01109 J02280 J02680 J01723 J03057 J01904 J00210 J00950 J02052 J01986 J02448 J01280 J00964 J00394 J01249J00661 J00648 J00137 J00049 J03121 J00688 J00034 J00290 J00003 J01642 J02935 J00041 J01343 J02420 J00425 J03021 J01707 J01956 J01409 J00371 J00574 J03037 J00327 J02369 J01066 J03039 J00008J00787 J01731 J02368 J02491 J00352 J02294 J01073 J00963J02503 J00342 J03008 J00827 J00606 J00753 J00373 J02455 J01218 J02134 J01694 J00700 J01010 J01149J02477J01014 J03043 J00689 J00671J00794 J00951 J00294 J00864 J00768 J02457 J02976 J00677 J01724 J00575 J00200 J00369 J00206 J00037 J00650J02478 J01412J01283 J00692 J00641 J01965 J00004 J01390 J00874 J01151 J00863J03052 J01285 J02003 J01008 J01019 J00971 J00083 J01471 J02191 J00786 J00383 J00461 J01075 J01715 J01264 J01105 J01132 J00405 J02192 J01056 J01810 J02421 J00662 J00890 J01980 J02013 J01223 J00099 J00839 J02365 J02333 J00875 J01194 J00886 J01224 J01958 J02070 J02242 J01197 J01129 J00956J01201 J01039J00784 J00657 J00663 J01997 J01557 J02286 J01951 J01216 J03130 J00642 J00654 J00783 J02936 J03010 J00765 J02250 J01136 J01621 J01011 J01220 J02768J00984 J00589 J02642 J02799J00791 J01114 J02229 J02785 J02681 J00660 J00459J01811 J01198 J01192J01917 J00996J02644 J00143 J02704 J00914 J02074 J02697 J01716 J00248 J00175 J03022 J01203 J00167 J03048 J01154 J01537 J01060 J00965 J02154 J00360 J02435 J02517 J00377 J01162 J02275 J01199 J01202 J00276 J01235J00778J00168 J00636 J00023 J03113 J00728 J01208 J01281 J02673 J00420 J02931 J01232 J01864 J00628 J00508 J01418 J00638 J01310 J02664 J02325J01562 J02720 J01919J02780 J00807 J00880 J01036J02536J02004 J00604 J03027 J01623 J00737 J02971 J01205 J02194 J00846 J00726 J01596 J00435 J02355 J02806 J00796 J02100J03123 J01930 J00271 J03064 J00374 J01228 J01568 J01955 J02132 J01321 J01322 J02707 J00709 J01029 J00922 J01251 J01682 J00114 J01187 J02346 J00105 J01899 J02797 J00991 J00630 J02324 J02967 J01263 J02972 J01936 J01230 J01973 J01897J01898 J01207 J02509 J03108 J01902J01886 J02169 J02527 J02216J01928 J00923 J02380 J00205 J02555 J00828 J00770 J00467J01185 J02827 J00639 J00627 J01317 J03122 J01007 J03040 J00498 J00142 J01334 J01209J01006 J01459 J02665 J02061 J00022 J02819 J02409 J00417 J01148 J00997 J00410 J00505 J00771 J02511 J01473 J00845 J00179J02228 J02225 J00917 J02184 J01161 J02762 J02807 J00707 J01812 J01805 J00621 J02542 J00174 J00824 J01415 J00998 J00492J02014 J00406 J01574 J00384 J01709 J00687 J00622 J02069 J02483 J02493 J01226 J02556 J01137 J02406 J02397 J00759 J00430 J03029 J00884 J00135 J01352 J02566 J00596 J01670J01635 J01257 J01913 J00656 J01727 J01914 J01656 J02987 J02166 J03000 J00842 J01306 J00631 J00358 J02115 J00600 J01445J01657 J00484 J01637 J01259 J00107 J01739 J02672 J00419 J00187 J02510 J02683 J00134J00801 J00947 J00698 J01639 J02122 J01068 J00678 J00039 J00584 J01241 J01238 J00887 J01460 J00699 J02328 J02060 J02943 J03133 J00983 J00882 J00429 J02614 J01924 J03028 J01057 J00095 J00195 J01477 J02186 J01988 J00667 J02141 J00362 J01815 J02544 J01655 J03058 J00199J02329 J00253 J01791 J00125 J00097 J02158 J02756 J02535 J00683 J02107 J00178 J02688 J00334 J00941 J03103 J00617 J01530 J01923 J00686 J01269 J01893 J02206 J00691 J00365 J02588 J02144 J01291J01526 J00605 J01987 J02981 J00938 J01463 J02372 J00249 J02818 J00847 J00101 J00078J02959 J01903 J02188 J00883 J01711 J00056 J00896J01664 J01166 J01640 J02714 J00257 J01626 J02942 J02434 J01408 J00094 J00189 J00301 J00164 J02398 J02632 J01262 J01799 J02634 J01797 J02611J02276 J01305 J01766 J01634 J00203 J02272J00218 J02375 J00335 J00752 J00579 J00705 J02119 J00096 J01867 J01584 J02804 J00747 J02538 J00929 J02684 J00413 J00166 J01102 J01672 J01906 J01597J02996 J00414 J03137 J00889 J02639 J02696 J01531 J02960J03134 J00395 J02961 J01896 J03090 J02934 J01374 J01163 J01567 J02679 J00409 J02105 J01969 J00379 J00214 J02816 J00876 J01497 J01641 J02389 J01376 J02939 J00881 J02218 J00510 J02635 J02640 J00292 J00578 J02637 J00911 J00321 J02500 J01674 J02171 J00275 J01667 J02167 J01900 J01167 J01912 J01915 J01976 J02093 J01671 J00141 J02878 J02703 J01115 J02431 J02066 J01638 J02116 J01371 J00246 J00888 J02006 J02718 J01787 J01964 J02920 J03032 J00684 J02303 J02189J02268 J00262 J01602 J02528 J01884 J02554 J01013 J01885 J02019 J03117 J03030 J01726 J00949 J02040 J02359 J02905 J02012 J02005 J01455 J00350 J01341 J03116 J00960 J02198 J00060 J03120 J02522 J01164 J01800 J01430 J00560 J01469 J02539 J01227 J01451 J00171 J01586 J02702 J00502 J00762 J01054 J02071 J00978 J01592 J02232 J02472 J00810 J01841 J02381 J02546 J01403 J02932 J01267 J03119 J01875 J00261 J03041 J02723 J00243 J01668J01665 J02525 J00999 J01369 J02940J01673 J00279 J01045 J02098 J02338 J01659 J02633 J00795 J00407 J00710 J01314 J00285 J01833 J00353 J01666 J01490 J02701 J00111 J00345 J02091 J01344 J01829 J00354 J00318 J02470 J01669 J02148 J00518 J00344 J00990 J01020J02356 J01450 J02938 J00910 J01189 J01889 J02146 J01005 J01629 J02548 J02385 J00170 J00343 J00613 J02570 J00658 J02657 J01888 J00349 J00511 J00346 J00994 J02979 J01905 J01926 J01399J01400 J01625 J00496 J00675 J00988 J01696 J01747J00347 J02752 J00907 J02873 J01448 J01392 J01298 J03125 J00529 J00236 J00823 J00351 J02895 J02526 J01398 J00165 J01725 J01044 J02549 J01991 J00464 J01554J00525 J00348 J01266 J02715 J00048 J00201J00412 J00495 J00838 J03033 J01406 J00494 J02195 J01240 J01386 J01171 J02413 J02537 J00363 J02459 J02686 J01401 J02161J03062 J02502 J00162 J00533 J00989 J02462 J00380 J01303 J02594 J02010 J01883 J00255 J01160 J00434 J01295 J01411 J00280 J00234 J02020 J00942 J00531 J01407 J02507 J01034 J00317 J01358 J01452 J00402 J01225 J01210 J01744 J01907 J01499 J02716 J00443 J01911 J02593 J00235 J02974 J00209 J02487 J01760 J00873 J02867 J02881 J02486 J00333 J02712 J02207 J02742 J02011 J00241 J01515 J02484 J01989 J00527 J02370J03060 J02872J02841 J01881 J03056 J01300 J01646 J00526 J02724 J00571 J01021 J00766 J02727 J01630 J01439 J02021 J01101 J02579 J00904
J02721
J00422
J02629
J02638
J02834
J01061
J03102
J02958
J00177 J02017
J00516 J02553
J01175J01453
J01844 J02311
J01935
J01932
J00040
J01652 J01728
J00932
J01941 J00523 J01660
J00769
J01437
J01119
J02675 J01872
J01981 J02745
J01758 J01982
J02709 J01759
J01996 J02719
J02023
J01082
Figure 2: The network consisting of 0.7% Jamu pairs (correlation value above or equal to 0.596).
J02096
J02667
J00615
J00697
6
BioMed Research International Table 3: Statistics of three datasets. Parameters
Network statistics
DPClusO
Total pairs Minimum correlation Number of Jamu formulas Average degree (Random network: ER) (Random network: BA) (Random network: CNN) Clustering coefficient (Random network: ER) (Random network: BA) (Random network: CNN) Number of connected components (Random networks: ER, BA, CNN) Network diameter (Random network: ER) (Random network: BA) (Random network: CNN) Network density (Random network: ER) (Random network: BA) (Random network: CNN) Total number of clusters Number of clusters with more than 2 Jamu (%) Number of Jamu formulas in the biggest cluster
of all formulas consists of 465 plants. We assigned 3138 Jamu formulas to 116 diseases of International Classification of Diseases (ICD) version 10 from World Health Organization (WHO, Table 1) [19]. Those 116 diseases are mapped to 18 classes of disease, which contains 16 classes of disease from National Center for Biotechnology Information (NCBI) [20] and 2 additional classes. Table 2 shows distribution of 3138 Jamu into 18 classes of disease. According to this classification, most Jamu formulas are useful for relieving muscle and bone, nutritional and metabolic diseases, and the digestive system. Furthermore, there is no Jamu formula classified into glands and hormones and neonatal disease classes. We excluded 4 Jamu formulas which are used to treat fever in the evaluation process because this symptom is very general and almost appeared in all disease classes. Jamuplant-disease relations can be represented using 2 matrices: first matrix is Jamu-plant relation with dimension 3138 × 465 and the second matrix is Jamu-disease relation with dimension 3138 × 18. After completion of data acquisition process, we calculated the similarity between Jamu pairs using correlation measure. The similarity measures between Jamu pairs were determined based on their ingredients. Corresponding to 𝐾 (3138 in present case) Jamu formulas, there can be maximum (𝐾 × (𝐾 − 1)/2) = (3138 × (3137/2)) = 4,921,953 Jamu
0.7%
0.5%
0.3%
34,454 0.596 2,779 24.8 (24.8 ± 0.0) (24.7 ± 0.1) (24.7 ± 0.4) 0.521 (0.009 ± 0.000) (0.030 ± 0.001) (0.246 ± 0.008) 69 (1) 15 (4.0 ± 0.0) (10.8 ± 0.8) (14.6 ± 1.9) 0.008 (0.009 ± 0.000) (0.009 ± 0.000) (0.009 ± 0.000)
24,610 0.665 2,496 19.7 (19.7 ± 0.0) (19.7 ± 0.1) (19.7 ± 0.4) 0.520 (0.008 ± 0.000) (0.028 ± 0.001) (0.239 ± 0.008) 119 (1) 17 (4.0 ± 0.0) (11.2 ± 1.5) (14.1 ± 1.4) 0.008 (0.008 ± 0.000) (0.008 ± 0.000) (0.008 ± 0.000)
14,766 0.718 2,085 14.2 (14.2 ± 0.0) (14.1 ± 0.1) (14.0 ± 0.4) 0.540 (0.007 ± 0.000) (0.026 ± 0.001) (0.233 ± 0.010) 254 (1) 20 (5.0 ± 0.0) (10.8 ± 0.9) (14.7 ± 1.3) 0.007 (0.007 ± 0.000) (0.007 ± 0.000) (0.007 ± 0.000)
1,746 1,296 (74.2) 118
1,411 873 (61.9) 104
938 453 (48.3) 89
pairs. We sorted the Jamu pairs based on correlation value using descending order and selected top-𝑛 (0.7%, 0.5%, and 0.3%) pairs of Jamu formula to create 3 sets of Jamu pairs. The number of Jamu pairs for 0.7%, 0.5%, and 0.3% datasets is 34,454 pairs, 24,610 pairs, and 14,766 pairs and the corresponding minimum correlation values are 0.596, 0.665, and 0.718, respectively. The three datasets of Jamu pairs can be regarded as three undirected networks (step 1 in Figure 1) consisting of 2779, 2496, and 2085 Jamu formulas, respectively (Table 3). Figure 2 shows visualization of 0.7% Jamu networks using Cytoscape Spring Embedded layout. We verified that the degree distributions of the Jamu networks are somehow close to those of scale-free networks, that is, roughly are of power law type. However, in the high-degree region the power law structure is broken (Figure 3). Nearly accurate relation of power laws between medicinal herbs and the number of formulas utilizing them was observed in Jamu system but not in Kampo (Japanese crude drug system) [4]. The difference of formulas between Jamu and Kampo can be explained by herb selection by medicinal researchers based on the optimization process of selection [4]. Thus, the broken structure of power law corresponding to Jamu networks is associated with the fact that selection of Jamu pairs based on ingredient correlation leads to nonrandom selection. We also constructed random networks according
BioMed Research International
7
●● ● ● ●● ●● ● ●●●● ● ●
●
●
10
20
50
Frequency 13 23 40
● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ●● ● ● ●●●● ● ●● ●● ● ●● ●●●●●● ● ●● ●●● ● ● ● ●
200
100
●
8
● ●● ● ● ● ● ●●● ●●
5
●●
2 3
37 22 13 5 2 1
3
●
●● ● ● ● ● ●
2
●
●
● ● ●● ● ●● ●● ●● ● ●● ● ●● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ●● ● ●● ● ● ●●●● ● ●● ●
1
●
221
●
71
●
5
●
8
Frequency
0.5% ●
1
●
63 114
234
0.7%
1
2
5
10
(Deg.)
20 (Deg.)
●
●● ●●● ● ●● ●●●●● ●●
50
100
200
0.3% ● ● ●
●
81
●
Frequency 12 22 43
●●
● ● ● ● ● ●●● ● ● ● ●
7
●
4
● ●
●
●● ●
●
●
● ●
● ●● ●
2
●● ●● ● ●
1
●● ●●
1
2
5
10 (Deg.)
20
●● ● ●
50
100
Figure 3: Degree distributions of three Jamu networks roughly follow power law. The 𝑥-axis corresponds to the log of degree of a node in the Jamu network and the 𝑦-axis corresponds to the log of the number of Jamu.
to Erd˝os-R´enyi (ER) model [21], Barab´asi-Albert (BA) model [22], and Vazquez’s Connecting Nearest Neighbor (CNN) model [23] of the same size corresponding to each of the real Jamu network. We used Cytoscape Network Analyzer plugin [24] and R software for analyzing the characteristics of both the Jamu and the random networks. We determined five statistical indexes, that is, average degree, clustering coefficient, number of connected component, network diameter, and network density of each Jamu network and also of each random network. The clustering coefficient 𝐶𝑛 of a node 𝑛 is defined as 𝐶𝑛 = 2𝑒𝑛 /(𝑘𝑛 (𝑘𝑛 − 1)), where 𝑘𝑛 is the number of neighbors of 𝑛 and 𝑒𝑛 is the number of connected pairs between all neighbors of 𝑛. The network diameter is the largest distance between any two nodes. If
a network is disconnected, its diameter is the maximum of all diameters of its connected components. A network’s density is the ratio of the number of edges in the network over the total number of possible edges between all pairs of nodes (which is 𝑛(𝑛 − 1)/2, where 𝑛 is the number of vertices, for an undirected graph). The average number of neighbors and the network density are the same for the real and random networks of the same size as it is shown in Table 3. In case of 0.7% and 0.5% real networks, the clustering coefficient is roughly the same and in case of 0.3% the clustering coefficient is somewhat larger. The number of connected components and the diameter of the Jamu networks gradually decrease as the network grows bigger by addition of more nodes and edges.
BioMed Research International 300
300 Number of clusters
Number of clusters
8
200 100
200 100
0
0 0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
Matching score
0.6
0.8
1.0
Matching score
(a) 0.7%
(b) 0.5%
Number of clusters
300 200 100 0 0.0
0.2
0.4
0.6
1.0
0.8
Matching score (c) 0.3%
Figure 4: Distribution of clusters based on matching score. 150 Number of predicted plants
Ratio of number of clusters to total clusters
1.0 0.8 0.6 0.4 0.2 0.0
100
50
0 0.0
0.1
0.2
0.3 0.4 0.5 0.6 0.7 Matching score threshold
0.8
0.9
1.0
0.7% 0.5% 0.3%
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Matching score threshold 0.7% 0.5% 0.3%
(a)
(b)
Figure 5: (a) Success rate and (b) number of predicted plants with respect to matching score thresholds.
Very different values corresponding to clustering coefficient, connected component, and network diameter imply that the Jamu networks are quite different from all 3 types of random networks. The differences between Jamu networks and ER random networks are the largest. Random networks constructed based on other two models are also substantially different from Jamu networks. Based on the fact that the random networks constructed based on all three types of models are different from the Jamu networks, it can be concluded that structure of Jamu networks is reasonably biased and thus might contain certain information about
plant-disease relations. Specially, much higher value corresponding to clustering coefficient indicates that there are clusters in the networks worthy to be investigated. To extract clusters from the Jamu networks (step 2 in Figure 1) we applied DPClusO network clustering algorithm [14] to generate overlapping clusters based on density and periphery tracking. 3.2. Supervised Clustering Based on DPClusO. DPClusO is a general-purpose clustering algorithm and useful for finding overlapping cohesive groups in an undirected simple graph
BioMed Research International
9
Table 4: List of plants assigned to each disease.
Table 4: Continued.
Number Plants name A. Disease: blood and lymph diseases
Hit-miss status
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Tamarindus indica Allium sativum Tinospora tuberculata Piper retrofractum Syzygium aromaticum Bupleurum falcatum Graptophyllum pictum Plantago major Zingiber officinale Cinnamomum burmannii Soya max Kaempferia galanga Curcuma longa Piper nigrum Zingiber aromaticum Phyllanthus urinaria Oryza sativa Myristica fragrans Alstonia scholaris Syzygium polyanthum Andrographis paniculata Sida rhombifolia Cyperus rotundus Sonchus arvensis Curcuma aeruginosa Curcuma xanthorrhiza B. Disease: cancers
Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Miss Hit Hit Hit Hit Hit Hit Hit Hit Miss Hit Miss Hit Miss Hit Hit
1
Catharanthus roseus C. Disease: the digestive system
Hit
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Foeniculum vulgare Glycyrrhiza uralensis Imperata cylindrica Zingiber purpureum Physalis peruviana Punica granatum Echinacea purpurea Zingiber officinale Psidium guajava Baeckea frutescens Amomum compactum Cinnamomum burmannii Melaleuca leucadendra Caesalpinia sappan Parkia roxburghii Rheum tanguticum Kaempferia galanga Coriandrum sativum
Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit
∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗
∗ ∗ ∗ ∗ ∗ ∗ ∗
Number Plants name
Hit-miss status
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
Curcuma longa Zingiber aromaticum Phyllanthus urinaria Myristica fragrans Hydrocotyle asiatica Carica papaya Mentha arvensis Lepiniopsis ternatensis Helicteres isora Andrographis paniculata Symplocos odoratissima Schisandra chinensis Blumea balsamifera Silybum marianum Cinnamomum sintoc Elephantopus scaber Curcuma aeruginosa Kaempferia pandurata Curcuma xanthorrhiza Curcuma mangga Curcuma zedoaria Daucus carota Matricaria chamomilla Cymbopogon nardus D. Disease: female-specific diseases
Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Foeniculum vulgare Imperata cylindrica Tamarindus indica Pluchea indica Piper retrofractum Punica granatum Uncaria rhynchophylla Zingiber officinale Guazuma ulmifolia Nigella sativa Terminalia bellirica Baeckea frutescens Phaseolus radiatus Amomum compactum Sauropus androgynus Usnea misaminensis Cinnamomum burmannii Melaleuca leucadendra Parameria laevigata Parkia roxburghii Piper cubeba Kaempferia galanga
Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit
∗
∗
∗ ∗ ∗ ∗
∗
∗
∗
10
BioMed Research International Table 4: Continued.
Table 4: Continued.
Number Plants name
Hit-miss status
Number Plants name
Hit-miss status
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
Coriandrum sativum Kaempferia angustifolia Curcuma longa Zingiber aromaticum Languas galanga Galla lusitania Quercus lusitanica Hydrocotyle asiatica Areca catechu Lepiniopsis ternatensis Helicteres isora Piper betle Elephantopus scaber Kaempferia pandurata Curcuma xanthorrhiza Sesbania grandiflora E. Disease: the heart and blood vessels
Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit
1 2 3 4 5 6 7 8 9 10
Allium sativum Curcuma longa Morinda citrifolia Homalomena occulta Hydrocotyle asiatica Alstonia scholaris Syzygium polyanthum Andrographis paniculata Apium graveolens Imperata cylindrica F. Disease: male-specific diseases
Hit Hit Hit Hit Hit Hit Miss Hit Miss Hit
1 2 3 4 5 6
Cucurbita pepo Serenoa repens Baeckea frutescens Phaseolus radiatus Curcuma longa Elephantopus scaber G. Disease: muscle and bone
Miss Miss Hit Hit Hit Hit
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit
1 2 3 4 5 6 7 8 9 10 11 12 13
Foeniculum vulgare Clausena anisum-olens Zingiber purpureum Allium sativum Strychnos ligustrina Tinospora tuberculata Piper retrofractum Syzygium aromaticum Cola nitida Ginkgo biloba Panax ginseng Equisetum debile Zingiber officinale
Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit
∗ ∗
∗ ∗ ∗ ∗ ∗ ∗
∗
∗
∗ ∗ ∗
Ganoderma lucidum Nigella sativa Terminalia bellirica Baeckea frutescens Amomum compactum Cinnamomum burmannii Melaleuca leucadendra Parameria laevigata Psophocarpus tetragonolobus Parkia roxburghii Piper cubeba Kaempferia galanga Coriandrum sativum Cola acuminata Coffea arabica Orthosiphon stamineus Curcuma longa Piper nigrum Alpinia galanga Vitex trifolia Zingiber amaricans Zingiber zerumbet Zingiber aromaticum Languas galanga Massoia aromatica Morinda citrifolia Carum copticum Panax pseudoginseng Oryza sativa Myristica fragrans Pandanus amaryllifolius Eurycoma longifolia Hydrocotyle asiatica Areca catechu Mentha arvensis Lepiniopsis ternatensis Pimpinella pruatjan Andrographis paniculata Blumea balsamifera Cymbopogon nardus Sida rhombifolia Cinnamomum sintoc Piper betle Talinum paniculatum Elephantopus scaber Cyperus rotundus Curcuma aeruginosa Kaempferia pandurata
∗ ∗
∗ ∗ ∗
∗
∗ ∗
∗ ∗
∗
∗
BioMed Research International
11
Table 4: Continued.
Table 4: Continued.
Number Plants name
Hit-miss status
Number Plants name
Hit-miss status
62 Curcuma xanthorrhiza 63 Tribulus terrestris 64 Corydalis yanhusuo 65 Pausinystalia yohimbe H. Disease: nutritional and metabolic diseases
Hit Hit Hit Hit
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit
44 45 46 47 48 49 50 51 52 53 54
Piper betle Spirulina Stevia rebaudiana Theae sinensis Sonchus arvensis Curcuma heyneana Curcuma aeruginosa Kaempferia pandurata Curcuma xanthorrhiza Curcuma zedoaria Olea europaea I. Disease respiratory diseases
Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
Foeniculum vulgare Clausena anisum-olens Glycyrrhiza uralensis Zingiber purpureum Piper retrofractum Syzygium aromaticum Gaultheria punctata Panax ginseng Equisetum debile Zingiber officinale Citrus aurantium Nigella sativa Amomum compactum Cinnamomum burmannii Melaleuca leucadendra Parkia roxburghii Cocos nucifera Piper cubeba Kaempferia galanga Coriandrum sativum Curcuma longa Piper nigrum Zingiber aromaticum Languas galanga Mentha piperita Oryza sativa Myristica fragrans Pandanus amaryllifolius Hydrocotyle asiatica Mentha arvensis Lepiniopsis ternatensis Helicteres isora Blumea balsamifera Cymbopogon nardus Piper betle Curcuma xanthorrhiza
Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit
Foeniculum vulgare Glycyrrhiza uralensis Zingiber purpureum Allium sativum Tinospora tuberculata Pandanus conoideus Syzygium aromaticum Punica granatum Zingiber officinale Guazuma ulmifolia Nigella sativa Amomum compactum Cinnamomum burmannii Parameria laevigata Caesalpinia sappan Soya max Cocos nucifera Rheum tanguticum Piper cubeba Murraya paniculata Kaempferia galanga Coffea arabica Orthosiphon stamineus Curcuma longa Piper nigrum Zingiber aromaticum Aloe vera Phaleria papuana Galla lusitania Quercus lusitanica Morinda citrifolia Myristica fragrans Momordica charantia Areca catechu Lepiniopsis ternatensis Alstonia scholaris Hibiscus sabdariffa Laminaria japonica Syzygium polyanthum Andrographis paniculata Sindora sumatrana Cassia angustifolia Woodfordia floribunda
∗
∗
∗ ∗ ∗
∗
∗
∗
∗ ∗
∗
∗ ∗ ∗
∗ ∗ ∗
12
BioMed Research International Table 4: Continued.
70
Hit-miss status
37 38
Salix alba Matricaria chamomilla J. Disease: skin and connective tissue
Hit Miss
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Strychnos ligustrina Merremia mammosa Piper retrofractum Santalum album Zingiber officinale Citrus aurantium Citrus hystrix Cassia siamea Cocos nucifera Trigonella foenum-graecum Orthosiphon stamineus Curcuma longa Vetiveria zizanioides Aloe vera Rosa chinensis Jasminum sambac Phyllanthus urinaria Mentha piperita Oryza sativa Myristica fragrans Hydrocotyle asiatica Lepiniopsis ternatensis Alstonia scholaris Andrographis paniculata Cymbopogon nardus Piper betle Theae sinensis Curcuma heyneana Kaempferia pandurata Curcuma xanthorrhiza Melaleuca leucadendra Matricaria chamomilla K. Disease: the urinary system
Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Miss
1 2 3 4 5 6 7 8 9 10 11 12 13
Foeniculum vulgare Imperata cylindrica Strychnos ligustrina Plantago major Zingiber officinale Cinnamomum burmannii Strobilanthes crispus Kaempferia galanga Orthosiphon stamineus Phyllanthus urinaria Blumea balsamifera Sonchus arvensis Curcuma xanthorrhiza
Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit
∗
∗ ∗
∗ ∗
50 40 30
24
20
14
18
10 0
∗
63
60 Number of plants
Number Plants name
1
2
3
4
5
6
5
6
2
2
1
7
8
9
Number of diseases
Figure 6: Distribution of 135 plants assigned based on 0.7% dataset with respect to the number of diseases they are assigned to.
∗
∗
∗ ∗ ∗ ∗ ∗ ∗ ∗
∗
indicates that plant will not assigned if we use matching score >0.7.
for any type of application. It ensures coverage and performs robustly in case of random addition, removal, and rearrangement of edges in protein-protein interaction (PPI) networks [14]. While applying DPClusO, the parameter values of density and cluster property that we used in this experiment are 0.9 and 0.5, respectively [15]. Table 3 shows the summary of clustering result by DPClusO. Because clusters consisting of two Jamu formulas are trivial clusters, for the next steps we only use clusters each of which consists of 3 or more Jamu formulas. The number of total clusters increases along with the larger dataset, although the threshold correlation between Jamu pairs decreases. We evaluated the clustering result using matching score to determine dominant disease for every cluster (step 3 in Figure 1). Matching score of a cluster is the ratio of the highest number of Jamu associated with the same disease to the total number of Jamu in the cluster. Thus matching score is a measure to indicate how strongly a disease is associated to a cluster. Figure 4 shows the distribution of the clusters with respect to matching score from three datasets. All datasets have the highest frequency of clusters at matching score >0.9 and overall most of the clusters have higher matching score, which means most of the DPClusO generated clusters can be confidently related to a dominant disease. Furthermore the number of clusters with matching score >0.9 is remarkably larger compared to the same in other ranges of matching score in case of the 0.3% dataset (Figure 4(c)). If we compare the ratio of frequency of clusters at matching score >0.9 for every dataset, the 0.3% dataset has the highest ratio with 40.84% (of 453), compared to 29.67% (of 873) and 21.91% (of 1296), in case of 0.5% and 0.7% datasets, respectively. Thus, the most reliable species to disease relations can be predicted at matching score >0.9 corresponding to the clusters generated from 0.3% dataset. Figure 5(a) shows the success rate for all 3 datasets with respect to threshold matching scores. Success rate is defined as the ratio of the number of clusters with matching score larger than the threshold to the total number of clusters. As expected it tends to produce lower success rate if we decrease correlation value to create the datasets. However more clusters are generated and more information can be extracted when we lower the threshold correlation value. The success rate increases rapidly as the matching score decreases
BioMed Research International
13
Table 5: Relation between disease classes in NCBI and efficacy classes reported by Afendi et al. [6]. Class of disease D1 Blood and lymph diseases
Ref. NCBI
Efficacy class E7 Pain/inflammation (PIN)
D2 Cancers
NCBI
D3 The digestive system
NCBI
E7 Pain/inflammation (PIN) E4 Gastrointestinal disorders (GST)
D4 Ear, nose, and throat D5 Diseases of the eye
NCBI NCBI
E7 Pain/inflammation (PIN) E7 Pain/inflammation (PIN) E7 Pain/inflammation (PIN)
D6 Female-specific diseases D7 Glands and hormones
NCBI NCBI
E5 Female reproductive organ problems (FML) E7 Pain/inflammation (PIN)
D8 The heart and blood vessels D9 Diseases of the immune system D10 Male-specific diseases
NCBI NCBI NCBI
E7 Pain/inflammation (PIN) E7 Pain/inflammation (PIN) E6 Musculoskeletal and connective tissue disorders (MSC)
D11 Muscle and bone D12 Neonatal diseases
NCBI NCBI
E6 Musculoskeletal and connective tissue disorders (MSC) E7 Pain/inflammation (PIN)
D13 The nervous system
NCBI
D14 Nutritional and metabolic diseases
NCBI
E7 Pain/inflammation (PIN) E2 Disorders of appetite (DOA) E4 Gastrointestinal disorders (GST)
D15 Respiratory diseases
NCBI
D16 Skin and connective tissue D17 The urinary system D18 Mental and behavioural disorders
NCBI ∗ ∗
from 0.9 to 0.6 and after that the slope of increase of success rate decreases. Therefore in this study we empirically decide 0.6 as the threshold matching score to predict plant-disease relations. 3.3. Assignment of Plants to Disease. By using DPClusO resulting clusters, we assigned plants to classes of disease. Based on a threshold matching score we assigned dominant disease to a cluster. Then we assign a plant to a cluster by way of analyzing the ingredients of the Jamu formulas belonging to that cluster and determining the highest frequency plant, that is, the plant that is used for maximum number Jamu belonging to that cluster (step 4 in Figure 1). Thus we assign a disease and a plant to each cluster having matching score greater than a threshold. Our hypothesis is that the disease and the plant assigned to the same cluster are related. The total number of assigned plants depends on matching score value. Figure 5(b) shows the number of predicted plants that can be assigned to diseases in the context of matching score. With higher matching score value, the number of predicted plants assigned to classes of disease is supposed to remain similar or decrease but the reliability of prediction increases. In Figure 5(b) a sudden change in the number of predicted plants is seen at matching score 0.6 which we consider as empirical threshold in this work. Based on the 0.7% dataset, the largest number of plants (135 plants, Table 4) was assigned to diseases. There are 63 plants assigned to only one class of disease, whereas the other 72 plants are assigned to at least two or more classes of disease (Figure 6).
E8 Respiratory disease (RSP) E7 Pain/inflammation (PIN) E9 Wounds and skin infections (WND) E1 Urinary related problems (URI) E3 Disorders of mood and behavior (DMB)
3.4. Evaluation of the Supervised Clustering Based on DPClusO. We used previously published results [6] as gold standard to evaluate our results. The previous study assigned plants to 9 kinds of efficacy whereas we assigned the plants to 18 disease classes (16 from NCBI and 2 additional classes). For the sake of evaluation we got done a mapping of the 18 disease classes to 9 efficacy classes by a professional doctor, which is shown in Table 5. Table 6 shows the prediction result of plant-disease relations for all 3 datasets, corresponding to clusters with matching score greater than 0.6. Table 6 also shows corresponding efficacy, the number of assigned plants, number of correctly predicted plants, and true positive rates (TPR), respectively. We determined TPR corresponding to a disease/efficacy class by calculating the ratio of the number of correct prediction to the number of all predictions. When a disease corresponds to more than one kind of efficacy, the highest TPR can be considered the TPR for the corresponding disease. For all 3 datasets the TPR corresponding to each disease is roughly 90% or more. The 0.3% dataset consists of Jamu pairs with higher correlation values and based on this dataset 117 plants are assigned to 14 disease classes. The 0.7% dataset contains more Jamu pairs and assigned plants to 11 disease classes, one less disease class compared to 0.5% dataset. The two disease classes covered by 0.3% dataset but not covered by 0.5% and 0.7% datasets are the nervous system (D13) and disease of the immune system (D9). The only disease class covered by 0.3% and 0.5% datasets but not covered by 0.7% dataset is mental and behavioural disorders (D18). The larger dataset network tends to have
14
BioMed Research International Table 6: The prediction result of plant-disease relations using matching score >0.6. 0.7% dataset
Class of disease
Corresponding efficacy
Number of Correct assigned prediction plants
0.5% dataset True positive rate
Number of Correct assigned prediction plants
0.3% dataset True positive rate
Number of Correct assigned prediction plants
True positive rate 0.83
D1
E7
26
22
0.85
24
20
0.83
24
20
D2
E7
1
1
1.00
5
5
1.00
1
1
1.00
28
1.00
25
0.89
0
—
D3
E4 E7
42 0
42
1.00
38
0.90
0
—
33 0
33
1.00
30
0.91
0
—
28
D4
E7
0
D5
E7
0
0
—
0
0
—
0
0
—
D6
E5
38
38
1.00
37
37
1.00
32
32
1.00
D7
E7
0
0
—
0
0
—
0
0
—
D8
E7
10
8
0.80
8
7
0.88
6
5
0.83
D9
E7
0
0
—
0
0
—
1
1
1.00
D10
E6
6
4
0.67
2
0
—
3
1
0.33
D11
E6
65
65
1.00
71
71
1.00
60
60
1.00
D12
E7
0
0
—
0
0
—
0
0
—
D13
E7
0
0
—
0
0
—
5
5
1.00
36
0.80
26
0.74
45
1.00
35
1.00
34
1.00
33
1.00
29
0.88
27
1.00
D14 D15
E2 E4 E7 E8
54 38
44
0.81
54
1.00
37
0.97
45 34
35 33
31
0.82
30
0.88
D16
E9
32
31
0.97
32
32
1.00
D17
E1
13
13
1.00
9
9
1.00
8
8
1.00
0
0 135
—
5
5 129
1.00
4
4 117
1.00
D18 E3 Total assigned plants
lower coverage of disease classes. The number of Jamu pairs, that is, the number of edges in the network, affect the number of DPClusO resulting clusters and number of Jamu formulas per cluster. As a consequence, for the larger dataset networks, the success rate becomes lower and the coverage of disease classes is lower but prediction of more plant-disease relations can be achieved.
4. Conclusions This paper introduces a novel method called supervised clustering for analyzing big biological data by integrating network clustering and selection of clusters based on supervised learning. In the present work we applied the method for data mining of Jamu formulas accumulated in KNApSAcK database. Jamu networks were constructed based on correlation similarities between Jamu formulas and then network clustering algorithm DPClusO was applied to generate high density Jamu modules. For the analysis of the next steps potential clusters were selected by supervised learning. The successful clusters containing several Jamu related to the same disease might be useful for finding main ingredient plant for that disease and the lower matching score value clusters will be associated with varying plants
27
which might be supporting ingredients. By applying the proposed method important plants from Jamu formulas for every classes of disease were determined. The plant to disease relations predicted by proposed network based method were evaluated in the context of previously published results and were found to produce a TPR of 90%. For the larger dataset networks, success rate and the coverage of disease classes become lower but prediction of more plant-disease relations can be achieved.
Conflict of Interests The authors declare that there is no financial interest or conflict of interests regarding the publication of this paper.
Acknowledgments This work was supported by the National Bioscience Database Center in Japan and the Ministry of Education, Culture, Sports, Science, and Technology of Japan (Grant-in-Aid for Scientific Research on Innovation Areas “Biosynthetic Machinery. Deciphering and Regulating the System for Creating Structural Diversity of Bioactivity Metabolites (2007)”).
BioMed Research International
References [1] R. Verporte, H. K. Kim, and Y. H. Choi, “Plants as source of medicines,” in Medicinal and Aromatic Plants, R. J. Boger, L. E. Craker, and D. Lange, Eds., chapter 19, pp. 261–273, 2006. [2] A. Furnharm, “Why do people choose and use complementary therapies?” in Complementary Medicine: An Objective Appraisal, E. Ernst, Ed., pp. 71–88, Butterworth-Heinemann, Oxford, UK, 1996. [3] E. Ernst, “Herbal medicines put into context,” British Medical Journal, vol. 327, no. 7420, pp. 881–882, 2003. [4] F. M. Afendi, T. Okada, M. Yamazaki et al., “KNApSAcK family databases: integrated metabolite—plant species databases for multifaceted plant research,” Plant and Cell Physiology, vol. 53, no. 2, p. e1, 2012. [5] F. M. Afendi, N. Ono, Y. Nakamura et al., “Data mining methods for omics and knowledge of crude medicinal plants toward big data biology,” Computational and Structural Biotechnology Journal, vol. 4, no. 5, Article ID e201301010, 2013. [6] F. M. Afendi, L. K. Darusman, A. Hirai et al., “System biology approach for elucidating the relationship between Indonesian herbal plants and the efficacy of Jamu,” in Proceedings of the 10th IEEE International Conference on Data Mining Workshops (ICDMW ’10), pp. 661–668, Sydney, Australia, December 2010. [7] F. M. Afendi, L. K. Darusman, A. H. Morita et al., “Efficacy of Jamu formulations by PLS modeling,” Current Computer-Aided Drug Design, vol. 9, pp. 46–59, 2013. [8] F. M. Afendi, L. K. Darusman, M. Fukuyama, M. Altaf-UlAmin, and S. Kanaya, “A bootstrapping approach for investigating the consistency of assignment of plants to Jamu efficacy by PLS-DA model,” Malaysian Journal of Mathematical Sciences, vol. 6, no. 2, pp. 147–164, 2012. [9] W. Winterbach, P. V. Mieghem, M. Reinders, H. Wang, and D. de Ridder, “Topology of molecular interaction networks,” BMC Systems Biology, vol. 7, article 90, 2013. [10] C. Bachmaier, U. Brandes, and F. Schreiber, “Biological network,” in Handbook of Graph Drawing and Visualization, pp. 621–651, CRC Press, 2013. [11] X. Chen, M. Chen, and K. Ning, “BNArray: an R package for constructing gene regulatory networks from microarray data by using Bayesian network,” Bioinformatics, vol. 22, no. 23, pp. 2952–2954, 2006. [12] P. Langfelder and S. Horvath, “WGCNA: an R package for weighted correlation network analysis,” BMC Bioinformatics, vol. 9, article 559, 2008. [13] A. Martin, M. E. Ochagavia, L. C. Rabasa, J. Miranda, J. Fernandez-de-Cossio, and R. Bringas, “BisoGenet: a new tool for gene network building, visualization and analysis,” BMC Bioinformatics, vol. 11, article 91, 2010. [14] M. Altaf-Ul-Amin, M. Wada, and S. Kanaya, “Partitioning a PPI network into overlapping modules constrained by highdensity and periphery tracking,” ISRN Biomathematics, vol. 2012, Article ID 726429, 11 pages, 2012. [15] M. Altaf-Ul-Amin, H. Tsuji, K. Kurokawa, H. Asahi, Y. Shinbo, and S. Kanaya, “DPClus: a density-periphery based graph clustering software mainly focused on detection of protein complexes in interaction networks,” Journal of Computer Aided Chemistry, vol. 7, pp. 150–156, 2006. [16] S. K. Kachigan, Multivariate Statistical Analysis: A Conceptual Introduction, Radius Press, New York, NY, USA, 1991.
15 [17] J. L. Rodgers and W. A. Nicewander, “Thirteen ways to look at the correlations coefficient,” The American Statiscian, vol. 42, pp. 59–66, 1995. [18] M. Li, J.-E. Chen, J.-X. Wang, B. Hu, and G. Chen, “Modifying the DPClus algorithm for identifying protein complexes based on new topological structures,” BMC Bioinformatics, vol. 9, article 398, 2008. [19] World Health Organization, “International Classification of Diseases (ICD) 10,” 2010, http://www.who.int/classifications/ icd/en/. [20] National Center for Biotechnology Information, Genes and Disease, NCBI, Bethesda, Md, USA, 1998. [21] P. Erdos and A. Renyi, “On the evolution of random graph,” Publicationes Mathematicae Debrecen, vol. 6, pp. 290–297, 1959. [22] A.-L. Barab´asi and R. Albert, “Emergence of scaling in random networks,” Science, vol. 286, no. 5439, pp. 509–512, 1999. [23] A. V´azquez, “Growing network with local rules: preferential attachment, clustering hierarchy, and degree correlations,” Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, vol. 67, no. 5, Article ID 056104, 15 pages, 2003. [24] Max Planck Institut Informatik, “NetworkAnalyzer,” 2013, http://med.bioinf.mpi-inf.mpg.de/netanalyzer/index.php.