CHAPTER 5 Auto Contractive Maps, H Function and ...

25 downloads 43616 Views 161KB Size Report
After the training phase, the weights matrix of the Auto-CM represents the map of the main connections between the variables. We apply this new approach to ...
42

Artificial Adaptive Systems in Medicine, 42-47

CHAPTER 5 Auto Contractive Maps, H Function and Maximally Regular Graph – Application Cathy Helgason , Massimo Buscema#, Enzo Grossi§

Abstract: In this chapter we describe a new mapping method able to find out connectivity traces among variables thanks to an artificial adaptive system, the Auto-Contractive Map (Auto-CM), able to define the strength of the associations of each variable with all the others in a dataset. After the training phase, the weights matrix of the Auto-CM represents the map of the main connections between the variables. We apply this new approach to explore the possible association of multiple variables within two different clinical studies: the African American Antiplatelet Stroke Study (AAASPS), a large clinical trial comparing the preventive effect of two different anti platelet agents for recurrent stroke, myocardial infarction and death, and a smaller study, the Aspirin Response Study (ARS), wherein the genetic predisposition to aspirin response as measured by inhibition of platelet aggregation measured ex vivo was determined in patients taking aspirin for the prevention of thrombotic vascular occlusion.

Keywords: Artificial Adaptive Systems, Artificial Neural Networks, Connectivity Map, Non-linearity, AutoCM.

Introduction Information from large clinical trials is provided in statistical form based on probability theory. The importance of data mining of large data bases coming from double blind randomized controlled multicenter clinical trials has not been appreciated. Instead, data analysis has taken a reductionist approach limited to probability based statistics. This has especially been the case for studies concerning the diagnosis and treatment of stroke and has meant that the scientific focus has been restricted to the numerical difference in the outcome in two treatment groups when procedures or drugs have been studied. Because the reductionist approach assumes certain relationships amongst variables, it precludes the possibility that hidden associations among variables be discovered. This may have enormous importance when it is recognized that in spite of multiple studies of various agents to treat acute ischemic stroke, none has proven to be statistically significant in beneficial effect for clinical outcome. Stroke is a complex multi-factorial disease and it is unlikely that a single variable approach will lead to progress in the understanding of the disease. To approach this complex situation, variables of clinical interest should be processed through novel non linear data mining algorithms which are able to extract dynamically hidden information. In this study we applied a novel data mining process to explore the possible association of multiple variables within two different clinical studies: the African American Antiplatelet Stroke Study (AAASPS), a large clinical trial comparing the preventive effect of two different anti platelet agents for recurrent stroke, myocardial infarction and death, and a smaller study, the Aspirin Response Study (ARS), wherein the genetic predisposition to aspirin response as measured by inhibition of platelet aggregation measured ex vivo was determined in patients taking aspirin for the prevention of thrombotic vascular occlusion (Gorelick et al., 2003; Momary et al., 2008). Data mining techniques are analytical process designed to search a data base for consistent patterns and/or systematic relationships between variables with the ultimate goal of discovering hidden trends and associations amongst those variables. The more common algorithms of linear projections of variables are the principal component analysis (PCA) and the independent component analysis (ICA); the former requires a Gaussian distribution of data, while the latter does not require any specific distribution. These classical statistical techniques have limited power when the relationships between variables are non linear. Moreover, PCA and ICA are not able to preserve the geometrical structure of the original space. Application of these meth

Cathy Helgason, Department of Neurology, University of Illinois, College of Medicine 912 South Wood Street, Room 855 N, Chicago, IL 60612 (USA). E-mail: [email protected] # Massimo Buscema, Semeion Research Center, Via Sersale 117, 00128 Rome, Italy. E-mail: [email protected] § Enzo Grossi, Bracco Medical Department, Via 25 Aprile 4, 20097 S.Donato Milanese, Milan, Italy. E-mail: [email protected]

Massimo Buscema / Enzo Grossi (Eds.) All rights reserved - © 2009 Bentham Science Publishers Ltd.

Auto Contractive Maps, H Function and Maximally Regular Graph – Application

Artificial Adaptive Systems in Medicine 43

ods may lose important information because establishing the precise association among variables having only the contiguity as a known element is difficult. Another limitation of currently used statistical methods is that mapping is generally based on a specific kind of “distance” among variables (e.g. Euclidean, City block, correlation, etc). This gives rise to a “static” projection of possible associations. In other words, the intrinsic dynamics due to active interactions of variables in living systems of the real world which could be captured by means of Artificial Adaptive Systems is completely lost. A connection scheme able to hypothesize links among variables, i.e. Minimum Spanning Tree (MST) algorithm, as described by Kruskal (1956), could increase the information obtained by mapping of variable connections. The Kruskal MST algorithm of graph theory finds a Minimum Spanning Tree for a connected weighted graph. The MST method finds a subset of the edges that form a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. This function has been recently applied in the medical field, especially in biology and medical imaging. However, the MST algorithm is still rare in clinical medicine (Frimmel et al., 2004; Kim and Jung, 2006). In this paper, we describe a new paradigm for mapping of different and multiple variables that is able to create a semantic connectivity map in which: a) non linear associations are preserved, b) there are explicit connections schemes, and c) the complex dynamics of adaptive interactions is captured. As a result of this mapping, biological hubs of variables are detected by the analysis. Related dependent variables converge to these hubs, which in turn are then considered to be relevant biological variables in the connectivity map. Materials and Methods Data base: The data base for the AAASPS and the smaller ARS studies were provided to the investigators respectively by Dr Philip Gorelick and Dejuran Richardson Ph.D. (AAASPS) and by Larissa Cavallari Ph.D.and Kathryn Momary Ph.D (ARS). Though dealing with the broader subject of antiplatelet agents and their clinical efficacy, the two studies are independent and clinically unrelated both having been completed and statistically analyzed at the time of the Autocontractive Mapping data analysis. Both studies were completed with IRB approval. No identification of patients was communicated. Statistical analysis Our analyses of these two studies had the aim of increasing our understanding of the biological pathway leading to recurrent stroke and death in stroke patients treated with anti-platelet agents. This goal has been achieved through a new mathematical approach able to point out the relative relevance of each variable in representing a major biological hub. This new paradigm of variable processing aims to create a semantic connectivity map in which: a) non-linear associations are preserved, b) connections schemes are explicit, and c) the complex dynamics of adaptive interactions is captured. This method is based on an artificial adaptive system able to define the association strength of each variable with all the others in any dataset, named the Auto-Contractive Map (Auto-CM). For the data processing only Semeion Research Software packages and MatLab were used: 1. 2. 3. 4.

to train Auto-CM: Buscema (2000-2008), Buscema (2002) and Massini (2007); to calculate the Maximally Regular Graph: Buscema (2006-2008); to visualize MST and MRG: Massini (2006-2008). to process PCA: MathLab version 7.1, 2005 from the math of Hotelling (1933).

Results The mapping in Figure 1 represents that of the AAASPS data variables. A Minimum Spanning Tree, the most economic way by which to represent the distance

44

Artificial Adaptive Systems in Medicine

Helgason et al.

Fig. 1. Mapping of multiple variables in the AAASPS study.

Fig. 2. Principal Component Analysis 1-2 of AAASPS study.

between variables, was created for the data set. Connectivity, clustering strength, degree of protection, topological entropy, Delta Hubbness, and Maximally Regular Graph were calculated. Strong links were found between HTN ( hypertension) and stroke,HTN and small vessel disease, HTN and death and HTN and diabetes mellitus (DM). HTN related in an equal manner to aspirin and ticlopidine, but ticlopidine had a greater relevance in terms of its interaction with other variables. Ticlopine related more closely to small vessel disease.

Auto Contractive Maps, H Function and Maximally Regular Graph – Application

Artificial Adaptive Systems in Medicine 45

Stroke had a strong connection to male and HTN to female gender. Both ticlopidine and aspirin had a strong connection to HTN. Death itself related to age over 75 years, income under 5000-9999 U.S dollars, education under 8 years, body mass index up to 20, myocardial infarction and thoracic or abdominal aortic surgery. These relationships were not appreciated by the Principal Component Analysis. This means that clinically plausible interactions between variables collected in those patients suffering end point events in the AAASPS study were found using a dynamic non linear mapping method of AutoContractive Maps. These connections were not discovered by PCA.

Fig. 3. Auto-Contractive Mapping of Aspirin Response Study

Figures 3 and 4 represent the new mapping information for the Aspirin Response Study. The AutoCM here again reveals information missed by PCA. In this mapping clinically useful information is found in that complete inhibition of platelet aggregation, or complete response to aspirin is related directly to the PTGS1-P17-(PL) genotype where as the PTGS1-P17L-(PP) genetype related directly to partial response. The ITGB3 (A1A1) and PTGS1-A-707G-(AA) genotypes were equally related to partial response to aspirin, but the former was directly related and strongly so to the complete response to inhibition of platelet aggregation caused by collagen. A cluster of interrelationships between normal response, i.e. no response to aspirin with regards to inhibition of platelet aggregation to collagen, epinephrine and ADP, and diabetes. In addition, the PTGS1-A-707G (AG) genotype is related to being African American and stroke. Importantly this latter relation was not appreciated by PCA. Discussion The implication of the findings of this study are potentially profound for the scientific stroke community. Firstly, the limitation of data analysis to the reductionist approach of probability based statistics may have led to missed information regarding the results of clinical trials, including the efficacy of drugs and other treatments. Second, the role of anti platelet agents in recurrent stroke prevention in African Americans and the cause of aspirin resistance has been seen through a different medium and conclusions different from that of traditional data analysis have been found. Thirdly, the new method of data analysis used the this study, Auto-Contractive Maps has found information that was previously overlooked through traditional statistical analysis and in addition has a different interpretation than that previously concluded from these two studies. These facts lead one to conclude that the data from previous clinical and perhaps basic science stud-

46

Artificial Adaptive Systems in Medicine

Helgason et al.

ies in the field of stroke should be revisited for the purpose of data mining and that the analysis of future studies be not limited to the past reductionist approach of probability based statistics, but open to new methodologies.

Fig. 4. Principal Component Analysis 1-2 for Aspirin Response Study

Final considerations on Auto-CM system lead one to conclude that this approach highlights affinities among variables as related to their dynamical interaction rather than to their simple contingent spatial position. This method allows for the description of a context typical of living systems where a continuous time dependent complex change in the variable value is present. After the training phase, the matrix of the AutoCM represents the warped landscape of the dataset. We apply a simple filter (Minimum Spanning Tree by Kruskal) to the matrix of Auto-CM system to show the map of main connections between and among variables and the principal hubs of the system. These hubs can also be defined as variables with the maximum amount of connections in the map. Consideration of information gained through this new approach to detection of variable associations, suggests that hypertension, small vessel disease, diabetes, and genotype may be related to response to anti platelet agents, specifically aspirin and stroke in the African American patient. References [1] [2] [3] [4]

[5]

Buscema M., Constraints Satisfaction Networks. Software for programming Non Linear Auto-Associative Networks, Semeion Software #14, ver 9.0, Rome, 2000-2008. Buscema M., Contractive Maps. Software for programming Auto-Contractive Maps, Semeion Software #15, ver 1.0, Rome, 2002. Buscema M., MST. Software for programming trees from artificial networks weights matrix, Semeion Software #38, ver 5.1, Rome, 2006-2008. Buscema M., and Grossi E., A novel adapting mapping method for emergent properties discovery in data bases: experience in medical field. Proceeding International Conference on Systems, Man and Cybernetics (SMC 2007), 2007, IEEE. Montreal, Canada. Buscema M., and Grossi E., The semantic connectivity map: an adapting self-organising knowledge discovery method in data bases. Experience in gastro-oesophageal reflux disease. International Journal of Data Mining and Bioinformatics (IJDMB), Vol. 2, N. 4, pp. 362-404, 2008

Auto Contractive Maps, H Function and Maximally Regular Graph – Application

[6] [7] [8] [9]

[10] [11] [12] [13] [14] [15]

Artificial Adaptive Systems in Medicine 47

Buscema M., and Sacco P.L., Squashing Theory and Auto-Contractive Map Network. Semeion Technical Paper #32, Rome, 2008. Comon P., Independent component analysis – a new concept? Signal Processing, 36, pp. 287-314, 1994. Frimmel H., Nappi J., and Yoshida H., Fast and robust computation of colon centerline in CT colonography. Med Phys, 31(11), pp.3046-3056, 2004 Nov. Gorelick P., Richardson D., Kelly M., Ruland S., Hung E. et al for the African American Antiplatelet Stroke Prevention Study (AAASS) Investigators, Aspirin and Ticlopidine for Prevention of Recurrent Stroke in Black Patients. A Randomized Trial. JAMA 289, pp.2947-2957, 2003. Hotelling H., Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 1933. Kruskal J.B., On the shortest spanning sub tree and the traveling salesman problem. Proceedings of the American Mathematical Society 7, pp. 48–50, 1956. Lee U. Kim, and S. Jung K.Y., Classification of epilepsy types through global network analysis of scalp electroencephalograms. Phys Rev E Stat Nonlin Soft Matter Phys, 2006 Apr. Massini G., Tree Visualizer. Software to draw and manipulate tree graph, Semeion Software #40, ver 1.0, Rome, 2006-2008. Massini G., Semantic Connection Map, Software to train Auto-CM system and manipulate tree graph. Semeion Software #46, ver. 1.0, Rome, 2007. Momary K.M., Shapiro N.L., Brace L.D., Shord S.S., Grossi E., Viana M.A., Helgason C.M., and Cavallari L.H., (2008), Influence of cyclooxygenase-1 genotype on ex vivo aspirin response in patients at risk for stroke. Cerebrovasc Dis. 2009;27(6):585-593. Epub 2009 Apr 24.

Suggest Documents