............................................................................................................................................................................................................
Mobile Classification in Microarray Experiments I. M. Dozmorov, M. Centola, N. Knowlton & Y. Tang
Abstract Department of Arthritis and Immunology, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA Received 26 November 2004; Accepted in revised form 15 January 2005 Correspondence to: Dr I. M. Dozmorov, Oklahoma Medical Research Foundation, Department of Arthritis and Immunology, 825 NE. 13th Street, Oklahoma City, OK 73104, USA. E-mail:
[email protected]
In a homogeneous group of samples, there are genes whose expression variations can be attributed to factors other than experimental errors. These factors can include natural biological oscillations or metabolic processes. These genes are rarely classified as ‘interesting’ based on their variability profile. However, their dynamic behaviour can tease out important clues about naturally occurring biological processes in the organism under study and can be used for group classification. Dynamical discriminate function analysis was developed on the concept that stable classification parameters (roots) can be derived from highly variable gene-expression data. Stability of these combinations implies a strongly compensatory relationship that may divulge functional interconnections.
Introduction Differentially expressed genes can be selected for discrimination and/or classification purposes; however, even statistically significant differences in gene expression between groups do not guarantee the differentiation of several groups due to the overlapping of expression values between sample groups for a given group of genes, though it could be a rare event. To increase the discriminatory power, different subsets of genes need to be combined to provide clear distinction among sample groups. This type of analysis can be performed by several statistical approaches such as discriminate function analysis (DFA). DFA creates a linear combination of variables to maximize spatial (Euclidean) separation between groups of samples, e.g. ‘control’ versus ‘diseased’. Individual discriminatory capability of genes critically depends on their stability. In microarray experiments, in addition to the majority of stable genes, there also exists a group exerting extremely high variability, much higher than what would be expected by chance [1]. Calling these genes ‘hypervariable’ (HV genes), we have suggested and presented evidence that this extreme variability may result from their involvement in certain dynamical biological processes not synchronized in a homogenous group of samples [2]. The biologic significance of these HV genes comes from their coexpression, exhibiting as components of a common cluster brought together by their highly correlated profiles. The similarities displayed by the HV genes are statistically improbable to occur by chance, even in an array experiment with more than 10,000 expressed genes (P < 0.0001). In this work, we demonstrate a new approach to classify groups using microarray experimental data. This
84
classification is based on HV genes, whose variability is transformed into a highly conservative discriminatory parameter, the root. The root is a linear combination of expression values of HV genes. We further postulate that the stability of the linear combinations of gene expressions for each root could be a result of strong functional interconnections between them. Variation of each single gene in the roots is compensated by the changes in expressions of some or all other members, keeping the roots stable. We demonstrate that these functional interconnections between hypervariable discriminatory genes can be presented in the form of functional networks utilizing a partial correlation procedure.
Materials and methods Sample population. For the biological model, a sample population of inflammatory bowel disease (IBD) was used. IBD is a general name applied to diseases that cause inflammation of the intestine and colon. Ulcerative colitis (UC), a form of IBD, is centered primarily in the large intestine, while Crohn’s disease (CD), another form of IBD, occurs principally in the small intestine. For this project, eight normal control (NC), nine UC and eight CD patients were included. Four types of tissue samples, including neutrophils, peripheral blood mononuclear cells (PBMC), affected and unaffected colonic biopsies, were collected from all patients. Minimum information about microarray experiments (a) Experimental design: Four types of tissues, neutrophils, PBMC, affected and unaffected biopsies, from three different sample populations were collected to study
# 2005 Blackwell Publishing Ltd. Scandinavian Journal of Immunology 62 (Suppl. 1), 84–91
I. M. Dozmorov et al. Mobile Classification in Microarray Experiments 85 ............................................................................................................................................................................................................
the different mechanisms involved in the pathological development of various IBD subtypes and to identify a potential biological model for treatment design. (b) Array design: Human oligonucleotide microarray (Qiagen, Hilden, Germany) contains 21,329 human genes based on RefSeq and Unigene annotations. It also includes 12 housekeeping genes and 12 negative controls. Website link for gene list http://omrf.ouhsc.edu/~frank/human-library.txt. (c) Source of samples, extract preparation and labelling: For array experiments, the following samples were used for each tissue: . . . .
PBMC: eight NC; seven UC and five CD. Neutrophils: eight NC; four UC and seven CD. Biopsy of unaffected area: eight UC and six CD. Biopsy of affected area: eight UC and seven CD.
PBMC were isolated by Histopaq 1077 (Sigma, Aldrich, St. Louis, MO, USA) density centrifugation and RNA for each tissue was extracted by Qiagen columns. 5 mg of RNA was reverse-transcribed to cDNA using oligo dT12-18 primer (Invitrogen, Carlsbad, CA, USA) and PowerScript reverse transcriptase (Clontech, Palo Alto, CA, USA) and labelled with Cy3-dUTP (Amersham, Buckinghamshire, UK). Labelled cDNA was purified by vacuum filtration using Montage PCR 96-well plates following the manufacturer’s directions (Millipore, Billerica, MA, USA). (d) Hybridization procedure and parameters: cDNA was resuspended in 37.5 ml water and diluted in 100 ml ChipHyb (Ventana Medical Systems, Tueson, AZ, USA), 8 ml 50X Denhardt’s solution, 20 ml deionized formamide and 10.5 ml of a solution containing 10 mg human Cot-1 DNA, 8 mg polyd(A)4060 and 4 mg yeast tRNA. Samples were heat denatured at 95 C and cooled briefly to 58 C before being applied to the microarrays in a Ventana Discovery hybridization chamber. Hybridization was carried out at 65 C overnight, followed by high stringency washes. (e) Measurement data and specifications: Microarrays were scanned using a 48-position carousel, dual-colour laser scanner at 5 mm resolution and Cy3 fluorescent intensity is measured using Imagene software (Biodiscovery, Marina del Rey, CA, USA). Data analysis procedure. (a) Selection of HV genes: The key concept of our methods of data normalization and analysis is based on the use of internal standards to eliminate some aspects of system behaviour such as technical variability and baseline biological fluctuation, thus enabling an increase in statistical power. Normalization for differences among experiments was conducted using a procedure described in detail elsewhere (see Ref. [3] for method description and Ref. [4] for toolbox). First, an internal standard is constructed by identifying a set of normally distributed genes having expressions indistinguishable from technological noise [5]. This background cohort enables the statistical selection of genes
above or below background with higher power than a simple paired Student’s t-test. Second, genes expressed above background and belong to the equally expressed cohort are our second internal standard and are used to adjust biased expression levels across different arrays by robust regression. After normalization and adjustment of the geneexpression profiles to a common log-transformed standard, and gene expressions are used to calculate residuals – deviation of expression values from the averaged control profile. The log-transformed residuals are independent of expression level and approximate a normal distribution based on the Kolmogorov–Smirnov criterion. From these residuals, a ‘reference group’ is constructed by identifying a group of genes that are expressed above background with inherently low variability as determined by an F-test. The ‘reference group’ is the third internal standard and is composed of a set of biologically, stable expressed genes (BSG). This group of BSG provides a baseline measurement of technical variation and biologic fluctuation. Genes whose expression level varied significantly (P < 1/n) within the sample population (when compared to the ‘reference group’) were denoted HV genes. The selection threshold for the HV genes is at P < 1/n, where n is the number of genes expressed above background, usually more than half of all genes on an array. It is important to note that HV genes exist even within a homogenous sample population such as the control group. The variability of the HV genes statistically exceeds that of the BSG and therefore may reflect some nonsynchronized gene dynamics. Hypervariations due to experimental errors such as ‘dirty spots’ were filtered from this analysis statistically by comparing the variability of the residuals for each gene in replicated group by excluding one sample at a time. A statistical decrease in variability after excluding one replicate provides evidence of a possible error in that particular replicate. Such genes were excluded from the family of HV genes. The expressions of HV genes in a sample population can be considered as snapshots of some dynamical biological process in which they participate. Correlation of these HV gene expressions may reflect some functional interconnections in the aforementioned dynamical processes. In this article, the HV genes are used to create markers for DFA. (b) Discriminant function analysis: DFA is used to determine variables that discriminate between two or more naturally occurring groups. DFA function used in our analysis is supplied with Statistica (StatSoft, Tulsa, OK, USA). HV genes were used for selection of a set of genes to discriminate between NC and two different subgroups of patients. We used a variant of DFA named the forward stepwise analysis that has been successfully applied to clinical classification [1] and experimental microarray data [6–8].
# 2005 Blackwell Publishing Ltd. Scandinavian Journal of Immunology 62 (Suppl. 1), 84–91
86 Mobile Classification in Microarray Experiments I. M. Dozmorov et al. ............................................................................................................................................................................................................ (c) Networking of the genes based on partial correlation coefficients: Functional interconnections of genes of common characteristics are revealed through networking. Correlation coefficients can be used as a characteristic of gene coexpression within clusters; however, not every pair of highly correlated genes is functionally interconnected. They could be mediated by a third gene (such as a common regulator) that confounds the pair’s relationship. Functional interconnections are well characterized by using a partial correlation coefficient () since the influence of the ‘external’ gene on the correlated genes is removed [6, 9]. This analysis was performed in a pairwise manner for all genes shown to be related for their common characters. A Monte-Carlo simulation study was used to define the threshold above which significant partial correlations appear. For simulation, expression profiles were modelled with random values while maintaining the average and SD for the sample population of each gene. The threshold obtained was the maximal partial correlation coefficient appearing between these simulated profiles. Genes whose correlation did not fall below r, after excluding the influence of a third gene, were considered to be truly connected [10].
Results HV gene selection
After data normalization, approximately 9500 genes for each sample were expressed 3 SD above the background among a total of 20,000 genes. These genes were used to remove experimental bias between samples through robust regression analysis. HV gene was first selected in the IBD patient and control samples using data from PBMC. To determine the HV genes, a stringent threshold of P < 0.0001 was used. This P-value is derived from the expression 1/n, where n is approximately 10,000 genes expressed distinctively from background. At this threshold, the likelihood of selections to appear just by chance is very slim. Level of variability of HV genes as well as the proportion of genes being hypervariable exceeds significantly the statistically expected ranges, indicating that HV genes could only be derived from biological reason (Fig. 1A). Further evidence for the biological significance of the hypervariability comes from the fact that some of the HV genes demonstrated very similar coexpression in PBMC data in all three sampling populations (Fig. 1B). Here only six genes are shown: TMSB4X, ACTB, S100A8, HLA-G, MAPK3 and CD74.
Box and Whisker Plot
* * *
Figure 1 (A) Among the homogeneous genes of low variability, there are some genes with extremely high variations – hypervariability (HV) genes (marked with asterisk). This figure presents a Box and Whisker plot of gene-expression residuals (fragment) calculated in log-transformed data as deviations from averaged gene-expression profile. Data shown were obtained in x-replicated experiment with samples from the NC group. The HV genes have statistically significantly higher variations than the majority stable genes by F-test (P < 0.0001). (B) Coordinated variations of the group of HV genes. Geneexpression profiles in these 18 samples are presented in normalized form with mean ¼ 0 and SD ¼ 1. Lines connect expressions of the same genes in different samples. CD, Crohn’s disease; NC, normal control; UC, ulcerative colitis.
* *
*
*
*
**
A 3 2
NC
UC
CD
1 0 –1 –2 1 2 3 4 5 6 7 8
9 10 11 12 9 10 11 Samples
16 17 18 19 20
B
# 2005 Blackwell Publishing Ltd. Scandinavian Journal of Immunology 62 (Suppl. 1), 84–91
I. M. Dozmorov et al. Mobile Classification in Microarray Experiments 87 ............................................................................................................................................................................................................ Table 1 Hypervariable (HV) genes having higher discriminatory capabilities in discriminant function analysis (DFA) GenBank
Gene name
Description
RI (%)
NM_003467 U07802
CXCR4 ZFP36L2
100 71
NM_000660 M90391
TGF-1 IL-16
NM_001953 NM_004355 NM_007314
ECGF1 CD74 ABL2
NM_002985 NM_052942 NM_033296 NM_002111
CCL5 GBP5 PGR1 HD
NM_006498 BC000687 AK001703 X60188
LGALS2 TRAM AMMECR1 MAPK3
NM_002727 NM_021257 NM_016553 BG675206 NM_002619 AK055976 NM_004202 NM_002127
PRG1 NGB NUP62 PRG5 PF4 TMSB4X TMSB4Y HLA-G
NM_002778 NM_002923 NM_000239 NM_005252
PSAP RGS2 LYZ FOS
Chemokine (C-X-C motif), receptor 4 (fusin). Overexpressed in IBD [11] Zinc finger protein 36, C3H type-like 2. Plays a crucial role in proliferation and activation of smooth muscle cells in the case of endothelial cell damage [12] Transforming growth factor-b1. Different production in CDvsUC [13] Interleukin-16 (lymphocyte chemoattractant factor). Increased expression of IL-16 in inflammatory bowel disease [14] Endothelial cell growth factor 1 (platelet-derived). Critical role in IBD [15] CD74 antigen. Different expression in UCvsCD [16] V-abl Abelson murine leukaemia viral oncogene homolog 2. Implicated in processes of cell differentiation, cell division, cell adhesion and stress response Small inducible cytokine A5 (RANTES). Level significantly elevated in IBD [17] Guanylate-binding protein 5. Interferon induced. Potent antiangiogenic activity in endothelial cells T-cell-activation protein. Defective suppressor/regulatory T-cell activation in IBD [18] Huntingtin (Huntington disease) patients may also experience bowel or bladder problems such as difficulty in passing urine, incontinence and constipation [19] Lectin, galactoside-binding, soluble, 2 (galectin 2). Might play a key role in the inflammation process Translocating chain-associating membrane protein. Upregulated in inflammation [20] Alport syndrome. Blood in urine is a frequent symptom of this disease Mitogen-activated protein kinase 3. Inflammatory reactions with participation of NF-kB, which activated in IBD Proteoglycan 1, secretory granule. Reduced expression in IBD [21] Neuroglobin. Upregulated in inflammation [22] Nucleoporin 62 kDa P53-responsive gene 5 Platelet factor 4. Abnormalities of platelet number and function in IBD [15] Thymosin, beta 4, X chromosome. Elevated level in serum in IBD [23] Thymosin, beta 4, Y chromosome. Elevated level in serum in IBD [23] HLA-G histocompatibility antigen, class I, G. Potential distinguisher between ulcerative colitis and Crohn’s disease [24] Prosaposin (Gaucher disease and metachromatic leukodystrophy) Regulator of G-protein signalling 2, 24 kDa [25] Lysozyme (renal amyloidosis). Increased expression in UC [26] V-fos FBJ murine osteosarcoma viral oncogene homolog. Overexpression in inflammation. Stimulate fibroblast invasion independently of the presence of growth factors [27]
NM_000691 NM_001619 NM_002704
ALDH3A1 ADRBK1 PPBP
NM_016388 NM_021128 AF119897 NM_002620 NM_002964
TRIM POLR2L FTL PF4V1 S100A8
Aldehyde dehydrogenase 3 family, member A1. Plays a role in inflammatory processes [28, 29] Adrenergic, beta, receptor kinase 1. Increased expression in chronic bladder inflammation [30] Proplatelet basic protein (includes platelet basic protein, beta-thromboglobulin, connective tissue). Abnormalities of platelet number and function in IBD [15] T-cell receptor interacting molecule. Defective suppressor/regulatory T-cell activation in IBD [18] Polymerase (RNA) II (DNA-directed) polypeptide L (7.6 kDa). Role in inflammation [31] Ferritin, light polypeptide. Iron deficiency in IBD [32] Platelet factor 4 variant 1. Abnormalities of platelet number and function in IBD [15] S100 calcium-binding protein A8 (calgranulin A). A clinical marker of inflammation throughout the intestinal tract [33]
62 57 52 52 48 43 38 38 38 33 33 33 29 29 29 29 29 24 19 19 19 19 14 14 10
5 5 5 5 5 5 5 5
IBD, inflammatory bowel disease; RI, reproducibility index – the number of times this particular gene was included in the root when one sample was removed from the DFA analysis. The 10 genes highlighted in bold were used in the final discrimination of three groups.
All genes demonstrate highly correlated behaviour (P > 0.9) in all three groups of PBMC samples; additionally, in the patients for whom the data for other tissues are available, these genes have similar behaviour in other tissues, such as neutrophils and colon biopsies (data not shown). Such remarkable correlation and reproducibility of behaviour for the HV genes indicate their involvement in some common biological process. From HV analysis based on PBMC data, 192 genes were shown to be hypervariable in all three sample groups
– NC, UC and CD. From these 192 HV genes, 92 genes have functional annotation information; thus only these were used in subsequent DFA analysis.
DFA-based selection of genes of reproducible discriminatory potential
Normalized expression values for 92 HV genes were converted to their antilog to downplay the influence of weak
# 2005 Blackwell Publishing Ltd. Scandinavian Journal of Immunology 62 (Suppl. 1), 84–91
88 Mobile Classification in Microarray Experiments I. M. Dozmorov et al. ............................................................................................................................................................................................................ 4 3 2
Root 2
1 0 –1 –2 N UC CD
–3 –4 –10
–8
–6
–4
–2
0
2
4
6
8
10
Root 1
Figure 2 Discriminant function analysis of three groups of samples. NC, normal controls; CD, Crohn’s disease; UC, ulcerative colitis patients. Roots, the linear combination of 10 genes highlighted in bold in Table 1, are used as coordinates in this presentation.
expressions that are highly negative in log-transformed form. After the data were fed as input into the programme, the user instruction was followed exactly to let the programme pick a list of genes for discrimination. This analysis was carried out many times by removing one sample each time until all possibilities were exhausted. The discriminatory power of each gene was counted as the number of times it was included in the roots combination. The genes were sorted by reproducibility index (RI) – the number of times a given gene was discriminatory – and 35 genes having a RI greater than 5% are listed in Table 1. Their functional relevance to IBD pathology was obtained through literature search and is also listed in this table. Among the top discriminators, chemokine (C-X-C motif)receptor 4 (fusin) has the highest discriminatory power of 100%. Genes ZFP36L2, TGF-1, IL-16, ECGF1 and CD74 have discriminatory power between 50 and 75%. Among the 92 HV genes, 19 genes have RI >25%. Among them, 10 genes (highlighted in bold in Table 1) were used to generate the discrimination of three groups in Fig. 2. Plotting the DFA roots for each sample as coordinates in a two-dimensional presentation produces the three distinct clusters seen in Fig. 2.
Networking of the HV genes based on partial correlations for the characterization of functional interconnections
Selections of variable genes for discriminatory linear combination indicate that these genes could be functionally related and act to compensate each other. Their relationships were explored by means of networking using partial correlation. Networking was performed on the 35 genes presented in Table 1 by searching for statistically reproducible functional interconnections through partial correlation. Statistical reproducibility was defined by the presence of
connections of nodes as estimated by the correlation above the threshold which was generated by the Monte-Carlo simulation. Among the simulation data, the maximum correlation was ¼ 0.7 and was used as the determination threshold. To increase the accuracy of the interconnection, it was necessary to use a larger sample size for partial correlation coefficient calculations. We accomplished this by combining data for PBMC, neutrophils and biopsies, a total of 25 samples, for the patient groups. A number of genes remained correlated even after merging data from different sample types strongly indicates a systematic specific functional interconnection between some genes. From the 35 genes, 27 genes establish full or partial reproducible interconnections between NC (Fig. 3A), UC (Fig. 3B) and CD (Fig. 3C). Genes are presented in the form of several tightly interconnected clusters. Connections in common between all the three sample groups are between MAPK3, TMSB4, CD74, HLA-G, S100A8, PRG1 and RGS2. Some interconnections of these clusters are similar between UC and CD where missing from the network generated for the NC group. One significant difference seen in NC is the introduction of two negative relationships between POR2L and IL-16 and between LGAS2 and PF4. The similarities between UC and CD start from inflammation genes MAPK3 (Erk), FOS and LYZ, spreading to the chain of platelet related genes and ferritin, then to the closely interconnected inflammatory cytokines, lymphokines, as well as T-cell-activation genes. Another similar cluster between CD and UC are the connections between ZFP36L2, ADRBK1, CXCR4, IL-16, TFGB1, ECGF1 and PGR1. Differences were also observed between UC and CD from the point where RANTES (CCL5) joins in this net, and in disconnection of the ALDH3A1-LGALS2-POLR2L cluster from the rest of the net in case of UC.
Discussion The main idea of this work – stabile discrimination of groups in microarray experiments – could be achieved using variably expressed genes. The profound functional sense for HV genes is exclusively demonstrated here and previously [2] by statistical evidences, families of such genes show highly similar variability profiles. We consider them as snapshots of the dynamical process not synchronized among groups. Their interconnection was further shown by their functional gene networks. The analysis idea is illustrated with data generated for the IBD – UC and CD. The first surprise comes from genes hypervariably expressed in all three groups. Although it is only a subgroup of the 192 genes, almost all the selected 35 genes are related directly with IBD pathologies or with inflammation which is an essential component of IBD pathology. It was reported that increased expression of HIV-1 coreceptors CCR5 and CXCR4, beta-chemokine
# 2005 Blackwell Publishing Ltd. Scandinavian Journal of Immunology 62 (Suppl. 1), 84–91
I. M. Dozmorov et al. Mobile Classification in Microarray Experiments 89 ............................................................................................................................................................................................................
A
B
Figure 3 Network of 27 of the 35 genes presented in Table 1 was obtained by partial correlation procedure in each of the three sample groups: (A) normal controls (NC), (B) ulcerative colitis patients (UC) and (C) Crohn’s disease (CD). Only reproducible, nonrandom connections for 27 nodes are shown here, blue dashed line in normal control network indicates strong negative correlation. The purple lines indicate positive interconnections.
RANTES and macrophage inflammatory protein (MIP)1a and MIP-1b is similarly observed in patients with IBD and in HIV-1-infected patients, suggesting increased inflammation in the colon of HIV-1-infected patients [11]. Chemokines, together with key cytokines that promote their release, are elevated in mucosal tissues from patients with IBD. Several other cytokines are presented in Table 1, including CCL5, IL-16 and TGF-b. CCL5 was also reported to be overexpressed in IBD patients [17]; IL16 was seen to have twofold expression in colonic mucosa of IBD patients [14], which could lead to increased secretion of other proinflammatory cytokines in IBD; TGF-b1 is an inhibitory cytokine recognized as a key regulator of immunological homeostasis and inflammatory responses [13]. Some genes playing regulatory (MAPK3 and FOS) or downstream housekeeping (ALDH3A1, NGB, TRAM and POLP2L) roles in inflammatory processes are also presented Table 1.
C IBD patients have inappropriate T-cell responses to antigenic components of their own intestinal microflora, suggesting the presence of a disorder in the normal mucosal immune mechanism [13]. Several genes having important roles in immunoreactivity are presented in Table 1. Among them, T-cell-activation protein (PGR1) with CD4 T lymphocytes [18] synergistically promotes infiltration of inflamed mucosal regions. Both CD and UC are associated with abnormalities of platelet number and function, and platelets can actively contribute to mucosal inflammation in IBD diseases. In the peripheral circulation, the state of platelet activation is typically increased, and IBD-involved mucosa frequently contains platelet aggregates within mucosal microthrombin [15]. Several genes in Table 1 are related to the platelet functions: proplatelet basic protein, platelet factor 4 in two variants and finally platelet-derived endothelial growth factor [15].
# 2005 Blackwell Publishing Ltd. Scandinavian Journal of Immunology 62 (Suppl. 1), 84–91
90 Mobile Classification in Microarray Experiments I. M. Dozmorov et al. ............................................................................................................................................................................................................
A
B
Figure 4 Mobile idea of the compensatory equilibrium of variable discriminatory parameters. When one of these parameters (red butterfly) is occasionally at the semivertical position, the rest compensate their movement differently – discrimination by the differences in compensatory reactions.
It is worth to point out from Table 1 that one gene, S100A8 or calgranulin A, has been used as a clinical marker in screening test of IBD patients [33]. As a downstream effect of IBD, iron deficiency is common in IBD and it explains the presence of ferritin in Table 1 [32]. In addition to genes involved in general pathology of IBD, there is another group of genes that are specific to the subtype of IBD disease. These genes are able to potentially distinguish between UC and CD. Mice with targeted deletion of the G protein G (alpha)i2 develop an IBD closely resembling UC [25]. The upstream regulator of G-protein, RGS2, is presented among the selected HV genes. Other genes whose differential expression was associated with UC or CD and that are also presented in Table 1 are listed as following: lysozyme (LYZ) with increased expression in UC [26]. Strong evidence showed that the presence of HLA-G on the surface of intestinal epithelial cell in patients with UC lends support to the notion that this molecule may serve as a regulator of mucosal immune responses to antigens of undefined origin. Thus, different pattern of HLA-G expression may help to differentiate between the immunopathogenesis of CD and UC [24]; the key factor of immunoregulation, TGF-b1, was shown to have significant difference in production in CD and in UC, probably as a consequence of the different T-helper polarization [13]. The idea to use variable immunological parameters for diagnostics of diseases of similar pathological phenotypes was first presented in early publication by Petrov et al. [34]. They proposed the mechanical model for such classification in the form of a mobile (Fig. 4). Here, we incorporated the same idea into the classification of IBD disease using microarray data. As clearly seen in Fig. 4, general stability of mechanical mobile construction is achieved as a result of the strong interconnections between elements of the system. The similar position of any of these butterflies in two different
mobiles could be achieved at the expense of very different compensatory reactions of other members of the mobiles. We proposed that the obtained stability of the linear combinations of HV genes – the roots in DFA – not only gives an instrument for group discrimination but also reveals dynamical compensator interrelationships between these genes. As far as genes themselves are functionally relevant to given pathology, it is possible to expect that differences in their functional interrelations could be extremely important for the understanding of the pathological alterations of the dynamical biological processes. The networks of these genes were obtained exclusively from gene transcription data based on partial correlation. The next exciting surprise appeared from comparison of these networks: part of the connections is very similar among all three groups – NC and two disease groups. Also, several tightly clustered groups of genes are especially similar between two patient groups, UC and CD only. Differences of the gene networks are also seen between UC and CD. It is worth noting that there is a couple of negative control interconnections seen in the NC indicating that some inhibitory regulations in the normal process are disrupted in the patients. From this analysis, the systematic differences seen between UC and CD and also between IBD and NC could be used for the better understanding of the mechanisms of the IBD pathology and for the development of new therapeutic strategies for their cure. Data of the DFA and networking concentrate on the new dynamical differences in gene expressions and present new opportunity for discrimination of the similar pathologies using tightly functionally interconnected variable genes.
Acknowledgments The author ID thanks Ivan Lefkovits for favourable and fruitful discussion of the basic idea of using internal standards for powerful microarray data analysis. The author also thanks Prof. K. A. Lebedev for profound inoculation into my mind of this idea about the use of dynamical parameters for creation of stabile mobile-like discriminatory constructions. The work was supported by grants P20 RR16478-04, P20 RR020143, P20 RR017703, P20 RR15577 and NIH 01700172.
References 1
2
3
Jarvis JN, Dozmorov I, Jiang K et al. Novel approaches to gene expression analysis of active polyarticular juvenile rheumatoid arthritis. Arthritis Res Ther 2004;6:R15–32. Dozmorov I, Knowlton N, Tang Y et al. Hypervariable genes – experimental error or hidden dynamics. Nucleic Acids Res 2004;32:147. Dozmorov I, Centola M. An associative analysis of gene expression array data. Bioinformatics 2003;19:204–11.
# 2005 Blackwell Publishing Ltd. Scandinavian Journal of Immunology 62 (Suppl. 1), 84–91
I. M. Dozmorov et al. Mobile Classification in Microarray Experiments 91 ............................................................................................................................................................................................................ 4 Knowlton N, Dozmorov IM, Centola M. Microarray data analysis toolbox (MDAT): for normalization, adjustment and analysis of gene expression data. Bioinformatics 2004;20:3687–90. 5 Dozmorov I, Knowlton N, Tang Y, Centola M. Statistical monitoring of weak spots for improvement of normalization and ratio estimates in microarrays. BMC Bioinformatics 2004;5:53. 6 Dozmorov I, Saban MR, Knowlton N, Centola M, Saban R. Connective molecular pathways of experimental bladder inflammation. Physiol Genomics 2003;15:209–22. 7 Jarvis JN, Dozmorov I, Jiang K et al. Gene expression arrays reveal a rapid return to normal homeostasis in immunologically-challenged trophoblast-like JAR cells. J Reprod Immunol 2004;61:99–113. 8 Jorgensen ED, Dozmorov I, Frank MB, Centola M, Albino AP. Global gene expression analysis of human bronchial epithelial cells treated with tobacco condensates. Cell Cycle 2004;3:1154–68. 9 Zimmerman RA, Dozmorov I, Nunlist EH et al. 5a-Androstane3a,17b-diol activates pathway that resembles the epidermal growth factor responsive pathways in stimulating human prostate cancer lncap cell proliferation. Prostate Cancer Prostatic Dis 2004;7: 364–74. 10 Toh H, Horimoto K. Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling. Bioinformatics 2002;18:287–97. 11 Olsson J, Poles M, Spetz AL et al. Human immunodeficiency virus type 1 infection is associated with significant mucosal inflammation characterized by increased expression of CCR5, CXCR4, and betachemokines. J Infect Dis 2000;182:1625–35. 12 Matussek A, Lauber J, Bergau A et al. Molecular and functional analysis of Shiga toxin-induced response patterns in human vascular endothelial cells. Blood 2003;102:1323–32. 13 Del Zotto B, Mumolo G, Pronio AM, Montesani C, Tersigni R, Boirivant M. TGF-beta1 production in inflammatory bowel disease: differing production patterns in Crohn’s disease and ulcerative colitis. Clin Exp Immunol 2003;134:120–6 (Erratum in: Clin Exp Immunol 2003;134:365). 14 Seegert D, Rosenstiel P, Pfahler H, Pfefferkorn P, Nikolaus S, Schreiber S. Increased expression of IL-16 in inflammatory bowel disease. Gut 2001;48:326–32. 15 Danese S, Motte Cd Cde L, Fiocchi C. Platelets in inflammatory bowel disease: clinical, pathogenic, and therapeutic implications. Am J Gastroenterol 2004;99:938–45. 16 Lawrance IC, Fiocchi C, Chakravarti S. Ulcerative colitis and Crohn’s disease: distinctive gene expression profiles and novel susceptibility candidate genes. Hum Mol Genet 2001;10:445–56. 17 McCormack G, Moriarty D, O’Donoghue DP, McCormick PA, Sheahan K, Baird AW. Tissue cytokine and chemokine expression in inflammatory bowel disease. Inflamm Res 2001;50:491–5. 18 Kraus TA, Toy L, Chan L, Childs J, Mayer L. Failure to induce oral tolerance to a soluble protein in patients with inflammatory bowel disease. Gastroenterology 2004;126:1771–8. 19 Kirkwood SC, Su JL, Conneally P, Foroud T. Progression of symptoms in the early and middle stages of Huntington disease. Arch Neurol 2001;58:273–8.
20 Morgan RW, Sofer L, Anderson AS, Bernberg EL, Cui J, Burnside J. Induction of host gene expression following infection of chicken embryo fibroblasts with oncogenic Marek’s disease virus. J Virol 2001;75:533–9. 21 Day RM, Mitchell TJ, Knight SC, Forbes A. Regulation of epithelial syndecan-1 expression by inflammatory cytokines. Cytokine 2003;21:224–33. 22 Uno T, Ryu D, Tsutsumi H et al. Residues in the distal heme pocket of neuroglobin. Implications for the multiple ligand binding steps. J Biol Chem 2004;279:5886–93. 23 Mutchnick MG, Lee HH, Hollander DI, Haynes GD, Chua DC. Defective in vitro gamma interferon production and elevated serum immunoreactive thymosin beta 4 levels in patients with inflammatory bowel disease. Clin Immunol Immunopathol 1988;47:84–92. 24 Torres MI, Le Discorde M, Lorite P et al. Expression of HLA-G in inflammatory bowel disease provides a potential way to distinguish between ulcerative colitis and Crohn’s disease. Int Immunol 2004;16:579–83. 25 Hornquist CE, Lu X, Rogers-Fani PM et al. G(alpha)i2-deficient mice with colitis exhibit a local increase in memory CD4þ T cells and proinflammatory Th1-type cytokines. J Immunol 1997;158:1068–77. 26 Fahlgren A, Hammarstrom S, Danielsson A, Hammarstrom ML. Increased expression of antimicrobial peptides and lysozyme in colonic epithelial cells of patients with ulcerative colitis. Clin Exp Immunol 2003;131:90–101. 27 Scott LA, Vass JK, Parkinson EK, Gillespie DA, Winnie JN, Ozanne BW. Invasion of normal human fibroblasts induced by v-Fos is independent of proliferation, immortalization, and the tumor suppressors p16INK4a and p53. Mol Cell Biol 2004;24:1540–59. 28 Pappas P, Sotiropoulou M, Karamanakos P, Kostoula A, Levidiotou S, Marselos M. Acute-phase response to benzo[a]pyrene and induction of rat ALDH3A1. Chem Biol Interact 2003;143– 144:55–62. 29 Pappas P, Stephanou P, Vasiliou V, Marselos M. Anti-inflammatory agents and inducibility of hepatic drug metabolism. Eur J Drug Metab Pharmacokinet 1998;23:457–60. 30 Saban MR, Nguyen NB, Hammond TG, Saban R. Gene expression profiling of mouse bladder inflammatory responses to LPS, substance P, and antigen-stimulation. Am J Pathol 2002;160:2095–110. 31 Adcock IM. Glucocorticoid-regulated transcription factors. Pulm Pharmacol Ther 2001;14:211–9. 32 de Silva AD, Mylonaki M, Rampton DS. Oral iron therapy in inflammatory bowel disease: usage, tolerance, and efficacy. Inflamm Bowel Dis 2003;9:316–20. 33 Aadland E, Fagerhol MK. Faecal calprotectin: a marker of inflammation throughout the intestinal tract. Eur J Gastroenterol Hepatol 2002;14:823–5. 34 Petrov RV, Lebedev KA, Poniakina ID, Petrukhin IS. Interrelation of the immunological parameters of healthy donors and of persons frequently ill with acute respiratory diseases and bronchitis in the remission stage (a new approach to assessing immune status). Zh Mikrobiol Epidemiol Immunobiol 1983;9:99–105.
# 2005 Blackwell Publishing Ltd. Scandinavian Journal of Immunology 62 (Suppl. 1), 84–91