An Integrated Computational Proteomics Method to Extract Protein ...

5 downloads 277 Views 605KB Size Report
data integration, database development, information visualization, and algorithm design ... our in-house Oracle 10G relational database system for this analysis.
An Integrated Computational Proteomics Method to Extract Protein Targets for Fanconi Anemia Studies Jake Yue Chen

Sarah L. Pinkerton

Changyu Shen

Mu Wang

Indiana University School Department of Biochemistry Division of Biostatistics Department of Biochemistry of Informatics and and Biophysics Department of Medicine and Biophysics Dept. of Computer Science Indiana University School Indiana University School Indiana University School Purdue School of Science of Medicine of Medicine of Medicine Indianapolis, IN 46202 USA Indianapolis, IN 46202 USA Indianapolis, IN 46202 USA Indianapolis, IN 46202 USA

[email protected]

[email protected]

[email protected]

[email protected]

malformations [1]. Cells from FA patients exhibit a unique hypersensitivity to DNA interstrand cross-linking agents such as mitomycin C (MMC) and diepoxybutane [2]. To date eleven FA complementation groups (FA-A, -B, -C, -D1, -D2, -E, -F, -G, I, J, and -L) have been defined by somatic cell fusion studies with each complementation group corresponding to a distinct gene [3, 4]. Eight of the FA genes, FANCA, FANCC, FANCD1 (BRCA2), FANCD2, FANCG, FANCE, FANCF, and FANCL have been cloned [4-10]. Despite the identification of FA genes, the functions of each FA protein remain largely unknown. In addition, FA proteins do not have significant sequence homology to each other or to proteins with known function. Several lines of evidence have suggested the involvement of FA proteins in modulation of cytokine-mediated signaling [11], responsiveness to oxidative stress [12], chromatin remodeling [13], and DNA interstrand cross-link repair [14]. At least two FA proteins, FANC-C and FANC-D2, have multiple functions [15]. Although different FA proteins are likely to have distinct cellular functions, the general phenotypic feature of bone marrow failure across all complementation groups indicates that FA proteins share important hematopoietic activities by either acting together in a multi-protein complex and/or playing their individual role in sequential signaling pathways. Specifically how the FA proteins function in response to various types of genomic stress has not yet been described.

ABSTRACT Fanconi Anemia (FA) is a rare autosomal genetic disease with multiple birth defects and severe childhood complications for its patients. The lack of sequence homology of the entire FA Complementation Group proteins in such as FANCC, FANCG, FANCA makes them extremely difficult to characterize using conventional bioinformatics methods. In this work, we describe how to use computational methods to extract protein targets for FA, using protein interaction data set collected for FANC group C protein (FANCC). We first generated an initial set of 130 FAinteracting proteins as “FANCC seed proteins” by merging an inhouse experimental set of FANCC Tandem Affinity Purification (TAP) Pulldown Proteomics data identified from Mass Spectrometry methods with publicly available human FANCCinteracting proteins. Next, we expanded the FANCC seed proteins using a nearest-neighbor method to generate a FANCC protein interaction subnetwork of 948 proteins in 903 protein interactions. We show that this network is statistically significant, with high indices of aggregation and separations. We also show a visualization of the network, support the evidence that many wellconnected proteins exists in the network. Further, we developed and applied an interaction network protein scoring algorithm, which allows us to calculate a ranked list of significant FA proteins. Our result has been supporting further biological investigations of disease biologists on our team. We believe our method can be generalized to other disease biology studies with similar problems.

A current model of how the FA protein functions in DNA repair is depicted in Fig. 1 [16]. Similar to many modular proteins, FA protein contains protein domains that may allow the protein to bind to multiple protein partners and form FA multi-protein complexes, which can subsequently carry out cellular functions. Therefore, the identification and analysis of components of protein complexes becomes critical in understanding their cellular functions. As shown in this figure (Fig. 1), FA proteins A, C, E, F, and G assemble into a nuclear complex that activates the monoubiquitination of FANC-D2, which is not part of the complex. Monoubiquitination of FANC-D2 is induced by DNA damage and is required for targeting of FANC-D2 to nuclei. Mutation in any of the FANC-A, -C, -E, -F, or -G prevents formation of this nuclear complex and disrupts the normal repair response to DNA-damaging agents, such as MMC. Monoubiquitinated FANC-D2 co-localizes with BRCA1 during S-phase and after DNA damage [13, 16]. However, there is no evidence suggesting that this co-localization is through direct FANC-D2/BRCA1 interaction. FANC-A/BRCA1 interaction is the only direct interaction reported so far using the yeast twohybrid system between BRCA1 and FA proteins. Although protein-protein interactions among FA proteins have also been

Keywords Proteomics, Protein Interaction Network, Disease Target, Fanconi Anemia

1. INTRODUCTION In this work, we investigate how to develop and use computational techniques to advance disease biology research frontiers for Fanconi Anemia (FA), a rare autosomal recessive genetic syndrome characterized by aplastic anemia, multiple birth defects, predisposition to cancer, and a wide range of congenital Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SAC’06, April 23-27, 2006, Dijon, France. Copyright 2006 ACM 1-59593-108-2/06/0004…$5.00.

173

and algorithm design and can be easily adapted to other disease biology studies.

previously reported (for example in [17]), some of those reports were conflicting. One problem associated with the yeast twohybrid system is that interactions of proteins that are dependent on post-translational modifications occurring only in mammalian cells would not be detected. One good example is that monoubiquitinylation and serine phosphorylation of FANC-D2 are required for mediating cellular resistance to ionizing radiation [18]. Co-immunoprecipitation strategies may also be problematic due to specificity of the antibodies used. To date it is still unclear whether a FA protein supercomplex is required to carry out DNA interstrand cross-link repair. FANCE FANCG

2. METHODS 2.1 Integration of FANCC Protein Interaction Data Sets The protein interaction data set for the study comes from two sources, MPC data set derived experimentally by our participating disease biologists and public human protein interaction data set collected through bioinformatics methods.

FANCC FANCA

FANCF

BRCA1? FANCD2

In the rest of the paper, we will provide details of our analysis method and results. We also discuss how to assist FA disease biologists with new hypothesis generations based on our results.

DNA damage

Ub FANCD2 Co-localizes with BRCA1 (indirect interaction?)

BRCA2/FANCD1

BRCA1

Figure 1. Schematic presentation of the current model for the FA protein complex, ubiquitination of FANCD2, and interactions with BRCA1/BRCA2. We developed an integrated computational method to perform biological knowledge discovery, based on an initial collection of FA Multi-Protein Complex (MPC) data identified from Tandem Affinity Purification (TAP) protein pulldown and mass spectrometry experiments. The MPC protein pulldown experiment used protein Fanconi Anemia Complementation Group C (gene symbol: FANCC) as “bait”, from which we use a “spoke model” to enumerate interacting proteins by counting only the bait-prey protein interactions between FANCC and identified FANCC pulldown proteins. We first searched the Online Predicted Human Interaction Database (OPHID) [19] to retrieve and merge the FANCC MPC data set with known experimental/predicted human interacting protein pairs involving the FANCC protein. Second, we expanded the merged protein interactions with additional OPHID protein interactions and visualized the FA protein interaction sub-network using interaction confidence and types as parameters. Third, we performed statistical analysis to assess the significance of the information extracted. Fourth, we developed and applied a computational algorithm to rank order proteins of high relevance to the FA disease sub-network seeded by FANCC.

A snapshot of the MPC data identified from the protein Tandem Affinity Purification (TAP) and mass spectrometry methods is shown in Table 1. The FANCC protein (the first record) served as the bait protein for the proteomics data set. Even though this MPC data gives a list of proteins functionally related to FANCC, the list by itself is not quite informative. In particular, the score, “XCorr Score” is simply a measure of confidence that an entry protein was detected in the MPC proteomics experiment. There are no indicators to forecast how closely and how significantly a protein is related to the FANCC disease biology pathways/networks. The data in the table also showed a nontrivial bioinformatics challenge of making protein identifiers compatible from one data set to another. For example, we noticed that even though many protein identifiers from public databases are prefixed to the protein description field of each record, the SwissProt ID (immediately following “..|sp|”) in the Protein Description string are missing for proteins “IPI00180305.2”, making it difficult for them to be mapped to protein entries in the SwissProt database. Table 1. A FANCC TAP/MS data snapshot. It shows top 4 of 145 proteins identified in the MPC with the highest XCorr Scores. Protein ID

IPI000 23608. 1

IPI002 96337. 2

Our reported work has the following two significances. First, our work is the first to demonstrate that it is possible to combine proteomics data sets from two distinctly different sources and make conclusive FA disease biology discoveries. The target proteins that we discovered in this research have served specific guidance to experimental disease biologists/clinicians how to design the next iterations of experiments. Second, our computational proteomics approach to classical bioinformatics problems is novel and integrated, which combines bioinformatics data integration, database development, information visualization,

IPI000 31801. 1 IPI001 80305. 2

174

XCorr Score

5.585

4.994

4.938

4.368

Peptide Count

Description

38

rs|NP_000127|sp|Q0059 7|Fanconi_anemia_group _C_protein|mass|63429| Human

2

rs|NP_008835|sp|P7852 7-1|Splice_isoform1of P78527_DNAdependent_protein_kinas e_catalytic_subunit|mass |469089|Human

1

rs|NP_003642|sp|P1698 9-1|Splice_isoform1of P16989_DNA-binding _protein_A|mass|40060| Human

1

rs|NP_065816|sp||retinob lastoma-associated factor_600|mass|573939| Human

The second source of data comes from the Online Predicted Human Interaction Database (OPHID) [19], a web-based database of human protein interactions with more than 40,000 interactions among ~9,000 proteins. It is by far the most comprehensive and integrated repository of all known human protein interactions, both from curated literature publications and from highthroughput experiments, and of predicted interactions inferred from interaction evidence in model organisms, e.g., yeast, fly, worm, and mouse. Even though more than half of total interactions in OPHID are predicted by mapping interacting protein pairs in available organisms onto orthologous protein pairs in human, the statistical significance of these predicted human interactions was confirmed by evaluating domain co-occurrence, co-expression, and GO semantic distance evidences [19]. The entire collection of OPHID data were downloaded and loaded into our in-house Oracle 10G relational database system for this analysis.

interactions, therefore involve FANCC protein as one partner and a seed protein as another partner. Next, we search and retrieve protein interacting pairs in OPHID such that at least one member of the protein interaction pair belongs to the FANCC seed proteins. The set of interacting pairs retrieved is called the FANCC expanded interactions, and the new expanded set of proteins is called the FANCC expanded proteins (a superset of FANCC seed proteins). The FANCC expanded interactions can have either the “W” type (expansions taking place within seed proteins) or the “A” type (expansions taking place across seed and non-seed proteins). Note that since we do not expand FANCC-related interaction beyond FANCC’s immediate interaction partners, we should not expect interactions with both partners belonging to “non-seed proteins”.

2.3 Visualization of Interaction Sub-networks To perform interaction network visualization, we used an internally developed software platform, ProteoLens [21]. The tool has native built-in support for relational database access and manipulations. The tool allows expert users to browse database schemas and tables, filter and join relational data using SQL queries, and customize data fields to be visualized as graphical annotations in the visualized network. The visualization result is shown in the next section.

Because there is inherent noise in either MPC proteomics data sets or predicted protein interaction data sets, we explicitly model the data reliability from different data source. In particular, we assign a confidence score for each protein interaction pair in our merged MPC and predicted human protein interaction data set, based on the following heuristic scoring rules: 1.

MPC protein interactions, of which prey proteins have an “XCorr Score” ≥ 2.5, are assigned a high confidence score of 0.91.

2.

MPC Protein interactions, of which prey proteins have an “XCorr Score” between 1.95 and 2.5, are assigned a medium confidence score of 0.75.

3.

OPHID protein interactions that are experimentally collected from human (non-predicted data set) are assigned a high confidence score of 0.9;

4.

OPHID protein interactions that are inferred from highquality protein interactions in mammalian organisms are assigned a medium confidence score of 0.5;

5.

OPHID protein interactions that are inferred from lowquality or low-confidence interactions or non-mammalian organisms are assigned a low confidence score of 0.3.

2.4 Statistical Examination of the Network Since all the FANCC expanded proteins interact with FANCC seed proteins, which in turn interact with the FANCC protein, one could conjecture that the network formed by the FANCC expanded proteins should be more “connected” than randomly selected protein set of the same size. To gauge network “connectivity”, we first introduced several basic concepts. First, we define a path between two proteins A and B as a set of proteins P1, P2,…, Pn such that A interacts with P1, P1 interacts with P2, …, and Pn interacts with B. Note that if A directly interacts with B, then the path is the empty set. Then, we define the largest connected sub-network of a network, as the largest subset of proteins and interactions such that there is at least one path between any pair of proteins in the interaction network subset. Finally, we define the index of aggregation of a network as the ratio of the size (by protein count) of the largest subnetwork that exists in this network to the size of the network. Therefore, the higher the index of aggregation, the more “connected” the network should be.

We also used the Human Gene Nomenclature Consortium (HGNC) database [20], a repository of officially approved gene symbols by an international genome coalition, to resolve protein identifiers from multiple data sources and unofficial gene symbols. The HGNC database provides standard gene symbols and gene mappings to various gene/protein IDs in common public databases such as SwissProt, NCBI RefSeq, NCBI Locuslink, and KEGG enzyme. Using HGNC gene mappings, we were able to map the majority of protein entries from both the MPC data set and the OPHID database into SiwssProt IDs and official gene symbols.

Another network property to gauge is the “index of separation”, a measure of the percentage of W-type interactions found in the entire FANCC expanded interactions. One can conjecture that a high index of separation found in a network represents extensive “re-discovery” of proteins after the protein interactions are expanded from the seed proteins. We developed a simulation method to examine the statistical significance of observed index of aggregation and index of separation in FANCC expanded protein networks. Specifically, we use the following resampling procedure to measure how likely what we have observed is distinctly different from random selections:

2.2 Expansion of the Interaction Network Once we have a collection of merged initial protein interaction data set, we derive FA-related protein interaction sub-network using a nearest-neighbor expansion method described as follows. First, we denote an initial list of FANCC-interacting proteins (merged from both experimental TAP method and OPHID) as FANCC seed proteins (which include FANCC protein). The set of protein interactions, called FANCC seed

1.

175

Randomly select from OPHID 100 proteins (the number of “effectively expandable” number of proteins in the FANCC seed protein set);

2.

Build an expanded protein interaction sub-network by using the same nearest-neighbor expansion method described earlier;

3.

Find the largest connected sub-network and the number of W-type interactions;

4.

Compute the index of aggregation and index of separation for the expanded sub-network;

5.

Repeat steps 1-4 for 1,000 times to obtain a distribution of the index of aggregation and index of separation under random selection conditions.

6.

Compare the actually observed indexes of aggregation and separation with the distribution obtained in 5 and calculate the p-value.

set as described in the Method Section. Except for 30 proteins from the seed protein that cannot be mapped to SwissProt entries (and therefore not expandable), the remaining 100 FANCC seed proteins were queried against OPHID interactions. The FANCC expanded proteins include 948 members (130 FANCC seed proteins and 718 non-seed proteins). The FANCC expanded interactions include 903 interaction pairs, among which 32 are Wtype interactions and 871 are A-type interactions. In Figure 2, we show a schematic drawing of the FANCC expansions and the total count of proteins in each category. W-type expanded interactions (32)

2.5 Protein Target Ranking Algorithm We assess the individual confidence for each protein in the FANCC expanded interaction set. We define a relevance score function si similar to [22] for each protein i in this set as the following:

FANCC seed proteins (130)

⎛ ⎞ ⎛ ⎞ si = k * ln⎜⎜ ∑ p(i, j ) ⎟⎟ − ln⎜⎜ ∑ N (i, j ) ⎟⎟ ⎝ j∈ N ( i ) ∩ A ⎠ ⎝ j∈ N ( i ) ∩ A ⎠,

E-type expanded interactions (871) FANCC Non-seed proteins (718)

Figure 2. A schematic drawing of FANCC protein expansions beginning with an initial set of merged 130 seed proteins (the numbers shown in each category are total counts)

where i and j are indices for proteins in the network, k is an empirical constant (k>1), N(i) is the set of interaction partners of protein i in the network, A is the set of FANCC expanded proteins, p(i,j) is the confidence score that we assigned to the interaction between proteins i and j (described earlier), and N(i,j) = 1 if protein j belongs to the intersection of N ( i ) ∩ A (Otherwise N(i,j) = 0). In principle, this empirical scoring function ranks favorably a situation, in which interacting proteins with many high confidence interactions among its neighbors will stand out among proteins with many low confidence interactions or with only a few interactions. To avoid showing a negative score, in this work, we further convert si to the exponential scale as ti= exp(si) and report ti as the final score. In this study, we assign k=2.

3.2 Statistical Examination of the Network For the FA-related protein interaction network formed by the FANCC expanded proteins excluding the FANCC protein itself, the largest connected sub-network consists of 708 FA-related proteins among total 948 seed and non-seed proteins. However, since 30 FANCC seed proteins are not readily expandable because they do not map to known SwissProt records, the “effective” size of the connected network of interest is 94830=818, from which we can calculate the index of aggregation (defined in the Method Section) as 708/818=86.55%. Also, we can determine the index of separation as the percentage of W-type interactions (32) in all the FANCC expanded interactions (903), i.e., 32/903=3.54%.

3. RESULTS 3.1 Building a FA Protein Interaction Sub-network

To help researchers comprehend the statistical significance of the FANCC expanded protein interaction network demonstrated through the statistical properties, we studied two histograms (Figures not shown due to space limit), one for the random distribution of the parameter index of aggregation and the other for the random distribution of the parameter index of separation. Both distributions are derived from random sampling methods described in the Method Section. We found an index of aggregation of 86.55%=0.87 for our FA-related protein interaction network falls at approximately the rightmost 5% of the histogram tail area of the random distribution of the parameter; therefore, the index is significant at p-value = 0.05. We also found an index of separation 3.54%=0.035 for our FA-related protein interaction network falls to the extreme right end of the histogram of the random distribution of the parameter; therefore, the index is very significant at p-value < 0.001.

As described in the Method section, we collected an initial list of FA-related protein interaction data set—FANCC Seed Interactions—by merging two proteomics data sets. After cleaning the protein identifiers to make the data set sharing SwissProt or NCBI Refseq protein IDs, we have a total of 119 unique FANCC (SwissProt ID: Q00597) seed interactions from the TAP/Mass Spectrometry method, and a total of 12 unique FANCC seed interactions from the OPHID data set. Altogether, the two merged data sets contain 130 unique FANCC seed interactions, with only one protein interaction in common, the interaction between FANCC and FANCG (Fanconi Anemia Group G Protein). The abundance of interaction partners of protein FANCC and the lack of significant overlap between two independent proteomics data sets suggest that FANCC protein is likely a “hub” protein [23].

A significant but not exceptionally high network index of aggregation suggests that the FA-related network has connectivity structures that are not random by nature. A

From the 130 FANCC seed proteins, we performed nearestneighbor expansions using OPHID human protein interaction data

176

significant and exceptionally high index of separation after FANCC expansions suggests that the FA-related proteins may be tightly related, consistent with the assumption that majority members of the connected proteins should participate in a few shared biological pathways.

4. DISCUSSION The integrated approach to the analysis of human protein interaction data and Fanconi anemia complementation proteins have been guiding biologists in our team to validate existing protein-protein interactions, predict novel interactions that could lead to new hypothesis-driving projects, determine the pathways that FA proteins are involved, and eventually elucidate the functions of FA proteins.

3.3 Visualization of the FA-related Protein Interaction Sub-network

From the figure, we can make the following observations. First, in the network, there are quite a few interactions that tend to “fan out” from a few protein hubs in the network. The FANCC protein (at the center of a cluster of seed proteins) appears to be such a protein hub. Second, there are quite a few “peripheral proteins” that do not significantly relate to the rest of the network. They represent proteins with lower likelihood than normal to be involved significantly in FA disease pathways. Third, there are some interesting “essential proteins” in the network. These proteins are generally well connected with the rest of the network through high-quality protein interactions.

In TAP?

Gene Name

1

69.72

FANCC

Fanconi anemia, complementation group C

Y

2

59.83

ESR1

estrogen receptor 1

Y

3

32.33

CASP8

caspase 8, apoptosis-related cysteine protease

Y

4

31.94

STAT1

signal transducer and activator of transcription 1, 91kDa

N

5

30.59

HSPA1A

heat shock 70kDa protein 1A

N

6

30.59

HSPA1B

heat shock 70kDa protein 1B

N

7

30.27

PRKDC

protein kinase, DNA activated catalytic polypeptide

Y

8

23.69

CDC2

cell division cycle 2, G1 to S and G2 to M

N

9

20.21

THBS1

thrombospondin 1

Y

10

18.49

FANCG

Fanconi anemia, complementation group G

Y

11

16.4

SPTAN1

spectrin, alpha, nonerythrocytic 1 (alpha-fodrin)

N

12

15.88

BRCA1

breast cancer 1, early onset

N

13

15.8

FANCA

Fanconi anemia, complementation group A

N

Y

14

14.97

CFTR

cystic fibrosis transmembrane conductance regulator, ATPbinding cassette (sub-family C, member 7)

15

13.94

WRN

Werner syndrome

Y

16

10.63

FANCE

Fanconi anemia, complementation group E

N

17

9.92

FANCD2

Fanconi anemia, complementation group D2

N

18

8.4

TP53

tumor protein p53 (LiFraumeni syndrome)

N

19

8.13

GRIN2A

glutamate receptor, ionotropic, N-methyl D-aspartate 2A

Y

7.9

HSPA6

heat shock 70kDa protein 6 (HSP70B')

Y

20

177

Gene Symbol

Rank

Figure 3. A network of FANCC expanded protein interactions. Nodes colored in red are FANCC seed proteins and in blue are non-seed expanded proteins from OPHID. Edges represent protein interactions: the thicker the line and the redder the line are, the more confidence we know about a particular interaction.

Score

Table 2. Top 20 rank-ordered FA related protein targets.

In Figure 3, we show a visualization of all the FA expanded proteins and protein interactions in the entire network, using our recently developed software, “ProteoLens” [21]. ProteoLens is a biological network data mining and annotation platform, which supports standard GML files and relational data in the Oracle Database Management System. In the figure, all the FA seed proteins (shown as nodes) are colored red, while the FA non-seed expanded proteins (also shown as nodes) are colored blue. All the protein interactions (shown as edges) are also color-labeled, with interaction quality confidence shown as edges with varying degrees of “thickness” and “redness”.

For example, in Table 2, we showed top 20 rank-ordered FANCC interacting proteins (including FANCC itself), from which we can generate interesting biological hypothesis. In the table, 10 (50%) of the proteins are experimentally identified by TAP method. FANCG and FANCE are in a nuclear complex that activates the monoubiquitination of FANC-D2 [17]. This complex formation may be DNA-damage dependent. Without DNA double-strand breaks (DSBs), FANCA, FANCF, and BRCA1 may not form a complex with FANCC and FANCG. It is also known that both FANCA and FANCG are phosphoproteins, the fact that DNAPKcs is found to be in the complex with FANCC may indicate that DNA-PK is responsible for phosphorylation of these proteins. Since FANCC has been shown to interact directly with FANCA, it is likely that FANCC is also a phosphoprotein. In some cases phosphoryation is activating and in other cases phosphylation is inhibitory, therefore if FANCC is a phosphoprotein its activity could depend upon its phosphorylation status in the cell.

large proteins that contain pro domains in the inactive state. Once the pro domain is cleaved, the caspase is activated and begins to signal downstream to the effector caspases to initiate apoptosis within the cell. Caspase 8 is the key initiator caspase downstream of the apoptosis pathway induced by TNFR1 and Fas. Activated caspase 8 plays dual roles in the cell. Caspase 8 feeds directly into caspase 3 activation; and stimulates the release of cytochrome C by the mitochondria. Caspase 3 activation leads to the degradation of cellular proteins necessary to maintain cell survival and integrity, which leads to cell death. Since FANCC has an anti-apoptotic function upstream of Caspase 3 activation, the determination of whether the anti-apoptotic function of FANCC in cells is through its interaction with procaspase 8 or caspase 8 could help us better understand the role of FANCC plays in apoptotic pathway. Several other interesting findings include estrogen receptor 1, DNA-PKcs, and Werner syndrome protein.

Interestingly, we also found that Werner Syndrome protein (WRN) high on the protein target list. Both FA and Werner Syndrome (WS) have been classified as caretaker diseases, because they feature genomic instability as well as a strong predisposion to cancer [24]. Patients with WS, a premature aging disease, have features typical of normal aging such as graying and loss of hair, as well as other disorders associated with aging including: atherosclerosis, osteoporosis, type II diabetes mellitus and vascular disease, as well as an unusually high incidence of tumors [25, 26]. The WRN protein is made of 1432 amino acids and contains a nuclear localization signal at its carboxyl-terminal end. The WRN protein possesses both exonuclease and helicase activities and is partially homologous to RecQ helicases. WRN has been found to bind to Ku70/80, a component of the DNA-PK complex, which suggests a role for WRN in DSBs repair [26]. However, its interaction with FANCC may not be direct, rather it is very likely through DNA-PK.

We believe our results from this analysis to be very interesting to clinicians and FA researchers. Our integrated computational proteomics method can also be generalized to study other disease biology areas, such as Alzheimer’s Disease. We plan to make our complete results and computational method publicly available on our web site http://bio.informatics.iupui.edu/ soon.

5. ACKNOWLEDGMENT This work is partially supported by a summer research grant awarded to Dr. Jake Chen by Purdue Research Foundation. It was also supported in part by systems obtained by Indiana University through its relationship with Sun Microsystems Inc. as a Sun Center of Excellence. We thank Stephanie Burks for maintaining the high-end Sun servers and Oracle 10g servers, on which the database computing in this study was conducted.

6. REFERENCES 1.

In addition, Caspase 8 was also identified as one of the FANCC highly relevant proteins. Previous studies have connected the apoptotic effects seen in hematopoietic progenitor cells from children lacking functional FANCC protein to the Fas pathway. Inhibitors of both the initiator Caspase 8 and the effector Caspase 3 were shown to blunt the apoptotic response seen in FA-C lymphoblast cells, which confirms that the apoptotic effect seen in FA-C cells go through the fas/caspase8/caspase3 pathway [27]. Exactly what role FANCC plays in this apoptotic pathway is unclear, but it is likely that FANCC functions upstream of Caspase 3 since inhibitors of Caspase 8 also blunted the response seen in FA-C cells. There are many apoptotic signaling pathways within the cell, some of which are independent and others exhibit feedback functions. Many apoptotic signals are mediated via receptors present on the surfaces of lymphocytes and other hemopoietic cells for the immune system. Of these receptors, the most prominent belong to a group of structurally related signaling proteins called the TNF receptor (TNFR) superfamily [28]. The TNFR family is separated into two classes. The first class which includes TNFR1, Fas, and death receptors three through six (DR3-DR6), have cytoplasmic tails that contain a “death domain (DD)”. These domains are important for protein-protein interactions, for example the Fas receptor contains a DD that binds to both Fas-associated death domain protein (FADD) and receptor-interacting protein (RIP). There are two types of caspases, the initiator and effector caspases. Initiator caspases are

2. 3. 4. 5. 6. 7. 8. 9.

178

D'Andrea, A.D. and M. Grompe, Molecular biology of Fanconi anemia: implications for diagnosis and therapy. Blood, 1997. 90(5): p. 1725-36. Ishida, R. and M. Buchwald, Susceptibility of Fanconi's anemia lymphoblasts to DNA-cross-linking and alkylating agents. Cancer Res, 1982. 42(10): p. 4000-6. Joenje, H., et al., Complementation analysis in Fanconi anemia: assignment of the reference FA-H patient to group A. Am J Hum Genet, 2000. 67(3): p. 759-62. Timmers, C., et al., Positional cloning of a novel Fanconi anemia gene, FANCD2. Mol Cell, 2001. 7(2): p. 241-8. Strathdee, C.A., et al., Cloning of cDNAs for Fanconi's anaemia by functional complementation. Nature, 1992. 356(6372): p. 763-7. Lo Ten Foe, J.R., et al., Expression cloning of a cDNA for the major Fanconi anaemia gene, FAA. Nat Genet, 1996. 14(3): p. 320-3. de Winter, J.P., et al., The Fanconi anaemia group G gene FANCG is identical with XRCC9. Nat Genet, 1998. 20(3): p. 281-3. de Winter, J.P., et al., Isolation of a cDNA representing the Fanconi anemia complementation group E gene. Am J Hum Genet, 2000. 67(5): p. 1306-8. de Winter, J.P., et al., The Fanconi anaemia gene FANCF encodes a novel protein with homology to ROM. Nat Genet, 2000. 24(1): p. 15-6.

10. 11.

12.

13.

14.

15. 16. 17. 18. 19.

20.

Howlett, N.G., et al., Biallelic inactivation of BRCA2 in Fanconi anemia. Science, 2002. 297(5581): p. 606-9. Rathbun, R.K., et al., Inactivation of the Fanconi anemia group C gene augments interferon-gamma-induced apoptotic responses in hematopoietic cells. Blood, 1997. 90(3): p. 974-85. Kruyt, F.A., et al., Abnormal microsomal detoxification implicated in Fanconi anemia group C by interaction of the FAC protein with NADPH cytochrome P450 reductase. Blood, 1998. 92(9): p. 3050-6. Hoatlin, M.E., et al., A novel BTB/POZ transcriptional repressor protein interacts with the Fanconi anemia group C protein and PLZF. Blood, 1999. 94(11): p. 373747. McMahon, L.W., et al., Human alpha spectrin II and the FANCA, FANCC, and FANCG proteins bind to DNA containing psoralen interstrand cross-links. Biochemistry, 2001. 40(24): p. 7025-34. Pang, Q., et al., The Fanconi anemia complementation group C gene product: structural evidence of multifunctionality. Blood, 2001. 98(5): p. 1392-401. Naf, D., et al., Functional activity of the fanconi anemia protein FAA requires FAC binding and nuclear localization. Mol Cell Biol, 1998. 18(10): p. 5952-60. Folias, A., et al., BRCA1 interacts directly with the Fanconi anemia protein FANCA. Hum Mol Genet, 2002. 11(21): p. 2591-7. Taniguchi, T., et al., Convergence of the fanconi anemia and ataxia telangiectasia signaling pathways. Cell, 2002. 109(4): p. 459-72. Brown, K.R. and I. Jurisica, Online predicted human interaction database. Bioinformatics, 2005. 21(9): p. 2076-82.

21. 22.

23. 24. 25. 26. 27.

28.

179

Povey, S., et al., The HUGO Gene Nomenclature Committee (HGNC). Hum Genet, 2001. 109(6): p. 678-80. Sivachenko, A., J. Chen, and C. Martin, ProteoLens: A Visual Data Mining Platform for Exploring Biological Networks (submitted). Bioinformatics, 2005. Chen, J.Y., C. Shen, and A.Y. Sivachenko. Mining Alzheimer Disease Relevant Proteins from Integrated Protein Interactome Data. in Pacific Symposium on Biocomputing. 2006. Maui, Hawaii. Hoffmann, R. and A. Valencia, Protein interaction: same network, different hubs. Trends Genet, 2003. 19(12): p. 681-3. Joenje, H. and K.J. Patel, The emerging genetic and molecular basis of Fanconi anaemia. Nat Rev Genet, 2001. 2(6): p. 446-57. Oshima, J., et al., Homozygous and compound heterozygous mutations at the Werner syndrome locus. Hum Mol Genet, 1996. 5(12): p. 1909-13. Li, B. and L. Comai, Functional interaction between Ku and the werner syndrome protein in DNA end processing. J Biol Chem, 2000. 275(50): p. 39800. Rathbun, R.K., et al., Interferon-gamma-induced apoptotic responses of Fanconi anemia group C hematopoietic progenitor cells involve caspase 8dependent activation of caspase 3 family members. Blood, 2000. 96(13): p. 4204-11. Makela, T. and K. Porkka, [The "omics" are coming--one gene is not enough anymore]. Duodecim, 2002. 118(11): p. 1146-8.