51
Towards a systems biology understanding of human health: Interplay between genotype, environment and nutrition Frank Desiere* Nestle´ Research Center, P.O. Box 44, 1000 Lausanne 26, Switzerland; Institute for Systems Biology, Seattle, Washington, USA Abstract. Sequencing of the human genome has opened the door to the most exciting new era for the holistic system description of human health. It is now possible to study the underlying mechanisms of human health in relation to diet and other environmental factors such as drugs and toxic pollutants. Technological advances make it feasible to envisage that in the future personalized drug treatment and dietary advice and possibly tailored food products can be used for promoting optimal health on an individual basis, in relation to genotype and lifestyle. Life-Science research has in the past very much focused on diseases and how to reestablish human health after illness. Today, the role of food and nutrition in human health and especially prevention of illness is gaining recognition. Diseases of modern civilization, such as diabetes, heart disease and cancer have been shown to be effected by dietary patterns. The risk of disease is often associated with genetic polymorphisms, but the effect is dependent on dietary intake and nutritional status. To understand the link between diet and health, nutritional-research must cover a broad range of areas, from the molecular level to whole body studies. Therefore it provides an excellent example of integrative biology requiring a systems biology approach. The current state and implications of systems biology in the understanding of human health are reviewed. It becomes clear that a complete mechanistic description of the human organism is not yet possible. However, recent advances in systems biology provide a trajectory for future research in order to improve health of individuals and populations. Disease prevention through personalized nutrition will become more important as the obvious avenue of research in life sciences and more focus will need to be put upon those natural ways of disease prevention. In particular, the new discipline of nutrigenomics, which investigates how nutrients interact with humans, taking predetermined genetic factors into account, will mediate new insights into human health that will finally have significant positive impact on our quality of life. Keywords: systems biology, genomics, transcriptomics, proteomics, metabolomics, nutrigenomics, pharmacogenetics, pharmacogenomics, health, nutrition, diet, disease, prevention, diagnostics, bioinformatics, molecular databases, metabolism, networks, cells, polymorphisms, SNPs, epigenetics.
Introduction Systems biology is gaining importance in today’s life-science research. Interestingly, the first attempts to systems biology, go back to the 1960s. At the time such attempts were called modeling of cellular processes by study means of ‘‘systems theory and biology.’’ Many mathematicians and engineers tried to develop approaches that allow analyzing biological systems in a physical way. It was realized at the time, that when they tried to interact with an organism as a physical system, they found themselves interacting with it in many *Corresponding author: E-mail:
[email protected] BIOTECHNOLOGY ANNUAL REVIEW VOLUME 10 ISSN: 1387-2656 DOI: 10.1016/S1387-2656(04)10003-3
ß 2004 ELSEVIER B.V. ALL RIGHTS RESERVED
52 more ways than they had instrumentations for. It then became clear that for a complex system as cells or organisms there were many more capabilities, and several more modes of interaction than just a limited set of canonical rules [1]. Up until that point, many scientists and biologists alike have focused on reducing life to its constituent parts, first focusing on the cell, then working their way down to the molecular level. Today, two apparently opposing opinions are in discussion. The first claims that a cellular system can readily be described in all its parts and even be simulated, maybe using the tools of systems biology. The other opinion cast serious doubts that this can be achieved due to fundamental reasons and limitations. This opinion is certainly well documented by the work of Robert Rosen (1934–1998), a theoretical biologist who strived to answer the question the Nobel physicist Erwin Schro¨dinger posed in 1943: ‘‘What is Life?’’ To this day, what it is that makes an organism alive has remained unanswered by conventional biology, chemistry and physics. Schro¨dinger’s works on complexity and biological systems claim that these cannot be decomposed or predicted because of their anticipatory nature and that a biological system is not a just a complex machine [2]. But let us now focus on the achievements that have catalyzed the massive advances in understanding of biological systems through the field of biotechnology and later genomics, leading finally to the more holistic (and mysterious) term of ‘‘systems biology.’’ With the development of highthroughput technologies for molecular biology in the 1980s and 1990s, that amongst other achievements have resulted in the completion of the human genome [3,4], quantitative data on the transcriptome [5], proteome [6] and metabolome level [7], an increasing interest in formal mathematical models of cellular activities as gene expression and regulation has been triggered. It was realized that the vast amount of data available would require new concepts for the understanding and new tools for the description of life as a whole. Systems biology and systems theory, which study the organization and behavior of living systems, seemed indeed a natural conceptual framework for such a task. Systems biology attempts to reconstruct living systems as a series of overlapping models. It exploits all the theoretical and experimental advances of the various genome projects, allying them to computational, mathematical and engineering disciplines. This is done in an attempt to create predictive models of cells, organs, biochemical processes, and complete organisms. Consequently, systems biology today has the potential to advance our knowledge and understanding of complex biological systems, from simple cells to complete organisms and potentially to whole ecosystems. The understanding of biological systems is not an altruistic matter for the benefit of advancing philosophy and theoretical sciences. No, there are real problems to be solved. The world’s demography has pushed medical treatment to higher levels over the last decades, mainly due to aging populations and increased life-expectancy [8]. With individuals realizing that they will enjoy longer lives, the issue of disease-prevention has become an important concern.
53 The quest for new treatments and prevention of illness has let pharmaceutical companies become powerful, big corporations which drive many areas of modern life science and a big proportion of the biotechnology industry to this date. The discovery of new pharmaceutical treatments, especially those which will bring large amounts of cash back to the industry, the so called ‘‘block-buster drugs,’’ seem to be the ultimate goal for research and development in academic and public institutes, biotechnology companies and the private medical research centers alike. In parallel, researchers have promoted disease prevention also through adequate nutrition and it was realized that scientific breakthroughs in both areas would require a massive investment into modern nutrition research through ‘‘systems biology.’’ To accelerate the mission, research institutes and scientific groups dealing with systems biology have been created in recent years. Founded in the year 2000, the ‘‘The Institute for Systems Biology’’ (www.systemsbiology.org) is for many people the pioneer in the new field, and has managed to influence the pace and direction of modern biology. That trend has now gained broad acceptance as a new scientific field, prompting the National Institutes of Health (NIH) to identify in 2003 systems biology and multidisciplinary research as key components in a new set of agency initiatives for the NIH Roadmap for Medical Research of the next decade. New projects under the theme of New Pathways to Discovery would include Bioinformatics and Computational Biology, Structural Biology, Building Blocks and Pathways, Molecular Libraries and Molecular Imaging, and Nanomedicine. Starting from the year 2004, the NIH will fund these topics which will at the same time require an improved computational infrastructure for biomedical research, libraries of chemical molecules, new molecular and cellular imaging tools, and nanoscale technology devices for viewing and interacting with basic life processes. This policy describes clearly the challenges ahead of us to investigate biological systems, particularly in the context of human health, treatment of disease and prevention of illness. Technologies of systems biology Systems biology focuses on complex biological systems that are composed of molecular components. Understanding systems biology requires the integration of experimental and computational research data [9]. Systems biology is the attempt to systematically study all the concurrent physiological processes in a cell or tissue by global measurement of differentially perturbed states. The ultimate goal of systems biology is the integration of data from these observations into models that might, eventually, represent and make possible the simulation of the physiology of the cell [10,11]. Although biological systems are made-up of their components, the essence of a system lies in dynamics and it cannot be described merely by enumerating components of the system. At the same time, it is inappropriate to believe that only system structures, such as network topologies, are important without
54 Table 1. Web resources and databases for systems biology.
Institute for Systems Biology (www.systemsbiology.org) MIT Computational and Systems Biology Initiative (CSBI) (csbi.mit.edu/) Bauer Center for Genomics Research (CGR) at Harvard University (www.cgr.harvard.edu/) Bio-X at Stanford University (biox.stanford.edu/) Cell Systems Initiative at the University of Washington (csi.washington.edu/) Genomes to Life program at the US Department of Energy (DOE) (doegenomestolife.org/) Biomolecular Systems website at the Pacific Northwest National Laboratory (PNNL) (biomolecular.org) Institute for Computational Biomedicine at the Weill Medical College of Cornell University (icb.med.cornell.edu/)
paying sufficient attention to diversities and functionalities of the components (Table 1). Both structure of the system and its components play indispensable roles forming a holistic view of the state of the system. The goals of systems biology are: (1) (2)
(3) (4)
Understanding of the components of a biological system, such as genes, proteins and metabolites, as well as their physical structures, Understanding of dynamics of the system, both quantitative and qualitative analysis as well as construction of theories/models with powerful prediction capability, Understanding of control methods of the system, and Understanding of design methods of the system.
The following sections will give a more detailed overview of the sub-disciplines of systems biology, which characterize the cellular components. Finally these components will have to be put into context, which will be the focus towards the end of this review. Genomics The availability of completely sequenced genomes catalyzed the emergence of systems biology and has truly revolutionized biology. For the first time since the advent of molecular biology, biological questions are now addressed by studying the complete set of a system in contrast to the previous investigation of function(s) of individual genes and gene products one or a few at a time. Before, high-throughput analytical instruments like the DNA sequencer or mass spectrometers for protein determination had been invented, this reductionist approach proved to be extremely fruitful, leading to the discovery of an impressive number of biological principles. However, it was quickly realized that in nature, cellular components function together with other components. As Henri Poincare´ already pointed out in 1952 [12] ‘‘the aim of science is not things in themselves, but the relations between them; outside these relations there is no reality knowable.’’ Indeed, biological processes should be considered as a
55
Number of completed genomes
complex network of interconnected components. In other words, for any biological process, one might consider a ‘‘modular approach’’ in which the behavior and function of the corresponding network are studied as a whole. In addition to studying some of its components individually, the first step to reach that goal was the determination of complete genomes of organisms. The significance of the finished human genome sequence [3,4] and other genomes of model organisms for the field of systems biology cannot be overstated. Without these genomes, holistic studies would simply not be possible. Still, our knowledge is steadily increasing, which is underlined by the latest detailed analysis of human chromosome 6 [13]. A great abundance of biological information was revealed that was previously unrecognized within the draft of the human genome. Comparative genomics using the genomes of the mouse, rat, puffer- and zebra-fish allowed refined predictions of which stretches of DNA are actually genes, and a more sophisticated interpretation of the underlying genomic data. The power of comparative genomics is quickly growing as the genome sequences of other nematodes are sequenced [14], as well as chicken, chimpanzee, frog, and cow that are already in the production queue, become available. Currently there are about 203 complete genomes of living organisms in the public domain (www.ebi.ac.uk/genomes/, Fig. 1), with more than 800 on their way of being finished. These numbers underline the growing importance of comparative genomics. However, it must be stated that gene-prediction remains to be a significant challenge and it can be anticipated that our current data about location and number of genes will constantly have to be updated [15]. The genome of an organism represents an ideal coordinate system for systems biology, a precisely definable digital core of information for an organism [16]. Genes are the ‘‘genetic parts list’’ to which all other biological information can be linked. Transcripts are directly related to genes. Proteins are related to transcripts and then to genes. All the information is hierarchical in
160 140 120 100 80 60 40 20 0 1995 1996 1997 1998 1999 2000 2001 2002 2003
Fig. 1. Number of completed genomes (http://www.ebi.ac.uk/genomes).
56
Fig. 2. Regulatory gene network for endomesoderm specification: the view from the genome. The architecture of the network is based on perturbation and expression data, on data from cisregulatory analyses for several genes, and on other experiments (reproduced with permission from Hamid Bolouri and Eric Davidson, http://sugp.caltech.edu/endomes/) [17].
nature: DNA, mRNA, protein, protein interactions, informational pathways, informational networks, cells, tissues or networks of cells, an organism, populations and whole ecologies. It is therefore tempting to construct a geneindex in which every gene of organisms are listed and numbered and to use it as a central core for linking any kind of biological information to it. This concept has partially been applied to publicly accessible genome resources e.g., Ensembl (www.ensembl.org) and RefSeq (www.ncbi.nlm.nih.gov/RefSeq). Genomic sequences also provide access to regulatory sequences in genomes, which are a vital component to solving the regulatory code [17]. Also, genomic sequences open access to polymorphism studies; some of these variations are responsible for differences in physiology and disease predisposition. These components combined make-up the elements in the ‘‘periodic table of life.’’ With these components in hand, the immediate challenge is to place them in the context of their informational pathways and networks.
57 The logical extension to studying the genome is the determination of interindividual differences within the genome of people. Only a small number of common polymorphisms explain the bulk of heterozygosity [18]. Human genetic diversity appears on the level of individual polymorphisms, known as single nucleotide polymorphisms (SNPs), as well as in the specific combinations of alleles (haplotypes) as observed at closely linked sites. The goal of the International HapMap Project for example is to develop a haplotype map of the human genome, to describe the common patterns of human DNA sequence variation. The HapMap is expected to be a key resource for researchers in finding genes affecting health, disease, responses to diet and other environmental factors. SNPs, single-nucleotide polymorphisms, are small genetic variations between people that can significantly alter the function of proteins. Most importantly, the altered function may have significant effects on how the individual reacts to treatment of drugs, allergies to environmental substances and digestion of foods [19]. The latest release of dbSNP (118) at the NCBI contains an impressive amount of 5,798,183 SNPs for human of which 2,359,534 are validated. These polymorphisms now have to be investigated for their significance in altering biological function of proteins and pathways. Knowledge about SNPs is most important for treatment using drugs. Altered protein function might not carry a drug to its target cells or tissues cripple the enzymes that activate a drug or aid its removal from the body, or alter the structure of the receptor to which a drug is supposed to bind. Variation in immune-system genes can also influence how particular drugs are tolerated. Together, these subtle genetic variations mean that the dose at which a drug will work may vary hugely from person to person. The so widely utilized ‘‘one-size-fits-all’’ prescription leads to life-threatening adverse reactions and to drugs completely failing to do their job. Well-documented examples of active SNPs are available from the P450 protein family, enzymes in the liver that oxidize foreign chemicals. Three of these P450 genes that are particularly important for drug metabolism of commonly prescribed drugs, have been shown to be highly polymorphic and some have already been linked to failure in certain patients [20]. Other examples show that the efficiency of the painkiller codeine depends on a particular polymorphism [21] and that the anticoagulation drug warfarin can cause serious adverse drug reactions depending on the genotype of the patient [22]. Another example shows that the base excision repair enzyme MED1 is associated with nonpolyposis colorectal tumors, a very common form of hereditary cancer. The gene’s protein product, MED1, is an enzyme that normally helps cells repair potentially cancer-causing damage to genes. However, a defective MED1 enzyme did not only prevent repairs in normal cells and permitted a cancer to start, but in particular, the enzyme also interfered with the effectiveness of some types of chemotherapy [23]. Genomic polymorphisms will only be able to be investigated with many complete human genomes available, an achievement that can be envisaged by the
58 end of the first decade of the 21st century. It is anticipated that within about 10 years, advances in nanotechnology and other methods will allow the fast and cheap sequencing of individuals’ genomes, which in turn will lead to advances in predictive medicine. As scientists are able to look at 30,000 or more genes for each patient, doctors could use such genome sequences to predict what health problems the individual patient is likely to face. Genome shotgun sequencing and microarrays have given us the tools to identify people with SNPs [24]. Individuals can now be profiled with increasing efficiency, and used to highlight polymorphic genes that influence our response to specific drugs or foods. These developments have resulted in a completely new discipline called ‘‘Pharmacogenetics’’ – the study of the influence of genetic variation on drug responses [25]. Similarly, the science of nutrigenomics seeks to provide a molecular understanding for how common dietary chemicals (i.e., nutrition) affect health by altering the expression and/or structure of an individual’s genetic makeup. Thus, the new field of nutrigenomics opens the way for ‘‘personalized nutrition.’’ In other words, by understanding our nutritional needs, our nutritional status, and our genotype, nutrigenomics should enable individuals to manage better their health and well-being by precisely matching their diets with their unique genetic makeup. The success of these methods will largely depend on the large-scale discovery of SNPs, their validation and the discovery of diet-related genes. To achieve this task more research into nutritional sciences using systems biology will have to be initiated thus identifying nutritionally relevant genes in order to study their response to nutrients systematically. The understanding of the human genome is not completed with the genome sequence established and the polymorphisms determined. New discoveries further complicate the understanding of genomes. Inheritable changes in gene function can occur without a change in the DNA sequence. Epigenetic mechanisms such as DNA methylation, histone acetylation, and RNA interference, and their effects in gene activation or inactivation might be involved in imprinting and parental imprinting in which a gene’s activity depends on whether it is inherited from the mother or the father [26]. There is evidence to suggest that factors such as lifestyle and diet leave a trail of epigenetic footprints across our genome, which is then inherited [27]. In a striking example, Duke University researchers have demonstrated recently in mice how extra vitamin doses during pregnancy in the mother’s diet changes the color of pups [28]. This study is the first one to find a clear mechanism of the effect of maternal nutrition on disease and phenotype. The nutrients used in the study, B12, folic acid, choline and betaine, had silenced the gene that rendered mice fat and yellow, but had not altered its sequence. The gene was in fact methylated, and thus switched off, linking prenatal diet to diseases like diabetes, obesity and cancer. Thus, knowledge about the genomic make-up of individuals will be crucial in health research (Table 2).
59 Table 2. Web resources and databases for genomics.
The National Human Genome Research Institute (www.nhgri.nih.gov) Nature Genome Getaway (www.nature.com/genomics/human/) Ensembl Human Genome browser (www.ensembl.org) European Bioinformatics Institute EBI (www.ebi.ac.uk/) National Center for Biotechnology Information NCBI (www.ncbi.nlm.nih.gov/genome/guide/ human/) International HapMap Project (www.hapmap.org) The Human Epigenome Project HEP (www.epigenome.org) Database of single nucleotide polymorphisms dbSNP (www.ncbi.nlm.nih.gov/SNP)
Transcriptomics The genome describes the ultimate potential of an organism, and the transcriptome, all complementary DNA sequences, describes the utilization/ expression of that potential. Transcripts can readily be identified by expressed sequence tags (ESTs). EST sequencing efforts still represent an economic and fast way to characterize expressed genes. EST sequencing still remains an essential resource for genome exploitation and annotation. This is particularly important with the increasing availability of draft genome sequences from different organisms and the mounting emphasis on gene function and regulation [29]. Simultaneous analysis of gene-expression can be performed using the technology that allows synthesis or immobilization of known complementary DNA sequences on microscopic arrays and later hybridizing RNA obtained from living cells onto the array. Microarrays exploit the preferential binding of complementary single-stranded nucleic-acid sequences and the underlying principle is the same for all microarrays. An unknown sample is hybridized to the array of immobilized DNA molecules whose sequence is known. Each array features thousands of different DNA probe sequences arranged in a defined matrix and thus can identify thousands of genes simultaneously, which means that genetic analysis can be done on a huge scale. Transcriptome profiling, using microarrays [30,31] or serial analysis of gene expression (SAGE) [32], can measure the relative abundance of transcripts simultaneously for thousands of genes under various experimental conditions. This technology has revolutionized the way in which researchers analyze gene expression in cells and tissues. It allows researchers to determine which genes are being expressed in a given cell type at a particular time and under particular conditions. They can be used to compare the status of gene expression in two different cell types or tissue samples, for example, healthy versus diseased tissue, and to examine changes in gene expression-profile at different stages in the cell cycle or during embryonic development. Other uses of microarrays include comparative genomic hybridization studies [33], genotyping individuals for genetic differences that might be associated with disease [34], assignment of probable functions to newly discovered genes by comparison with the expression patterns of known genes,
60 to identify key players in signaling pathways and to uncover new categories of genes [35] (Fig. 3). Other areas of application engulf today the identification of new targets for therapeutic drugs, in disease diagnosis, and in toxicogenomics [36], the study of the genetic basis of an individual’s response to environmental factors such as drugs and pollutants. Transcription profiling is today applied in all major areas of biology. One of the most remarkable studies to date and a great example for a systems biology approach is the description of a geneco-expression network for global discovery of conserved genetic modules [5].
Experimental Design
Experiment
Data analysis
Data storage
RT-PCR Labeling Pooling of samples Hybridization to array Scanning
Image analysis Statistics Normalization Clustering Annotation
MAGE-ML, MIAME Relational database ArrayExpress GEO Stanford microar. DB
Protein extraction Sample fractionation Separation (2D-GE, LC) Digestion ESI/MALDI/FT-MS
Analysis of spectra Statistical evaluation Identification: Database search, Quantification Annotation
mzXML, PEDRO Relational database No repository yet! GenBank, EMBL, BIND, DIP, etc.
Analysis of spectra Database search Statistical evaluation Annotation, Clustering Network Modeling
SBML Relational database No repository yet! KEGG, E-cell, EMP etc.
Transcriptome Determination of genome-wide transcript levels via DNA array: Treated vs. non-treated Normal vs. abnormal tissue
Proteome Determination of all proteins in a cell or body fluid via (quantitative) mass spectrometry: Treated vs. non-treated Normal vs. abnormal tissue
500
Intensity, counts
400
Metabolome Determination of all metabolites in a cell or body fluid: Treated vs. non-treated Normal vs. abnormal tissue
300 200 100 0 600
1000
1400 1800 m/z , au
2200
2600
Metabolite extraction Sample fractionation Separation (LC, GC) Identification: MS, GC, NMR
y
x z
Fig. 3. Comparison of data analysis strategies for transcriptome, proteome and metabolome studies. Abbreviations: RT-PCR, reverse transcriptase polymerase chain reaction; MAGE-ML, MicroArray Gene Expression Markup Language; MIAME, Minimum Information About a Microarray Experiment; GEO, Gene Expression Omnibus; 2D-GE, 2 Dimensional Gel Electrophoresis; ESI, Electro Spray Ionization; MALDI, Matrix Assisted Laser Desorption/ Ionization; FT, Fourier Transform; MS, Mass Spectrometry; mzXML, mass spectrometry eXtensible Markup Language [37]; PEDRO, software, to support the capture, storage and dissemination of proteomics experimental data [37,38]; EMBL, European Molecular Biology Laboratory; BIND, Biomolecular Interaction Network Database; DIP, Database of Interacting Proteins; LC, Liquid Chromatography; GC, Gas Chromatography; NMR, Nuclear Magnetic Resonance spectrometry; SBML, Systems Biology Markup Language; KEGG, Kyoto Encyclopedia of Genes and Genomes; EMP, database of Enzymes and Molecular Pathways.
61 In a truly massive approach co-expressed pairs of genes were identified over 3182 DNA microarrays from humans, flies, worms, and yeast. An estimated number of 22,163 conserved co-expression relationships were identified using statistical clustering algorithms providing new evidence for the involvement of genes in core biological functions. The relative ease of producing such a large number of data is obscured by the difficulty of dealing with the results due to a lack of simple and accepted approaches to analyzing large-scale gene expression data. Visualizing and presenting such large gene expression data is not trivial [30]. Despite these difficulties, the field of gene expression analysis is helping devise strategies that also allow distinguishing between the expressions of alternatively spliced transcripts. It has been estimated that 30%–60% of all human genes encode for more than one transcript. The impact of these alternative gene-products on function and regulation has become a major focus for research and has led to the establishment of various databases harboring information on alternatively spliced transcripts [39,40]. Further investigation is required to determine the cause and effect of alternative splicing in a genome, transcriptome and proteome context. The impact of transcriptome studies in human health research has been shown for many fields in recent times. Their applications include assessing the safety of food, drugs, vaccines, medical devices and other products of consumer interest [41–46]. DNA arrays for the identification of food-borne bacterial pathogens and viruses [47] can be used to reduce the incidence of food poisoning, illness and death associated with bacterial or viral contamination of meat, seafood, dairy products and other foods. Also in clinical settings, the identification of organisms in patients admitted to hospitals with systemic bacterial infections can be envisaged. The capacity to type unambiguously all the common bacteria on a single chip within a few hours of sampling will allow high-speed testing in agricultural, manufacturing and clinical settings. It might be possible that gene-expression patterns will be used to simplify widely used diagnostic descriptions of cancers. When currently as many as 7000 disease-concepts with 42,000 names (and synonyms) are used worldwide to describe different cancers and the number of validated gene-expression profiles for cancers grows, these profiles may offer a useful way to streamline this list and standardize cancer classification on a rational basis [48]. Another possible application will be the test of efficacy and safety of pharmaceuticals, both in clinical trials and treatment. Genotyping by DNA arrays could be used to stratify patients participating in clinical trials into populations of responders and non-responders to enhance the accuracy of drugtesting results, and allowing drugs to be tailored to specific subsets of the population according to clearly identifiable markers in the patient population [30]. DNA arrays could also be used to examine the physiological effects of a specific diet, allowing the analysis of pathways and the identification of reactions in which food and its components are involved in Ref. [49].
62 Table 3. Web resources and databases for transcriptomics. Microarray Informatics at the EBI (www.ebi.ac.uk/microarray) Pat Brown’s lab at Stanford University (cmgm.stanford.edu/pbrown/) Stanford Microarray Database (genome-www5.stanford.edu/) Microarray Gene Expression Data (MGED) Society (www.mged.org/) Database of alternatively spliced proteins ASP at UCLA (www.bioinformatics.ucla.edu/ HASDB/)
This technology will be used as a valuable tool to identify mechanisms by which nutrients interact with the body and how individuals respond to food intake in a specific diet (Table 3). DNA microarrays are currently becoming useful analytical tools for disease profiling. However, there is a pressing need for other profiling technologies that go beyond measuring RNA levels, particularly for disease-related investigations. DNA microarrays have limited utility for the analysis of biological fluids and for the discovery of markers directly in the fluid. To reach that goal, there is a need to assay protein levels and activity. Numerous alterations may occur in proteins that are not reflected in changes at the RNA level, providing a compelling rationale for additional, direct analysis of gene expression at the protein level. The next challenge is to integrate RNA data with protein data [50]. Proteomics Proteomics technologies attempt the large-scale determination of gene and cellular function directly at the protein level. Mass spectrometry (MS) has increasingly become the method of choice for analysis of complex protein samples. MS-based proteomics is a discipline made possible by the availability of gene and genome sequence databases and technical and conceptual advances in instrumentation technology [51]. Proteomics has also established itself as an indispensable technology to interpret the information encoded in genomes. Protein analysis by MS so far, has been most successful when applied to small sets of proteins isolated in specific functional contexts. The systematic analysis of the much larger number of proteins expressed in a cell is now also rapidly advancing, mainly due to the development of new experimental approaches. A single bacterial cell may produce 4000 proteins whose abundances and activities may vary throughout an experiment, while the number of proteins expressed in higher eukaryotes is likely to be at least 10-fold greater. Attempts to catalogue, visualize, and analyze proteomics experiments have therefore become a major challenge (Table 4). Further to the identification of proteins, their quantification can now be addressed. However, no single method or instrument exists that is capable of identifying and quantifying the components of a complex protein sample. Two methods are popular: 2-dimensional electrophoresis (2DE) followed by MS or
63 Table 4. Steps involved in a typical proteomics experiment. Protein isolation from a biological sample (e.g., a cell extract) following some experimental treatment. Fractionation of the resulting proteins (or peptides, the products of proteome digestion) by methods such as two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) or liquid chromatography (LC). Protein or peptide detection by MS. Protein identification through manual interpretation or database correlation of mass spectra.
limited protein purification with automated peptide MS/MS. When accurate quantification is desired, stable-isotope tagging of proteins or peptides is made. While 2DE clearly has its shortcomings [52–54], the use of liquidchromatography combined with tandem MS (LC MS/MS) experiments appears to be an extremely promising technology. However, mass spectrometers are inherently poor quantitative devices. The data collected by this method is comprehensive requiring more sophisticated tools for its analysis than what is presently available. Current challenges for the analysis of MS based data are the development of tools for their high-throughput analysis [55–57]. First steps into that direction do however exist. For example, statistical models to estimate the accuracy of the peptide assignments called ‘‘PeptideProphet’’ [58]. For computing probabilities that proteins are present in a particular sample can be done by ‘‘ProteinProphet’’ [59]. In order to tackle the quantitative analysis of peptide LC-MS/MS experiments, stable-isotope tagging has been developed. Different stable-isotopes can be readily differentiated in a mass spectrometer owing to their mass difference leading to an accurate indication of the abundance ratio for the two samples. This new technique has been applied successfully in several experiments [6,60–62]. An interesting aspect for studying the inner workings of a cellular system is to study their protein machines. Most proteins exert their function by way of protein–protein interactions. Enzymes are often held in tightly controlled regions of the cell by such protein–protein interactions. Thus, protein–protein interactions provide a wealth of information on the fundamental aspects of cellular life. The first of such screens is the yeast-two-hybrid (Y2H) technology [63]. Recently, two studies of biochemical purifications combined with mass spectrometry (MS) were conducted: one uses ‘‘high-throughput MS protein complex identification’’ (HMS-PCI) [64], the other employing ‘‘Tandem affinity purification (TAP) followed by MS identification’’ [65]. Some of the most biologically informative results have come from the analysis of large protein complexes, like the analyses of the spliceosome, followed by the yeast nuclear pore complex [66,67]. The complexity of the biological system on the protein level is further rendered more difficult through protein post-translational modification. These posttranslational modifications modulate the activity of most eukaryote proteins.
64 Their analysis is now pursued using mass spectrometric peptide sequencing and analysis technologies. Furthermore, stable isotope labeling strategies in combination with mass spectrometry have been applied successfully to study the dynamics of modifications [68]. Proteomics is an essential component of systems biology research because proteins are responsible for many crucial processes in the cell. This technology became extremely valuable for the description of biological processes such as protein abundance, linkage maps to other proteins or to other types of biomolecules including DNA and lipids. Proteomics can also address for example protein expression profiling, activities, modification states, and subcellular location. Unfortunately, with the exception of quantitative protein expression profiles and protein–protein interactions none of these properties can currently be measured systematically, quantitatively and with high throughput. But rapid advances in technology suggest that these limitations may be only momentary. The few studies where the same biological system was subjected to different types of systematic measurements already offer insights into the power of the method. For instance, mRNA expression profiles and protein expression profiles seem to be largely complementary and therefore contribute to a more refined description of the system that each observation by itself is unable to provide [10]. Combining different genomic and proteomic results obtained from the same biological system will substantially increase our understanding of complex biological processes. More specifically, the systems biology studies based on diverse and high-quality proteomic data are already defining functional biological modules and reveal previously unrecognized connections between biochemical processes and modules. The new hypotheses that are generated by this approach can be tested either by traditional methods or by the targeted generation of more genomic and proteomic data [10,69–71]. A promising quantitative proteomic profiling method (MS/MS) has recently been reported for glycoproteins using isotope protein tagging as well as automated tandem mass spectrometry [72]. The method is based on the conjugation of glycoproteins to a solid support using hydrazide chemistry, stable isotope labeling of glycopeptides and the specific release of formerly N-linked glycosylated peptides via peptide-N-glycosidase F. The application of this approach to the analysis of plasma membrane proteins and human blood serum proteins promises great potential for the functional analysis of biological systems and for clinical diagnostics or prognostics. The result could be that an individual global-health profile based on protein identifications will become feasible, revolutionizing the field of disease diagnosis and health monitoring. It may be possible in the future that a small sample of blood can reveal an image of the physiological and pathological states of every tissue in the body [73]. In conclusion, the ever-advancing proteomics research represents one of the most promising technologies for the investigation of human health. Only two years ago, scientists reported a simple blood test based on proteomics a technology that successfully detects ovarian cancer even in its early stages [74].
65 Table 5. Web resources and databases for proteomics. ExPASy Proteomics tools (expasy.org/tools/) A Research Pointer to the Applied Proteomics and Proteomics Technologies (http:// proteomicssurf.com) spectroscopyNOW (http://www.spectroscopynow.com) Human Proteome Organization HUPO (http://www.hupo.org) Institute for Systems Biology (http://www.systemsbiology.org/)
Now, clinical laboratories are ready to employ the test. The emerging field of clinical proteomics will provide early diagnostic methods leading the way for potentially curing such diseases (Table 5). Metabolomics Our metabolism is an expression of a transient steady state in the dynamics of cellular biosynthesis. Proteins function either as enzymes, receptors, transporters, channels, hormones and other signaling molecules or provide structural elements for cells, organs or the skeleton. Metabolites, in contrast, serve in an extensive broad range of functions within the cell. Metabolites are usually rapidly ‘‘converted’’ in enzymatic and chemical reactions, serve as building blocks for macromolecules or may serve as transient energy-storage. Therefore, the identification, quantification and the reactions of metabolites are important in the context of systems biology. Metabolomics is considered to be the study of the entire set of metabolites in a cell, tissue or organ sample [75–78]. In many respects, metabolites are the final stage of biological cellular activity along the line from gene to mRNA to protein to function to phenotype (Fig. 4). Analytical approaches that take the chemical complexity and dynamic range of the metabolome into account employ usually an extraction of metabolites from a cell by different techniques followed by parallel analyses of those subfractions. This strategy is required to segregate the metabolome into more manageable subclasses with similar chemical properties that also helps minimizing chemical side-reactions between them. The subclasses are subjected to parallel analytical techniques to record metabolite profile information. Segregation of the subclasses while parallel analyses helps visualize a greater portion of the metabolome. In most cases the methods use classical chromatographic separation techniques that may comprise Fourier-transform infrared spectroscopy (FTIR), electrospray mass spectrometry (ESI-MS) and nuclear magnetic resonance (NMR) spectroscopy. A promising route to the metabolome is the comprehensive metabolic analysis coupled with statistical methods of cluster and phenotype analysis alike. An individual’s health status is rapidly reflected at the metabolic state. Thus, it might be possible for health-care and nutrition practitioners to make recommendations for a specific treatment or food for their condition. To reach this goal, a suitable database based on a large
66
Fig. 4. A network of metabolic pathways illustrating the complexity of metabolism as it is known today (excerpt reproduced with permission from Roche Applied Science’s Biochemical Pathways Michal: Biochemical Pathways, 1998 ß Spektrum Akademischer Verlag, Heidelberg, Berlin).
number of measurements of accurate metabolite concentrations from healthy people is required. Consequently, the development of a public metabolite atlas might be necessary. Specific quantification of metabolites has been used to characterize metabolic processes in a multitude of focused metabolic pathways studies. The developed methods have been optimized to produce high-quality data that describe the compounds of interest. Today, these data constitute of the metabolic states of individuals. However, this type of analysis is poorly suited to simultaneously gathering information on the multitude of metabolites that characterize an organism’s nutritional processes. Another technique, metabolic profiling, has been devised to monitor, in parallel, hundreds or even thousands of metabolites, using high-throughput techniques. This is done to enable screening for relative changes rather than absolute concentrations of compounds. Most analytical techniques for profiling small molecules consists of HPLC or gas chromatograph (GC) coupled to mass spectrometry. Mass spectrometers are generally more sensitive and more selective than any other types of detectors. When coupled with the appropriate sample-introduction and ionization techniques, mass spectrometers can selectively analyze both organic and inorganic compounds. Nevertheless, the metabolites have to be separated prior to detection, by chromatographic techniques that are coupled online to the mass detector. Gas chromatography is used to separate compounds on the basis of their relative vapor pressures and affinities for the material in the chromatography column, but is restricted to compounds that are volatile and heat stable. HPLC separations are better suited for the analysis of labile and high-molecular-weight compounds and for the analysis of non-volatile polar compounds in their natural form. The vast information gathered using high-throughput screening with GC-and HPLC-MS techniques require advanced informatics technologies for analysis. Yet proton Nuclear Magnetic Resonance (1H-NMR) Spectroscopy is dealing with metabolite profiling and allowing information to be gathered on the
67 flow of metabolites through biological processes and the control of the pathways. High-resolution 1H-NMR spectroscopy, with the advantage of detection of any proton-containing metabolite, appears to become more important in the future in metabolite profiling. NMR-techniques have been used in the past mainly to analyze metabolite changes in mammalian body fluids and tissues and this method may be extended by detecting other nuclei, for example 31P or naturalabundance isotopes such as 13C. When metabolomics is applied in studies where substrates enriched in 13C, metabolite analysis can even be taken onto a dynamic level by allowing the fluxes to be determined quantitatively. Such automated biochemical profiling techniques will become an important component of multi-disciplinary integrated approaches in metabolic and functional genomics studies. The previously described technologies of genomics, transcriptomics and metabolomics, have produced a complete ‘‘parts-catalog’’ of the molecular components in many organisms. The next challenge would be to reconstruct and simulate the overall cellular functions. Recently, advances have been made in the area of flux balance analysis and mathematical modeling [79]. Fundamental physicochemical laws and principles are used to systematically describe the living cell. However, serious limitations to this goal are the inability to rationally and exhaustively analyze biochemical networks and to accurately take all parameters into account, e.g., conservation of mass, energy and redox potential as well as mass transfer. An attempt to derive a global model of metabolisms of a cell is presented in the E-Cell software for cell simulation. Given a set of reaction rules and initial values, users can run simulations and observe dynamic changes in quantities and concentrations of intra- and extracellular metabolites and substances through graphical user interfaces. Activities of biochemical reactions can be monitored, as well as amounts of substances can be subject of change (increased/ decreased) by the users at any time during the simulation. e-Cell system makes it possible to conduct in silico metabolic experiments [80]. Furthermore, the availability of many annotated genomes paves the way for a systematic application of flux-balance methods to a large variety of organisms. However, such a high-throughput goal crucially depends on the capacity to build metabolic flux models in an automated fashion [81] (Table 6). Pulling it together The availability of genome sequences, expressed protein repertoires and identified metabolites for several organisms, including humans have allowed the transition from classic analytical biology to ‘‘systems biology.’’ In this new approach, biological processes of interest, mostly systems, are studied as complex networks of functionally interacting macromolecules and reactions. These functional genomics approaches can be helpful to accelerate the identification of the genes and gene products involved in particular modules,
68 Table 6. Web resources and databases for metabolomics.
Metabolomics at University of Wales Aberystwyth (http://dbk.ch.umist.ac.uk/metabol.htm) Biochemical pathways (ExPASy) (http://us.expasy.org/tools/pathways/) Biopathways consortium (http://www.biopathways.org/) BRENDA, the Comprehensive Enzyme Information System (http://www.brenda.uni-koeln.de) EcoCyc and MetaCyc (http://www.ecocyc.org/) GeneCards (http://bioinformatics.weizmann.ac.il/cards/) KEGG – Kyoto encyclopedia of genes and genomes (http://www.genome.ad.jp/kegg/) E-cell project (http://www.e-cell.org/) Main metabolic pathways on Internet (http://home.wxs.nl/ pvsanten/mmp/main.htm) Metabolic Control Analysis (MCA) (http://dbk.ch.umist.ac.uk/mca_home.htm) MPI for Molecular Plant Physiology (http://www.mpimp-golm.mpg.de/fiehn/index-e.html) PathDB Biochemical Pathways (http://www.ncgr.org/pathdb/) Compugen’s Biocarta (http://www.biocarta.com/) Interactive metabolic reconstruction on the web WIT (http://wit.mcs.anl.gov/WIT2/) EMP Database of Enzymes and Metabolic pathways (http://emp.mcs.anl.gov)
Table 7. Useful databases for protein interaction
Database of Interacting Proteins DIP [86] (dip.doe-mbi.ucla.edu/) BIND [87] (http://bind.ca) PathCalling Yeast Interaction Database [63] (portal.curagen.com/) Mammalian protein–protein interaction database (PPI) [88] (fantom21.gsc.riken.go.jp/PPI/) Molecular Interaction database MINT [89] (160.80.34.4/mint/) General Repository Interaction Datasets GRID [90] (biodata.mshri.on.ca/grid/servlet/Index)
and to describe the functional relationships between them. However, the data emerging from individual ‘‘omic’’ approaches should be viewed with caution because of the occurrence of false-negative and false-positive results [82]. One of the problems biologists face is that the data set too large to comprehend in full. Novel and useful databases are being developed in recent times reflecting progress in different aspects of genomics [83], prompting the saying that we live in ‘‘the age of databases.’’ In the new age of computational biology, it is not enough to publish scientific results in the literature, but the data has to be stored in a structured way both for retrieval and to connect to other resources on the web. Computer databases first rose to prominence in life science as central repositories for nucleic acid and protein sequences. Their interrogation via e.g., the BLAST sequence search tool [84] is now performed frequently by biologists. After the establishment of GenBank in 1982 [85], many other databases have been developed that will be important for systems biology (Table 7). Some of these databases for example contain searchable indices of known protein-protein interactions. The current limiting factor in these databases however is the quality of information. High-quality information of validated protein–protein interactions
69 is so far only available for yeast [91] and the fruit-fly [92]. Very few largescale high quality data sets for mammalian systems are available in the public domain. TRANSFAC [93] and SCPD [94] catalog interactions between proteins and DNA (i.e., transcription factor interactions), and databases of metabolic pathways have also recently been established e.g., EcoCyc [95], KEGG [96], and WIT [97]. A growing number of databases are under development for storing gene-expression data sets, as for example ArrayExpress [98], Gene Expression Omnibus [99] and the Stanford Microarray Database [100]. This recent explosion, in both the variety and volume of information of interest poses two challenges to database users and developers alike. First, the information must be maintained systematically in a format that is compatible with both single queries and global searches. Often, the desired information is present in the database but is not annotated consistently for all entries. We therefore need systems that integrate data globally [11]. Apart from computer-generated databases, high-quality databases require very often manual work of curators. This time intensive approach is well exemplified in the Human Protein Reference Database (HPRD) [101] (www.hprd.org/). Information relevant to the function of human proteins in health and disease is collected including protein–protein interactions, post-translational modifications, enzyme/ substrate relationships, disease associations, tissue expression, and sub-cellular localization. The data is collected from more than >300,000 published articles for a non-redundant set of 2750 human proteins. The HPRD database as well as others of its kind put existing information in computer-readable format. They represent bioinformatics platforms that are useful in cataloging and mining the large number of proteomic interactions and alterations that are about to be discovered with systems biology approaches. Storing existing knowledge in structured ways is the key challenge and the cornerstone for the new biology (Fig. 5).
PEX14 SEC35 VMA22 TIP20
YPR105C YLR315W YMR181C YOR164C YOR331C
Fig. 5. Visualization of protein interaction using the PathCalling resource. The TIP20 protein, a transport protein that interacts with Sec20p, required for protein transport from the endoplasmic reticulum to the golgi apparatus, shows interaction with protein YPR105C, which itself interacts with many other proteins. This information allows for a rapid evaluation of the functionality of a protein within the context of whole proteome.
70
YBR093C
YAL038W YCR012W
YOL127W YIL0697
YIL13 YER074W YDR171W
YHR174W YGR254W YOL086C
YPL075W YLR127W
YOL120C YML024W
YDR050C
YNL301C YNL216W YIL0697 YER179W
YNL199C YPR048W
YPR048W
YPR048W
Fig. 6. Visualization of a selection of the 331 genes containing network described in Ref. [11] using Cytoscape version 1.1.1. Proteins were selected from the full yeast genome based on their having significant expression change at least 1 of 20 conditions: The wild type (wt) strain and nine genetically altered yeast strains, perturbed environmentally by growth in the presence (+gal) or absence ( gal) of 2% galactose sugar. Each altered strain has a complete deletion of one of GAL genes, which encode proteins needed for the metabolism of galactose. Cytoscape is used to display all information regarding nodes (proteins) and edges (interactions). Here, nodes are represented by grey circles, and interactions/edges are represented by colored lines.
The most complex adventure we are facing now is to achieve a description of cellular biology. Current theories are able to capture and model only a small portion of the data at a time. General approaches to integrate, visualize and model information about cells that will help broaden biological understanding are necessary. To increase the reliability of gene function annotation, multiple independent datasets need to be integrated. Such integration will be crucial for systems biology to achieve its promise (Fig. 6). In order for databases to interact, and researchers to exchange information about their biological observations of a system, a common representationlanguage for storing biochemical models is required. The Systems Biology Markup Language (SBML) was created for that purpose [102]. It is a machinereadable format for the representation of computational models in systems biology. It is expressed in XML (www.w3.org/XML/), and contains structures for representing compartments, species and reactions, as well as optional unit definitions, parameters and rules. SBML will be crucial for the storage and exchange of data between databases. The rapidly expanding biological datasets of physical, genetic and functional interactions present a daunting task for data visualization and evaluation [103].
71 Completely new concepts are required in order to help scientists understand complex data. The Cytoscape software, for example, attempts to integrate biomolecular interaction networks with high-throughput expression data and other molecular states into a unified conceptual framework. It is applicable to any system of molecular components and interactions, and most powerful when used in conjunction with large databases of protein–protein, protein– DNA, and genetic interactions that are increasingly available for humans and model organisms. The tools provide functionality to layout and query interaction networks; visually integrate the network with expression profiles, phenotypes, and other molecular states and linking to databases for functional annotations. An important facet of the tool is that it is extensible through plug-ins, allowing rapid development of additional computational analyses [104]. Another approach is presented in the Osprey software [90] that represents interactions in a flexible and expandable graphical format and provides options for functional comparisons between datasets. Systems biology involves interaction between experiment and simulation, attempting to create ever more accurate models of processes, such as the functioning of an organ over a period of time. Initially, a rough working model is created and used to design experiments that will verify or refute the predictions of that model. The model is modified to incorporate results and new simulations that in turn require further experiments. In this way, both the model and experiments evolve together until a satisfactory simulation can be achieved (Fig. 7). The above-mentioned databases are only covering the cellular level. However, the final goal to capture information about individuals will require databanks TCTTGTCGCACGCAACTT TTGAGGATTTTTAAAGGG TGTCTATACCAAACGGA GAGGAGTAATGATGAGT GGTTAAGAATCCATACTT CAAGCAGAATTCGGGGC GGTTACCAAGCGAC
Biological question
DNA Cells
RNA
Vmax .[S]
In-silico experiment, simulation
[S]+ K m Models Networks
Proteins Intensity, counts
v=
New hypothesis
500 400
Experiments
300
New data
Metabolites
200 100 0 600 1000 1400 1800 2200 2600 m/z, au
Biological System Databanks
Result: New insight
Fig. 7. Systems biology iterative research. Data about the living cell will reside in structured databases that are used to test-out new hypothesis and for proof of new models describing the system. In an iterative process, going back and forth between in-vitro, in-vivo, and in-silico experiments, new insight is created.
72 with information about people, biopsies or body fluid samples, stored to be analyzed for genetic and biochemical assessment. The UK Biobank project (www.ukbiobank.ac.uk/) will pursue exactly this goal. Up to half a million participants aged between 45 and 69 years will be involved in the study. They will be asked to contribute a blood sample, lifestyle details and their medical histories to create a national database of unprecedented size. With such databases, fears and uncertainties dealing with ethical issues have surfaced and are under continuous debate. The concerns associated with single-gene disorders, such as privacy, confidentiality, potential employment, or insurance discrimination and the rights of family members, are relevant. Additional factors include the nuanced meaning of genetic risk in complex diseases that result from genetic, environmental, and lifestyle interactions. The blurred boundary between medicine and genetic enhancement and the social implications of predicting diseases among a large fraction of the population, not to mention the gulf between identifying susceptibility and providing preventive treatment are subject of discussion. There is clearly a need to foster a public debate about the customization of diets or medical treatment to match the genetic profiles of consumers in the interest of preventing or managing chronic health conditions. This discussion needs to be initiated as fast as possible. To ease those fears, the latest decision of the US senate passed a bill, which would bar employers from using genetic information in making employment decisions, and prohibit health insurers from using genetic information to deny coverage or set rates [105] (Table 8). Health: the focus of systems biology ‘‘Let food be your medicine and medicine be your food.’’ Hippocrates, the father of modern medicine, c. 400 BC Biological research using molecular information on all cellular levels is addressing human health in a completely new ways. Disease prevention through Table 8. Web resources and databases for data-integration.
Physiome Project (http://www.physiome.org/) Systems Biology Software at the Keck Graduate Institute (http://www.cds.caltech.edu/ hsauro/) Virtual Cell Project of The National Resource for Cell Analysis and Modeling (http:// www.nrcam.uchc.edu/) E-Cell Project (http://www.e-cell.org/) Microbial Cell Project (http://microbialcellproject.org/) World Wide Web Instructional Committee Virtual Cell (http://www.ndsu.nodak.edu/instruct/ mcclean/vc/) Cytoscape (www.cytoscape.org/) GoMiner (http://discover.nci.nih.gov/gominer/) Database of functional networks at EMBL (http://www.ebi.ac.uk/research/pfmp/)
73 nutritional intervention and/or intelligent medical treatment is realized to be crucial for increased human quality of life. The combination of individual assessment of health status and the resulting personalized interventions can be envisaged in this decade. It can be estimated that by the year 2010 predictive genetic test will be available for as many as a dozen common disorders [106]. Individuals who choose to learn about their susceptibility to these diseases will be able to use this information to take preventive measures. For example, a woman at increased risk for developing breast cancer may want to have more frequent mammograms. A man susceptible to coronary heart disease may take medication to lower his cholesterol. Other people may reduce their risk for disease by changing their diet, getting more exercise and avoiding environmental agents that trigger disease. Genes are being identified that influence how a person responds to a given drug. Increasingly, doctors will prescribe drugs based on the genetic profiles of their patients. This individualized treatment will allow using the drug most likely to treat disease symptoms and also to minimize adverse drug reactions. Such an approach will usher in an era of personalized medicine. The tools of systems biology, by virtue of measuring all constituents of an organism, will have large implications for disease prevention via diet or other environmental factors such as lifestyle. Understanding human health will depend of a holistic view of our body’s biology and the numerous environmental cues to which we are constantly exposed. These include pollutants, toxins, pathogens, commensals and also radiation. Our gastrointestinal system, for example, is the organ with greatest contact to our environment; it is inhabited with a large number of bacteria, termed the microbiome [107]. Exploring the human microbiome during the different states of health, using molecular techniques have partly been initiated. These studies will lead to crucial insights about the relationship of the micro cosmos in our gut and us. These surveys are important for a number of reasons [108]. As adults, our total microbial population that is residing mainly in the intestine is composed of 500 to 1000 species. Their total number is at least one order of magnitude bigger than our somatic and germ cells altogether, with their total number of genes exceeding our own genes by a factor of 100. The microbiota residing in our body functions as a multifunctional organ with multiple implications for our health. In addition to the numerous but poorly characterized beneficial effects of the endogenous microflora on human health, a proper understanding of abundance and variations therein will be critical for recognizing potential patterns that are predictive of health or disease. We have virtually no information on the levels of microbial diversity and abundance that are optimal for maintenance of human health, or of those that are associated with disease. With only few gut microorganisms sequenced [109–112] we are just starting to learn about microbial partitioning within human micro-environments. We still understand little about inter-individual variability or variability as a function of time. Gut bacteria have also been
74 Duodenum and Jejunum: 102-105 cfu ml−1 Lactobacillus Streptococcus Bifidobacterium Enterobactericeae Staphylococcus Yeast
Ileum and Caecum: 103-109 cfu ml−1 Bifidobacterium Bacteroides Lactobacillus Streptococcus Enterobactericeae Staphylococcus Clostridium Yeast
Stomach: 100-103 cfu ml−1 Lactobacillus Streptococcus Staphylococcus Enterobactericeae Yeast
Colon: 1010-1012 cfu g−1 Bacteroides Eubacterium Clostridium Peptostreptococcus Streptococcus Bifidobacterium Fusobacterium Lactobacillus Enterobactericeae Staphylococcus Yeast
Fig. 8. Composition of the human gastro-intestinal micro-biota. The overall number of microorganisms in our body is estimated to be bigger than the number of all our somatic and germ cells [108,114,115].
implicated in colon cancer development, but their role in tumor invasion, which is modulated by environmental factors, has been unclear. One report states that a metalloprotease from Listeria monocytogenes, in combination with a host protease, produces a peptide that stimulates motility and invasion of colon cancer cells [113]. The pro-invasive factor was identified as peptide derived from bovine b-casein. This peptide could be generated in vitro by the combined actions of the L. monocytogenes metalloprotease Mpl and a trypsin-like serine protease present in the collagen used in the cell invasion assay. That data shows convincingly that the combined action of diet, bacteria and host elements can produce health impairments. Thus, detailed knowledge about the somatic and germ cells which make up our corpus has to be extended with knowledge about the microbiome and its interaction with our body (Fig. 8). Intensive research has focused in the past on protection of individuals from various stresses using food ingredients such as anti-oxidants [116]. However, a recent report underlines the significance of natural products for human health apposed to purified ingredients and especially their importance for prevention of disease. Lycopene, a carotenoid found in tomato products, was long known for its anti-oxidant effects [117]. It is used frequently as a purified additive to foods. In this new study [118] tomato powder was shown to inhibit the development of prostate cancer compared with a control diet, whereas a diet containing a pure synthetic lycopene supplement did not. The authors also found that the equally measured restriction on energy intake due to experimental conditions during the experiment produced a reduction in prostate cancer mortality that was independent of the effect of tomato powder. This new
75 Soul
Food Drugs Pollu tants
Body composition
Genome
Health
Exogenous bacteria
Age Disease
Stress Lifestyle Other factors
Microflora
Physiology
Fig. 9. Interaction of the environment with the human body. Environmental factors will influence healthy state of an organism taking individual genetic factors into account. Genetic predisposition may lead to body-dysfunction later in life that is modulated by nutrition and other environmental factors.
study is important on several levels. Perhaps most important, it weighs heavily in the debate about whether cancer prevention is best achieved via whole foods versus via single compounds. Another striking aspect is that caloric restriction can readily lead to disease prevention [119]. It has to be realized from this study that the ultimate biologic activity of a given food or nutrient depends on a large number of variables, including food processing and preparation method, gastrointestinal tract physiology, interactions between compounds in the food, and interactions between foods eaten together at the same meal. Clearly, we have barely begun to scratch the surface of understanding how the nutrient compounds within natural food interact within our biologic systems. The promise of systems biology is to grasp the potential complexity of the relevant effects in humans, untangling these interactions in the laboratory (Fig. 9, Table 9).
Conclusions and outlook What has systems biology achieved for the complete description of human biology? With all the advances we have to realize that getting from a gene to a human being is not as straightforward as some had hoped. Although a starting point of a genomic approach to health research is identifying the mechanisms how components interact in a healthy state and also which genes
76 Table 9. Web resources for health genomics.
European Nutrigenomics Organisation NuGO (www.nugo.org) Nutrigenomics.UCDavis.edu (nutrigenomics.ucdavis.edu) The Centre for Human NutriGenomics (http://www.nutrigenomics.nl/) The IFR Food and Health Network (http://www.foodandhealthnetwork.com/) Nutrition, Metabolism and Genomics Group http://nutrigene.4t.com Center for Nutrigenomics TU Munich (http://www.nutriogenomics.com/) Human Genome Project Information from the DOE (link) NCBI Science primer Pharmacogenomics (http://www.ncbi.nlm.nih.gov/About/primer/ pharm.html) International Society of Pharmacogenomics (www.pharmacogenomics.org.uk/) PharmGKB (http://www.pharmgkb.org)
are associated with disease, the sheer complexity of our biology is projecting this goal far out into the future. Increasing evidence suggests that the genetic makeup may partially explain why people of different ancestry experience disease or metabolize nutrients differently. Yet, these genetic clues have to become firm enough to guide medical practice. Genomics, proteomics, and bioinformatics are just beginning to influence the practice of medicine, most notably in diagnosis of disease and development of drugs and recommendations for nutritious foods. To accelerate this influence, physicians must be better prepared. They need to understand the nature of the tests and the kinds of information from which they will make clinical inferences and assist patients in making clinical decisions, always taking cultural and ethical considerations into account [120]. Translating genomic information into successful clinical trials will require advances on several fronts. Despite extensive preclinical studies, the vast majority of clinical trials fail because the drugs do not work as anticipated in patients or lead to intolerable side effects, mainly due to the lack of basic information about physiology and the difficulty to predicting which treatments is likely to succeed. Current medical practice treats illnesses after they appear. However, with the extended human lifespan, averting one illness enables a person to live long enough to contract another. Therefore, disease prevention and how to reach a global healthy state of our body must be the new focus of research. Nutrition and life-style of individuals can play a primary role in that battle. At some point in the future, genomic information and individual susceptibility data will be part of our healthcare system in which we will try to intervene or prevent at the earliest possible time, rather than what we are doing now, that is treating after an event occurred. Crucial factors for our new health-care consciousness of the public will be genome, proteomic, metabolic and informatics technologies, moving away from the reactive ‘‘fix-it’’ medical treatment towards a proactive, prospective and preventive medicine. This new health concept will start with a personalized assessment of individual nutritional status, environmental factors and life-style factors like sport and risk to disease and finalize in an individual lifestyle and healthcare plan.
77 Therefore, we need to obtain detailed knowledge about how proteins participate in the physiological processes in our bodies. We must know what the normal healthy state of our body is and how we might intervene to prevent any inconvenient condition of health. As for predictive medicine, we will require extensive information to analyze the genetic contribution to disease and, at least for the common afflictions, we will need to know the environmental components as well. Nutrition can certainly help in dealing with the growing problem of obesity. But how can we change the eating habits of our children? Should we exercise greater control over what people eat? But do we know which diet is adapted for us? How can we judge if there is too few scientific data. Clearly, the completion of our genome sequence is not the end of our quest, neither is it the beginning; maybe, it is only the end of the beginning. References 1. Wolkenhauer O. Systems Biology: The Reincarnation of Systems Theory Applied in Biology?. New York, Columbia University Press, 2001, pp. 258–270. 2. Rosen R. Essays on life itself. New York, Columbia University Press, 1999. 3. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, StangeThomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, DoucetteStamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la BM, Dedhia N, Blocker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP,
78
4.
5. 6. 7. 8. 9. 10. 11.
Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowski J, Thierry-Mieg D, ThierryMieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ and Szustakowki J. Initial sequencing and analysis of the human genome. Nature 2001;409: 860–921. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, Abu-Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Di FV, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao C, Yan C, Yao A, Ye J, Zhan M, Zhang W, Zhang H, Zhao Q, Zheng L, Zhong F, Zhong W, Zhu S, Zhao S, Gilbert D, Baumhueter S, Spier G, Carter C, Cravchik A, Woodage T, Ali F, An H, Awe A, Baldwin D, Baden H, Barnstead M, Barrow I, Beeson K, Busam D, Carver A, Center A, Cheng ML, Curry L, Danaher S, Davenport L, Desilets R, Dietz S, Dodson K, Doup L, Ferriera S, Garg N, Gluecksmann A, Hart B, Haynes J, Haynes C, Heiner C, Hladun S, Hostin D, Houck J, Howland T, Ibegwam C, Johnson J, Kalush F, Kline L, Koduru S, Love A, Mann F, May D, McCawley S, McIntosh T, McMullen I, Moy M, Moy L, Murphy B, Nelson K, Pfannkoch C, Pratts E, Puri V, Qureshi H, Reardon M, Rodriguez R, Rogers YH, Romblad D, Ruhfel B, Scott R, Sitter C, Smallwood M, Stewart E, Strong R, Suh E, Thomas R, Tint NN, Tse S, Vech C, Wang G, Wetter J, Williams S, Williams M, Windsor S, Winn-Deen E, Wolfe K, Zaveri J, Zaveri K, Abril JF, Guigo R, Campbell MJ, Sjolander KV, Karlak B, Kejariwal A, Mi H, Lazareva B, Hatton T, Narechania A, Diemer K, Muruganujan A, Guo N, Sato S, Bafna V, Istrail S, Lippert R, Schwartz R, Walenz B, Yooseph S, Allen D, Basu A, Baxendale J, Blick L, Caminha M, Carnes-Stine J, Caulk P, Chiang YH, Coyne M, Dahlke C, Mays A, Dombroski M, Donnelly M, Ely D, Esparham S, Fosler C, Gire H, Glanowski S, Glasser K, Glodek A, Gorokhov M, Graham K, Gropman B, Harris M, Heil J, Henderson S, Hoover J, Jennings D, Jordan C, Jordan J, Kasha J, Kagan L, Kraft C, Levitsky A, Lewis M, Liu X, Lopez J, Ma D, Majoros W, McDaniel J, Murphy S, Newman M, Nguyen T, Nguyen N and Nodell M. The sequence of the human genome. Science 2001;291: 1304–1351. Stuart JM, Segal E, Koller D and Kim SK. A Gene-coexpression Network for global discovery of conserved genetic modules. Science 2003;302:249–255. Zhou H, Ranish JA, Watts JD and Aebersold R. Quantitative proteome analysis by solidphase isotope tagging and mass spectrometry. Nat Biotechnol 2002;20:512–515. Price ND, Reed JL, Papin JA, Wiback SJ and Palsson BO. Network-based analysis of metabolic regulation in the human red blood cell. J Theor Biol 2003;225:185–194. Arias E, Anderson RN, Kung HC, Murphy SL and Kochanek KD. Deaths: final data for 2001. Natl Vital Stat Rep 2003;52:1–115. Kitano H. Computational Systems Biology. Nature 2002;420:206–210. Ideker T, Galitski T and Hood L. A new approach to decoding life: systems biology. Annu Rev Genomics Hum Genet 2001;2:343–372. Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, Bumgarner R, Goodlett DR, Aebersold R and Hood L. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 2001;292:929–934.
79 12. Poincare´, H. Science and hypothesis; with a preface by J. Larmor. 13. Mungall AJ, Palmer SA, Sims SK, Edwards CA, Ashurst JL, Wilming L, Jones MC, Horton R, Hunt SE, Scott CE, Gilbert JG, Clamp ME, Bethel G, Milne S, Ainscough R, Almeida JP, Ambrose KD and Andrews. The DNA sequence and analysis of human chromosome 6. Nature 2003;425:805–811. 14. Stein LD, Bao Z, Blasiar D, Blumenthal T, Brent MR, Chen N, Chinwalla A, Clarke L, Clee C, Coghlan A, Coulson A, D’Eustachio P, Fitch DH, Fulton LA, Fulton RE, GriffithsJones S, Harris TW, Hillier LW, Kamath R, Kuwabara PE, Mardis ER, Marra MA, Miner TL, Minx P, Mullikin JC, Plumb RW, Rogers J, Schein JE, Sohrmann M, Spieth J, Stajich JE, Wei C, Willey D, Wilson RK, Durbin R and Waterston RH. The genome sequence of caenorhabditis briggsae: A platform for comparative genomics. PLoS Biol 2003; 1:E45. 15. Pennisi E. Bioinformatics, Gene counters struggle to get the right answer. Science 2003;301: 1040–1041. 16. Hood L and Galas D. The digital code of DNA. Nature 2003;421:444–448. 17. Davidson EH, Rast JP, Oliveri P, Ransick A, Calestani C, Yuh CH, Minokawa T, Amore G, Hinman V, Arenas-Mena C, Otim O, Brown CT, Livi CB, Lee PY, Revilla R, Rust AG, Pan Z, Schilstra MJ, Clarke PJ, Arnone MI, Rowen L, Cameron RA, McClay DR, Hood L and Bolouri H. A genomic regulatory network for development. Science 2002;295:1669–1678. 18. Lander ES. The new genomics: global views of biology. Science 1996;274:536–539. 19. Jasny BR and Roberts L. Are we there yet? Science 2003;302:587. 20. Pirmohamed M and Park BK. Cytochrome P450 enzyme polymorphisms and adverse drug reactions. Toxicology 2003;192:23–32. 21. Staddon S, Arranz MJ, Mancama D, Mata I and Kerwin RW. Clinical applications of pharmacogenetics in psychiatry. Psychopharmacology (Berl) 2002;162:18–23. 22. Higashi MK, Veenstra DL, Kondo LM, Wittkowsky AK, Srinouanprachanh SL, Farin FM and Rettie AE. Association between CYP2C9 genetic variants and anticoagulation-related outcomes during warfarin therapy. JAMA 2002;287:1690–1698. 23. Cortellino S, Turner D, Masciullo V, Schepis F, Albino D, Daniel R, Skalka AM, Meropol NJ, Alberti C, Larue L and Bellacosa A. The base excision repair enzyme MED1 mediates DNA damage response to antitumor drugs and is associated with mismatch repair system integrity. Proc Natl Acad Sci USA 2003. 24. Melton L. Pharmacogenetics and genotyping: on the trail of SNPs. Nature 2003;422:917. 25. Johnson JA. Pharmacogenetics: potential for individualized drug therapy through genetics. Trends Genet 2003;19:660–666. 26. Dennis C. Epigenetics and disease: Altered states. Nature 2003;421:686–688. 27. Wolffe AP and Matzke MA. Epigenetics: regulation through repression. Science 1999;286: 481–486. 28. Waterland RA and Jirtle RL. Transposable elements: targets for early nutritional effects on epigenetic gene regulation. Mol Cell Biol 2003;23:5293–5300. 29. Clark MS, Edwards YJ, Peterson D, Clifton SW, Thompson AJ, Sasaki M, Suzuki Y, Kikuchi K, Watabe S, Kawakami K, Sugano S, Elgar G and Johnson SL. Fugu ESTs: New resources for transcription analysis and genome annotation. Genome Res 2003;13:2747–2753. 30. Stears RL, Martinsky T and Schena M. Trends in microarray analysis. Nat Med 2003;9: 140–145. 31. Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H and Brown EL. Expression monitoring by hybridization to highdensity oligonucleotide arrays. Nat Biotechnol 1996;14:1675–1680. 32. Velculescu VE, Zhang L, Vogelstein B and Kinzler KW. Serial analysis of gene expression. Science 1995;270:484–487. 33. Hackett CS, Hodgson JG, Law ME, Fridlyand J, Osoegawa K, de Jong PJ, Nowak NJ, Pinkel D, Albertson DG, Jain A, Jenkins R, Gray JW and Weiss WA. Genome-wide array
80
34. 35.
36. 37.
38.
39. 40.
41. 42.
43. 44.
45. 46. 47. 48. 49.
50.
CGH analysis of murine neuroblastoma reveals distinct genomic aberrations which parallel those in human tumors. Cancer Res 2003;63:5266–5273. Howbrook DN, van der VaA, O’Shaughnessy MC, Sarker DK, Baker SC and Lloyd AW. Developments in microarray technologies. Drug Discov Today 2003;8:642–651. Yamada K, Lim J, Dale JM, Chen H, Shinn P, Palm CJ, Southwick AM, Wu HC, Kim C, Nguyen M, Pham P, Cheuk R, Karlin-Newmann G, Liu SX, Lam B, Sakano H, Wu T, Yu G, Miranda M, Quach HL, Tripp M, Chang CH, Lee JM, Toriumi M, Chan MM, Tang CC, Onodera CS, Deng JM, Akiyama K, Ansari Y, Arakawa T, Banh J, Banno F, Bowser L, Brooks S, Carninci P, Chao Q, Choy N, Enju A, Goldsmith AD, Gurjal M, Hansen NF, Hayashizaki Y, Johnson-Hopson C, Hsuan VW, Iida K, Karnes M, Khan S, Koesema E, Ishida J, Jiang PX, Jones T, Kawai J, Kamiya A, Meyers C, Nakajima M, Narusaka M, Seki M, Sakurai T, Satou M, Tamse R, Vaysberg M, Wallender EK, Wong C, Yamamura Y, Yuan S, Shinozaki K, Davis RW, Theologis A and Ecker JR. Empirical analysis of transcriptional activity in the arabidopsis genome. Science 2003;302:842–846. Neumann NF and Galvez F. DNA microarrays and toxicogenomics: applications for ecotoxicology? Biotechnol Adv 2002;20:391–419. Pedrioli PGA, Eng J, Hubley R, Pratt B, Nilsson E and Aebersold R. A standard open representation of mass spectrometry data and its application in a proteomics research environment, 2004, Ref Type: Unpublished Work. Taylor CF, Paton NW, Garwood KL, Kirby PD, Stead DA, Yin Z, Deutsch EW, Selway L, Walker J, Riba-Garcia I, Mohammed S, Deery MJ, Howard JA, Dunkley T, Aebersold R, Kell DB, Lilley KS and Roepstorff. A systematic approach to modeling, capturing and disseminating proteomics experimental data. Nat Biotechnol 2003;21: 247–254. Lee C, Atanelov L, Modrek B and Xing Y. ASAP: the alternative splicing annotation project. Nucleic Acids Res 2003;31:101–105. Huang HD, Horng JT, Lee CC and Liu BJ. ProSplicer: a database of putative alternative splicing information derived from protein, mRNA and expressed sequence tag sequence data. Genome Biol 2003;4. Kipps TJ. Advances in classification and therapy of indolent B-cell malignancies. Semin Oncol 2002;29:98–104. al Khaldi SF, Martin SA, Rasooly A and Evans JD. DNA microarray technology used for studying foodborne pathogens and microbial habitats: minireview. J AOAC Int 2002;85: 906–910. Soini H and Musser JM. Molecular diagnosis of mycobacteria. Clin Chem 2001;47: 809–814. Paweletz CP, Charboneau L, Bichsel VE, Simone NL, Chen T, Gillespie JW, EmmertBuck MR, Roth MJ, Petricoin EF, III and Liotta LA. Reverse phase protein microarrays which capture disease progression show activation of pro-survival pathways at the cancer invasion front. Oncogene 2001;20:1981–1989. Beaucage SL. Strategies in the preparation of DNA oligonucleotide arrays for diagnostic applications. Curr Med Chem 2001;8:1213–1244. Chizhikov V, Rasooly A, Chumakov K and Levy DD. Microarray analysis of microbial virulence factors. Appl Environ Microbiol 2001;67:3258–3263. Gene chip for viral discovery. PLoS Biol 2003;1:139–140. Covitz PA. Class struggle: expression profiling and categorizing cancer. Pharmacogenomics J 2003;3:257–260. Berger A, Mutch DM, Bruce GJ and Roberts MA. Unraveling lipid metabolism with microarrays: effects of arachidonate and docosahexaenoate acid on murine hepatic and hippocampal gene expression. Lipids Health Dis 2002;1:2. Hanash S and Creighton C. Making sense of microarray data to classify cancer. Pharmacogenomics J 2003.
81 51. Aebersold R and Mann M. Mass spectrometry-based proteomics. Nature 2003;422: 198–207. 52. Rabilloud T. Two-dimensional gel electrophoresis in proteomics: old, old fashioned, but it still climbs up the mountains. Proteomics 2002;2:3–10. 53. Unlu M, Morgan ME and Minden JS. Difference gel electrophoresis: a single gel method for detecting changes in protein extracts. Electrophoresis 1997;18:2071–2077. 54. Gauss C, Kalkum M, Lowe M, Lehrach H and Klose J. Analysis of the mouse proteome. (I) Brain proteins: separation by two-dimensional electrophoresis and identification by mass spectrometry and genetic variation. Electrophoresis 1999;20:575–600. 55. Link AJ, Eng J, Schieltz DM, Carmack E, Mize GJ, Morris DR, Garvik BM and Yates JR. Direct analysis of protein complexes using mass spectrometry. Nat Biotechnol 1999;17: 676–682. 56. Han DK, Eng J, Zhou H and Aebersold R. Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry. Nat Biotechnol 2001;19:946–951. 57. Washburn MP, Wolters D and Yates JR. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotechnol 2001;19:242–247. 58. Keller A, Nesvizhskii AI, Kolker E and Aebersold R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 2002;74:5383–5392. 59. Nesvizhskii AI, Keller A, Kolker E and Aebersold R. A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem 2003;75:4646–4658. 60. Conrads TP, Issaq HJ and Hoang VM. Current strategies for quantitative proteomics. Adv Protein Chem 2003;65:133–159. 61. Mirgorodskaya OA, Kozmin YP, Titov MI, Korner R, Sonksen CP and Roepstorff P. Quantitation of peptides and proteins by matrix-assisted laser desorption/ionization mass spectrometry using (18)O-labeled internal standards. Rapid Commun Mass Spectrom 2000;14: 1226–1232. 62. Yao X, Freas A, Ramirez J, Demirev PA and Fenselau C. Proteolytic 18O labeling for comparative proteomics: model studies with two serotypes of adenovirus. Anal Chem 2001;73: 2836–2842. 63. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, Johnston M and I. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 2000;403:623–627. 64. Ho YP and Hsu PH. Investigating the effects of protein patterns on microorganism identification by high-performance liquid chromatography-mass spectrometry and protein database searches. J Chromatogr A 2002;976:103–111. 65. Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM, Remor M, Hofert C, Schelder M, Brajenovic M, Ruffner H, Merino A, Klein K, Hudak M, Dickson D and Rudi. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002;415:141–147. 66. Rout MP and Aitchison JD. The nuclear pore complex as a transport machine. J Biol Chem 2001;276:16593–16596. 67. Neubauer G, Gottschalk A, Fabrizio P, Seraphin B, Luhrmann R and Mann M. Identification of the proteins of the yeast U1 small nuclear ribonucleoprotein complex by mass spectrometry. Proc Natl Acad Sci USA 2001;94:385–390. 68. Mann M and Jensen ON. Proteomic analysis of post-translational modifications. Nat Biotechnol 2003;21:255–261. 69. Griffin TJ, Gygi SP, Ideker T, Rist B, Eng J, Hood L and Aebersold R. Complementary profiling of gene expression at the transcriptome and proteome levels in Saccharomyces cerevisiae. Mol Cell Proteomics 2002;1:323–333.
82 70. Betts JC, Lukey PT, Robb LC, McAdam RA and Duncan K. Evaluation of a nutrient starvation model of Mycobacterium tuberculosis persistence by gene and protein expression profiling. Mol Microbiol 2002;43:717–731. 71. Guina T, Purvine SO, Yi EC, Eng J, Goodlett DR, Aebersold R and Miller SI. Quantitative proteomic analysis indicates increased synthesis of a quinolone by Pseudomonas aeruginosa isolates from cystic fibrosis airways. Proc Natl Acad Sci USA 2003;100:2771–2776. 72. Zhang H, Li XJ, Martin DB and Aebersold R. Identification and Quantification of N-linked Glycoproteins Using Hydrazide Chemistry, Stable Isotope Labeling and Mass Spectrometry. Berlin, New York, Springer-Verlag, 2003, pp. 660–666. 73. Liotta LA, Ferrari M and Petricoin E. Clinical proteomics: written in blood. Nature 2003; 425:905. 74. Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, Mills GB, Simone C, Fishman DA, Kohn EC and Liotta LA. Use of proteomic patterns in serum to identify ovarian cancer. Lancet 2002;359:572–577. 75. Nicholson JK and Wilson ID. Opinion: understanding ‘global’ systems biology: metabonomics and the continuum of metabolism. Nat Rev Drug Discov 2003;2:668–676. 76. Watkins SM and German JB. Toward the implementation of metabolomic assessments of human health and nutrition. Curr Opin Biotechnol 2002;13:512–516. 77. German JB, Roberts MA, Fay L and Watkins SM. Metabolomics and individual metabolic assessment: the next great challenge for nutrition. J Nutr 2002;132:2486–2487. 78. German JB, Roberts MA and Watkins SM. Personal metabolomics as a next generation nutritional assessment. J Nutr 2003;133:4260–4266. 79. Kauffman KJ, Prakash P and Edwards JS. Advances in flux balance analysis. Curr Opin Biotechnol 2003;14:491–496. 80. Takahashi K, Ishikawa N, Sadamoto Y, Sasamoto H, Ohta S, Shiozawa A, Miyoshi F, Naito Y, Nakayama Y and Tomita M. E-Cell 2: Multi-platform E-Cell simulation system. Bioinformatics 2003;19:1727–1729. 81. Segre D, Zucker J, Katz J, Lin X, D’Haeseleer P, Rindone WP, Kharchenko P, Nguyen DH, Wright MA and Church GM. From annotated genomes to metabolic flux models and kinetic parameter fitting. OMICS 2003;7:301–316. 82. Ge H, Walhout AJ and Vidal M. Integrating ‘omic’ information: a bridge between genomics and systems biology. Trends Genet 2003;19:551–560. 83. Desiere F, German B, Watzke H, Pfeifer A and Saguy S. Bioinformatics and data knowledge: The new frontiers for nutrition and foods. Trends Food Sci Technol 2002;12:215–229. 84. Altschul SF, Gish W, Miller W, Myers EW and Lipman DJ. Basic local alignment search tool. J Mol Biol 1990;215:403–410. 85. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J and Wheeler DL. GenBank. Nucleic Acids Res 2003;31:23–27. 86. Xenarios I, Salwinski L, Duan XJ, Higney P, Kim SM and Eisenberg D. DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res 2002;30:303–305. 87. Bader GD, Betel D and Hogue CW. BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res 2003;31:248–250. 88. Suzuki H, Saito R, Kanamori M, Kai C, Schonbach C, Nagashima T, Hosaka J and Hayashizaki Y. The mammalian protein-protein interaction database and its viewing system that is linked to the main FANTOM2 viewer. Genome Res 2003;13:1534–1541. 89. Zanzoni A, Montecchi-Palazzi L, Quondam M, Ausiello G, Helmer-Citterich M and Cesareni G. MINT: a Molecular INTeraction database. FEBS Lett 2002;513:135–140. 90. Breitkreutz BJ, Stark C and Tyers M. Osprey: a network visualization system. Genome Biol 2003;4. 91. Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, Millar A, Taylor P, Bennett K, Boutilier K, Yang L, Wolting C, Donaldson I, Schandorff S, Shewnarane J, Vo M, Taggart J,
83
92.
93.
94. 95. 96. 97.
98.
99. 100.
101.
102.
103.
Goudreault M, Muskat B, Alfarano C, Dewar D, Lin Z, Michalickova K, Willems AR, Sassi H, Nielsen PA, Rasmussen KJ, Andersen JR, Johansen LE, Hansen LH, Jespersen H, Podtelejnikov A, Nielsen E, Crawford J, Poulsen V, Sorensen BD, Matthiesen J, Hendrickson RC, Gleeson F, Pawson T, Moran MF, Durocher D, Mann M, Hogue CW, Figeys D and Tyers M. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2002;415:180–183. Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao YL, Ooi CE, Godwin B, Vitols E, Vijayadamodar G, Pochart P, Machineni H, Welsh M, Kong Y, Zerhusen B, Malcolm R, Varrone Z, Collis A, Minto M, Burgess S, McDaniel L, Stimpson E, Spriggs F, Williams J, Neurath K, Ioime N, Agee M, Voss E, Furtak K, Renzulli R, Aanensen N, Carrolla S, Bickelhaupt E, Lazovatsky Y, DaSilva A, Zhong J, Stanyon CA, Finley RL, Jr., White KP, Braverman M, Jarvie T, Gold S, Leach M, Knight J, Shimkets RA, McKenna MP, Chant J and Rothberg JM. A protein interaction map of Drosophila melanogaster. Science 2003;302:1727–1736. Matys V, Fricke E, Geffers R, Gossling E, Haubrock M, Hehl R, Hornischer K, Karas D, Kel AE, Kel-Margoulis OV, Kloos DU, Land S, Lewicki-Potapov B, Michael H, Munch R, Reuter I, Rotert S, Saxel H and Scheer M. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res 2003;31:374–378. Zhu J and Zhang MQ. SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics 1999;15:607–611. Karp PD, Riley M, Saier M, Paulsen IT, Collado-Vides J, Paley SM, Pellegrini-Toole A, Bonavides C and Gama-Castro S. The EcoCyc database. Nucleic Acids Res 2002;30:56–58. Kanehisa M, Goto S, Kawashima S and Nakaya A. The KEGG databases at GenomeNet. Nucleic Acids Res 2002;30:42–46. Overbeek R, Larsen N, Pusch GD, D’Souza M, Selkov E, Kyrpides N, Fonstein M, Maltsev N and Selkov E. WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction. Nucleic Acids Res 2000;28:123–125. Brazma A, Parkinson H, Sarkans U, Shojatalab M, Vilo J, Abeygunawardena N, Holloway E, Kapushesky M, Kemmeren P, Lara GG, Oezcimen A, Rocca-Serra P and Sansone SA. ArrayExpress–a public repository for microarray gene expression data at the EBI. Nucleic Acids Res 2003;31:68–71. Edgar R, Domrachev M and Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 2002;30:207–210. Sherlock G, Hernandez-Boussard T, Kasarskis A, Binkley G, Matese JC, Dwight SS, Kaloper M, Weng S, Jin H, Ball CA, Eisen MB, Spellman PT, Brown PO, Botstein D and Cherry JM. The stanford microarray database. Nucleic Acids Res 2001;29:152–155. Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, Surendranath V, Niranjan V, Muthusamy B, Gandhi TK, Gronborg M, Ibarrola N, Deshpande N, Shanker K, Shivashankar HN, Rashmi BP, Ramya MA, Zhao Z, Chandrika KN, Padma N, Harsha HC, Yatish AJ, Kavitha MP, Menezes M, Choudhury DR, Suresh S, Ghosh N, Saravana R, Chandran S, Krishna S, Joy M, Anand SK, Madavan V, Joseph A, Wong GW, Schiemann WP, Constantinescu SN, Huang L, Khosravi-Far R, Steen H, Tewari M, Ghaffari S, Blobe GC, Dang CV, Garcia JG, Pevsner J, Jensen ON, Roepstorff P, Deshpande KS, Chinnaiyan AM, Hamosh A, Chakravarti A and Pandey A. Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res 2003;13:2363–2371. Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A, Cuellar AA, Dronov S, Gilles ED, Ginkel M, Gor V, Goryanin II, Hedley WJ and Hodgman TC. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 2003;19:524–531. Vidal M. A biological atlas of functional maps. Cell 2001;104:333–339.
84 104.
105. 106. 107. 108. 109.
110.
111.
112.
113.
114. 115. 116. 117.
118. 119.
120.
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B and Ideker T. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res 2003;13:2498–2504. Collins FS and Watson JD. Genetic discrimination: time to act. Science 2003;302:745. Collins FS, Green ED, Guttmacher AE, Guyer MS and US National Human Genome Research Institute. A vision for the future of genomics research. Nature 2003;422:835–847. Lederberg J. Getting in tune with the enemy – Microbes. The Scientist 2003;17:20. Xu J and Gordon JI. Inaugural article: Honor thy symbionts. Proc Natl Acad Sci USA 2003; 100:10452–10459. Xu J, Bjursell MK, Himrod J, Deng S, Carmichael LK, Chiang HC, Hooper LV and Gordon JI. A genomic view of the human-Bacteroides thetaiotaomicron symbiosis. Science 2003;299:2074–2076. Schell MA, Karmirantzou M, Snel B, Vilanova D, Berger B, Pessi G, Zwahlen MC, Desiere F, Bork P, Delley M, Pridmore RD and Arigoni F. The genome sequence of Bifidobacterium longum reflects its adaptation to the human gastrointestinal tract. Proc Natl Acad Sci USA 2002;99:14422–14427. Paulsen IT, Banerjei L, Myers GS, Nelson KE, Seshadri R, Read TD, Fouts DE, Eisen JA, Gill SR, Heidelberg JF, Tettelin H, Dodson RJ, Umayam L, Brinkac L, Beanan M, Daugherty S, DeBoy RT, Durkin S, Kolonay J, Madupu R, Nelson W, Vamathevan J, Tran B, Upton J, Hansen T, Shetty J, Khouri H, Utterback T, Radune D, Ketchum KA, Dougherty BA and Fraser CM. Role of mobile DNA in the evolution of vancomycinresistant Enterococcus faecalis. Science 2003;299:2071–2074. Pridmore RD, Berger B, Desiere F, Vilanova D, Barretto C, Pittet AC, Zwahlen MC, Rouvet M, Altermann E and Barrangou R. Proc Natl Acad Sci U.S.A. 2004;101: 2512–2517. Oliveira MJ, Van Damme J, Lauwaet T, De C, V, De Bruyne G, Verschraegen G, Vaneechoutte M, Goethals M, Ahmadian MR, Muller O, Vandekerckhove J, Mareel M and Leroy A. beta-casein-derived peptides, produced by bacteria, stimulate cancer cell invasion and motility. EMBO J 2003;22:6161–6173. Savage DC. Gastrointestinal microflora in mammalian nutrition. Annu Rev Nutr 1986; 6:155–78.:155–178. Berg RD. The indigenous gastrointestinal microflora. Trends Microbiol 1996;4:430–435. Urso ML and Clarkson PM. Oxidative stress, exercise and antioxidant supplementation. Toxicology 2003;189:41–54. Goodman GE, Schaffer S, Omenn GS, Chen C and King I. The association between lung and prostate cancer risk and serum micronutrients: results and lessons learned from beta-carotene and retinol efficacy trial. Cancer Epidemiol Biomarkers Prev 2003;12:518–526. Gann PH and Khachik F. Tomatoes or lycopene versus prostate cancer: Is evolution antireductionist? JNCI Cancer Spectrum 2003;95:1563–1565. Hursting SD, Lavigne JA, Berrigan D, Perkins SN and Barrett JC. Calorie restriction, aging and cancer prevention: mechanisms of action and applicability to humans. Annu Rev Med 2003;54:131–52:131–152. Omenn GS. Genetic advances will influence the practice of medicine: examples from cancer research and care of cancer patients. Genet Med 2002;4:15S–20S.