Metagenomics and Complementary Approaches for ...

6 downloads 27 Views 413KB Size Report
In 2006, Robert Edwards and colleagues published the first sequences .... amplification, cloning and screening, was used by Urich et al., to rapidly and.
2014, Microbial Biodiversity in Sustainable Agriculture Editor: Dr. Ram Chandra Published by: DAYA PUBLISHING HOUSE, NEW DELHI

Pages 243–262

Chapter 11

Metagenomics and Complementary Approaches for the Study of Microbial Communities Dinesh Chandra1*, Suresh Chand Meena2, A. Chattopadhyay3 and Vinod Saharan1 1

Department of Molecular Biology and Biotechnology, 2 Department of Plant Pathology, MPUAT, Udaipur, Rajasthan 3 Division of Plant Pathology, Indian Agricultural Research Institute, New Delhi

Metagenomics Metagenomic analyses of natural biological communities are revolutionizing our understanding of the diversity, function and inter-relationships among organisms in diverse ecological niches. The term ‘metagenomics’ was first coined by Handelsman in 1998 to describe the emerging field that analyses genetic materials recovered directly from environmental samples. Various other terms have also surfaced in the last decade, describing essentially the same lines of scientific investigations. These terms included Environmental Genomics, Ecological Genomics, and Community Genomics. However, in the last few years, the term metagenomics has emerged as dominant and catchy –––––––––– * Corresponding Author: E-mail: [email protected]

244

Microbial Biodiversity in Sustainable Agriculture

phrase. The broad-sense metagenomics now encompasses any investigation involving the application of modern genomics techniques to the study of biological communities directly in their natural environments, bypassing the need for the isolation, laboratory cultivation and observation of individual organisms. At present, metagenomic studies have predominantly focused on the genetic materials of microorganisms in the environment. However, the general methods described here are applicable to other group of organisms, including extinct and fossilized animals (Poinar et al., 2006). Unlikely traditional microbiology and microbial genome sequencing that rely upon pure clonal culture, the metagenomic analyses of microbial communities enable studies of organisms that are not easily cultured in laboratory. In addition, it also allows the direct studies of organisms in their natural environments. The ability to obtain such information is essential for comprehensive and realistic understanding of natural biological communities. As has been recently found, >99 per cent of microorganisms in natural environments can’t be cultured in the lab. Even if they can grow in laboratory, the artificial environments that we provide are likely to be very different from the natural ecological niches. As a result, gene expression, protein and metabolite profile will likely be different those observed in the lab and those in nature for some organism. Therefore, approaches to investigate microbial populations in their native habitats will be essential for critical and realistic understanding of microbes in nature. Earlier metagenomics research focused on the analyses of single gene (e.g. small subunits of RNA gene) or random gene fragments. In recent years, metagenomics research has developed into analysing other types of molecules in environmental samples, including RNA (transcripts), proteins and metabolites. In addition, different approaches (e.g. single cell vs. community genome) have been developed to analyse different components of the microbial community and the metagenome, depending on the objectives of the research.

Analysis of Metagenomes To study genetic materials from natural environmental samples, the first step is to extract the DNA molecules from the sample. Any given environmental sample will likely contain a variety of organisms: bacteria, archaea, viruses, protozoa, algae, fungi, plants and animals. In addition, different types of cells of the same organisms may also be present. The different types of cells will have different cell structures that can influence the efficiency of DNA extraction (Rondon et al., 2000). Indeed, different extraction protocols have shown to influence the types of results would get in subsequent metagenomic analysis. To ensure what’s being extracted is representative of the DNA in original environmental samples; a combination of extraction protocols may be needed. The following approaches have been used to study the extracted metagenomics DNA: The Single Gene Approach The analysis of DNA samples directly from the environment started long before the term metagenomics came into use. Early studies examined the distribution and

Metagenomics and Complementary Approaches for the Study of Microbial Communities

245

diversity of specific gene fragments in a given environmental sample. For several reasons, the small subunit of the ribosomal RNA (SSU rRNA) gene has been the marker of choice for this type of study. First, this gene is universally distributed across all cellular organisms. Such a universal distribution permits the potential identification, comparison and assessment of all cellular organisms in any given environment. Second, this gene is functionally homologous across all groups of cellular life forms. Such functional homogeneity ensures the comparison performed is meaningful. Third, because of its functional constraint, its secondary and tertiary structures are highly conserved, making its primary sequences relatively stable over evolutionary time scales. Its relatively evolutionary stability is essential for analysing high level (e.g. at the domain, kingdom, phylum, order, class, family or genus levels) taxonomic relationships among organisms and sequences. Fourth, this gene has shown to be evolving at a relatively constant rate over time, thus allowing the inferences of relative divergence across broad group of taxa. The SSU rRNA gene commonly known as the 16S rRNA gene in prokaryotes and 18S rRNA gene in eukaryotes, owing to their precipitation properties during centrifugation. To study the types and relative abundances of unique SSU rRNA genes in a given environmental sample, after DNA extraction, the most common approach is to use universal (or group specific) PCR primers to amplify the specific gene from the target organisms. The amplified products would generally contain a mixture of SSU rRNA sequences from diverse groups of organisms in the original sample. To obtain the DNA sequences for individual organisms in the traditional approach is to clone individual PCR products into a vector containing selectable marker genes, transform the ligated gene products into suitable host cells (typically E. Coli cells), and plated on selective media. Each individual colony grown from a single transformed cell should contain only one of the DNA fragments from the mixed PCR products. Individual colonies are then picked, the specific cloned fragment amplified by common primers located on the cloning vector, and then sequenced (Pace, 1991). The more recent, bead-based pyrosequensing technique allows direct sequencing of PCR products, abrogating the cloning step (Edward et al., 2006). Once the sequence is obtained, they can be compared to standard database (e.g. the Ribosomal Database Project or GenBank) for their identification to various taxonomic levels, from phylum, to family, genus and potentially species levels. The sequences can also be used for the calculations of biological diversities. The diversity measures that can be potentially used to compare natural biological communities include nucleotide diversity, gene diversity, haplotype diversity, genotype diversity, species diversity, phylogenetic diversity, evolutionary diversity and functional diversity (Xu, 2006). Norman Pace and colleagues conducted the early molecular work in this field and established the utility of SSU rRNA as an evolutionary chronometer and identified the tree domains of cellular life (i.e. Bacteria, Archaea and Eukaryota) using this gene (Pace et al., 1991). Sequences at the SSU rRNA were subsequently shown capable of separating organisms into lower level taxonomic ranks. However, to ensure that the obtained SSU rRNA sequences were indeed from the specific environments, considerable efforts were and still are implemented to exclude the possibilities of contamination and PCR false positives. This is true for sequences with both cultivable

246

Microbial Biodiversity in Sustainable Agriculture

representative and non cultivable ones. One of the most common PCR artefacts is recombination during PCR that generates chimera sequences. Chimeras can be identified through slide-window comparisons of DNA sequences. Specifically, in this analysis, if one part of a PCR-amplified DNA sequence (e.g. A) is identical to another sequence (e.g. B) but another part of sequence A was identical to a third sequence distinctly different from sequence B (e.g. C) in the same sample, sequence A is likely a recombinant molecule of sequence B and C, possibly derived during PCR amplification. In typical metagenomic analysis of environmental DNA samples through PCR, sequence A would be discarded. The existence of large databases of SSU rRNA (e.g. over 80,000 entries at RDP and GenBank) for Bacteria and to a less extent Archaea has greatly accelerated our identification of potential PCR artefacts. Although this methodology was limited to exploring highly conserved genes, its application supports early morphology-based observations that microbial diversity was far more complex than was known by culturing methods. Many subsequent studies followed, targeting different groups of organisms in different environmental niches. Such work revealed that the vast majority of microbial diversity had been missed by cultivation-based methods. Indeed, less than 1 per cent of the Bacteria and Archaea species in a given environmental sample can be cultured. The non-cultivable groups include some major phyla within all microbial groups (i.e. bacteria, archaea, fungi, viruses, algae and protozoa) (Xu, 2006). Targeted Partial Metagenome Sequencing Given the relatively high cost of sequencing whole metagenomes, early efforts to study functional gene distributions in the environments were based on the analyses of small genomic libraries containing large fragments of environmental DNA. Some of the clones in these libraries can yield both phylogenetic and functional information of certain genomes in specific natural environments. Specifically, the genes close to the SSU rRNA gene cluster or other marker genes can be cloned, identified and sequenced together, often providing some very informative results at a very reasonable cost. To achieve this, relatively high molecular weight DNA need to be extracted and used to construct clone libraries using Cosmid vectors or bacterial artificial chromosomes (BAC) so that such large fragments can be cloned and analysed individually (Beja et al., 2000a). Through Southern blotting, phylogenetic probes such as the SSU rRNA gene can be used to identify specific clones that contain the signature gene fragment, allowing further phylogenetic inferences. These clones can be sequenced at very reasonable cost, through either shotgun sequencing or pyrosequencing. The SSU rRNA gene as well as any other genes on the cloned fragments can be identified and annotated. Together, the information allows the inferences of the organisms and the metabolic processes in the community as well as their potential ecological functions. In 1995, Healy et al., Reported the metagenomic isolation of functional genes from ‘zoolibraries’ constructed from a complex culture of environmental organisms grown in the laboratory on dried grasses. Subsequently, in a classical metagenomic study of genome fragments from BAC library of marine picoplankton, Beja et al., 2000b identified a new class of genes of the rhodopsin family, named proteorhodopsin,

Metagenomics and Complementary Approaches for the Study of Microbial Communities

247

from an uncultivated alpha-proteobacterium SAR86. At that time, this rhodopsin family was known to exist only in extremely halophilic (salt-loving) archaea and had never before been observed in cultured bacteria. Unlike the archaea rhodopsin that does not express properly in model laboratory strains, the SAR86 expressed readily in the laboratory model bacterium E. coli and it functioned as a light driven proton pump. Later studies identified that this new type of light-driven energy generation process is in fact widespread in the ocean and that there are optimized absorption spectra of bacterial rhodopsins at different depth of ocean water (Beja et al., 2001). Whole Metagenome Sequencing The above approaches allow specific gene- based investigations of environmental samples. Recent technological advances are allowing the whole or partial metagenomes to be analysed relatively efficiently, though the costs and analyses are still prohibitive for most research labs. In this type of analyses, the DNA samples are either first cloned into a vector or then subjected to sequencing, or being sequenced directly. Because the DNA fragments are randomly sheared, overlapping fragments should exist in the sample. If sufficient numbers of DNA fragments were sequenced using either the shotgun approach through cloning and sequencing or direct pyrosequencing, it would be possible to reconstructed large DNA fragments from the randomly sequenced small pieces. In certain cases, it may even be possible to reconstruct a complete genome (Venter et al., 2004). Because the collection of DNA from an environment is largely uncontrolled, the most abundant organisms in an environmental sample should be most highly represented in the resulting sequence data. To achieve the high coverage needed to fully resolve genomes of underrepresented community members, large samples, often prohibitively so, are needed. On the other hand, the random nature of shotgun sequencing ensures that many of these organisms will be represented by at least some small sequence segments. Owing to the limitations of microbial isolation and observation methods, the vast majority of these organisms would go unnoticed using traditional culturing techniques or microscopy tools. Whole genome reconstruction from environmental samples is made possible largely by advances in bioinformatics associated with genome sequencing and annotation projects of known model organisms. However, the likelihood of obtaining complete or partially complete whole genome sequences will depends on the relative abundance of the specific genome in the original samples as well as the number of fragments being sequenced from metagenome. The more abundant a specific genome is in the original sample and the more fragments sequenced, the more likely a whole genome of the organism from the metagenome can be reconstructed. The reconstructed genomes allow assessments of both phylogenetic placements of the genomes as well as the structure and potential function of the dominant organisms in the specific environments. As has been shown in several studies, such extensive analyses often yields surprisingly informative results about broader microbial groups and their likely function across diverse ecological niches. Because of the random nature of this approach, it is also possible to obtain a high coverage of SSU rRNA diversity in the original sample and these SSU rRNA sequences can be used to infer the phylogenetic

248

Microbial Biodiversity in Sustainable Agriculture

diversity of microorganisms in the specific environments. In the past few years, a new sequencing platform, a chip-based pyrosequencing has been developed and used to directly analyses environmental DNA. This DNA sequencing technique generates shorter fragments than conventional Sanger sequencing platforms. However, this limitation is compensated for by the very large number of sequence reads that can be generated, usually in the range of millions of reads, in a single sequencing run. In addition, this technique does not require PCR or cloning the DNA before sequencing, circumventing two of the main biases in metagenomics. In 2006, Robert Edwards and colleagues published the first sequences of environmental samples generated with pyrosequencing. While much progress in metagenomic studies is due to the development of high throughput sequencing technologies, the development of bioinformatics software has also played a major role and is poised to play a significantly greater role. The initial metagenome assembly and analyses used software developed for genome sequencing projects of cultured or known individual organisms. In 2007, Huson et al., developed and published the first stand-alone metagenome analysis tool, MEGAN. This tool was originally developed to analyse the metagenome of a mammoth sample but was later adapted for shotgun reads of metagenome sequences. In 2008, Meyer et al., released the metagenomics RAST server (MG-RAST), a community resource for metagenome data analysis. Other software platforms and databases for analysing metagenomic data include the IMG/M and CAMERA. Single Cell Isolation, Whole Genome Amplification and Sequencing While most current studies of uncultivable environmental microorganisms rely on the extraction and sequencing of the whole community genomes, a recent paper reported successful isolation, amplification and sequencing of the partial genome of a single uncultured archaeon cell from the environment. In this study, Kvist et al. (2007) first mapped the diversity of Archaea in a soil sample by generating a clone library using group specific primers in combination with terminal restriction fragment length polymorphism profile. Intact cell were than extracted from the same environmental sample and probed with Cy3-labelled DNA fragment designed from the clone library using fluorescent in situ hybridization. The single cells with a bright fluorescent signal were isolated using a micromanipulator. The genome of the single isolated cells was then amplified through multiple displacement amplification (MDA) using the Phi29 DNA polymerase. The 16S rRNA gene sequence of the single cell showed a >99 per cent identity to a soil crenarchaeotal clone SCA1170. The shotgun sequencing of generated MDA product found its closest match to a crenarchaeotal BAC clone previously retrieved from a soil sample. In addition, this system was successfully validated using Methanothermobacter thermoautotrophicus as single-cell test organism. Such an ability to isolate and sequence a microbial cell from the environment will bypass many of the obstacles facing metagenomics research and will significantly accelerate our unbiased understanding of natural microbial communities.

Complementary Approach for the Study of Microbial

Metagenomics and Complementary Approaches for the Study of Microbial Communities

249

Communities Metagenomics holds an undisputed advantage in terms of accessing and examining complex and difficult to study natural microbial communities. However, the metagenomics approach that studies the entire DNA content of a community is still limited in its scope and capacity to derive ecologically meaningful information regarding the complex interactions that derive and shape communities. Difficulties inherent to this strategy, from problems associated with extraction of genomic material to loss of relevant information regarding the microorganisms and the ecosystem, necessarily limit the information that can be obtained from a particular study. Problems related to limited recovery of DNA have been addressed recently by amplification of the isolated material using multiple strand displacement (MDA), another major drawback of metagenomics is that gene discovery is carried out at the expense of genomic data in the absence of information regarding the organisms themselves. In metagenomics a point comes where scientific questions focused on understanding the interaction among microorganisms and their roles in the environment can start to be addressed. This will require coupling genotypic and phenotypic analyses through the implementation of novel, powerful and innovative tools and the concerted integration of other ‘omic’ approaches such as proteomics and transcriptomics. The formidable plasticity displayed by microorganisms is related to their metabolic versatility, the interaction of complex regulatory networks and their capacity to trigger differential responses that become evident in the expressed metabolic potential. Focusing on the global analysis of all genes and expression profiles can therefore reveal information beyond what can be gathered from studies of individual genes, contributing substantially to our understanding of the physiology and the strategies involved in microbial adaptation to changing environmental conditions (Schweder et al., 2008). The major challenge in the future will be to integrate experimental approaches and formulate questions aimed at deriving relevant ecological information, questions that can only be addressed in the context of intact communities where population requirements and interactions are at work (Turnbaugh and Gordon, 2008).

Metatranscriptomics Metatranscriptomics is the high-throughput detection and analysis, in sequence diversity and associated functions, of the transcripts (RNA molecules) extracted from samples where more than one microbial genome type is present. It is essentially a transcriptomic study in samples containing multiple cell types, species or operational taxonomic units (OTUs). The word ‘metatranscriptomic’ is derived by analogy with ‘metagenomic’. In the strict sense of the definition, Metatranscriptomics could include all the work involving direct extraction and detection of RNA sequences from environmental samples, i.e. those involving reverse transcription, target amplification, sequencing and analyses of 16S rRNA gene transcripts (Felske et al., 1996a; Nogales et al., 2001b; Small et al., 2001; Weinbauer et al., 2002). However, if one considers metagenomics mostly as a sequence-based approach (excluding function-based screenings), Metatranscriptomics could be restricted to analyses that have a broader scope and encompass total mRNA and/or rRNA transcripts in a sample. This

250

Microbial Biodiversity in Sustainable Agriculture

approach is made possible by massive sequencing efforts, and ideally does not involve cloning procedures or targeted PCR amplifications. However, the widespread use of 16S rRNA gene amplifications to characterize microbial communities could be considered as a special case since this gene is still extremely useful for exploring diversity and complexity in microbial communities (Tringe and Hugenholtz, 2008). Metatranscriptomics complements the metagenomic approach by focusing on the expressed subset of genes (metatranscriptome), thus reducing the complexity of the data to be analysed. This allow, for example, detection of sequences associated with a particular environmental condition that may not be so readily identified in metagenomic studies and increases the chance of detecting ecologically relevant active functions. The discovery of functions being induced in a sample as a response to a certain environmental condition (exerted pressure) also gives insight into processes of adaption and enriches our understanding of communities previously captured through metagenomic sequence surveys. Thus, this approach gives a composite view of the transcriptionally active subset of the genomes present in a community under the environmental condition sampled. The inherent instability of RNA molecules has been one of the most limiting factors for the development of metatranscriptomics. Transcriptional studies had already revealed the complexity of working with RNA, an unstable molecule of rapid turnover and short cellular half-life (seconds to minutes) when compared to the informative and more stable molecules of DNA. The lability of RNA molecules can also contrast with the proteome, which can have variable protein half-lives that are dependent on the specific protein’s biochemical nature and localization. The transient nature of a given RNA population will therefore influence the expression profiles observed, providing at best a snapshot of what are probably highly dynamic patterns of expression (Velculescu et al., 1995). Another factor limiting the capacity for deep sequence-based transcriptomic analyses of metagenomes is the low quantities of transcripts inherently present or recovered from environmental samples. This is due to the substantially lower biomass content found in these samples when compared with a pure bacterial culture (Amann et al., 1995). In addition, components that contaminate samples and are co-extracted with the nucleic acids (Griffiths et al., 2000), such as humic acids in soils, can interfere with additional steps in sample processing like quantification, enzymatic amplification, modification or hybridization (Alm et al., 2000; Roh et al., 2006). These problems, despite being shared with metagenomics, are particularly critical for the metatranscriptomic studies. However, improvements in sample recovery and purification over the last years have opened the way for global analyses that involve detection and identification of transcripts from environmental samples. The recently developed high-throughput sequencing technologies have obvious advantages in terms of exploring the metatranscriptome. Pyrosequencing, which is based on the detection of the released pyrophosphate, represents a turning point because it dispenses with cloning and provides a fast and economical alternative for obtaining large scale sequence information. The basic steps involved in the pyrosequencing based metatranscriptomic approach are: isolation of environmental RNA (eRNA), generation of complementary ecDNAs by random-primed reverse

Metagenomics and Complementary Approaches for the Study of Microbial Communities

251

transcription that are then treated to produce double-stranded DNA fragments of the environmental cDNAs (ds ecDNA). These ds ecDNAs are then ligated to adaptors, emulsified, and subjected to the 454-sequencing process (Leininger et al., 2006). These DNAs contain information of the expressed ribosomal genes (rRNA, taxonomicalcommunity structure information) and protein-coding genes (mRNA- metabolic functions) within a microbial community and thus provide relevant input for more detailed downstream analyses (protein-based analyses or microarray design) at an unprecedented depth of coverage. This approach, which avoids the well-known biases associated with culturing, primer-probe specificity and sensitivity, PCR amplification, cloning and screening, was used by Urich et al., to rapidly and simultaneously characterize both the structure and in situ function of a soil microbial community (Urich et al., 2008). The simultaneous analysis of both actively transcribed rRNA and mRNA sequences obtained by pyrosequencing was thus useful for taxonomic profiling of the community and assessing actively transcribed genes and functional information. In some cases it is desirable to focus on protein-coding genes and exclude the ribosomal content from the analysis. This focuses the work on predictions regarding functionally or networking of the possible metabolic pathway present. It also increases coverage and can revel more diversity associated with a specific function. In microbial transcriptomics and Metatranscriptomics, the exclusion of rRNA molecules is presently done by two methods. One method involves capturing and removing the ribosomal content by using probes to target highly conserved regions on the ribosomal subunits, followed by a selective hybridization and removal of the rRNA. Another alternative takes advantage of a difference between mRNA and rRNA, which allows a processive 5’-3’ exonuclease to digest rRNA having a 5’ monophosphate. Metatranscriptomics studies that use mRNA decrease the complexity in a meaningful and useful way, offering the advantage of recovering sequences for putative proteins that otherwise can be overlooked or underrepresented in metagenomic surveys. Future Perspective in Metatranscriptomics Nowdays, metatranscriptomics studies consist of deep sequence surveys of the expresses genes from overwhelmingly complex metagenomes (Raes and Bork, 2008; Urich et al., 2008). Although a powerful approach to understanding functionality, this strategy is still a relatively isolated and transient picture of what can be an amazingly diverse and largely unknown community. However, metatranscriptomics offers several advantages over the large scale sequence based metagenomic approach that seeks broad sequence coverage. By centring the analysis on the functions detected, this approach reduces the sequence complexity and provides a more meaningful alternative to the study of heterogeneous communities. One of the advantages of working with libraries generated from expressed transcriptional units is the increased chance of finding protein coding, functional sequences and assigning possible roles to these proteins within a metabolic context (Dunlap et al., 2006). Thus metatranscriptomics can facilitate understanding the variations within an ecosystem and the possible correlations between environmental variables and function (Gianoulis et al., 2009). It can also be used to target specific functions of environmental

252

Microbial Biodiversity in Sustainable Agriculture

importance (Gilbert et al., 2009; Shrestha et al., 2008) and has the potential of identifying genes that could go undetected in larger metagenomic sequencing datasets. The construction and analysis of cDNA libraries from diverse environments has revealed several unique sequences and the potential to uncover a high degree of novelty within microbial communities (McGrath et al., 2008). From a more pragmatic point of view, metatranscriptomics can be useful for describing the network of activities taking place in an ecosystem in order to obtain, for example, a specific metabolite. Several improvements and developments and developments are still required in order to more fully exploit this approach. One important aspect for future studies in metatranscriptomics is to define the rates of environmental RNA turnover (Kuechenmeister et al., 2009). This will allow us to fine tune and correct metatranscriptomic observations, and to assess possible correlations with microbial diversity, composition and functions, as well as with the environmental conditions present. An efficient coupling of metatranscriptomics with other techniques used in environmental microbiology will also become more prevalent. These will include other ‘omic’ approaches, high-throughput sequencing and microarrays, where metatranscriptomics can provide a more efficient way of feeding microarray probe design to match an ecosystem’s particular genomic and transcriptional content (Parro et al., 2007; Small et al., 2001; Urich et al., 2008). Metatranscriptomics will also be used in conjunction with complementing strategies, such as stable isotope probing on nucleic acids, a technique that detects the DNA or RNA of the bacterial species metabolizing the substrate (Lueders et al., 2004). What will probably be very important, however, will to be increase the number of studies that follow the same community across temporal variations in order to have a more accurate notion of the expression dynamics involved. The development of additional data mining tools to better interpret and integrate metatranscriptomics with data derives from complementing strategies should allow us to relate the environmental factors with community performance. Also, it should improve our capacity to detect and predict adaptation and evolution of microbial communities affected by natural or artificial pressures.

Metaproteomics Metaproteomics has emerged over the last years as a powerful strategy that can contribute significantly towards our understanding of ecosystem functioning in microbial ecology (Wilmes et al., 2008). It is evident that this ecological information cannot be obtained from the study of the genes alone and that genomics is limited in terms of elucidating critical aspects of microbial interactions (Graves and Haystead, 2002). In fact, an important difference with respect to genomic studies is that proteomics can reflect the dynamics of a system and capture changes driven by shifts in environmental conditions (Hagenstein and Sewald, 2006). The fact that proteins, not genes, are directly responsible for the phenotypes of cell makes proteomics an excellent tool for approaching functionally and revealing changes in protein synthesis and folding that result from rapid physiological responses (Lacerda et al., 2007). These protein expression profiles reflect specific microbial activities in a given ecosystem and can be more informative than either identification of functional genes present or even of their corresponding messenger RNAs (Benndorf et al., 2007; Wilmes

Metagenomics and Complementary Approaches for the Study of Microbial Communities

253

and Bond, 2006). Proteomics is also useful because it can identify functional genes of importance within a community and can verify metabolic processes inferred from metagenomic data. In addition, the generation of de novo peptide sequences confers specificity in the identification of proteins and phylogenetic origin of proteins (Wilmes and Bond, 2006). While the rapid progress in technologies for both protein separation and identification, such as chromatography and mass spectrometry, has triggered exciting developments in the field, metaproteomics will surely gain more momentum with the advent and incorporation of additional tools and strategies for exploring microbial communities. The term proteomics, which was first used in 1995, can be defined as the largescale study of the complete proteome, or the complete protein complement, expressed by a genome under different conditions (Graves and Haystead, 2002). This term is used to represent the array of proteins that are expressed in a biological compartment (cell, tissue) at a particular time under a particular set of conditions (BeranovaGiorgianni, 2003). Because proteins are key structural and functional molecules, molecular characterization of proteomes is important for a complete understanding of biological systems. Therefore, proteomic studies, which involve different disciplines such as molecular biology, biochemistry and bioinformatics, can provide a more integrated view of a biological system by detecting modifications of its entire protein fraction. Although proteomics has been used extensively to study microorganisms in pure culture, information derived from these protein profiles may not necessarily reflect processes occurring in complex microbial communities found in natural settings (Wilmes and Bond, 2006). Moreover, the focus of research on microbial ecology goes beyond the individual species to study whole assemblages and ecosystems. In this respect, the metaproteomic approach goes further than single microorganisms to encompass the spectrum of proteins present in a microbial community, giving a glimpse of its functional potential. Information generates using this strategy therefore complements environmental genomic database and contributes to our understanding of natural ecosystems. A metaproteomic analysis includes several technically challenging steps, beginning with the extraction of microbial proteins form the surrounding matrix and ending with their identification (Maron et al., 2007b). The protein fraction in any ecosystem involves secreted and cellular proteins, some of which can be attached to the cell wall or embedded in membranes (integral proteins). The choice of the protein extraction technique is crucial owing to the complexity of native microbial communities, the heterogeneity of natural environments, and the presence of interfering compounds that can affect the efficiency of extraction (Ogunseitan, 2006). Since the extraction technique can influence recovery, it is often useful to define this step on the basis of the protein fraction being targeted and on the subsequent method of protein analysis (Hecker, 2003). There are many protocols for this purpose, including differential centrifugation, resolving soluble proteins in separate gels, and employing reagents with stronger solubilisation power for pellets enriched with membrane proteins (Molloy et al., 2000). The most commonly used technique in proteomics to separate and resolve

254

Microbial Biodiversity in Sustainable Agriculture

complex protein mixtures is polyacrylamide gel electrophoresis (PAGE) either in one (1-DE) or two dimensions (2-DE). 2-DE first uses isoelectric focusing (IEF) in immobilized pH gradients followed by separation based on molecular weight using sodium dodecyl sulphate polyacrylamide gel electrophoresis (SDS-PAGE) in the second dimension. Despite being widely used for separation of proteins, 2DE is time consuming and labour intensive and is limited in its capacity to resolve all the proteins in complex samples or environments (Graves and Haystead, 2002). In addition, PAGE separation can lead to an under-representation of very large, or very small, proteins as well as of integral membrane proteins, and fail to detect low abundance proteins, and may fail to detect low abundance proteins. To bypass the limitations of protein electrophoresis, alternative ways of separating proteins have been developed, one of which involves high performance liquid chromatography (HPCL) (Graves and Haystead, 2002). Once proteins have been separated, spots resolved in 2D gels are digested with a protease, usually trypsin, and subjected to analysis using mass spectrometry for protein identification (Domon and Aebersold, 2006). The peptides must be ionized for mass spectrometry and this is achived usually by either matrix-assisted laser desorption/ionization (MALDI) or electrospray ionization (ESI) technique. New ionization methods include desorption electrospray ionization (DESI) and the recently developed surface- assisted laser desorption/ionization (SALDI) method that uses a non-volatile inorganic matrix of germanium on a silicon surface (Seino et al., 2007). Ionization is followed by mass analysis in a mass spectrometer using different analysers such as the commonly used quadrupole mass analysers, time –of-flight (TOF) instruments, ion trap mass analysers that trap molecular ions in a 3-D electric field, and tandem mass spectrometry (MS/MS), which can be used to acquire sequence information. There are several different mass analysers and the choice of equipment will be defined by several criteria. Triple-quadrupole mass spectrometers, for example, are most commonly used to obtain amino acid sequences while quadrupole-TOF (qTOF) is used for amino acid sequencing and determination of modifications. MALDITOF is usually used for peptide mass fingerprinting, MALDI-QpTOF allows both peptide mass fingerprinting and amino acid sequencing, and FT-ICR (Fourier transform ion cyclotron resonance) is useful because it can achieve higher resolution and accuracy (Graham et al., 2007; Graves and Haystead, 2002). Detection has been improved thanks to developments such as MS/MS and TOF/TOF instrumentation with optimized laser quality or direct analysis in real time (DART) (Lasaosa, 2008). The information generated by MS regarding peptide mass or sequence is then compared against published nucleotide or protein databases in order to predict and identify proteins (Wilmes and Bond, 2006). This identification therefore depends on the information available and relies heavily on bioinformatics tools for comparison and identification of hologues in databases. The growing number of reports on the characterization of microbial ecosystems in recent years is indicative of the great potential behind the metaproteomic approach. In-depth analyses of metaproteomic expression profiles are fundamental to our understanding of microbial interactions and of the role played by certain microorganisms in global nutrient cycles (Schweder et al., 2008). The first studies on

Metagenomics and Complementary Approaches for the Study of Microbial Communities

255

Metaproteomics were carried out in microbial habitats with limited microbial diversity, but now days the range of habitats studied has increased to include complex microbial communities. To date metaproteomic analyses have been conducted on microbial communities found in soils, activated sludge, wastewaters, acid mine drainage biofilms, marine ecosystems and even the human gastrointenstinal tract (Kan et al., 2005; Klaassens et al., 2007; Schulze et al., 2005; Sowell et al., 2009; Tyson et al., 2004; Wilmes et al., 2008). In pioneering study aimed at identifying proteins in dissolved organic matter (DOM) from complex environments such as lake waters, water extracted from soils and soils particles, Schulze et al., showed that, despite the limitations of the approach at the time, specific taxonomis groups could be identified and proteomic composition varied depending on the ecosystem, and that the strategy could be useful for assessing the functionality of an ecosystem (Schulze et al., 2005). More recently, protein fingerprinting has been used to study natural communities and evaluate the correlation between communities structure and ecosystem function. In one study, protein fingerprints generated by standard SDS-PAGE and ribosomal DNA fingerprints were used to analyse indigenous microbial communities in freshwater samples. Results showed that variations in the genetic and functional structure were complex and varied depending on the perturbations imposed on the community (Maron et al., 2007a). More recent work using the same strategy to analyse bacterial communities inoculated into sterile soils differing in their physicochemical properties showed a correlation between the functional structure of the community, as assessed by protein fingerprinting, and the physicochemical characteristics of the soil (Maron et al., 2008). Challenges and Future Perspectives It can be generally argued that the analysis of proteins through Metaproteomics provides extremely useful functional information regarding microbial communities, more so than metagenomics or even Metatranscriptomics (Stenuit et al., 2008). Despite its evident appeal and the great methodological and technical advances in terms of extracting and analysing proteins directly from environmental samples, the approach is still hampered by several limitations. Some of the inherent limitations of the approach include low protein extraction yields, difficulty in identifying peptides through database searches owing to reduced coverage of known protein sequences, and ambiguity in interpreting data in the absence of any corresponding metagenomic information. As a consequence of the diversity of protein function and structure there is no single universal extraction method available. This will require both adjustments to established procedures and improvements in efficiency of protein extraction, especially from highly contaminated samples. Other major challenges involve protein separation and identification techniques (Maron et al., 2007b) and bioinformatic capacity for analysis and management of the large volumes of data generated (Nesatyy and Suter, 2007; Wilke et al., 2003). Thus improvements in sample preparation, MS techniques and data capture and analysis will have to be paralleled by advances in bioinformatics tools designed for both organizing and processing proteomics and metaproteomics data (Yang and Zhang, 2008). Another major problem with metaproteomic studies is that assignment of peptide masses determined by MS relies

256

Microbial Biodiversity in Sustainable Agriculture

on known peptide sequences in databases. Despite the increasing amount of available microbial peptide sequences, most of the proteins derived from environmental microorganisms still lack reference sequences in databases (Schweder et al., 2008). Thus the limited number of organisms represented in the protein and gene sequence databases constrains the efficient application of cutting-edge high- throughput proteomics to environmental samples (Nesatyy and Suter, 2007). In addition, the high genetic variation in natural populations, as well environmental changes that affect the organisms’ responses could hamper the interpretation of protein expression levels from environmental samples. Another critical aspect in the approach is the reproducibility of the results. The difficulty associated with efforts at reducing the sources of variability has been made evident by the discrepancy in results obtained in different laboratories involved in the analysis of the same protein mixture (Tao, 2008). One additional and also very important challenge in the field will always be that of testing and validating the functional information obtained. In spite of the many limitations, metaproteomics still provides a powerful tool to study the functional diversity of environmental microbial communities. With the capacity to sample the total protein pool of a given natural population, the metaproteomics strategy provides a unique opportunity to obtain functional information regarding natural communities and link this information to population structure. The identification of peptide sequences, based on information of sequenced microorganisms and metagenomes, will improve in the years to come, offering more precise identification of specific enzyme and putative functions and helping our understanding of the adaptations and response to changing conditions. It can be anticipated that environmental proteomics will prove extremely useful in several fronts. For example, the identification of conserved proteins could serve as markers for specific habitats. Proteins that change upon environmental perturbation could be used as indicators of stress on natural populations and ecosystems (Maron et al., 2007b). In addition to identification of protein biomarkers, metaproteomics can also be very useful in the field of ecotoxicology by detecting minor changes in the proteome or metaproteome and quantifying the effects of stressors on natural populations, communities, and ecosystems (Nesatyy and Suter, 2007). Environmental proteomics can also lead to the identification of known or novel biochemical functions involved in complex biogeochemical process and can help to address the role played by the succession of populations within an ecosystem. As techniques and databases become more robust the likelihood will increase as assigning phylogenetic affiliation and possible catalytic function to proteins from complex environments (Rodriguez-Valera, 2004). Finally, metaproteomics can complement other meta-approaches in addressing fundamental questions in microbial ecology such as the relationship between community structure and function and how these communities contribute to ecosystem dynamics and stability.

Metabolomics New Expanding Area of Microbial Research Metabolomics, which has been defined as the study of global metabolite profiles in a biological system under a given set of conditions, is one of the recent technologies introduced in the systems biology approach (Goodacre et al., 2004). This rapidly expanding area of scientific research faces many technological challenges in its aim

Metagenomics and Complementary Approaches for the Study of Microbial Communities

257

to encompass one of the outermost levels of the information flux that displays greater complexity than do the genome, the transcriptome or the proteome. While genomics and proteomics study macromolecular buildings blocks (DNA and proteins, respectively), Metabolomics deals with structurally and physicochemically diverse small-molecule metabolites (typically

Suggest Documents