Metagenomics - Wiley Online Library

6 downloads 0 Views 187KB Size Report
Feb 16, 2009 - 3 Department of Biotechnology, SGGS College Sector-26, Chandigarh, India. Microorganisms constitute two third of the Earth's biological ...
Biotechnology Journal

DOI 10.1002/biot.200800201

Biotechnol. J. 2009, 4, 480–494

Review

Metagenomics: Concept, methodology, ecological inference and recent advances Jagtar Singh1, Arvind Behal2, Neha Singla1, Amit Joshi3, Niti Birbian1, Sukhdeep Singh1, Vandana Bali2 and Navneet Batra2 1 Department

of Biotechnology, Panjab University, Chandigarh, India of Biotechnology, GGDSD College Sector-32, Chandigarh, India 3 Department of Biotechnology, SGGS College Sector-26, Chandigarh, India 2 Department

Microorganisms constitute two third of the Earth’s biological diversity. As many as 99% of the microorganisms present in certain environments cannot be cultured by standard techniques. Culture-independent methods are required to understand the genetic diversity, population structure and ecological roles of the majority of organisms. Metagenomics is the genomic analysis of microorganisms by direct extraction and cloning of DNA from their natural environment. Protocols have been developed to capture unexplored microbial diversity to overcome the existing barriers in estimation of diversity. New screening methods have been designed to select specific functional genes within metagenomic libraries to detect novel biocatalysts as well as bioactive molecules applicable to mankind. To study the complete gene or operon clusters, various vectors including cosmid, fosmid or bacterial artificial chromosomes are being developed. Bioinformatics tools and databases have added much to the study of microbial diversity. This review describes the various methodologies and tools developed to understand the biology of uncultured microbes including bacteria, archaea and viruses through metagenomic analysis.

Received 12 September 2008 Revised 11 February 2009 Accepted 16 February 2009

Keywords: Genomic library · Metagenomics · Metaproteomics · Microbial community · rRNA

1 Introduction The first life forms, small microorganisms, have been found in fossils that are about 3.5 billion years old. At present, the total number of prokaryotic cells on earth has been estimated at 4 × 1030–6 × 1030 [1], comprising 106 and 108 separate genospecies (distinct taxonomic groups based on gene sequence analysis) [2]. This diversity presents an enormous but largely unexplored genetic and bio-

Correspondence: Dr. Navneet Batra, Department of Biotechnology, GGDSD College Sector 32, Chandigarh 160 030, India E-mail: [email protected] Fax: +91-172-2613656 Abbreviations: BAC, bacterial artificial chromosome; DD, differential display

480

© 2009 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

logical pool and can be exploited for the recovery of novel genes, entire metabolic pathways and their products [3]. These microorganisms can be accessed primarily by a classical approach, involving culturing the microorganism by preparing a solid or liquid growth medium containing appropriate carbon, energy and electron acceptor sources depending on the physiological conditions under which the organism is to be isolated. However, general routine conditions provided in the laboratory tend to impose selective pressure, thereby preventing the growth of large number of microorganisms [4], but studies have shown that only 1–15% of microbial genomes are cultivable under laboratory conditions and more than 85% have never studied [2]. Further, simple morphological and physiological traits of most microbes provide few identification clues [5]. This problem can be rectified by the use of phylogenetically directed isolation strate-

Biotechnol. J. 2009, 4, 480–494

gies. Microbial ecologists, with the help of large-insert cloning vectors, such as cosmids, fosmids or Bacterial Artificial Chromosomes (BACs) [6–9], are able to give invaluable information on the uncultivable microbes by extracting high-molecularweight DNA from samples and preparing metagenomic libraries. Clones can be sequenced by using shotgun or chromosome-walking methods and comparatively analyzed. Sequencing of ribosomal RNAs (rRNA) and the genes encoding them initiated a new era of microbial ecology to describe uncultured bacteria in the environment. As rRNA genes provide evolutionary chronometers [6, 10], Pace et al. [5] described the microbial diversity in Yellowstone’s most extreme thermal environment. Stahl et al. [11] and Lane et al. [12] used direct analysis of 16S rRNA gene sequences to describe the diversity of microorganisms in an environmental sample without culturing. 16S rRNA work alone has observed the presence of more than 13 000 new prokaryotes. Similarly, 31 unique 16S rRNA sequences were detected in the Octopus Spring mat [13]. Culture-independent surveys have shown that there are at least 40 well-resolved major bacterial divisions, suggesting that there are about 30 major bacterial divisions with no, or only very few, cultured representatives collection so far [14]. As the 23S rRNA gene offers additional diagnostic sequence stretches due to its greater length, characteristic insertions and deletions, this may be a better molecular marker for phylogenetic resolution [15]. The early studies were technically challenging, relying on direct sequencing of RNA or sequencing of reverse transcription-generated DNA copies. The next technical breakthrough arrived with the development of PCR technology and the design of primers that can be used to amplify almost the entire gene. Thus, genomic analysis of a population of microorganisms, metagenomics, has emerged as a powerful tool to gain access to the physiology and genetics of uncultured organisms. The word ‘metagenomics’ was coined [16] to capture the notion of analysis of a collection of similar but not identical items, as in a meta-analysis, which is an analysis of analyses [17]. The concept was used by Schmidt et al. [18] for the construction of a λ phage library from a seawater sample, which was screened for 16S rRNA genes [13]. This was followed by direct isolation of functional genes from mixed liquor of thermophilic microorganisms enriched on dried grasses in the laboratory [19]. The DeLong group obtained a 40-kb clone from sea water that contained a 16S rRNA gene affiliated to archaea, which had never been cultured, providing a landmark study [20].

© 2009 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biotechnology-journal.com

2 2.1

Methodologies for exploring the unculturable world Isolation of DNA from environmental samples

Several protocols for extraction of DNA from soil and aquatic sources have been developed for constructing metagenomic libraries [21–26] aiming at high recovery, efficiency and suitability for molecular analysis [27]. Various physical methods like freeze-thawing, beadmill homogenization, ultrasonication have been used for cell lysis, thereby allowing all groups of microorganisms to be lysed in equal proportions. Mechanical bead beating has been shown to recover more diversity compared with chemical treatment [28]; however, these physical treatments can fragment the DNA to sizes of 5–10 kb or even less [29–31], and increase the risk of chimera formation from small template DNA during subsequent PCR [32]. Chemical methods include the use of SDS [33], which besides being gentle, provides the highest DNA yields in comparison to freeze-thawing and sarkosyl-based lysis protocols [23]. Further SDS treatment of the sediment leads to more lysis of cells as compared to bead milling [34]. Combination of various physical and chemical methods that suits the different type of soils and microbial diversity has been reported [22]; however, polyphenolic compounds may get co-purified with the DNA and may interfere with enzymatic modification of the isolated DNA [35]. In addition, coextraction of humic substances like humic and fulvic acids present in soil interfere in DNA detection and measurement, and inhibit various enzymes including DNA polymerase in PCR [35, 36] and restriction enzymes [37] in digestion processes. An effect on transformation efficiency [38] and DNA hybridization specificity [39] is also observed by these substances [22]. These limitations have led many laboratories to isolate DNA from the metagenome of a microbial community after pre-cultivation. Although laboratory enrichment cultures result in isolation of a limited biodiversity, this technique is highly efficient for the rapid isolation of large DNA fragments and for cloning of operons and novel genes with high biotechnological value [40–42] (see Table 1). This also leads to an increased frequency of gene detection and isolation from reselected microorganisms of known desired traits. [19, 42, 53]. Tsai and Olson [35] developed a rapid method of removing humic substances in sediments for PCR, which can be a tool in detecting low number of bacterial cells in environmental samples.

481

Biotechnology Journal

Biotechnol. J. 2009, 4, 480–494

Table 1. Techniques for the enrichment of genomic DNA

Techniques

Principle

References

Differential expression analysis (DEA) Transcriptional differences in gene expression and detection of target genes

[27]

5-Bromo-2-deoxyuridine (BrdU) labeling

Isolation based on BrdU-labeled DNA in metabolically active cells

[43, 44]

Stable isotope probing (SIP)

Labeling of microbial DNA/RNA with stable isotopes followed by separation of labeled/unlabeled DNA by ultracentrifugation

[45, 46]

Suppressive subtractive hybridization (SSH)

Subtractive hybridization to select DNA fragments unique to each DNA sample and identification of genetic differences between microorganisms

[47, 48]

Microarray method

Arrays of DNA that are spatially arranged and high-throughput robotic screening for targeting multiple gene products

[46, 49]

Culture enrichment technique

Growth of target microorganisms based on nutritional, physical or chemical criteria

[50]

Phage display expression system

DNA sequence isolation by affinity selection of the surface-displayed protein in phage-display expression libraries

[51]

Multiple displacement amplification (MDA)

Application of Φ29 DNA polymerase and random exonuclease resistant primers to amplify the entire genome

[52]

To protect against the shearing of DNA and loss of diversity during in situ lysis of bacteria in soil, an indirect method involving separation of soil or sediment matrix followed by lysis of bacteria has been used in samples from oceanic bacterioplankton [54, 55]. For molecular studies of ancient DNA in the deeper sediment layers [56–58] for their possible role in natural transformation process, Corinaldesi et al. [59] developed protocol for isolation of the extracellular and intracellular DNA in aquatic sediments, thus removing the chance for possible contamination by intracellular DNA during lysis process. Methods for the direct isolation of DNA targeting diverse soil types have been developed, but their efficiency to capture DNA from all representatives of a complex community is still suspect [22], as the bacterial diversity did not differ much when these methods were compared with DNA isolation from cells separated from the soil matrix [60]. It has been demonstrated that, in marine sediments, a single DNA extraction method can significantly underestimate the total number of bacterial ribotypes present. Large quantities of water have to be filtered to obtain sufficient amounts of DNA for cloning purposes. This hinders the construction of large insert DNA libraries. Rare organisms often contribute a relatively low proportion of the total DNA and the genome population might be overshadowed by a limited number of dominant organisms. This could lead to a selective bias in downstream manipulations such as PCR [61]. To increase our ability to detect the actual bacterial di-

482

© 2009 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

versity in environmental samples, strategies to integrate different DNA extraction procedures are being developed along with the application of two or more different genetic markers. Another problem is overestimation of DNA due to persistence of DNA in the environment after cell death, leading to errors in determining the presence of living organisms. This problem can be corrected by various diagnostic methods based on RNA, instead of DNA [62–66]. The drawbacks associated with these methods are rapid turnover [67, 68], risk of contamination by DNA and/or RNase, and physiological status-dependent expression. Addition of ethidium monoazide (EMA), a DNA-intercalating dye to a mature biofilm before DNA isolation, generated 16S rRNA gene community fingerprints that were qualitatively different from untreated samples [69], providing a promising tool to selectively favor the analysis of the viable portion.

2.2

Enrichment strategies for metagenomic clones

Several strategies have been developed to increase the number of desired clones in a genomic library. Microbial communities can be extensively screened for specific metabolic or biodegradative capabilities by a gene-specific PCR method. This has been used for assessing biodegradative potential of indigenous microbial populations for the presence of catechol 2, 3-dioxygenase and phenol hydroxylase genes [70, 71]. Other reported examples include the identification of denitrifying bacteria [72] and polyhydroxyalkanoate-producing

Biotechnol. J. 2009, 4, 480–494

bacteria [73]. The major limitation of the method is the dependence of design of primers on existing information about sequences and, thus, favors known sequence types. Single-gene-family-specific sets of PCR primers are not able to detect functionally similar genes resulting from convergent evolution. Additional steps will be needed to access fulllength genes as only a fragment of a structural gene will typically be amplified by gene-specific PCR. PCR-based strategies like Tail PCR [74], panhandle PCR [75], cassette PCR [76], adaptor ligation PCR [77] and pre-amplification inverse PCR [78] can be employed. Although being laborious and time consuming, these methods have been used successfully for the recovery of novel gene variants such as that of 2,5 diketo-D-gluconic acid reductase [79], and of catechol 2,3-dioxygenase genes from genomic DNA obtained from a phenol and crude oildegrading bacterial consortium [76]. For the discovery of bacterial genes, another alternative is the differential display (DD) technique. Using randomly primed reverse transcriptase PCR (RT–PCR), genes from environmental samples are analyzed by the direct retrieval and analysis of microbial transcripts without constrains of targeting a specific organism, phylogenetic group or metabolic pathway [80–86]. Although recovery of mRNA is associated with certain technical difficulties, profiling functional microbial communities using RNA might be more effective than DNA, as the former is a more sensitive biomarker because it provides wider genomic access, including structural genes from lower eukaryotes as well as from prokaryotes, and has the ability to select for functional genes in response to alterations in physiological conditions [87]. Using high-density sampling DD, Brzostowicz and co-workers [88] were able to identify an operon responsible for degradation of 2,4-dinitrophenol as well as cyclohexanone oxidation genes in a mixed microbial community. Spread of antibiotic resistance genes in bacterial population involves various elements like integrons and gene cassettes. The key structural features of an integron include a gene cassette integration site (att1), an intI gene that encodes an integrase and two promoters that control the expression of the integrase gene and the incorporated gene cassettes [89, 90].The gene cassettes are genetic elements existing as free, circular/linear, non-replicating DNA molecule, consisting of one or more open reading frame(s) (ORFs) and associated chromosomal attachment sites (attC, 59 base elements). The integrase catalyzes the insertion of the gene cassette into the integration site controlled by the strong promoter via site-specific recombination using att1 and attC as its substrates [91]. Inte-

© 2009 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

www.biotechnology-journal.com

grons, therefore, act as a repository of ORFs coding for many gene products and potentially provide a source of novel genes. Primers designed to target the conserved regions within the 59 base elements have successfully been used to recover novel genes homologous to DNA glycosylase, phosphotransferase, methyl transferase and thiotransferase [92].

3

Metagenomic DNA libraries

Metagenomic libraries are a powerful tool for exploring diversity of microbes in uncultured system and form the basis of genomic studies to link phylogenetic and functional relationship of microbes and environment. The classical method of metagenomic library construction involves insertion of small sequences of less than 10 kb into a standard sequencing vector. However, traditional cloning vectors have been replaced by large cloning vectors like cosmid, fosmids or BACs using insert sizes of approximately 40–200 kb as small insert libraries do not allow the detection of large gene clusters or operons, and a large number of clones would have to be screened [6, 93–98]. E. coli, being mostly employed in industrial fermentations, batch production, separation, and downstream processing, is commonly used as a host strain during screening of soil-derived metagenomic DNA for encountering novel biocatalysts and small molecules [40, 41, 99–101]. However, E. coli has limited ability to express DNA from soil microorganisms and only a small number of positive clones are obtained during a single round of screening (