Adaptation to Nutrient Availability in Marine ... - CiteSeerX

Chapter

26

Adaptation to Nutrient Availability in Marine Microorganisms by Gene Gain and Loss Adam C. Martiny, Ying Huang, and Weizhong Li

26.1 INTRODUCTION An important regulator of both primary productivity and respiration is nutrient availability [e.g., Sanudo-Wilhelmy et al., 2001; Bonnet et al., 2008; Mills et al., 2008; Wambeke et al., 2008]. The concentration and ratio of nitrogen, phosphorus, and iron in the surface ocean varies predictably among different oceanic regions. For example, phosphate is often present at very low concentration in the North Atlantic compared to the North Pacific Subtropical Gyre, whereas the case is reversed for nitrate [Wu et al., 2000]. Similarly, iron is generally low in the Eastern Equatorial Pacific and Southern Ocean compared to most other areas of the ocean [Coale et al., 1996]. Despite this variation in nutrient availability, both photosynthetic and heterotrophic bacteria like Prochlorococcus, Synechococcus, and the SAR11 group are often detected in high abundances across ocean regions [Partensky et al., 1999; Morris et al., 2002; Rusch et al., 2007]. Thus, a fundamental question is whether and how specific bacterial lineages adapt to spatial variation in nutrient availability? It is becoming increasing clear that adaptation to local nutrient conditions is often mediated by gene gain and loss of entire cassettes containing the functionality responsible for nutrient acquisition. For example, a comparative analysis of Prochlorococcus genomes showed large variation in the presence of phosphate uptake genes [Martiny et al., 2006]. Furthermore, Prochlorococcus cells proliferating in low P environments like the Sargasso

Sea contain many P acquisition genes, whereas many of these genes are absent in cells from high-P areas [Martiny et al., 2009a]. This includes genes related to phosphate uptake, regulation, and utilization of organic phosphates. Similarly, it was observed that a marine Synechococcus strain isolated from a coastal high-P environment lacked several P uptake genes compared to one isolated from the open ocean [Palenik et al., 2006]. Rusch and colleagues also observed that several phosphate uptake genes associated with SAR11 were more abundant in the Sargasso and Caribbean Sea compared to the Eastern Pacific Ocean and attributed this to P availability [Rusch et al., 2007]. Furthermore, Prochlorococcus cells predominately found in low-nitrogen regions contained genes necessary for nitrate assimilation, whereas these genes were less common in cells from elevated nitrate regions [Martiny et al., 2009b]. Here, we will describe in detail some bioinformatic tools for using metagenomic libraries to elucidate how the genome content change within specific lineage in different ocean regions. We will further test if these variations are related to changes in nutrient availability in the ocean environment. This is done by first estimating the mean occurrence of core genes within a lineage and then testing if a particular gene is significantly underrepresented in a given sample. This approach was first used to show a link between phosphate availability in the ocean and the presence of phosphate acquisition genes in Prochlorococcus [Martiny et al., 2009a], but we will here further test if the content of P acquisition genes in other abundant ocean

Handbook of Molecular Microbial Ecology, Volume II: Metagenomics in Different Habitats, First Edition. Edited by Frans J. de Bruijn.  2011 Wiley-Blackwell. Published 2011 by John Wiley & Sons, Inc.

269

270

Chapter 26 Adaptation to Nutrient Availability in Marine Microorganisms

bacteria (SAR11 and Synechococcus) follow the same the pattern. Considering the genome-wide extent of gene gain and loss in Prochlorococcus, Synechococcus, and potentially SAR11, this may be a general adaptive process for bacteria to environmental variations in the ocean [Kettler et al., 2007; Rusch et al., 2007; Dufresne et al., 2008]. So although we focus on phosphate acquisition genes here, we want to emphasize that this approach is generally applicable and can be used for any type of adaptation involving gene gain and loss.

26.2

METHODS

To determine the relationship between nutrient availability and uptake genes in bacteria, we analyzed the metagenomes from 79 samples from the Global Ocean Sampling (GOS) expedition [Rusch et al., 2007]. Most of these originated from three oceanic regions: North Atlantic, Eastern Pacific, and the Indian Ocean. The approach is based on first estimating the mean occurrence of core genes within a lineage and then testing if a particular gene is significantly underrepresented in a given sample. The abundance of genes associated with a phylogenetic clade (e.g., Prochlorococcus, Synechococcus, or SAR11) was determined in two steps (Fig. 26.1) as described by Martiny et al. [2009a]. First, we first find all metagenomic reads associated with the clade and then map the reads as orthologs to one query genome. The motivation for this two-step approach is to use the genetic information from all members of a clade to specifically capture as many metagenomic reads as possible and assign them correctly to a taxon. Second, it allows for mapping of all reads to a single genome. This second step circumvents problems with proteins having a better match to genomes outside the clade compared to one input genome (used in the second part of the search) or alternatively assigning orthologs within a group, which can add additional uncertainty. First, we blasted all protein sequences associated with our input clade against all GOS samples (TBLASTN, e-value 1E-6, and minimum length of 25 letters). The translated GOS sequences of these hits were next compared to a reference database of all sequenced genomes (e.g., Genbank) using the fast CD-HIT-2D program (accurate mode) to determine whether a read has reciprocal best match to the input genomes [Li and Godzik, 2006]. For sequences with no match using CD-HIT-2D, we also applied BLASTX to find top matches at e-value ≤1e-10. This first reciprocal blast step provides a reference dataset containing all GOS metagenomic reads associated with a particular clade.

Second, we assigned each hit to a specific protein from one genome in the clade by a reciprocal blast approach (i.e., orthologs). This could be a genome where additional information about gene function is available (e.g., microarray data or gene knockouts). A GOS hit is then considered an ortholog if (1) it is assigned to the input clade (e.g., Prochlorococcus) and (2) the reciprocal hit matches the original query protein when only the query genome is searched. We only assign one hit per GOS read for each query protein to accommodate errors such as frame shifting of the GOS reads. To calculate the relative abundance of each gene, we estimated the number of hits matching this gene divided by the mean number of hits to single-copy core genes. This number represents the fraction of cells in a population that carries a particular gene. Core genes can be determined with a traditional reciprocal blast approach, but there are several software programs available for ortholog assignment. We use OrthoMCL [Li et al., 2003]. All hits are normalized for gene length, because longer genes will recruit more blast hits. However, we have noted that there is not a linear relationship between gene length and abundance and that better algorithms for this normalization step are needed (Fig. 26.2A). This may be partly attributed to the BLAST algorithm, because small fragments may be less likely to be detected. In addition, the number of metagenomic reads matching core genes (after normalized for length) is clearly not normally distributed, so a basic average would overestimate the mean (Fig. 26.2B). Thus, we use Matlab (Mathworks, MA) to provide a maximum likelihood estimate for the mean abundance and variance of core genes for each sample by fitting the number of hits to a gamma distribution. This procedure can be done with the Matlab GUI dfittool. As Figure 26.2B shows, a gamma distribution is a better match to the probability density function of core blast hits compared to a normal distribution. The difference between a gamma and normal distribution is more pronounced for samples with low coverage. It should be noted that the abundance of highly variable genes is probably underestimated due to difficulties with detecting them with BLAST. Thus, a bias due to sequence divergence is probably introduced. Unfortunately, many metagenomic samples including those from GOS do not have much associated metadata (see also Chapter 40, Vol. I for a discussion of this topic). Thus, to compare the relative gene occurrence in the GOS samples to local nutrient concentrations, we retrieved monthly average values for phosphate and nitrate for each location and depth from the World Ocean Atlas [WOA, Boyer et al., 2006]. This approach may introduce additional variance in the analysis of correspondence between genome content and nutrient availability, since WOA values do not represent the exact nutrient concentration at the time of sampling. An

271

26.3 Results and Discussion (A)

Input genomes from clade

TBLASTN of all genes against metagenome (e-value = 1e-6)

Clustering of metagenome hits with CD-HIT (accurate mode)

Search of representative translated metagenome sequences (one from each CD-HIT cluster) against reference database of all sequenced genomes in GenBank using CD-HIT-2D Sequences with no match using CD-HIT-2D were analyzed with BLASTX against GenBank reference database (e-value = 1e-10)

Database of metagenome sequences associated with input clade (B)

TBLASTN of genes from ONE genome against metagenome sequences associated with input clade (e-value = 1e-6)

BLASTP of translated metagenome hit against query genome (ortholog assigned if best hit is same input gene)

Ortholog abundance in metagenome of each gene from query genome

advantage is that WOA values represents an average for a site, whereas an individual sample could contain an unusual concentration of phosphate or nitrate and not represent the environment that the microbial community is adapted to. One can also use satellite data to retrieve values for surface temperature, chlorophyll, and light intensity.

Figure 26.1 Bioinformatic approach to estimate gene ortholog abundance in metagenome samples associated with specific microbial lineages. (A) Estimation of sequences in metagenomes associated with a specified lineage with several genomes available (e.g., Prochlorococcus, Synechococcus, or SAR11). (B) Estimation of ortholog abundance in metagenomic samples of each gene from one query genome.

26.3 RESULTS AND DISCUSSION Similar to some Prochlorococcus strains, Pelagibacter ubique HTCC7211 has a large genomic island that contains many genes involved in phosphate acquisition (Fig. 26.3). This genomic island is absent in two others strains (HTCC1002 and HTCC1062) belonging to

272

Chapter 26 Adaptation to Nutrient Availability in Marine Microorganisms (A) 200 180

Gene abundance

160 140 120 100 80 60 40 20 0 0

200

600

400

800 1000 Gene length (bp)

1400

1200

(B)

1600

Gamma distribution fit Normal distribution fit

0.1

Density

0.08

0.06

0.04

0.02

0 0

5

10

15

25 20 30 Blast hits (length normalized)

35

40

45

50

Figure 26.2 Normalization of gene abundance. (A) Relationship between gene length and gene abundance (as determined by BLAST) in a metagenome sample (i.e., HTCC7211 single-copy core genes present in sample GS02). Red lines describes a linear regression with b = 0. Green line is best linear fit. (B) Probability density function for core length normalized genes in a sample. The gray boxes represent the actual abundance of HTCC7211 core genes in sample GS02. The red line shows the maximum likelihood estimate for a gamma distribution, and blue shows the normal distribution estimate.

SAR11. Genes in this island include those that uptake ortho-phosphate (e.g., the transporters pstABCS and two component regulator phoBRU) and organic phosphates (alkaline phosphatase and phosphonate utilization proteins phnCDEFGHIJLMX). Thus, we examined the distribution of genes from this genomic island in different ocean regions. In open ocean samples from the Sargasso and Caribbean Sea, we observed most of these genes, although the phosphonate utilization genes were detected in low abundance. In coastal samples (e.g., from the Gulf of Maine or Galapagos Islands), all these genes are absent from the population. Open ocean samples from Pacific and Indian Ocean contain genes responsible for ortho-phosphate uptake (i.e., pstABCS and phoBRU), whereas genes associated with organic phosphate utilization appear to be absent. Thus, it appears that SAR11 cells from the open ocean in the Sargasso

and Caribbean Sea contain all the phosphate acquisition genes, some P genes are present in open ocean samples from the Pacific and Indian ocean, and almost all genes are absent in coastal samples. Recently, Tetu et al. [2009] examined genes expressed under P limitation in Synechococcus WH8102. This study identified genes involved in P acquisition, and therefore we used this genome as input to examine the gene gain and loss of P acquisition genes in marine Synechococcus in different ocean regions (Fig. 26.4). As with Prochlorococcus, the genes responding to P limitation includes proteins involved in acquisition of both ortho-phosphate and organic phosphates. We observed that Synechococcus cells from the Sargasso Sea contain most genes relevant P assimilation. In contrast, some of these genes are absent in coastal samples.

273

0.0

1.0

2.0

0.0

1.0

2.0

0.0

1.0

2.0

0.0

1.0

2.0

0.0

1.0

2.0

0.0

1.0

2.0

i

lvD a m

nC

01 13 F b 1 28 _5 _9 _8 lp z 34 1 11 11 11 2 -G -W -4 72 B72 hnX G OG OG fs B72 hnE hnE hnD hnChnFhnGhnH hnI hnJ B O p p p p p p p p p P P p C C C m P

Gulf of Mexico (GS17)

Gulf of Maine (GS02)

Sargasso Sea (GS00d)

x pp

Indian Ocean (GS117)

Rangirora Atoll (GS51)

Galapagos Islands (GS31)

89 8 12 B5 8 G 6 6 _7 1_ -LC 90 -Rfa 317 317 11 1 2 G G2 G G G hoR B72 stS stC stA stB hoUhoB xA pk 7 O O O O p p p p si p C C p P p p PB C CO C 6 8 _7 _7 p 11 211 hy M nL n. hn B72 B7 nuA nuB ur llD h o p C p P p z z z a A uf -s G tp CO dg

HTCC1062 are located in a different genomic region. (B) Relative abundance of Pelagibacter ubique HTCC7211 genes surrounding phoB in different ocean samples. Abundance of individual genes is determined by reciprocal best BLAST hit and normalized against length. The frequency is calculated as the length normalized occurrence of a specific gene divided by the length normalized mean occurrence of single-copy core HTCC7211 genes at each site. The shaded box represents the abundance of 95% of the single-core genes (fitted to a gamma distribution). Yellow represents core genes, gray represents genes of unknown function, blue represents genes related to P uptake, and red represents genes related to phosphonate utilization.

Figure 26.3 Occurrence of phosphate acquisition genes in SAR11. (A) Genes located in proximity of phoB in strain HTCC7211. It should be noted that some P uptake genes in

(B)

P. ubique HTCC7211 P. ubique HTCC1062 P. ubique HTCC1002

(A)

Relative abundance (gene abundance/mean core gene abundance)

oD

ph

ph

16

(1 9 oB 6) ph (9 oR 47 ( ) ps 948 tS ) pt (10 rA 1 (1 8) ps 019 tB ) ps (12 tA 70 ps (12 ) tC 71 (1 ) 2 ps tS 72) (1 ph 28 oX 6) ( ps 17 tS 99 (1 ) po 81 rin 5) ( us 222 4) hA 23 (23 91 90 )


5

274

Synechococcus WH8102 1.6

Sargasso Sea (GS000a)

1.0

0.0 1.6


1.0

Relative abundance (gene abundance/mean core gene abundance)

0.0 1.6


1.0

0.0 1.6


1.0

0.0 1.6


1.0

0.0 1.6


1.0

0.0 1.6


1.0

0.0

Figure 26.4 Occurrence of phosphate acquisition genes in marine Synechococcus. (A) Genes annotated as P acquisition genes and upregulated during P stress in Synechococcus WH8102 [Tetu et al., 2009]. (B) Relative abundance of Synechococcus P acquisition genes in different ocean samples. See text and Figure 26.3 for experimental details. Gene numbers refer to SYNWXXXX.

275

Internet Resources

Mean P acquisition gene occurrence

(A): Pelagibacter (SAR11)

(B): Synechococcus

1 0.8 0.6 0.4 0.2 0

0

0.2

0.6 0.4 Phosphate [uM]

0.8

1

0

0.2

0.4 0.6 Phosphate [uM]

In a previous study of Prochlorococcus, we observed that the presence of P acquisition genes was directly linked to phosphate concentration in the ocean. To test if this correlation also holds for SAR11 and Synechococcus, we examined the relationship between P concentration and average occurrence of P acquisition genes (Fig. 26.5). Rather than a linear relationship, it appears that SAR11 cells from samples containing less than a threshold of 100 nM phosphate had many P genes (Fig. 26.5A). In contrast, these genes are largely absent from cells from regions with elevated P. In Synechococcus, it also appears that there is a relationship between P concentration and genomic content (Fig. 26.5B). Generally, cells from low-P regions contain more genes associated with P acquisition than those from high-P regions. However, for this analysis, Synechococcus is only present in sufficient numbers in a few GOS metagenomic samples, making it difficult to identify a clear pattern. A limitation of our bioinformatics pipeline is that it only can detect the presence or absence of genes found in already sequenced genomes. However, there are likely important genomic variants among uncultured lineages within Prochlorococcus, Synechococcus, and SAR11. As an example of this, it was recently observed that some uncultured Prochlorococcus lineages harbored a gene cluster associated with nitrate assimilation, but no cultured representatives carried these genes [Martiny et al., 2009b]. Thus, this adaptation would not be detected with our described approach. The presented bioinformatics approach is very useful for detecting adaptations involving genes that are readily lost or gained and therefore not associated with a particular phylogenetic clade. This includes phosphate, nitrogen, and iron acquisition genes. Many other genes appear to be gained and lost during the evolution of marine bacteria

0.8

1

Figure 26.5 Relationship between phosphate concentration and average occurrence of P acquisition genes from SAR11 (A) and Synechococcus (B) in GOS samples. The average occurrence of P genes is the logarithmic mean of relative abundance of P acquisition genes. Phosphate concentration values for each site are retrieved from the World Ocean Atlas [Boyer et al., 2006].

[Kettler et al., 2007; Rusch et al., 2007; Dufresne et al., 2008], and some of these may too display a spatial or temporal distribution linked to environmental variations that can be detected with this approach. On the other hand, it has become clear that adaptation to variations in light or temperature commonly involves changes of protein sequences and regulatory systems and not gene gain and loss [Kettler et al., 2007]. These types of adaptation are often associated with specific phylogenetic clades and can instead be studied through the biogeography of phylotypes. In summary, we observe that the genome content of Prochlorococcus, Synechococcus, and SAR11 populations are all related to phosphate availability in different ocean regions. In low-P regions, all three lineages contain many genes for P acquisition, including regulation and organic P utilization. Some or most of these genes are gone in high-P environments. Strikingly, it appears that Prochlorococcus and SAR11 share a threshold of ≈100 nM, whereby cells below this threshold have many P genes. This may be related to a similar cell size and thereby physical constraints of nutrient uptake (e.g., surface-to-volume ratio). In contrast, there seems to be more of a linear relationship between genome content and P concentration in Synechococcus populations. Overall, our data suggest that gene gain and loss is important for genetic adaptation to nutrient availability in the ocean across several abundant microbial lineages.

INTERNET RESOURCES CD-HIT (http://weizhong-lab.ucsd.edu/cd-hit/) OrthoMCL (http://www.orthomcl.org)

276


World Ocean Atlas (http://www.nodc.noaa.gov/OC5/ WOA05/woa05data.html) CAMERA (http://camera.calit2.net/)

Acknowledgments The authors would like to thank Doug Rusch from JCVI, Jennifer B.H. Martiny from UCI, and Ben Temperton from Plymouth Marine Laboratory for many helpful comments. The work was supported by awards OCE0928544 from the National Science Foundation (ACM), 1R01AI075317 from NIH (YH), and the Gordon and Betty Moore Foundation (WL).

REFERENCES Bonnet S, Guieu C, Bruyant F, Prasil O, Van Wambeke F, et al. 2008. Nutrient limitation of primary productivity in the Southeast Pacific (BIOSOPE cruise). Biogeosciences 5:215– 225. Boyer TP, Antonov JI, Garcia HE, Johnson DR, Locarnini RA, et al. 2006. World Ocean Database 2005 . Washington, DC: U.S. Goverment Printing Office. Coale KH, Fitzwater SE, Gordon RM, Johnson KS, Barber RT. 1996. Control of community growth and export production by upwelled iron in the equatorial Pacific Ocean. Nature 379:621– 624. Dufresne A, Ostrowski M, Scanlan DJ, Garczarek L, Mazard S, et al. 2008. Unraveling the genomic mosaic of a ubiquitous genus of marine cyanobacteria. Genome Biol . 9:R90. Kettler GC, Martiny AC, Huang K, Zucker J, Coleman ML, et al. 2007. Patterns and Implications of Gene Gain and Loss in the Evolution of Prochlorococcus. PLoS Genet. 3:e231. Li L, Stoeckert CJ, Roos DS. 2003. OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes. Genome Res. 13:2178– 2189. Li W, Godzik A. 2006. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658– 1659.

Martiny AC, Coleman ML, Chisholm SW. 2006. Phosphate acquisition genes in Prochlorococcus ecotypes: Evidence for genome-wide adaptation. Proc. Natl. Acad. Sci. USA 103: 12552– 12557. Martiny AC, Huang Y, Li WZ. 2009a. Occurrence of phosphate acquisition genes in Prochlorococcus cells from different ocean regions. Environ. Microbiol . 11:1340– 1347. Martiny AC, Kathuria S, Berube PM. 2009b. Widespread metabolic potential for nitrite and nitrate assimilation among Prochlorococcus ecotypes. Proc. Natl. Acad. Sci. USA 106:10787– 10792. Mills MM, Moore CM, Langlois R, Milne A, Achterberg E, et al. 2008. Nitrogen and phosphorus co-limitation of bacterial productivity and growth in the oligotrophic subtropical North Atlantic. Limnol. Oceanogr. 53:824– 834. Morris RM, Rappe MS, Connon SA, Vergin KL, Siebold WA, et al. 2002. SAR11 clade dominates ocean surface bacterioplankton communities. Nature 420:806– 810. Palenik B, Ren QH, Dupont CL, Myers GS, Heidelberg JF, et al. 2006. Genome sequence of Synechococcus CC9311: Insights into adaptation to a coastal environment. Proc. Natl. Acad. Sci. USA 103:13555– 13559. Partensky F, Hess WR, Vaulot D. 1999. Prochlorococcus, a marine photosynthetic prokaryote of global significance. Microbiol. Mol. Biol. Rev . 63:106– 127. Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S et al. 2007. The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific. PLoS Biol . 5:e77. Sanudo-Wilhelmy SA, Kustka AB, Gobler CJ, Hutchins DA, Yang M, et al. 2001. Phosphorus limitation of nitrogen fixation by Trichodesmium in the central Atlantic Ocean. Nature 411: 66–69. Tetu SG, Brahamsha B, Johnson DA, Tai V, Phillippy K, et al. 2009. Microarray analysis of phosphate regulation in the marine cyanobacterium Synechococcus sp WH8102. ISME J . 3: 835– 849. Wambeke F, Bonnet S, Moutin T, Raimbault P, Alarcon G, Guieu C. 2008. Factors limiting heterotrophic bacterial production in the southern Pacific Ocean. Biogeosciences 5:833– 845. Wu J, Sunda W, Boyle EA, Karl DM. 2000. Phosphate Depletion in the Western North Atlantic Ocean. Science 289:759– 762.