Podo_comparative Final annotated MS - DSpace@MIT

91 downloads 51156 Views 3MB Size Report
Jan 29, 2013 - ... and imported in Adobe. 512. Illustrator. .... the NSF Center for Microbial Oceanography Research and Education (C-MORE) (grant numbers.
1  

Genomes of marine cyanopodoviruses reveal multiple origins of diversity

2   3  

Labrie S.J.1, K. Frois-Moniz1, M.S. Osburne1, L. Kelly1, S.E. Roggensack1, M.B. Sullivan1,3, G.

4  

Gearin2, Q. Zeng2, M. Fitzgerald2, M.R. Henn2 and S.W. Chisholm1

5   6  

1

Department of Civil and Environmental Engineering, Massachusetts Institute of Technology,

7  

Cambridge, MA, USA.

8  

2

Broad Institute, Cambridge, MA, USA.

9  

3

Current Address: Ecology and Evolutionary Biology Department, University of Arizona,

10  

Tucson, AZ, USA

11   12  

Corresponding author: Sallie W. Chisholm

13  

MIT 48-419

14  

15 Vassar Street

15  

Cambridge, MA 02139

16  

Phone: (617) 253-1771

17  

Fax: (617) 324-0336

18  

Email: [email protected]

19   20  

Running title: Cyanopodovirus genomics

21  

   

1  

22  

Summary

23   24  

The marine cyanobacteria Prochlorococcus and Synechococcus are highly abundant in the global

25  

oceans, as are the cyanophage with which they co-evolve. While genomic analyses have been

26  

relatively extensive for cyanomyoviruses, only 3 cyanopodoviruses isolated on marine

27  

cyanobacteria have been sequenced. Here we present 9 new cyanopodovirus genomes, and

28  

analyze them in the context of the broader group. The genomes range from 42.2 to 47.7 kbp, with

29  

G+C contents consistent with those of their hosts. They share 12 core genes, and the pan-genome

30  

is not close to being fully sampled. The genomes contain 3 variable island regions, with the most

31  

hypervariable genes concentrated at one end of the genome. Concatenated core-gene phylogeny

32  

clusters all but one of the phage into three distinct groups (MPP-A and two discrete clades within

33  

MPP-B). The outlier, P-RSP2, has the smallest genome and lacks RNA polymerase, a hallmark of

34  

the Autographivirinae subfamily. The phage in groups MPP-B contain photosynthesis and carbon

35  

metabolism associated genes, while group MPP-A and the outlier P-RSP2 do not, suggesting

36  

different constraints on their lytic cycles. Four of the phage encode integrases and three have a

37  

host integration signature. Metagenomic analyses reveal that cyanopodoviruses may be more

38  

abundant in the oceans than previously thought.

39   40  

Introduction

41  

Viruses are abundant in the oceans, often outnumbering bacterioplankton by an order of

42  

magnitude (Bergh et al., 1989; Fuhrman, 1999; Wommack and Colwell, 2000; Weinbauer and

43  

Rassoulzadegan, 2004). Among marine bacteria, the cyanobacteria Prochlorococcus and

44  

Synechococcus are the numerically dominant oxygenic phototrophs (Waterbury et al., 1986;

45  

Partensky et al., 1999; Scanlan and West, 2002), and contribute significantly to global primary

46  

productivity and global biogeochemical cycles (Liu et al., 1997; Liu et al., 1998). They coexist

47  

with their specific viruses – cyanophage – which are believed to play a key role in maintaining

   

2  

48  

diversity by “killing the winner” (Waterbury and Valois, 1993; Suttle and Chan, 1994; Thingstad,

49  

2000). Moreover, cyanophage impact the evolution of their hosts by mediating horizontal gene

50  

transfer (Lindell et al., 2004; Zeidner et al., 2005; Sullivan et al., 2006; Yerrapragada et al.,

51  

2009).

52   53  

All cyanophage isolated thus far are Caudovirales – tailed, dsDNA viruses belonging to three

54  

families: Myoviridae, Podoviridae and Siphoviridae. Most of the cyanomyoviruses are similar to

55  

the archetypal coliphage T4, and have genome sizes ranging from 161 – 252 kb, (Sullivan et al.,

56  

2010). Cyanopodoviruses, with genome sizes ranging from 42 kb to 47 kb, are similar in gene

57  

content and genome organization to coliphage T7 (Chen and Lu, 2002; Sullivan et al., 2005; Pope

58  

et al., 2007). There are fewer examples of cyanosiphoviruses (Sullivan et al., 2009; Huang et al.,

59  

2011), which have genome sizes ranging from 30 kb to 108 kb and do not share common features

60  

with other bacteriophage (Huang et al., 2011). To date, 18 cyanomyovirus genomes (Sullivan et

61  

al., 2005; Weigele et al., 2007; Millard et al., 2009; Sullivan et al., 2010; Sabehi et al., 2012), 5

62  

cyanosiphovirus genomes (Sullivan et al., 2009; Huang et al., 2011), and 5 cyanopodovirus

63  

genomes have been published (Chen and Lu, 2002; Sullivan et al., 2005; Pope et al., 2007; Liu et

64  

al., 2007; Liu et al., 2008).

65   66  

A hallmark characteristic of the cyanomyoviruses and cyanopodoviruses is that they carry

67  

homologs to host genes (which we now refer to as phage/host shared genes (Kelly et al,

68  

submitted)) whose products are thought to increase phage fitness under certain conditions. A

69  

subclass of these genes, referred to as auxiliary metabolic genes (“AMG” (Breitbart et al., 2007)),

70  

encode proteins involved in host metabolic pathways such as the light reactions of photosynthesis

71  

(PsbA, PsbD, Hli, PsaA, B, C, D, E, K, J/F (Mann, 2003; Lindell et al., 2004; Lindell et al., 2005;

72  

Sullivan et al., 2006; Sharon et al., 2009; Béjà et al., 2012)), the pentose phosphate pathway (PPP

73  

(Sullivan et al., 2005; Thompson et al., 2011; Zeng and Chisholm, 2012)), phosphate acquisition

   

3  

74  

(Millard et al., 2004; Sullivan et al., 2005; Sullivan et al., 2010; Thompson et al., 2011; Zeng and

75  

Chisholm, 2012), nitrogen metabolism (Sullivan et al., 2010) and DNA synthesis (Sullivan et al.,

76  

2005), among others. It is thought that the phage carry these homologs to alleviate bottlenecks in

77  

these key pathways after host transcription of host homologs has stopped (Thompson et al.,

78  

2011).

79   80  

Several observations reveal very tight co-evolution of host and cyanophage genomes with regard

81  

to these phage/host shared genes. It has been demonstrated, for example, that phage AMGs are

82  

expressed simultaneously during infection (Lindell et al., 2007) regardless of their position in the

83  

genome, which is striking given the strict genome-order transcription normally associated with

84  

such (T7-like) phage. In the case of phage/host shared P-acquisition genes, it has been

85  

demonstrated that these genes are carried more frequently by phage in regions of the oceans

86  

where cells are P-stressed (Kelly et al., submitted), and expression of the phage version of a high-

87  

affinity PO4-transport protein is actually regulated by the host PhoRB two component regulatory

88  

system such that the phage gene is only upregulated when the phage is infecting a P-stressed host

89  

cell (Zeng and Chisholm, 2012).

90   91  

Phage also carry genes that in the host encode high-light inducible proteins (Hlis – also called

92  

small CAB-like proteins (Funk and Vermaas, 1999)) thought to protect the photosynthetic

93  

complex, or possibly to be involved in a more general stress response in the host (He et al., 2001).

94  

Photosynthesis-associated proteins (Hlis, PsbA and PsbD) found in cyanophage are related to

95  

their respective orthologous proteins found in cyanobacterial genomes, indicating that they are of

96  

cyanobacterial origin (Lindell et al., 2004; Sullivan et al., 2006). Interestingly, there are two types

97  

of hli genes found in cyanobacterial genomes, referred to as single- and multi-copy hlis (Bhaya et

98  

al., 2002). The single-copy hlis are part of the Prochlorococcus core genome while multi-copy

99  

hlis contribute to the flexible genome and are found in highly variable genomic islands (Coleman

   

4  

100  

et al., 2006). Cyanophage hlis are homologous to the multi-copy hlis, suggesting that cyanophage

101  

play a role in horizontal transfer of multi-copy hlis (Lindell et al., 2004).

102   103  

Of the 5 cyanopodoviruses for which complete genomes were available prior to this study, two

104  

are of marine origin: P-SSP7 and Syn5 from the Sargasso Sea (Sullivan et al., 2005; Pope et al.,

105  

2007), and one is of estuarine environment in Georgia (P60 -Chen and Lu, 2002). The two other

106  

isolates, Pf-WMP3 (Liu et al., 2008) and Pf-WMP4 (Liu et al., 2007), are derived from

107  

freshwater environment and were isolated on the filamentous cyanobacterium Leptolyngbya. The

108  

genome of P-SSP7 is organized in three classes, similar to coliphage T7 (Sullivan et al., 2005),

109  

the first involved in takeover of host enzymatic machinery, followed by DNA replication and

110  

transcription, and finally viral assembly and morphogenesis (Lindell et al., 2007). Interestingly,

111  

whereas P60, isolated from a coastal river (Chen and Lu, 2002), has a similar genetic architecture

112  

to the freshwater cyanophage, its genes have greater homology to marine cyanopodoviruses (See

113  

Note added in proofs).

114   115  

To expand our understanding of the diversity and evolution of cyanopodoviruses infecting marine

116  

cyanobacteria, and to provide more reference genomes for metagenomic analyses, we sequenced

117  

9 additional cyanopodovirus genomes (Table 1) isolated from diverse environments (Red Sea,

118  

Sargasso Sea, Gulf Stream, and Subtropical Pacific Gyre) on host strains belonging to four

119  

different ecotypes of Prochlorococcus (HL I, II and LL I, II), and analyzed them in the context of

120  

the entire collection.

121  

   

5  

122  

Results and discussion

123   124  

Cyanophage isolation and host range

125  

The cyanopodoviruses reported here were isolated over a period spanning more that a decade

126  

(1995-2006; Table 1). Diverse strains of Prochlorococcus, including representatives from both

127  

high-light and low-light clades, were used as hosts to isolate and maintain phage stocks (Table 1).

128  

In contrast to cyanomyoviruses, which can typically infect multiple bacterial strains (Sullivan et

129  

al., 2003), these cyanopodoviruses have narrow host ranges, infecting only one or two strains

130  

under laboratory conditions (Table 2).

131   132  

General features of cyanopodovirus genomes

133  

The general features of the cyanopodovirus genomes are shown in Table 1, and include 9

134  

genomes reported for the first time, along with 5 existing genomes that were used for comparative

135  

analyses. The genomes of cyanopodovirus P-SSP7 and Syn5 are known to be linear, with direct

136  

terminal repeats (Pope et al., 2007; Sabehi and Lindell, 2012), and we assume that the new

137  

genomes are linear as well. The marine cyanopodovirus genomes range from 42.2 kbp to 47.7

138  

kbp, and code for 48 to 68 putative open reading frames (ORFs). The majority of the putative

139  

genes are encoded on the same strand, but phage P-RSP2 and P60 that contain an inverted region

140  

of 1.5 kb and multiple genome rearrangements, respectively (ORF15-17P-RSP2 – Fig. 3) (See Note

141  

added in proofs). Phage isolated on Prochlorococcus have a G+C content of 34% to 40.5%, while

142  

those isolated on Synechococcus range from 53% to 55% (Table 1) reflecting the different G+C

143  

content of the two hosts and the selective pressure for the phage to adapt their codon usage to that

144  

of their hosts (Krakauer and Jansen, 2002; Limor-Waisberg et al., 2011). The ability of

145  

cyanomyoviruses to cross-infect both Prochlorococcus and Synechococcus, despite their different

146  

G+C content, is thought to be facilitated by the tRNAs encoded by this group of phage (Enav et

147  

al., 2012). Only two tRNAs were identified in the cyanopodoviruses, however – one partial tRNA

   

6  

148  

in P-SSP7 (Sullivan et al., 2005) and one glycine tRNA in P-RSP5. The latter does not

149  

correspond to a rare codon in its host genome or to a highly used codon in the P-RSP5 genome

150  

(data not shown), suggesting that the G+C content difference between the genomes of

151  

cyanopodoviruses that infect Synechococcus and Prochlorococcus is probably a significant

152  

barrier to cross-infectivity (Enav et al., 2012).

153   154  

DNA Polymerase Phylogeny and the Core and Pan Genomes

155   156  

As a foundation for the analyses that follow, we wanted to identify the core genes shared by a

157  

defined set of cyanopodoviruses, as well as their flexible gene set. Previous work on Podoviridae

158  

DNA polymerase diversity suggests that this gene could be an acceptable phylogenetic tracer for

159  

Podoviridae because it is conserved among different groups of phage and shows signs of vertical

160  

inheritance (Chen et al. 2009; Labonté et al. 2009). Thus we used the phylogeny of this gene to

161  

define sets of phage for the core and pan-genome analysis, and to guide our analysis of

162  

relatedness among the phage. We first cast a broad net, including 71 DNA polymerase genes

163  

from phage of different genera and families according to current International Committee on

164  

Taxonomy of Viruses (ICTV) classification (Fig. 1). All cyanopodoviruses fell into the same

165  

clade – designated the P60-like genus (Lavigne et al., 2008) – with the exception of two

166  

freshwater cyanopodoviruses (indicated by three blue dots in Fig 1, as DNA polymerase is

167  

encoded by two genes in one of the phage). The P60-like clade can be divided into three

168  

subclades, supported by bootstrap values greater than 95% which exclude an outlier – P-RSP2 .

169  

The first clade corresponds to the clade MPP-A (marine picocyanopodovirus A) established by

170  

Chen and colleagues (2009), while the other two fall within clade MPP-B and form two discrete

171  

clades (B1 and B2) (see the core genome phylogeny analysis section below – Figs 1 & 3).

172  

   

7  

173  

Using an analysis similar to that described in Tettelin et al (2005) and used in our analysis of

174  

cyanomyoviruses (Sullivan et al., 2010), we first defined a set of core genes using only the 10

175  

cyanopodoviruses isolated on Prochlorococcus (P-RSP2, P-HP1, P-SSP11, P-SSP10, P-GSP1, P-

176  

SSP2, P-SSP3, P-SSP7, P-RSP5 and P-SSP9 – Table 1). This core is composed of 19 genes (Fig.

177  

2A); adding Synechococcus-specific phage Syn5 to the analysis reduces this number to 17 (Fig.

178  

2B), and if Synechococcus phage P60 is added, the shared gene set drops to 12 (Table 3 – Fig.

179  

2C). The significant impact of adding P60 is perhaps not surprising given its estuarine habitat.

180  

P60’s genome also includes several frameshifts (see below) and incomplete proteins (Table 3)

181  

(See Note added in proofs). Finally, adding the two freshwater cyanopodoviruses to the analysis

182  

causes a precipitous drop to 3 core genes: primase/helicase, DNA polymerase, and terminase

183  

(Fig. 2D) – consistent with the divergence of these phage seen in the DNA polymerase tree (Fig.

184  

1).

185   186  

Of the 17 core genes shared by the 10 Prochlorococcus cyanopodoviruses and Syn5, 9 are

187  

involved in DNA metabolism and assembly of virions, 6 encode phage structural proteins (portal

188  

protein, MCP, tail tube proteins A and B, internal core protein, tail fiber), one encodes the

189  

terminase, and one codes for an hypothetical protein of unknown function (Table 3; Fig. 3, blue

190  

shading). The pan-genome of this set of cyanopodoviruses is composed of 241 clustered

191  

orthologous groups (COGs), and the cumulative curve of unique genes is nowhere near

192  

saturation, suggesting that vast diversity remains (Fig. 2). Each new genome contributed an

193  

average of 15 unique genes to the pan-genome, representing 22.0% to 31.6% of the genes in each

194  

genome. In a similar analysis of 16 cyanomyoviruses, each genome adds approximately 90 new

195  

genes, or 27.5% to 42.8% of their gene content (Sullivan et al., 2010). In both, the percentage is

196  

significantly higher than that observed for host strains, where each new sequenced genome added

197  

approximately 7.3% to 11.8% of their gene content to the pan-genome (Kettler et al., 2007).

198      

8  

199  

Genome organization

200  

With the exception of P60 (See Note added in proofs) and the two freshwater cyanophage (Pf-

201  

WMP3 and Pf-WMP4) gene order in these genomes is roughly consistent with their relatedness in

202  

the DNA polymerase tree and core genome analysis (Fig. 3). As in P-SSP7 (Sullivan et al., 2005),

203  

order is highly conserved, and strikingly similar to the distantly related prototype enterophage T7

204  

(Dunn et al., 1983), supporting the hypothesis that T7-like enterophage and cyanopodoviruses

205  

evolved from a common ancestor, diverging at the protein sequence level (Sullivan et al., 2005;

206  

Lavigne et al., 2008) while keeping a similar genome organization. The exception is P60, which

207  

has multiple inversions (Fig. 3), rendering its genome architecture more similar to the freshwater

208  

cyanopodoviruses Pf-WMP3 and Pf-WMP4 (Liu et al., 2007; Liu et al., 2008), while its protein

209  

sequences are more similar to those of marine cyanophage (Liu et al., 2007). That is, P60 evolved

210  

with the other marine cyanopodoviruses in terms of protein sequences, but underwent multiple

211  

genomic rearrangements altering the T7-like genome architecture (See Note added in proofs). We

212  

note again, that P60 was isolated from an estuarine environment – quite distinct from the open

213  

ocean habitat of the other marine phages.

214   215  

Similar to T7 (Molineux, 2006), P-SSP7 genes are grouped into three ordered classes of genes

216  

that are sequentially expressed over the course of infection – marked in red, green, and blue along

217  

the P-SSP7 genome in Fig. 3 (Lindell et al., 2007). Class I genes encode primarily small proteins,

218  

including MarR and gp0.7, thought to be involved in redirecting transcription from the host to

219  

the phage (Lindell et al., 2007). This region is highly variable and does not include core genes

220  

(see below). Class II includes genes from the RNA polymerase gene up to, but not including the

221  

major capsid protein (MCP) gene and is involved in transcription, DNA metabolism and

222  

replication, and code for phage scaffolding proteins and structural components. Class III consists

223  

of genes involved in phage assembly and DNA maturation (Molineux, 2006) and spans the rest of

224  

the genome (Lindell et al., 2007).

   

9  

225   226  

Since P60 was the first cyanopodovirus sequenced (Chen and Lu, 2002) we are upholding naming

227  

conventions for phage and referring to this as the “P60-like genus” (Lavigne et al., 2008), even

228  

though P60 is not a ‘typical’ phage in this group with respect to gene content and organization

229  

(See Note added in proofs).

230   231  

Phylogeny and classification based on core genomes

232  

To further examine the phylogenetic groupings established above, the amino acid sequences of

233  

the core genes shared by the marine cyanopodovirus genomes (Fig. 2C) were concatenated and

234  

aligned, and a maximum likelihood analysis was applied (Fig. 3, tree on the left). Three distinct

235  

subgroups (MPP-A, MPP-B1 and B2) emerged with a topology consistent with the DNA

236  

polymerase tree above (compare Fig. 1 and Fig. 3), with P-RSP2 as an outlier, but still belonging

237  

to the group. The two divergent freshwater cyanopodoviruses (Fig. 1) were excluded from this

238  

core phylogeny analysis since they are missing most of the core genes (Fig. 2D).

239   240  

Based on the sequence analysis of the concatenated core genomes (Fig. 3), and its congruence

241  

with the DNA polymerase tree (Fig. 1), the 12 marine cyanopodoviruses in Fig. 3 belong to the

242  

same genus – the P60-like genus of the subfamily of the Autographivirinae. Even though P-RSP2

243  

is divergent from the other members of the group, it clearly falls within this clade. Because P-

244  

RSP2 lacks an RNA polymerase gene, however, it would normally be excluded from the

245  

Autographivirinae subfamily – which currently includes even very distantly related Podoviridae

246  

(eg. T7 and phiKMV – Fig. 1, middle ring) – based on this single criterion. Although the presence

247  

of RNA polymerase has been considered a hallmark gene for assignment of a phage to the

248  

Autographivirinae, we argue that P-RSP2 should be included based on its similarities to other

249  

phage in the P60-like genus (Figs. 1 & 3).

250      

10  

251  

P-RSP2 – the outlier

252  

P-RSP2 shares the same genome organization as the other cyanopodoviruses (with the exception

253  

of an inverted region in the class III genes), and has the same set of core genes, but it is highly

254  

divergent (Figs. 1 & 3). In fact, only one of its core genes (DNA polymerase – Fig. 1) shares

255  

more than 60% amino acid identity with the other phage. That it is the only phage in the group

256  

that was isolated on Prochlorococcus strain MIT9302 raises the question of whether there is

257  

something unique about this phage/host relationship. As discussed above, P-RSP2 is also the only

258  

phage in this group that lacks an RNA polymerase gene, essential for inclusion in the

259  

Autographivirinae (Lavigne et al., 2008), which in the canonical podovirus coliphage T7 is

260  

required for efficient transcription of class II and class III phage genes (Summers and Szybalski,

261  

1968; Studier and Maizel, 1969; Studier, 1972).

262   263  

Since P- RSP2 does not encode its own RNA polymerase, it likely has evolved mechanisms to

264  

use host transcriptional machinery to transcribe class II-III genes, such as additional host-like

265  

promoters or modulation of host RNA polymerase with transcriptional regulators such as sigma

266  

factors (Sullivan et al., 2009; Pavlova et al., 2012). In T4, for example, middle and late gene

267  

expression is coordinated by two transcriptional activators (Brody et al., 1995), but a search for

268  

similar activators in P-RSP2 yielded nothing. The G+C content of cyanopodoviruses prohibits the

269  

use of computational approaches like those of Vogel et al. (2003) to search for host-like

270  

promoters, thus the mechanism by which P-RSP2 transcribes Class II and III genes remains a

271  

mystery.

272   273  

Comparative genomics

274  

The Class I gene set (Fig. 3 – red under the P-SSP7 genome), is composed of very short genes

275  

that are highly variable. The set is most conserved in the MPP-B1 group relative to MPP-B2 and

276  

MPP-A, and consists of a genetic module of 10-13 genes that code for putative proteins mostly of

   

11  

277  

unknown function (Fig. 3). Genes of interest include an integrase (in 4 genomes), and a protein

278  

similar to T7 gp0.7 (a transcriptional regulator involved in the takeover of the cellular metabolism

279  

by the phage (Molineux, 2006), found in 3 genomes). Three of the 4 genomes that have the

280  

integrase gene have a downstream integration signature sequence, suggestive of the potential for

281  

lysogeny (discussed in more detail below).

282   283  

Class II genes (Fig. 3 – green under the P-SSP7 genome) were among the most conserved (Table

284  

3) across all three MPP groups. In addition to core genes, Class II also includes genes encoding

285  

RNA polymerase (11/12 genomes), high light inducible proteins (Hli – 9/12 genomes),

286  

photosystem II D1 protein (PsbA – 8/12 genomes) and transaldolase (TalC – 8/12 genomes).

287  

These genes have orthologs in bacterial genomes (phage/host shared genes), and while

288  

photosynthesis-associated genes are thought to have been derived from the host, the origin of talC

289  

is not clear (Ignacio-Espinoza and Sullivan, 2012) (see discussion below). The genes hli, psbA

290  

and talC, only found in MPP-B1 and MPP-B2, are common in cyanophage (Lindell et al., 2004;

291  

Sullivan et al., 2005; Lindell et al., 2005; Sullivan et al., 2006; Chenard and Suttle, 2008; Sullivan

292  

et al., 2010; Thompson et al., 2011; Sabehi et al.,2012) and are thought to increase phage fitness

293  

during infection (Bragg and Chisholm, 2008; Thompson et al., 2011).

294   295  

Class III genes (Fig. 3 – blue under the P-SSP7 genome) mainly consist of genes coding for

296  

structural components of mature virions. This class contains a highly variable region that encodes

297  

host specificity determinants, including genes in the region downstream of the tail tube protein B

298  

(gp31P-SSP7) and through the tail fiber protein (gp36P-SSP7).

299   300  

P-SSP2 and P-SSP3: two co-isolated phage reveal a hypervariable genomic region

301  

Phage P-SSP2 and P-SSP3 were isolated on the same day, at the same station, from proximate

302  

depths (120m and 100m respectively), using Prochlorococcus MIT9312 as the host. Their

   

12  

303  

genomes share 95% overall nucleotide sequence identity, and most proteins are 100% identical

304  

(Fig. 4). They differ in only 7 genes (Table 5), each being either significantly divergent, or absent

305  

in one or the other. The Class I module in the two genomes includes 2 pairs of divergent genes:

306  

gp14P-SSP2/gp55P-SSP3 and gp18P-SSP2/gp52P-SSP3, whose gene products share 76% and 66% identity,

307  

respectively. Immediately adjacent to the latter pair, P-SSP2 encodes an additional orphan gene

308  

(gp17P-SSP2) (Fig. 4) that does not share similarity with proteins in public databases. A second

309  

divergent region is located at the C-terminus of the tail fiber(gp16P-SSP3 and gp57P-SSP2) (Fig. 4;

310  

Table 5) involved in host recognition,. The P-SSP3 tail fiber gene (gp16P-SSP3) is smaller than that

311  

of (gp57P-SSP2). Downstream of gp16P-SSP3 are two small genes - gp15P-SSP3 and gp14P-SSP3 – that are

312  

absent in the P-SSP2 genome. The former is an orphan while the latter shares 29% amino acid

313  

identity with genes gp40P-SSP7 (Figs. 1 & 3) – and 20% amino acid identity with gp28P-RSM4 in a

314  

cyanomyovirus isolated on Prochlorococcus MIT9303 (Sullivan et al., 2010). Genes gp40P-SSP7

315  

and gp14P-SSP3 are located in the same genomic region (Fig. 3).

316   317  

The N-terminal regions of all marine cyanopodoviruses tail fiber proteins are more conserved

318  

than the C-terminal regions (data not shown). The hypervariable C-terminal regions likely help

319  

phage adapt to host receptor diversity, and could either result from random

320  

mutation/recombination events or through an active mechanism. The latter has been reported in

321  

podoviruses that infect the pathogen Bordetella (Uhl and Miller, 1996), which encode a template-

322  

dependent, reverse transcriptase-mediated diversity generating mechanism (Liu et al., 2002; Liu

323  

et al., 2004; Doulatov et al., 2004), but we could find no evidence of this in our genomes. The

324  

counterpart of this phage hypervariable region in their hosts was studied by Avrani et al. (2011).

325  

They found that phage resistance in Prochlorococcus was acquired by accumulating mutations in

326  

hypervariable genomic islands coding for cell surface receptors, among others. Together, these

327  

recent findings beautifully illustrate the ongoing evolutionary arms race between phage and their

328  

hosts.

   

13  

329   330  

Phage/host shared genes, myo/podo shared genes, and genomic islands

331  

One of the most interesting features of some cyanophage is the set of genes they carry that appear

332  

to be of bacterial origin (Mann, 2003; Lindell et al., 2004; Millard et al., 2004; Sullivan et al.,

333  

2005; Lindell et al., 2005; Sullivan et al., 2006; Sullivan et al., 2010; Thompson et al., 2011;

334  

Zeng and Chisholm, 2012) – ‘phage/host shared genes’ (Kelly et al., submitted) – 3 of the most

335  

well studied examples being psbA, talC, and hlis. There are 66 genes in these cyanopodovirus

336  

genomes with orthologs in Prochlorococcus and Synechococcus (Proportal

337  

http://proportal.mit.edu/ - (Kelly et al., 2012)). They group into 12 COGs and are localized in

338  

three regions of the phage genomes (Fig. 5A - diamonds). The first includes genes involved in

339  

nucleotide metabolism that are found in all branches of the tree of life, and as such we don’t

340  

consider it an island. The second contains the psbA and hli genes, and the third includes talC,

341  

which is involved in host carbon metabolism, a nuclease-encoding gene, and a gene of unknown

342  

function – all genes likely acquired by horizontal gene transfer. These regions, which have some

343  

similarity to the genomic islands found in cyanomyoviruses (Millard et al., 2009), are referred to

344  

as Island II and III (Fig. 5A).

345   346  

Island II (Fig. 3, pink shading), surrounded by core genes, is composed of up to 6 genes,

347  

including psbA and hli and additional genes of unknown function (Table 4). Island II genes are

348  

not present in the Syn5, and P-RSP2 genomes, and P-SSP9 has only the hli gene (Figs. 3 and

349  

5A). The psbA and hli genes in this island have orthologs in cyanomyoviruses and hosts (Mann,

350  

2003; Lindell et al., 2004; Lindell et al., 2005; Sullivan et al., 2006), so we wondered whether the

351  

rest of the genes in this island did as well (Table 4). gp222_COG and gp30_COG, clusters of

352  

genes coding for hypothetical proteins, have orthologs in cyanomyoviruses but not in

353  

picocyanobacteria, while gp32_COG has orthologs only in host genomes (Table 4). While the

354  

synteny of Island II is not present in the hosts or cyanomyoviruses (data not shown), orthologous

   

14  

355  

genes in cyanomyovirus were often located within 15-20 genes of each other suggesting that

356  

Island II was likely acquired in small pieces via multiple gene gain events, or as a larger insert

357  

that underwent a series of deletions and reorganizations.

358   359  

Analysis of the phylogeny of the psbA and talC genes in this expanded set of phage genomes (Fig

360  

S1 and S2) generally confirms the conclusions of other reports (Lindell et al., 2004; Millard et al.,

361  

2004; Sullivan et al., 2006; Ignacio-Espinoza and Sullivan, 2012) that phage psbA was not

362  

recently acquired from picocyanobacteria (Fig. S1) and was likely acquired multiple times

363  

(Ignacio-Espinoza et al. 2012). But while the cyanomyovirus psbA genes are closely related to

364  

their specific hosts (Fig. S1), cyanopodovirus psbA genes form a clade distinct from those from

365  

both cyanomyoviruses and hosts (Fig. S1). Further, cyanopodovirus psbA genes appear more

366  

diverse than those of cyanomyoviruses, as indicated by the long branch lengths. As for talC, we

367  

confirm that the origin of phage talC is less clear, as it differs significantly from

368  

picocyanobacterial versions of this gene (Ignacio-Espinoza and Sullivan, 2012). In fact, phage

369  

talC genes are more related to organisms from different phyla (Gammaproteobacteria, Firmicute

370  

and Actinobacteria – Fig. S2). In contrast to psbA, cyanophage talC genes are highly conserved,

371  

form a monophyletic clade, and likely were only acquired once and then diverged (Ignacio-

372  

Espinoza and Sullivan, 2012).

373   374  

It is intriguing that if a genome has any of the three genes, psbA, hli or talC, it has them all - with

375  

the exception of P-SSP9 which has only one hli gene (Table 3). While Island II contains psbA

376  

and hli, and is in the middle of the genome, talC is at the extreme downstream end, making it

377  

unlikely that this set of genes could be simultaneously acquired or lost. Yet they are linked in the

378  

observed gene gain/loss pattern (Fig. 5A – green and turquoise diamonds in Island II, and red

379  

diamonds in Island III) and their co-expression, despite their separation in the genome, led

380  

Lindell et al. (2007) to argue that their physical separation might reflect “evolution in progress”

   

15  

381  

i.e. an initial step toward the co-localization of these co-transcribed genes (Molineux, 2006;

382  

Lindell et al., 2007). The fact that talC lies at the end of all of the cyanopodoviruses now in our

383  

collection, however, argues against this, and suggests that there is something significant about

384  

this positioning that still eludes us.

385   386  

We found 59 proteins (grouped into 16 COGs) shared only by cyanopodo and cyanomyoviruses –

387  

i.e. not present in hosts – and all are of unknown function (Table 6). The majority are in Islands II

388  

and III (Fig. 5A; Table 6) – also the location of all of the phage/host shared genes.

389   390  

The mechanisms underlying the genetic variability in islands in cyanopodoviruses are not clear.

391  

In small lambda-like siphoviruses, rapid evolution is facilitated by structural simplicity, a small

392  

set of core genes, and the exchange of compatible genetic modules (Botstein, 1980; Hendrix et

393  

al., 1999; Comeau et al., 2007). T4-like myoviruses, on the other hand, have a significantly

394  

larger, and syntenic, set of core genes, that are for the most part vertically inherited (Filée et al.,

395  

2006; Comeau et al., 2007; Ignacio-Espinoza and Sullivan, 2012). This core is involved in

396  

replication and assembly of the viruses, often requiring complex protein-protein interactions

397  

(Leiman et al., 2003), which reduces the probability of acquiring functional orthologs. Thus in

398  

T4-like phage, horizontal gene transfer events are concentrated in hypervariable islands (Comeau

399  

et al., 2007; Millard et al., 2009), while the optimal core genome is kept intact (Comeau et al.,

400  

2007). Cyanopodoviruses appear to use a strategy similar to T4-like phages, accessing the genetic

401  

diversity thought to be involved in adaptation to their host’s metabolism and ecological niche

402  

through genomic islands (Filée et al., 2006; Comeau et al., 2007), while conserving an optimal

403  

core genome.

404   405  

The flexible genome positioning reveals more islands

   

16  

406  

We explored whether the frequency of occurrence of a gene in this set of phage (Fig. 2) would be

407  

reflected in the position of that gene in a genome, hoping that this might ultimately yield insights

408  

into gene gain and loss mechanisms. We divided the flexible COGs into 3 groups for this

409  

analysis: i) hyperflexible genes (found in 1-3 genomes – Fig. 5B, red diamonds), ii) flexible genes

410  

(found in 4-6 genomes – Fig. 5B, green diamonds), and iii) conserved flexible genes (found in 7-

411  

10 genomes – Fig. 5B, blue diamonds). The hyperflexible genes are concentrated in the left

412  

extremity of the genomes, which we name Island I, while the flexible genes are more

413  

concentrated in Island II and the right arm of the genome (Island III). Finally, the core and the

414  

conserved flexible genes appear more distributed along the middle, and slightly in the right arm

415  

of the genomes.

416   417  

Assuming that these cyanopodoviruses reproduce similarly to T7 (Wolfson et al., 1972;

418  

Molineux, 2006), in which the genome replicates as linear concatemers that are cleaved before

419  

encapsidation, the propensity of hypervariable genes to be located in Island I could suggest that

420  

gene gain/loss events occur primarily at the extremities of the linear genomes. An alternative

421  

explanation is lysogeny, in which the temperate phage integrates into the host genome as a linear

422  

fragment, and the excision of the phage genome from host chromosome may be imprecise. Two

423  

published cyanopodovirus genomes (P-SSP7 (Sullivan et al., 2005) and Syn5 (Pope et al., 2007))

424  

and three reported here (P-SSP2, P-SSP3 and P-SSP9) encode a phage-like integrase gene.

425  

Furthermore, a 40-50 bp sequence with a perfect match to a cyanobacterial host sequence is found

426  

downstream – suggesting a possible host integration site (Sullivan et al., 2005).

427   428  

Despite indirect evidence for lysogeny in picocyanobacteria (McDaniel et al., 2002; Ortmann et

429  

al., 2002), none of the complete marine cyanobacterial genomes examined contains an intact

430  

prophage. This is perhaps not surprising as it is thought that lysogeny is favored when the

431  

environment is not optimal for growth of host cells, the opposite of optimally growing laboratory

   

17  

432  

cultures (Waterbury and Valois, 1993). Recently, however, a partial prophage sequence, highly

433  

similar to P-SSP7, was found in a genome fragment from a wild Prochlorococcus single-cell

434  

(Malmstrom et al., 2012).

435   436  

Biogeography of cyanopodoviruses

437  

To analyze the distribution of the cyanopodoviruses in the oceans and place it in the context of

438  

their hosts and other cyanophage, we recruited reads from marine metagenomic datasets using all

439  

the cyanophage genomes available (see methods) (Fig. 6-7). We first examined the relative

440  

number of metagenomic reads recruited by cyanosipho-, podo-, and myovirus genomes in the

441  

viral metagenome samples from the HOT212 sample (N. Pacific) and “Marine Virome”. Using

442  

only the 3 previously published cyanopodovirus genomes to recruit, cyanopodoviruses represent

443  

22% of all recruited reads in the HOT212 sample (Fig. 6). This jumps to 50% if all 12 genomes

444  

are used for recruitment, and a similar proportion emerges from the analysis of the MarineVirome

445  

database (Fig. 6).

446   447  

Analysis of the relative abundance of the three viral groups in the bacterial-fraction metagenomes

448  

from the North Pacific (HOT), Bermuda (BATS), Mediterranean (MedDCM), and the Global

449  

Ocean Survey (GOS) (Fig. 6) revealed the dominance of cyanomyoviruses in all samples,

450  

consistent with the observations of others for GOS and MedDCM databases (Williamson et al.,

451  

2008; Huang et al., 2011). The significant overabundance of cyanomyoviruses in these samples

452  

relative to those from the viral fraction (“Marine Virome and HOT212”) samples is likely due to

453  

the larger size of cyanomyoviruses, which would cause them to be preferentially retained by

454  

filters, either attached to cells or freely floating.

455   456  

We analyzed the geographic distribution of cyanopodo- and cyanomyoviruses in the Global

457  

Ocean Survey (GOS) and found that cyanopodoviruses are widespread but appear to be more

   

18  

458  

abundant in the Caribbean Sea, the Gulf of Mexico, the Eastern Tropical Pacific Ocean and the

459  

Indian Ocean (Fig. 7B). Interestingly, abundance of Prochlorococcus recruited reads also

460  

qualitatively corresponds to areas of relatively high cyanopodovirus counts (Fig. 7C). Thus

461  

although quantitative assessments are not possible, the additional reference genomes for

462  

cyanopodoviruses help document their widespread distribution, and point to some hotspots of

463  

abundance.

464   465  

Conclusions and future directions

466   467  

The growing number of cyanophage genomes is helping us better understand their relatedness

468  

and evolution, and their interactions with their host cells. Here we used four approaches to

469  

explore the similarities and differences among cyanopodoviruses: DNA polymerase phylogeny,

470  

concatenated core genome phylogeny, the presence or absence of RNA polymerase, and genome

471  

architecture. All but the extremely divergent freshwater cyanopodoviruses would fall into the

472  

“P60-like genus” by these criteria, except for P-RSP2, which is an outlier in the concatenated

473  

core genome tree, and lacks the hallmark RNA polymerase gene for this group. It is also the only

474  

phage isolated on Prochlorococcus MIT9302. Bcause its core genome architecture is similar to

475  

the others over much of the genome, and its position in the DNA polymerase tree assigns it to the

476  

“P60-like genus” group, we include it here.

477   478  

Cyanopodoviruses have two hypervariable island regions in which genes shared with their hosts,

479  

and/or with cyanomyoviruses, are concentrated. The positions of hyperflexible genes – i.e. those

480  

found in only 1 to 3 genomes – are highly concentrated in a third island at one extremity of the

481  

genome. These islands point to interesting regions for unveiling gene acquisition and loss

482  

mechanisms. Another hypervariable region, at a finer evolutionary scale, encompasses the C-

483  

terminal part of the tail fiber gene in the two very closely related phage, P-SSP2 and P-SSP3.

   

19  

484  

This region may indicate an underlying diversity-generating mechanism, helping phage to adapt

485  

to the vast diversity of host receptors found in marine environments.

486   487  

Our analysis contributes to the growing appreciation of the complexity of phage diversity in the

488  

oceans, and the degree to which it is under-sampled.

489   490  

Materials and Methods

491   492  

Bacteriophage isolation, characterization, DNA extraction

493  

Phage were isolated as previously described (Waterbury and Valois, 1993; Sullivan et al., 2003).

494  

All phage used in this study were isolated by triple (or greater) plaque purification, followed by

495  

two rounds of dilution to extinction. The phage stocks were filtered through 0.2µm and stored at

496  

4˚C in the dark. For each phage, we used the earliest sample in our collection that still retained

497  

infectivity, to minimize the number of infectious cycles the phage went through – and therefore,

498  

the accumulation of mutations in the genome. Nonetheless, all of these phage went through

499  

multiple transfers on serially transferred host cultures before the final stock was collected for

500  

sequencing. The DNA was extracted as previously described (Henn et al., 2010).

501   502  

Genome sequencing, assembly and annotation

503  

The genomes were sequenced by 454 pyrosequencing, and assembled and annotated at the Broad

504  

Institute as previously described (Henn et al., 2010). The protein sequences were clustered into

505  

orthologous groups using OrthoMCL program (van Dongen and Abreu-Goodger, 2012) (see

506  

below) with the available cyanophage genomes on Proportal (http://proportal.mit.edu/ ). The

507  

protein functional annotations were updated based on the information available on ProPortal.

508   509  

Comparative genomics

   

20  

510  

For Figure 3 and Figure 4, all marine cyanopodovirus proteins were compared using the program

511  

BLASTP (NCBI). The genomes in Figure 3 were extracted from the GenBank file using the

512  

software BioEdit (http://www.mbio.ncsu.edu/bioedit/bioedit.html) and imported in Adobe

513  

Illustrator. The comparison of P-SSP2 and P-SSP3 was done using BLASTP and the genome

514  

maps were generated in R using the package GenoplotR (Guy et al., 2010).

515   516  

Core genome analysis

517  

The method used for clustering cyanopodovirus proteins into homologous groups was similar to

518  

that described previously(Kettler et al., 2007; Sullivan et al., 2010). All marine cyanopodovirus

519  

proteins were paired using a reciprocal best BLASTP hit analysis where the sequence alignment

520  

covered at least 75% of the protein length of the longest protein and where the percentage of

521  

identity was at least 35%. The clusters were then built by transiently grouping these pairs. To

522  

increase the sensitivity of the method, HMM profiles (Sonnhammer et al., 1998) were built for

523  

each cluster from an alignment of proteins made with Muscle (version 3.7 (Edgar, 2004; Edgar,

524  

2004). The protein database was then searched de novo using the HMM models to group proteins

525  

with significant homology (E-value ≤ 1e-5). HMMBUILD and HMMSEARCH from HMMER

526  

were used to build and search for motifs in the sequence database, respectively.

527   528  

Phylogeny of the core genome and of the DNA polymerase

529  

All marine cyanopodoviruses were included for this analysis while the freshwater

530  

cyanopodoviruses were excluded because they lack most of the core genes. For each phage, the

531  

core protein sequences were concatenated in the same order, from the single strand binding

532  

protein to the terminase. The concatenated protein sequences were then aligned with MUSCLE

533  

(Edgar, 2004; Edgar, 2004) using the default parameters. The alignment was converted to phylip

534  

format using the BioPython package (Cock et al., 2009). Phylogenetic analysis of the

535  

concatenated proteins was performed using PhyML 3.0 (Guindon et al., 2010). The trees were

   

21  

536  

built from the command line with the following options: -d aa -b -4 -m JTT -v e -c 4 -a e -o tlr.

537  

Both trees are unrooted. The approach NNIs was used to search the tree topology. The initial tree

538  

was based on the BioNJ algorithm using the substitution model JTT (Jones et al., 1992). A

539  

discrete gamma model was estimated by the software with 4 categories and a gamma shape of

540  

1.384 with a proportion of invariant a.a. of 0.042. The maximum likelihood was estimated using

541  

the Shimodaira–Hasegawa–like procedure (Shimodaira, 2002). Finally, the trees were visualized

542  

with the online tool iTOL (Letunic and Bork, 2007; Letunic and Bork, 2011). The sequences of

543  

the DNA polymerase were retrieved from ACLAME database (ACLAME MGEs. Version 0.4 -

544  

family_vir_proph_26 (Leplae et al., 2009)) and were aligned as described above; the tree was

545  

built using the same approach as the core genome phylogeny analysis.

546   547  

Phage/host shared genes and hypervariable genetic islands in cyanopodoviruses

548  

Clustering cyanopodovirus/host and cyanopodovirus/cyanomyovirus shared genes was performed

549  

using the OrthoMCL program (van Dongen and Abreu-Goodger, 2012). The clustering was done

550  

with a conservative value of 35% for the percent identity and an E-value of 1E-05. To avoid

551  

clustering proteins solely on the basis of conserved domains, we pre-filtered our BLASTP results

552  

to accept the orthologous pairs only if the sequence alignment covered at least 75% of the length

553  

of the longer of the two sequences. The cyanophage and picocyanobacterial genomes used in the

554  

clustering analysis are listed in supplemental Table 1. Figure 5 was generated using the python

555  

matplotlib module (Hunter, 2007).

556   557  

P-RSP2 promoter analysis and transcriptional factor searches

558  

The P-RSP2 genome was screened for promoters as previously described (Vogel et al., 2003;

559  

Lindell et al., 2007). Briefly, a position-specific weight matrix was built from the -10 box of

560  

Prochlorococcus MED4 (Vogel et al., 2003) with the Motif module from the BioPython package

561  

(Cock et al., 2009). The phage genomes were searched for this motif. The threshold was set at 7.2

   

22  

562  

based on the distribution of scores for the established motif for the -10 promoter box sequences.

563  

P-RSP2 coding sequences were analyzed to detect transcription factors using InterProScan

564  

(Zdobnov and Apweiler, 2001), Pfam (Punta et al., 2012), and CDD (Marchler-Bauer and Bryant,

565  

2004). We were specifically looking for conserved protein domains related to transcription

566  

factors or DNA binding domain. Except for the phage proteins known to be involved in DNA

567  

metabolism (DNA polymerase, endo/exonuclease, DNA primase, single strand binding protein),

568  

no DNA binding motifs could be detected nor conserved domains related to transcription factors.

569   570  

Metagenomics

571  

Six metagenomic datasets were used in this study: four from the bacterial fraction, (The Global

572  

Ocean Survey dataset (GOS (Rusch et al., 2007)), the deep chlorophyll max Mediterranean

573  

dataset (Ghai et al., 2010), the Pacific Ocean datasets (Station Hawaii Ocean Time-Series –

574  

HOT179 and HOT186 (Frias-Lopez et al., 2008; Coleman and Chisholm, 2010)) and two viral

575  

fraction datasets (the MarineVirome (Angly et al., 2006) and the Pacific Ocean dataset (HOT212

576  

(this study – NCBI accession:  SRA059090)). All datasets, except HOT212, were obtained from

577  

the CAMERA website (http://camera.calit2.net/index.shtm). Only the sites with more than 10,000

578  

reads were used from the GOS database. The methods used were similar to those described by

579  

Malmstrom et al (2012) , and the reference genomes used for recruitment are listed in

580  

supplemental Table 2. Briefly, metagenomic reads were matched to reference genomes using

581  

BLASTN (Table S1), and those with a bit score of at least 40 were compared against the NCBI nt

582  

database to assess if there were other best hits. The number of recruited reads at a GOS site was

583  

normalized against the number of reads in the GOS database from that site. Finally, to compare

584  

the relative abundance of cyanopodo- and cyanomyoviruses, the normalized read counts for each

585  

GOS site were normalized to the average genome size of each phage family – 188780 bp and

586  

46320 bp for the cyanomyo- and cyanopodoviruses respectively. The bar graphs were generated

587  

in R using ggplot2 package (Wickham, 2009) and the map was generated in R using ggplot2

   

23  

588  

(Wickham, 2009), maps (http://CRAN.R-project.org/package=maps), gpclib (http://CRAN.R-

589  

project.org/package=gpclib), and maptools (http://CRAN.R-project.org/package=maptools)

590  

packages. The shapefile used to create the Galapagos Islands inset was downloaded from ©

591  

OpenStreetMap contributors (http://downloads.cloudmade.com).

592  

Note added to proof

593  

After this manuscript was accepted, we learned that a new version of P60 genome has been

594  

generated (Feng Chen, pers. comm.), which contains significant changes from the published

595  

version (Chen and Lu, 2002). We re-examined our data in the context of this revised P60 genome

596  

and found that some of our statements need to be modified, but the main conclusions of the paper

597  

remain the same.

598   599  

First, the revised P60 genome organization now makes it more similar to the other

600  

cyanopodoviruses, and all the genes are coded on the same DNA strand. Further, this genome

601  

makes P60 fall squarely in the P60-like genus as defined by Lavigne et al. (2008). The revised

602  

sequence also affects our core gene analysis such that marine cyanopodoviruses and P60 now

603  

share 15 core genes instead of 12.

604   605  

Acknowledgments

606  

We are grateful to Jessie W. Thompson and Qinglu Zeng for comments and edits on the

607  

manuscript, and Katherine Huang for her advice and analyses in the early stages of the genome

608  

sequencing. This work was supported by grants from the Gordon and Betty Moore Foundation

609  

(SWC and MRH), the US National Science Foundation (NSF) Biological Oceanography Section,

610  

the NSF Center for Microbial Oceanography Research and Education (C-MORE) (grant numbers

611  

OCE-0425602 and EF 0424599), and the US Department of Energy-GTL. SJL was supported by

612  

a postdoctoral fellowship from the “Fonds Québecois de la recherche sur la nature et les

613  

technologies”.

   

24  

614   615   616   617   618   619   620   621   622   623   624   625   626   627   628   629   630   631   632   633   634   635   636   637   638   639   640   641   642   643   644   645   646   647   648   649   650   651   652   653   654   655   656   657   658   659   660   661   662   663  

References Angly, F.E., Felts, B., Breitbart, M., Salamon, P., Edwards, R.A., Carlson, C., et al. (2006) The marine viromes of four oceanic regions. PLoS Biol. 4: e368. Avrani, S., Wurtzel, O., Sharon, I., Sorek, R., and Lindell, D. (2011) Genomic island variability facilitates Prochlorococcus-virus coexistence. Nature. 474: 604-608. Bergh, O., Børsheim, K.Y., Bratbak, G., and Heldal, M. (1989) High abundance of viruses found in aquatic environments. Nature. 340: 467-468. Béjà, O., Fridman, S., and Glaser, F. (2012) Viral clones from the GOS expedition with an unusual photosystem-I gene cassette organization. ISME. J. 6: 1617-1620. Bhaya, D., Dufresne, A., Vaulot, D., and Grossman, A. (2002) Analysis of the hli gene family in marine and freshwater cyanobacteria. FEMS Microbiol. Lett. 215: 209-219. Botstein, D. (1980) A theory of modular evolution for bacteriophages. Ann. N. Y. Acad. Sci. 354: 484-490. Bragg, J.G., and Chisholm, S.W. (2008) Modeling the fitness consequences of a cyanophageencoded photosynthesis gene. PLoS One. 3: e3550. Breitbart, M., Thompson, L.R., Suttle, C.A., and Sullivan, M.B. (2007) Exploring the Vast Diversity of Marine Viruses. Oceanography. 20: 135-139. Brody, N., Kassavetis, A., Ouhammouch, M., Sanders, M., Tinker, L., and Geiduschek, P. (1995) Old phage, new insights: Two recently recognized mechanisms of transcriptional regulation in bacteriophage T4 development. FEMS Microbiol. Lett. 128: 1-8. Chen, F., and Lu, J. (2002) Genomic Sequence and Evolution of Marine Cyanophage P60: a New Insight on Lytic and Lysogenic Phages. Appl. Environ. Microbiol. 68: 2589-2594. Chenard, C., and Suttle, C.A. (2008) Phylogenetic Diversity of Cyanophage Photosynthetic Genes (psbA) in Marine and Fresh Waters. Appl. Environ. Microbiol. 74: 5317-5324. Cock, P.J., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A., et al. (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 25: 1422-1423. Coleman, M.L., and Chisholm, S.W. (2010) Ecosystem-specific selection pressures revealed through comparative population genomics. Proc. Natl. Acad. Sci. U. S. A. 107: 1863418639. Coleman, M.L., Sullivan, M.B., Martiny, A.C., Steglich, C., Barry, K., DeLong, E.F., and Chisholm, S.W. (2006) Genomic islands and the ecology and evolution of Prochlorococcus. Science. 311: 1768-1770. Comeau, A.M., Bertrand, C., Letarov, A., Tétart, F., and Krisch, H.M. (2007) Modular architecture of the T4 phage superfamily: a conserved core genome and a plastic periphery. Virology. 362: 384-396. Doulatov, S., Hodes, A., Dai, L., Mandhana, N., Liu, M., Deora, R., et al. (2004) Tropism switching in Bordetella bacteriophage defines a family of diversity-generating retroelements. Nature. 431: 476-481. Dunn, J.J., Studier, F.W., and Gottesman, M. (1983) Complete nucleotide sequence of bacteriophage T7 DNA and the locations of T7 genetic elements. J. Mol. Biol. 166: 477-535. Edgar, R.C. (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC. Bioinformatics. 5: 113. Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32: 1792-1797. Enav, H., Béjà, O., and Mandel-Gutfreund, Y. (2012) Cyanophage tRNAs may have a role in cross-infectivity of oceanic Prochlorococcus and Synechococcus hosts. ISME J. 6: 619-628. Filée, J., Bapteste, E., Susko, E., and Krisch, H.M. (2006) A selective barrier to horizontal gene transfer in the T4-type bacteriophages that has preserved a core genome with the viral replication and structural genes. Mol. Biol. Evol. 23: 1688-1696.

   

25  

664   665   666   667   668   669   670   671   672   673   674   675   676   677   678   679   680   681   682   683   684   685   686   687   688   689   690   691   692   693   694   695   696   697   698   699   700   701   702   703   704   705   706   707   708   709   710   711   712   713  

Frias-Lopez, J., Shi, Y., Tyson, G.W., Coleman, M.L., Schuster, S.C., Chisholm, S.W., and DeLong, E.F. (2008) Microbial community gene expression in ocean surface waters. Proc. Natl. Acad. Sci. U. S. A. 105: 3805-3810. Fuhrman, J.A. (1999) Marine viruses and their biogeochemical and ecological effects. Nature. 399: 541-548. Funk, C., and Vermaas, W. (1999) A cyanobacterial gene family coding for single-helix proteins resembling part of the light-harvesting proteins from higher plants. Biochemistry. 38: 93979404. Ghai, R., Martin-Cuadrado, A.B., Molto, A.G., Heredia, I.G., Cabrera, R., Martin, J., et al. (2010) Metagenome of the Mediterranean deep chlorophyll maximum studied by direct and fosmid library 454 pyrosequencing. ISME J. 4: 1154-1166. Guindon, S., Dufayard, J.F., Lefort, V., Anisimova, M., Hordijk, W., and Gascuel, O. (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59: 307-321. Guy, L., Kultima, J.R., and Andersson, S.G. (2010) genoPlotR: comparative gene and genome visualization in R. Bioinformatics. 26: 2334-2335. He, Q., Dolganov, N., Bjorkman, O., and Grossman, A.R. (2001) The high light-inducible polypeptides in Synechocystis PCC6803. Expression and function in high light. J. Biol. Chem. 276: 306-314. Hendrix, R.W., Smith, M.C., Burns, R.N., Ford, M.E., and Hatfull, G.F. (1999) Evolutionary relationships among diverse bacteriophages and prophages: all the world's a phage. Proc. Natl. Acad. Sci. U. S. A. 96: 2192-2197. Henn, M.R., Sullivan, M.B., Stange-Thomann, N., Osburne, M.S., Berlin, A.M., Kelly, L., et al. (2010) Analysis of High-Throughput Sequencing and Annotation Strategies for Phage Genomes. PLoS One. 5: e9083. Huang, S., Wang, K., Jiao, N., and Chen, F. (2011) Genome sequences of siphoviruses infecting marine Synechococcus unveil a diverse cyanophage group and extensive phage-host genetic exchanges. Environ. Microbiol. 14: 540-558. Hunter, J.D. (2007) Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9: 90-95. Ignacio-Espinoza, J.C., and Sullivan, M.B. (2012) Phylogenomics of T4 cyanophages: lateral gene transfer in the 'core' and origins of host genes. Environ. Microbiol. 14: 2113-2126. Jones, D.T., Taylor, W.R., and Thornton, J.M. (1992) The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8: 275-282. Kelly L, Ding H, Huang KH, Osburne M, Chisholm SW. (submitted) Features of cyanomyophage gene content and evolution in wild populations and cultured genomes. ISME J. Kelly, L., Huang, K.H., Ding, H., and Chisholm, S.W. (2012) ProPortal: a resource for integrated systems biology of Prochlorococcus and its phage. Nucleic Acids Res. 40: D632-D640. Kettler, G.C., Martiny, A.C., Huang, K., Zucker, J., Coleman, M.L., Rodrigue, S., et al. (2007) Patterns and implications of gene gain and loss in the evolution of Prochlorococcus. PLoS Genet. 3: e231. Krakauer, D.C., and Jansen, V.A. (2002) Red queen dynamics of protein translation. J. Theor. Biol. 218: 97-109. Labonté, J.M., Reid, K.E., and Suttle, C.A. (2009) Phylogenetic analysis indicates evolutionary diversity and environmental segregation of marine podovirus DNA polymerase gene sequences. Appl. Environ. Microbiol. 75: 3634-3640. Lavigne, R., Seto, D., Mahadevan, P., Ackermann, H.W., and Kropinski, A.M. (2008) Unifying classical and molecular taxonomic classification: analysis of the Podoviridae using BLASTP-based tools. Res. Microbiol. 159: 406-414. Leiman, P.G., Kanamaru, S., Mesyanzhinov, V.V., Arisaka, F., and Rossmann, M.G. (2003) Structure and morphogenesis of bacteriophage T4. Cell. Mol. Life Sci. 60: 2356-2370.

   

26  

714   715   716   717   718   719   720   721   722   723   724   725   726   727   728   729   730   731   732   733   734   735   736   737   738   739   740   741   742   743   744   745   746   747   748   749   750   751   752   753   754   755   756   757   758   759   760   761   762   763  

Leplae, R., Lima-Mendez, G., and Toussaint, A. (2009) ACLAME: A CLAssification of Mobile genetic Elements, update 2010. Nucleic Acids Res. 38: D57-D61. Letunic, I., and Bork, P. (2007) Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics. 23: 127-128. Letunic, I., and Bork, P. (2011) Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic Acids Res. 39: W475-W478. Limor-Waisberg, K., Carmi, A., Scherz, A., Pilpel, Y., and Furman, I. (2011) Specialization versus adaptation: two strategies employed by cyanophages to enhance their translation efficiencies. Nucleic Acids Res. 39: 6016-6028. Lindell, D., Jaffe, J.D., Coleman, M.L., Futschik, M.E., Axmann, I.M., Rector, T., et al. (2007) Genome-wide expression dynamics of a marine virus and host reveal features of coevolution. Nature. 449: 83-86. Lindell, D., Jaffe, J.D., Johnson, Z.I., Church, G.M., and Chisholm, S.W. (2005) Photosynthesis genes in marine viruses yield proteins during host infection. Nature. 438: 86-89. Lindell, D., Sullivan, M.B., Johnson, Z.I., Tolonen, A.C., Rohwer, F., and Chisholm, S.W. (2004) Transfer of photosynthesis genes to and from Prochlorococcus viruses. Proc. Natl. Acad. Sci. U. S. A. 101: 11013-11018. Liu, H., Campbell, L., Landry, M.R., Nolla, H.A., Brown, S.L., and Constantinou, J. (1998) Prochlorococcus and Synechococcus growth rates and contributions to production in the Arabian Sea during the 1995 Southwest and Northeast Monsoons. Deep-Sea Res. Part II. 45: 2327-2352. Liu, H., Nolla, H.A., and Campbell, L. (1997) Prochlorococcus growth rate and contribution to primary production in the equatorial and subtropical North Pacific Ocean. Aquat. Microb. Ecol. 12: 39-47. Liu, M., Deora, R., Doulatov, S.R., Gingery, M., Eiserling, F.A., Preston, A., et al. (2002) Reverse transcriptase-mediated tropism switching in Bordetella bacteriophage. Science. 295: 2091-2094. Liu, M., Gingery, M., Doulatov, S.R., Liu, Y., Hodes, A., Baker, S., et al. (2004) Genomic and genetic analysis of Bordetella bacteriophages encoding reverse transcriptase-mediated tropism-switching cassettes. J. Bacteriol. 186: 1503-1517. Liu, X., Kong, S., Shi, M., Fu, L., Gao, Y., and An, C. (2008) Genomic analysis of freshwater cyanophage Pf-WMP3 Infecting cyanobacterium Phormidium foveolarum: the conserved elements for a phage. Microb. Ecol. 56: 671-680. Liu, X., Shi, M., Kong, S., Gao, Y., and An, C. (2007) Cyanophage Pf-WMP4, a T7-like phage infecting the freshwater cyanobacterium Phormidium foveolarum: complete genome sequence and DNA translocation. Virology. 366: 28-39. Malmstrom, R.R., Rodrigue, S., Huang, K.H., Kelly, L., Kern, S.E., Thompson, A., et al. (2012) Ecology of uncultured Prochlorococcus clades revealed through single-cell genomics and biogeographic analysis. ISME J. 10.1038/ismej.2012.89. Mann, N.H. (2003) Phages of the marine cyanobacterial picophytoplankton. FEMS Microbiol. Rev. 27: 17-34. Marchler-Bauer, A., and Bryant, S.H. (2004) CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 32: W327-W331. McDaniel, L., Houchin, L.A., Williamson, S.J., and Paul, J.H. (2002) Lysogeny in marine Synechococcus. Nature. 415: 496. Millard, A., Clokie, M.R., Shub, D.A., and Mann, N.H. (2004) Genetic organization of the psbAD region in phages infecting marine Synechococcus strains. Proc. Natl. Acad. Sci. U. S. A. 101: 11007-11012. Millard, A.D., Zwirglmaier, K., Downey, M.J., Mann, N.H., and Scanlan, D.J. (2009) Comparative genomics of marine cyanomyoviruses reveals the widespread occurrence of

   

27  

764   765   766   767   768   769   770   771   772   773   774   775   776   777   778   779   780   781   782   783   784   785   786   787   788   789   790   791   792   793   794   795   796   797   798   799   800   801   802   803   804   805   806   807   808   809   810   811   812   813   814  

Synechococcus host genes localized to a hyperplastic region: implications for mechanisms of cyanophage evolution. Environ. Microbiol. 11: 2370-2387. Molineux, I. (2006) The T7 group, In The bacteriophages. Calendar, R. (eds). New York: Oxford, UK: Oxford University Press, pp. 277-301. Ortmann, A.C., Lawrence, J.E., and Suttle, C.A. (2002) Lysogeny and Lytic Viral Production during a Bloom of the Cyanobacterium Synechococcus spp. Microb. Ecol. 43: 225-231. Partensky, F., Hess, W.R., and Vaulot, D. (1999) Prochlorococcus, a marine photosynthetic prokaryote of global significance. Microbiol. Mol. Biol. Rev. 63: 106-127. Pavlova, O., Lavysh, D., Klimuk, E., Djordjevic, M., Ravcheev, D.A., Gelfand, M.S., et al. (2012) Temporal Regulation of Gene Expression of the Escherichia coli Bacteriophage phiEco32. J. Mol. Biol. 416: 389-399. Pope, W.H., Weigele, P.R., Chang, J., Pedulla, M.L., Ford, M.E., Houtz, J.M., et al. (2007) Genome sequence, structural proteins, and capsid organization of the cyanophage Syn5: a "horned" bacteriophage of marine Synechococcus. J. Mol. Biol. 368: 966-981. Punta, M., Coggill, P.C., Eberhardt, R.Y., Mistry, J., Tate, J., Boursnell, C., et al. (2012) The Pfam protein families database. Nucleic Acids Res. 40: D290-D301. Rusch, D.B., Halpern, A.L., Sutton, G., Heidelberg, K.B., Williamson, S., Yooseph, S., et al. (2007) The Sorcerer II Global Ocean Sampling expedition: Northwest Atlantic through Eastern Tropical Pacific. PLoS Biology. 5: 398-431. Sabehi, , G., and Lindell D (2012) The P-SSP7 Cyanophage Has a Linear Genome with Direct Terminal Repeats. PLoS One. 7: e36710. Sabehi, G., Shaulov, L., Silver, D.H., Yanai, I., Harel, A., and Lindell, D. (2012) A novel lineage of myoviruses infecting cyanobacteria is widespread in the oceans. Proc. Natl. Acad. Sci. U. S. A. 109: 2037-2042.Scanlan, D.J., and West, N.J. (2002) Molecular ecology of the marine cyanobacterial genera Prochlorococcus and Synechococcus. FEMS Microbiol. Ecol. 40: 112. Sharon, I., Alperovitch, A., Rohwer, F., Haynes, M., Glaser, F., Atamna-Ismaeel, N., et al. (2009) Photosystem I gene cassettes are present in marine virus genomes. Nature. 461: 258-262. Shimodaira, H. (2002) An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51: 492-508. Sonnhammer, E.L., Eddy, S.R., Birney, E., Bateman, A., and Durbin, R. (1998) Pfam: multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Research. 26: 320-322. Studier, F.W. (1972) Bacteriophage T7. Science. 176: 367-376. Studier, F.W., and Maizel, J.V. (1969) T7-directed protein synthesis. Virology. 39: 575-586. Sullivan, M.B., Coleman, M.L., Weigele, P., Rohwer, F., and Chisholm, S.W. (2005) Three Prochlorococcus Cyanophage Genomes: Signature Features and Ecological Interpretations. PLoS Biol. 3: e144. Sullivan, M.B., Huang, K.H., Ignacio-Espinoza, J.C., Berlin, A.M., Kelly, L., Weigele, P.R., et al. (2010) Genomic analysis of oceanic cyanobacterial myoviruses compared with T4-like myoviruses from diverse hosts and environments. Environ. Microbiol. 12: 3035-3056. Sullivan, M.B., Krastins, B., Hughes, J.L., Kelly, L., Chase, M., Sarracino, D., and Chisholm, S.W. (2009) The genome and structural proteome of an ocean siphovirus: a new window into the cyanobacterial 'mobilome'. Environ. Microbiol. 11: 2935-2951. Sullivan, M.B., Lindell, D., Lee, J.A., Thompson, L.R., Bielawski, J.P., and Chisholm, S.W. (2006) Prevalence and evolution of core photosystem II genes in marine cyanobacterial viruses and their hosts. PLoS Biology. 4: 1344-1357. Sullivan, M.B., Waterbury, J.B., and Chisholm, S.W. (2003) Cyanophages infecting the oceanic cyanobacterium Prochlorococcus. Nature. 424: 1047-1051. Summers, W.C., and Szybalski, W. (1968) Totally asymmetric transcription of coliphage T7 in vivo: correlation with poly G binding sites. Virology. 34: 9-16.

   

28  

815   816   817   818   819   820   821   822   823   824   825   826   827   828   829   830   831   832   833   834   835   836   837   838   839   840   841   842   843   844   845   846   847   848   849   850   851   852   853   854   855   856   857   858   859   860   861   862   863  

Suttle, C.A., and Chan, A.M. (1994) Dynamics and Distribution of Cyanophages and Their Effect on Marine Synechococcus spp. Appl. Environ. Microbiol. 60: 3167-3174. Tettelin, H., Masignani, V., Cieslewicz, M.J., Donati, C., Medini, D., Ward, N.L., et al. (2005) Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". Proc. Natl. Acad. Sci. U. S. A. 102: 13950-13955. Thingstad, T.F. (2000) Elements of a theory for the mechanisms controlling abundance, diversity, and biogeochemical role of lytic bacterial viruses in aquatic systems. Limnol. Oceanogr. 45: 1320-1328. Thompson, L.R., Zeng, Q., Kelly, L., Huang, K.H., Singer, A.U., Stubbe, J., and Chisholm, S.W. (2011) Phage auxiliary metabolic genes and the redirection of cyanobacterial host carbon metabolism. Proc. Natl. Acad. Sci. U. S. A. 108: E757-E764. Uhl, M.A., and Miller, J.F. (1996) Integration of multiple domains in a two-component sensor protein: the Bordetella pertussis BvgAS phosphorelay. EMBO Journal. 15: 1028-1036. van Dongen, S., and Abreu-Goodger, C. (2012) Using MCL to extract clusters from networks. Methods. Mol. Biol. 804: 281-295. Vogel, J., Axmann, I.M., Herzel, H., and Hess, W.R. (2003) Experimental and computational analysis of transcriptional start sites in the cyanobacterium Prochlorococcus MED4. Nucleic Acids Res. 31: 2890-2899. Waterbury, J.B., and Valois, F.W. (1993) Resistance to cooccurring phages enables marine Synechococcus communities to coexist with cyanophages abundant in seawater. Appl. Environ. Microbiol. 59: 3393-3399. Waterbury, J.B., Watson, S.W., Valois, F.W., and Franks, D.G. (1986) Biological and ecological characterization of the marine unicellular cyanobacterium Synechococcus. Can. Bull. Fish. Aquat.. Sci. 214: 71-120. Weigele, P.R., Pope, W.H., Pedulla, M.L., Houtz, J.M., Smith, A.L., Conway, J.F., et al. (2007) Genomic and structural analysis of Syn9, a cyanophage infecting marine Prochlorococcus and Synechococcus. Environ. Microbiol. 9: 1675-1695. Weinbauer, M.G., and Rassoulzadegan, F. (2004) Are viruses driving microbial diversification and diversity? Environ. Microbiol. 6: 1-11. Wickham, H. (2009) Ggplot2 : elegant graphics for data analysis. New York: Springer. Williamson, S.J., Rusch, D.B., Yooseph, S., Halpern, A.L., Heidelberg, K.B., Glass, J.I., et al. (2008) The Sorcerer II Global Ocean Sampling Expedition: Metagenomic Characterization of Viruses within Aquatic Microbial Samples. PLoS ONE. 3: e1456. Wolfson, J., Dressler, D., and Magazin, M. (1972) Bacteriophage T7 DNA replication: a linear replicating intermediate (gradient centrifugation-electron microscopy-E. coli-DNA partial denaturation). Proc. Natl. Acad. Sci. U. S. A. 69: 499-504. Wommack, K.E., and Colwell, R.R. (2000) Virioplankton: viruses in aquatic ecosystems. Microbiol. Mol. Biol. Rev. 64: 69-114. Yerrapragada, S., Siefert, J.L., and Fox, G.E. (2009) Horizontal gene transfer in cyanobacterial signature genes. Methods Mol. Biol. 532: 339-366. Zdobnov, E.M., and Apweiler, R. (2001) InterProScan--an integration platform for the signaturerecognition methods in InterPro. Bioinformatics. 17: 847-848. Zeidner, G., Bielawski, J.P., Shmoish, M., Scanlan, D.J., Sabehi, G., and Béjà, O. (2005) Potential photosynthesis gene recombination between Prochlorococcus and Synechococcus via viral intermediates. Environ. Microbiol. 7: 1505-1513. Zeng, Q., and Chisholm, S.W. (2012) Marine Viruses Exploit Their Host's Two-Component Regulatory System in Response to Resource Limitation. Curr. Biol. 22: 124-128.

 

   

 

29  

864   865   866  

Tables Table 1. General features of the cyanopodoviruses from this study, and of those whose genomes have been previously published. MPP

1

Original host

# ORFs

Host Phage %GC %GC content content Site of origin

1-Sep-99

HQ634152

This study

100

64°16’W 64°16’W

5-Jun-96

HQ337022

This study

25m

22° 45'N 158°00'W

8-Mar-06

GU071104

This study

Aug-95

HQ332140

This study

1-Sep-99

NC_006882

(Sullivan et al., 2005)

31-Aug-95

HQ332137

This study

64°16’W 34°55’E

31-Aug-95

GU071107

This study

13-Sep-00

GU071102

This study

100

31°48’N

64°16’W

31-Aug-95

HQ316584

This study

HL(I)

47039

54

30.8

39.2

P-SSP10

Prochlorococcus NATL2A

LL(I)

47325

52

35

39.2

BATS

P-HP1

Prochlorococcus NATL2A

LL(I)

47536

65

35

39.9

HOTS

4

MPP-B2 P-GSP1

Prochlorococcus MED4

HL(I)

44945

53

30.8

39.6

80

38°21’N

66°49’W

P-SSP7

Prochlorococcus MED4

HL(I)

44970

54

30.8

38.8

BATS

5

100

31°48’N

64º16’W

P-SSP3

Prochlorococcus MIT9312 HL(II)

46198

56

31.2

37.9

BATS

100

31°48’N

64°16’W

P-SSP2

Prochlorococcus MIT9312 HL(II)

45890

59

31.2

37.9

BATS

120

P-RSP5

Prochlorococcus NATL1A

47741

68

35.1

38.7

Red Sea

130

31°48’N 29°28’N

BATS

1

40.5

Gulf Stream

100

Long.

Date water sampled Accession # Reference

Lat. 31°48’N 31°48’N

Prochlorococcus MIT9515

LL(I)

BATS

Depth

MPP-B1 P-SSP11

MPP-A

867   868   869   870   871   872  

Phage

Host Genome Clade2 size (kb)

P-SSP9

Prochlorococcus SS-120

LL(II)

46997

53

36.4

SYN5

Synechococcus WH8109

Syn.

46214

61

60.1

55

Sargasso Sea

Surface

36°58’N

73°42’W

30-Nov-86 NC_009531

(Pope et al., 2007)

P60

Synechococcus WH7803

Syn.

47872

80

60.2

53.2

Satilla River6

Surface

-

-

12-Jul-88

AF338467

(Chen and Lu, 2002)

-

P-RSP2

Prochlorococcus MIT9302 HL(II)

42257

48

-

34

Red Sea

Surface

29°28’N

34°53’E

14-Jul-96

HQ332139

This study

-

Pf-WMP3 Leptolyngbya foveolarum

FC3

43249

41

-

46.5

Lake Weiming

nd

-

-

22-Jul-03

EF537008.1

(Liu et al., 2008)

-

Pf-WMP4 Leptolyngbya foveolarum

FC

40938

55

-

51.8

Lake Weiming

nd

-

-

22-Jul-03

DQ875742.1

(Liu et al., 2007)

Classification of phage genomes based on the concatenated core genes phylogeny. “–“ indicates a phage that is not classified in one of the three groups (Fig. 1). 2 Clade names for Prochlorococcus as defined in Rocap et al., (2002) 3 FC = Freshwater cyanophage 4 HOTS = Hawaii Ocean Time Series Station 5 BATS = Bermuda Atlantic Time Series Station 6 Satilla River: estuary - salinity = 30‰ – See note added in proofs.

Table 2. Host range of some of the cyanopodoviruses reported here. + indicates successful infection; - indicates no infection. Clade designations for Prochlorococcus refer to light adaptation properties of host cells as defined in Rocap and colleagues (2002). [Correction added on 29 January 2013 after first online publication: P-SSP3 and P-SSP10 were removed from Table 2 as irregularities were detected in the lysates after publication. This does not affect the genomic data or any of the conclusions of the paper.] P-RSP5

P-RSP2

P-SSP11

-

+

-

-

-

-

+

+

-

-

-

-

HL(I)

-

-

-

-

-

+

LL(I)

-

+ -

+ +

-

-

-

-

-

-

-

P-GSP1

P-HP1

Phage P-SSP7

873   874   875   876   877   878  

Prochlorococcus MIT9302 Prochlorococcus MIT9312 Prochlorococcus MIT9215 Prochlorococcus GP2

HL(II)

-

Prochlorococcus MIT9202 Prochlorococcus AS9601 Prochlorococcus MIT9301

HL(II) HL(II)

-

Prochlorococcus MED4

HL(I)

Prochlorococcus MIT9515 Prochlorococcus NATL2A

Host strains tested

Host clade HL(II) HL(II) HL(II) HL(II)

Prochlorococcus NATL1A

LL(I)

-

Prochlorococcus MIT9313

LL(IV)

-

   

+

31

Table 3. Relatively conserved genes in cyanopodoviruses. Core genes of marine cyanopodoviruses are shown in bold. Classes of genes are as defined for P-SSP7 by Lindell et al. (2007), depicting the order of the timing of their transcription (see Fig. 3). Class II-b genes, which include talC, are transcribed with Class II genes, even though they are positioned at the end of the genome (Lindell et al., 2007) Freshwater Cyano T7-like phage

Class III

Class II-b

P-SSP9

gp28

gp54

gp29

gp15

gp6

Pf-WMP4

Syn5

gp51

Pf-WMP3

P-SSP10

gp11

P60 *

P-SSP11

gp42

P-RSP5

gp29

P-HP1

gp13

P-GSP1

RNA polymerase

P-SSP3

Class II

Putative Function P-SSP2

Gene Class

P-RSP2

Marine cyanopodoviruses

P-SSP7

879   880   881   882  

-

gp6

-

-

SSB

gp14

gp30

gp41

gp10

gp50

gp26

gp53

gp28

gp21

gp1

gp47

-

-

-

Endonuclease

gp15

gp31 gp40

gp9

gp49 gp25

gp52

gp26

gp22

gp52

gp46

gp16-17

-

gp17

Primase/Helicase

gp16

gp32 gp39

gp8

gp48 gp24

gp51

gp25

gp24

gp50

gp45

gp18

gp9

gp12

DNA polymerase

gp17

gp34 gp38

gp7

gp46 gp23

gp50

gp24

gp27

gp49

gp44

gp20

gp12-14

gp19

Exonuclease

gp19

gp35 gp37

gp6

gp44 gp22

gp49

gp23

gp29

gp47

gp42

gp21

-

-

Rnr

gp20

gp38

gp35

gp4

gp41

gp19

gp46

gp20

gp33

gp44

gp40

-

-

-

gp34

gp21

gp39

gp34

gp3

gp40

gp18

gp45

gp19

gp34

gp43

gp39

-

-

-

-

gp22

gp40 gp33

gp52

gp39 gp17

gp44

gp18

gp35

gp42

gp37

gp28-43

-

-

Portal

gp24

gp42 gp31

gp50

gp37 gp13

gp42

gp16

gp37

gp40

gp35

gp41

-

-

Scaffolding protein

gp25

gp43 gp30

gp49

gp36 gp11

gp41

gp15

gp38

gp38

gp33

gp38-39

-

-

Hli

gp26

gp44

gp29

gp48

gp35

gp9

gp40.5

gp14

-

gp38.5

-

-

-

-

PsbA

gp27

gp46

gp27

gp47

gp34

gp8

gp40

gp13

-

-

-

-

-

-

MCP

gp29

gp48 gp25

gp46

gp29

gp5

gp36

gp8

gp39

gp37

gp32

gp37

gp32

-

Tail tube A

gp30

gp50 gp23

gp45

gp28

gp2

gp35

gp7

gp40

gp36

gp31

gp35-36

-

-

Tail tube B

gp31

gp51 gp22

gp44

gp27

gp1

gp33-34

gp6

gp41

gp35

gp29

gp33-34

-

-

gp32

-

gp32

gp53

gp20

gp43

gp26

gp68

gp5

gp42

gp34

-

-

-

-

Internal core protein

gp35

gp56

gp17

gp39

gp19

gp65 gp27-26

gp2

gp45

gp31

gp26

-

-

-

Tail fiber

gp36

gp57

gp16

gp38-35

gp16

gp64

gp25

gp1

gp46

gp30-28

gp25

-

-

-

-

gp43

gp2

gp10

gp32

gp9

gp60

gp20

gp49

-

gp23

gp16

-

-

-

-

gp45

gp4

gp8

gp30

gp07

gp59

gp19

gp48

-

gp21

gp18

-

-

-

gp49

gp47

gp6

gp7

gp26

gp5

gp56

gp17

gp46

gp49

gp18

gp11

gp70

-

-

Terminase

gp51

gp10

gp3

gp21

gp1

gp51

gp13

gp48

gp60

gp14

gp9

gp54-55

gp36

gp40

TalC

gp54

gp12

gp1

gp19

gp2

gp50

gp14

gp46

-

-

-

-

-

-

P-SSP7

Orthologs present in cyanomyoviruses

Orthologs present in hosts

Table 4. Genes found in Island II (Fig 3, 5) – an island found in all but 3 of the cyanopodoviruses in Fig. 3 – showing whether they have orthologs in host genomes (Prochlorococcus and Synechococcus), and/or those of cyanomyoviruses.

P-SSP11

884   885   886  

gp40

gp27

+

+

gp40.5 gp26

+

+

887   888   889   890   891   892   893   894   895   896   897   898   899  

901   902   903   904  

P-SSP3

P-SSP2

P-SSP10

P-RSP5

P-HP1

P-GSP1

Phage

Cluster name1

Putative function

PsbA_COG

PsbA

gp47 gp34

gp9

gp13 gp46 gp27

Hli_COG

Hli

gp48 gp35

gp8

gp14 gp44 gp29

gp33

gp7

2

gp222_COG

gp222

gp12 gp45 gp28

gp39

+

-

gp30_COG

hypothetical protein

gp30 gp33

gp9

gp37

+

-

gp32_COG

hypothetical protein

gp32

gp11

gp38

-

+

gp47_COG

hypothetical protein

-

-

orphan

hypothetical protein

gp10

-

-

orphan

hypothetical protein

gp6

-

-

orphan

hypothetical protein

-

-

orphan

hypothetical protein

-

-

orphan

hypothetical protein

-

-

1 2

gp47 gp26

gp10 gp28 gp31

Cluster names refer to the putative function or a phage gene representing the cluster

gp222: conserved hypothetical protein

Table 5. The only genome differences between the most closely related cyanopodoviruses, P-SSP2 and P-SSP3, which were isolated from the same site, on the same host. The remainder of the proteins share ≥95% identity (see also Fig. 4). 900   Orthologous proteins P-SSP2

P-SSP3

% id (aa)

Putative function

gp14

gp55

76.4

Hypothetical protein

gp17

absent

-

Hypothetical protein

gp18

gp52

66.3

Hypothetical protein

gp32

gp39



Primase/helicase

gp57

gp16

77.7

Tail fiber

absent

gp15

-

Hypothetical protein

absent

gp14

-

Hypothetical protein

† Frameshifts -- High similarity between the nucleotide sequences

   

33

905   906   907   908  

Table 6. Cyanopodovirus genes shared with (A) picocyanobacterial hosts, Synechococcus and Prochlorococcus (“phage/host share genes”), (B) cyanomyoviruses (“podo/myo shared genes”), or (C) both (phage/host and podo/myo shared genes”). Single-strand binding protein (SSB, bolded) is the only core gene in this set.

A

B

C

909   910   911   912   913   914   915   916   917   918   919   920   921   922   923   924   925  

Syn5

P-SSP9

P-SSP11

P-SSP3

P-SSP2

P-SSP10

P-RSP5

P-RSP2

P-HP1

Putative function DNA primase RNA polymerase gp 0.72 SSB3 Class II Unknown Class III Unknown Unknown Thymidylate synthase

P-GSP1

Class1 Class I

P-SSP7

Phage

gp13 gp11 gp51 gp28 gp29 gp29 gp42 gp54 gp11 gp26 gp44 gp14 gp10 gp50 gp47 gp26 gp28 gp30 gp41 gp53 gp32 gp11 gp38 gp41 gp65 gp48 gp40 -

gp10 gp6 gp15 gp4 gp1 gp21 gp61

Class I

Endonuclease Unknown Unknown Class II gp2224 Unknown Class III Unknown Unknown Unknown Endonuclease Unknown Unknown Unknown

gp46 gp33 gp7 gp12 gp45 gp28 gp30 gp33 gp9 gp43 gp32 gp9 gp60 gp49 gp2 gp20 gp30 gp8 gp29 gp43 gp43 gp49 -

gp48 gp25 gp23 gp61

Class II

gp26 gp27 gp49 gp54

Hli PsbA Class III HNH endonuclease Class IIb TalC

gp48 gp35 gp8 gp47 gp34 gp9 gp25 gp3 gp10 gp19 gp2 gp50

gp39 gp37 -

gp14 gp44 gp29 gp40.5 gp38.5 gp13 gp46 gp27 gp40 gp44 gp8 gp5 gp15 gp43 gp12 gp1 gp14 -

-

1

Class of genes as defined for P-SSP7 by Lindell et al. (2007), according to the timing of their transcription gp 0.7: transcriptional regulator 3 Core gene, SSB: Single Strand Binding protein. 4 gp222: conserved hypothetical protein. 2

   

34  

926   927  

Supplemental table 1. Cyanophage and picocyanobacterial genomes used for the protein clustering analysis. Bacterial or viral strain

Genome size (kb)

NCBI accession number

Cyanophage S-TIM5 Syn33 syn9 MED4-213 P-HM1 P-HM2 P-RSM1 P-RSM3

152.3 174.3 177.3 181.0 181.0 183.8 177.2 178.8

JQ245707.1 GU071108.1 DQ149023.2

P-RSM4 P-SSM2 P-SSM3 P-SSM4 P-SSM5 P-SSM7 S-PM2 P-RSM6 S-RSM4 S-SM1 S-SM2 S-SSM4 S-SSM5 S-SSM7 S-ShM2 Syn1 Syn10 Syn19 Syn2 Syn30 P60 P-SSP7 P-RSP5

176.4 252.4 179.1 178.2 252.0 182.2 196.3 192.5 194.5 174.1 190.8 182.8 176.2 232.9 179.6 191.2 177.1 175.2 175.6 178.8 47.9 45.0 47.7

GU071099.1 GU071092.1

P-SSP2 P-SSP9 P-SSP10 P-GSP1 P-HP1 P-SSP11 Syn5 P-RSP2

   

GU071101.1 GU075905.1

CAMERA accession number

BROADPHAGEGENOMES_SMPL_SYN33_G1163 CAM_SMPL_001226 BROADPHAGEGENOMES_SMPL_P-HM1_G1154 BROADPHAGEGENOMES_SMPL_P-HM2_G1155 CAM_SMPL_001227 CAM_SMPL_001229

AF338467.1 AY939843.2 GU071102.1

BROADPHAGEGENOMES_SMPL_P-RSM4_G1161 CAM_SMPL_000950 CAM_SMPL_000897 CAM_SMPL_000949 BROADPHAGEGENOMES_SMPL_P-SSM7_G1169 CAM_SMPL_001230 BROADPHAGEGENOMES_SMPL_S-SM1_G1061 BROADPHAGEGENOMES_SMPL_S-SM2_G1159 CAM_SMPL_000897 BROADPHAGEGENOMES_SMPL_S-SSM5_G1166 BROADPHAGEGENOMES_SMPL_S-SSM7_G1167 BROADPHAGEGENOMES_SMPL_S-SHM2_G1164 BROADPHAGEGENOMES_SMPL_SYN1_G1160 CAM_SMPL_001202 BROADPHAGEGENOMES_SMPL_SYN19_G1165 CAM_SMPL_001201 CAM_SMPL_001200 BROADPHAGEGENOMES_SMPL_NATL1A-7_G1172

45.9 47.0 47.3 44.9

GU071107.1 GU071104.1

CAM_SMPL_000899

47.5 47.0 46.2 42.3

GU071104.1

AY940168.2 GU071103.1 AJ630128.1 FM207411.1 GU071094.1 GU071095.1 GU071097.1 GU071098.1 GU071096.1 GU071105.1 GU071106.1

CAM_SMPL_000948

EF372997.1

BROADPHAGEGENOMES_SMPL_NATL2A-133_G1171 CAM_SMPL_000947 CAM_SMPL_000945

35  

P-SSP3 MED4-184 MED4-117 P-SS2

47.1 38.3 38.8 107.5

Cyanobacteria Prochlo. MED4 Prochlo. MIT9313 Prochlo. MIT9303 Prochlo. NATL1A Prochlo. NATL2A Prochlo. AS9601 Prochlo. MIT9515 Prochlo. MIT9215 Prochlo. MIT9211 Prochlo. MIT9312 Prochlo. SS120 Prochlo. MIT9301 Synecho. CC9311 Synecho. CC9605 Synecho. CC9902 Synecho. WH8102 Synecho. WH7803 Synecho. RCC307 Synecho. WH7805 Prochlo. MIT9202 Synecho. BL107 Synecho. RS9917 Synecho. RS9916 Synecho. WH5701

1657 2410 2682 1864 1842 1669 1704 1738 1688 1709 1751 1641 2606 2510 2234 2434 2366 2224 2620 1691 2283 2579 2664 3043

GQ334450.1

BX548174 BX548175 CP000554 CP000553 CP000095 CP000551 CP000552 CP000825 CP000878 CP000111 AE017126 CP000576 CP000435 CP000110 CP000097 BX548020 CT971583 CT978603 AAOK00000000 AATZ00000000 AANP00000000 AAUA00000000

CAM_SMPL_000946 CAM_SMPL_001191 CAM_SMPL_001190 -

MF_SMPL_P9202 MF_SMPL_WH5701

928   929   930   931   932   933   934   935   936   937   938   939   940   941   942  

   

36  

943   944  

Supplemental table 2. Cyanophage reference genomes used for the metagenomic read recruitment. Phage family Myo.

Genome size (kb)

NCBI accession number

152.3

JQ245707.1

-

Syn33

Myo.

174.3

GU071108.1

BROADPHAGEGENOMES_SMPL_SYN33_G1163

syn9

Myo.

177.3

DQ149023.2

-

MED4-213

Myo.

181.0

P-HM1

Myo.

181.0

GU071101.1

BROADPHAGEGENOMES_SMPL_P-HM1_G1154

P-HM2

Myo.

183.8

GU075905.1

BROADPHAGEGENOMES_SMPL_P-HM2_G1155

P-RSM1

Myo.

177.2

P-RSM3

Myo.

178.8

P-RSM4

Myo.

176.4

GU071099.1

BROADPHAGEGENOMES_SMPL_P-RSM4_G1161

P-SSM2

Myo.

252.4

GU071092.1

-

P-SSM3

Myo.

179.1

P-SSM4

Myo.

178.2

P-SSM5

Myo.

252.0

P-SSM7

Myo.

182.2

GU071103.1

BROADPHAGEGENOMES_SMPL_P-SSM7_G1169

S-PM2

Myo.

196.3

AJ630128.1

-

P-RSM6

Myo.

192.5

S-RSM4

Myo.

194.5

FM207411.1

-

S-SM1

Myo.

174.1

GU071094.1

BROADPHAGEGENOMES_SMPL_S-SM1_G1061

S-SM2

Myo.

190.8

GU071095.1

BROADPHAGEGENOMES_SMPL_S-SM2_G1159

S-SSM4

Myo.

182.8

S-SSM5

Myo.

176.2

GU071097.1

BROADPHAGEGENOMES_SMPL_S-SSM5_G1166

S-SSM7

Myo.

232.9

GU071098.1

BROADPHAGEGENOMES_SMPL_S-SSM7_G1167

S-ShM2

Myo.

179.6

GU071096.1

BROADPHAGEGENOMES_SMPL_S-SHM2_G1164

Syn1

Myo.

191.2

GU071105.1

BROADPHAGEGENOMES_SMPL_SYN1_G1160

Syn10

Myo.

177.1

Syn19

Myo.

175.2

Syn2

Myo.

175.6

Syn30

Myo.

178.8

P60

Podo.

47.9

AF338467.1

-

P-SSP7

Podo.

45.0

AY939843.2

-

P-RSP5

Podo.

47.7

GU071102.1

BROADPHAGEGENOMES_SMPL_NATL1A-7_G1172

P-SSP2

Podo.

45.9

GU071107.1

-

P-SSP9

Podo.

47.0

GU071104.1

CAM_SMPL_000899

P-SSP10

Podo.

47.3

-

P-GSP1

Podo.

44.9

CAM_SMPL_000948

P-HP1

Podo.

47.5

P-SSP11

Podo.

47.0

Syn5

Podo.

46.2

Phage S-TIM5

   

CAMERA sample accession number

CAM_SMPL_001226

CAM_SMPL_001227 CAM_SMPL_001229

CAM_SMPL_000950 AY940168.2

CAM_SMPL_000897 CAM_SMPL_000949

CAM_SMPL_001230

CAM_SMPL_000897

CAM_SMPL_001202 GU071106.1

BROADPHAGEGENOMES_SMPL_SYN19_G1165 CAM_SMPL_001201 CAM_SMPL_001200

GU071104.1

BROADPHAGEGENOMES_SMPL_NATL2A-133_G1171 CAM_SMPL_000947

EF372997.1

-

37  

P-RSP2

Podo.

42.3

CAM_SMPL_000945

P-SSP3

Podo.

47.1

CAM_SMPL_000946

MED4-184

Sipho.

38.3

CAM_SMPL_001191

MED4-117

Sipho.

38.8

CAM_SMPL_001190

P-SS2

Sipho.

107.5

GQ334450.1

-

S-CBS1

Sipho.

53.7

HM480106.1

-

S-CBS2

Sipho.

73.5

GU936714.1

-

S-CBS3

Sipho.

28.0

GU936715.1

-

S-CBS4

Sipho.

62.9

HQ698895.1

-

945   946   947   948   949  

   

38  

950   951   952  

953   954   955   956   957   958   959   960   961   962   963   964   965   966   967   968   969   970   971   972   973  

FIGURES

Figure 1. Maximum likelihood, circular phylogenetic tree of phage DNA polymerase sequences retrieved from ACLAME database (ACLAME MGEs. Version 0.4 - family_vir_ 14 (Leplae et al., 2009)). The bar represents 1 amino acid substitution per site and branches with a bootstrap value greater than 80% are indicated by a black dot. Green dots indicate marine cyanopodoviruses while the three blue dots mark DNA polymerase genes from the two freshwater cyanopodoviruses, one of which encodes DNA polymerase with two genes. The outer, middle and inner rings respectively indicate the phage families, subfamilies and genus when available in NCBI taxonomy database (http://www.ncbi.nlm.nih.gov/taxonomy).

   

39  

A

B

Prochlorococcus  cyanopodoviruses

 Prochlorococcus  cyanopodoviruses  +  Syn5 200

250

100

50

0

4

2

6

8

125

120 100 80 60

10 2

11 4

3

4

5

6

k  genomes

4 8

0

10

100 50 0

974   975   976   977   978   979   980   981   982   983  

2

4

6

8

10

12

Number  of  genomes

6

8

210

350

150

100

300 250 200 150 100

50 14 17 0

2

50 7 2 2 2 4 3 3 9 12 4 6 8 10 12

k  genomes

50

13 0

10

400

200

100

19 8 4

2

6

2 4 6

7 8

k  genomes

1

10

17

10

Prochlorococcus  cyanopodoviruses D +  Syn5  +  P60  +  Freshwater  cyanopodoviruses

Number of genes

Number  of  genes  in  k  genomes

Number  of  genes

150

4

2

154 150

Number  of  genomes

250 300

200

100

8

Prochlorococcus  cyanopodoviruses  +  Syn5  +  P60

250

150

50

19

19

20

Number  of  genomes

C

200

40

0

10

Number  of  genes  in  k  genomes

150

140

Number of genes in k genomes

Number  of  genes

200

250

Number  of  genes

Number  of  genes  in  k  genomes

160

0

2

4

6

8 10 12 14

Number of genomes

300 287 250 200 150 100 50 1417 0

2

7 2 2 2 4 3 2 8 9 1 3 6 8 10 12 14

4

k genomes

Figure 2. Core and pan-genome analysis using different sets of phage genomes in the analysis, as indicated by the headers in A-D. Left panel in each pair: Number of total genes in the core- (circles) and pan- (triangles) genomes as a function of the number of genomes included in the analysis. The core genome is the set of genes shared by all the genomes included in the analyzed subset, while the pan-genome is the total number of unique genes found in the same subset. All possible combinations of genomes were analyzed; the line is drawn through the average. Right panel in each pair: The frequency distribution of genes among the genomes, showing that genes found in only one (k=1) of the genomes are the most common (See note added in proofs for panel C)

   

40  

0.3 P60 47872 bp

    65 67 66 68 70 71 69 72 73 74 75 76 77 78 79 80

55 56 57 58 59 60 62 61 63 64

*

61

60

52

47 (holin) 49

51

55 56 57 58 59

54

53

48 50

46 (Tail fiber)

45 (ICP) (ICP)

44

42 43

41 (Tail tube B)

40 (Tail tube A)

39 (MCP)

38 (Scaff.)

(HNH)

50 (TalC) 49 48

52 51

53

62 6160 59 (gp165) 58 57 56 5455

63 (Tail fiber)

64 (Tail fiber)

65 (ICP)

66

68 (gp42) 67

1 (Tail tube B)

12 11 (Scaff.) 10 9 (PsbA) 8 (Hli) 7 (gp222) 6 tRNA 5 (MCP) 3 4 2 (Tail tube A)

18 (gp34) 16 17 1415 13 (Portal)

22 (Exonuc.) 21 20 (gp32) 19 (Rnr)

23 (DNA pol)

21 20 (gp42) 19

34 (ICP)

52 53 54 (TalC)

51

37 38 39 40 41 42 43 44 45 46 47 4849 (HNH) 50

36 (Tail fiber)

35 (ICP)

22 (Tail tube B)

32 33

2 1 (TalC)

3

1211 10 9 8 6 7 (gp49) 5 (HNH) 4

15 14 13

16 (Tail fiber)

17 (ICP)

18 (ICP)

24 23 (Tail tube A)

30 (Scaff.) 29 (Hli) 28 27 (PsbA) 26 25 (MCP)

34 (gp34) 3332 31 (Portal)

37 (Exonuc.) 36 35 (Rnr)

38 (DNA pol.)

39 (Prim./Hel.)

41 (SSB) 40 (Endonuc.)

42 (RNA pol.)

31 (Tail tube B)

30 (Tail tube A)

25 (Scaff.) 26 ((Hli) 27 (PsbA) 28 29 (MCP)

21 (gp34) 22 23 24 (Portal)

20 (Rnr)

18 19 (Exonuc.)

17 (DNA pol.)

14 (SSB) 15 (Endonuc.) 16 (Prim./Hel.)

13 (RNA pol.)

11 12 (TalC)

10

59 1 2 3 4 56 8 7 9

58

57

56 (ICP)

55 (ICP)

52 53 (gp42) 54

51 (Tail tube B)

49 50 (Tail tube A)

48 (MCP)

43 (Scaff.) 44 (Hli) 45 46 (PsbA) 47

39 (gp34) 4041 42 (Portal)

35 (Exonuc.) 3637 38 (Rnr)

33 34 (DNA pol)

30 (SSB) 31 (Endonuc.) 32 (Prim./Hel.)

29 (RNA pol.)

28

20 19 (TalC)

21

33 31 32 30 29 28 27 26 25 (HNH) 24 23 22

34

35 (T4 like PTF)

36

37

38 (Tail fiber)

39 (ICP)

43 (gp42) 42 41 40

44 (Tail tube B)

45 (Tail tube A)

46 (MCP)

49 (Scaff.) 48 (Hli) 47 (PsbA)

50 (Portal)

3 (gp34) 2 1 53 51 52 (TMP)

6 (Exonuc.) 5 4 (Rnr)

7 (DNA pol)

8 (Prim/Hel)

10 (SSB) 9 (Endonuc.)

8 (MCP)

40

41

42

48 (gp165) 4647 (gp48) 44 45 (gp50) 43 (TalC)

51 50 49

52 (PNAP)

1 (Tail fiber)

2 (IPC)

3

5 (gp42) 4

6 (Tail tube B)

7 (Tail tube A)

15 (Scaff.) 14 (Hli) 13 (PsbA) 12 (gp222) 11 10 9

16 (Portal)

19 (gp34) 18 17 (TMP)

20 (Rnr)

23 (Exonuc.) 2221 (gp32)

24 (DNA pol.)

25 (Prim./Hel.)

28 (SSB) 27 26 (Endonuc.)

29 (RNA pol.)

39 38 37 36 35 (gp9) 3433 32 31 30

12

13

24 23 22 21 20 19 (gp165) 18 (gp48) 17 15 16(gp50) 14 (TalC)

25 (Tail fiber)

26 (ICP)

27 (ICP)

18

(gp50) (Endonuc.)

66

1

13 12 11 10 9 8 7 (gp165) 6 (gp48)5 4 3 2

14 (TFP)

15

16 (Tail fiber)

17

19 (ICP)

22 21 20

23

30 29 28

26 25 24

27 (Tail tube B)

28 (Tail tube A)

33 (gp222) 3231 30 29 (MCP)

36 (Scaff.) 35 (Hli) 34 (PsbA)

40 (gp34) 39 38 (TMP) 37 (Portal)

45 44 (Exonuc.) 43 42 (gp32) 41 (Rnr)

46 (DNA pol)

47

48 (Prim./Hel.)

50 (SSB) 48 (Endonuc.)

51 (RNA pol.)

64 63 (gp12) 62 60 61 59 (gp9) 58 57 56 55 54 52 53

32 (gp42) 31

33 (Tail tube B)

35 (Tail tube A) 34 (Tail tube B)

41 (Scaff.) 40.5 (Hli) 40 (PsbA) 39 ((gp222) 38 37 36 (MCP)

42 (Portal)

45 (gp34) 44 43 (TMP)

49 (Exonuc.) 48 47 (gp32) 46 (Rnr)

50 (DNA pol)

53 (SSB) 52 (Endonuc.) 51 (Prim./Hel.)

54 (RNA pol.)

11 10 9 8 (gp12) 7 6 (gp9) 5 4 3 2 1

65

(TalC)

13

28 (Tail fber) 27 26 25 24 23 22 21 20 19 18 17 (gp50) 16 15 14

29

30 (Tail fiber)

31 (ICP)

32 (IVP)

34 (PKS) 33 (IVP)

35 (Tail tube B)

36 (Tail tube A)

39 38 (Scaff.) 38.5 (Hli) 37 (MCP)

43 (gp34) 41 42 (gp35) 40 (Portal)

48 (Endonuc.) 47 (Exonuc.) 46 45 44 (Rnr)

49 (DNA pol.)

Class Class III III

54

53

42 43 44 45 46 47 48 49 50 51 52

41 (Portal)

38 39 (Scaff.) 40

35 (Tail tube A) 36 (Tail tube A) 37 (MCP)

34 (Tail tube B)

27 28 29 30 31 32 33

37 (Portal)

34 (gp34) 35 36

33 (Rnr)

28 29 (Exonuc.) 30 31 32

27 (DNA pol)

25 26

27 26 (SSB) 25 (Endonuc.) 24 (Prim./Hel.)

Class C ass IIII Class

26

21 (Exonuc.) 22 23 24 25

20 (DNA pol)

19

18 (Prim./Hel.)

13 14 (Endonuc.) 16 15 17

24 (Prim./Hel.)

2 1 (SSB) 53 52 (Endonuc.) 51 50 (Prim./Hel.)

28 (RNA pol.)

11 (RNA pol.)

12

18 17 16 15 13 14

P-SSP11 47039 bp

12

17 (Int) 19 20 18 21 (SSB) 22 23 (Endonuc.)

5 4 (MarR) 3 (Int)

Class Class II

16

15 (RNA pol.)

6 (RNA pol.)

13 14 15 16 1718 20 19 21 2223 2526 24 27 (Int)

P-HP1 47536 bp

10 11

Syn5 46214 bp 55 54 5352 5150 49 48 4647 4544 (gp0.7) 43 (Int)

3 2 1

7 6 5 4

8

18 (gp165) 17 16 1514 13 12 (gp48) 11 10 (Endonuc.) 9

2120 19

22

24 23

25 (Tail fiber)

26 (ICP)

27

28

29 (Tail tube B)

31 (Tail tube A) 30

32 (MCP)

34 33 (Scaff.)

35 (Portal)

38 37 (TMP) 36

39 (gp 34)

41 (gp32) 40 (Rnr)

43 (Endonuc.) 42 (Exonuc.)

44 (DNA pol)

45 (Prim/Hel)

48 47 (SSB) 46 (Endonuc.)

P-RSP2 42257 bp

8 9

P-SSP9 46997 bp

7

MPP-A

6 (RNA pol.)

P-RSP5 47741 bp

7 10 13

P-SSP7 44970 bp 56

P-SSP3 46198 bp

1 2 3 4 5 6 7 8 9 10 11 (gp0.7) 12 (Int)

P-SSP2 45890 bp

47 46 4445 43 42 41 40 383937 36 35 34 323331 2930

P-GSP1 44945 bp

10 (DNA Prim.) 9 7 8

MPP-B2

11 12

P-SSP10 47325 bp

2 4 5 6 8 9 11 12 14

MPP-B1

1 3

984   985   986   987   988   989   990   991   992   993   994   995   996   997   998   999   1000   1001   1002   1003   1004   1005   1006   1007   1008   1009   1010   1011   1012   1013   1014   1015   1016   1017   1018   1019   1020   1021   1022   1023   1024   1025   1026   1027   1028   1029   1030   1031   1032   1033   1034   1 2 3 4 5

Island II

(endonuc.)

*

Figure 3. Alignment of the genomes of 12 cyanopodoviruses. Orthologous proteins represented in color other than white share 60% amino acid identity or more, while those shown in white do not. The core proteins shared by all cyanopodoviruses are linked by blue shading and genomic Island II (see Fig. 5) is highlighted by pink shading. Cyanopodovirus/host shared proteins and cyanopodovirus/cyanomyovirus shared proteins are designated by small diamonds and triangles, respectively (see also Fig. 5 & Table 6), and each different cluster is represented by a different color except for singletons that are represented in white. The phylogenetic tree on the left was generated from an alignment of the concatenated core protein sequences using a maximum likelihood method. Branches with a bootstrap value greater than 80% are indicated by a black dot. The phage genomes were classified into three groups based on the concatenated core gene phylogeny of the 12 cyanopodoviruses (Boxes – MPP-A, MPP-B1 and MPP-B2 (MPP: Marine picocyanopodovirus) ); PRSP2 is an outlier based on this analysis. The bar represents 0.3 amino acid substitutions per site. (P60 genomes – See note added in proofs)

41  

1036   1037  

   

 

Figure 4. Alignment of the genomes of phage P-SSP2 and P-SSP3. Each gene product was aligned with its homolog and the percent identity was calculated using the length of the longest protein as the denominator. The colors indicate the percent identity between proteins (from 80% to 100%) while the red shading and thin red lines indicate the zone of homology between the DNA sequences where bit score is higher than 40. Proteins in white share less than 80% identity and are reported in Table 5.

1035  

42  

1038   1039   1040   1041   1042   1043   1044   1045   1046   1047   1048   1049   1050   1051   1052   1053   1054  

Figure 5. A) Position of cyanopodovirus/host shared genes (diamonds) and cyanopodovirus/cyanomyovirus shared genes (triangles) in cyanopodovirus genomes (symbols are positioned in the middle of the genes). The position of the genes is relative to the position (marked as 0) of the ribonucleotide reductase genes (rnr). When a diamond and a triangle co-localize, the cyanopodovirus gene is shared by both host and cyanomyovirus genomes. Orthologs determined using OrthoMCL are represented in the same color. Singletons are shown in white. B) Position of flexible genes (Fig. 2B) in the genomes, according to their frequency distribution (see Fig. 2) Red diamonds indicate genes shared by 1-3 genomes; green diamonds shared by 4-6 genomes; and blue diamonds shared by 7-10 genomes. The histogram on top indicates the relative counts of genes in the various categories present in overlapping sliding windows of 500 bp. The grey shading indicates apparent genome islands. Island I is identified primarily by the set of the most hypervariable genes, occurring in only a few genomes (red diamonds, panel B), while the other two islands are evident in both panels A and B. The orange shading marks the region of the genome involved in DNA replication and transcription, which is not considered a genomic island as these genes are shared by all branches of the tree of life. C) Relative counts of core genes present in overlapping sliding windows of 500 bp.

   

43  

Viral  metagenomic datasets

Bacterial  metagenomic datasets

100%   90%  

Cyanosiphovirus  

80%  

Cyanomyovirus  

70%  

Cyanopodovirus  

60%   50%   40%   30%   20%   10%  

1055   1056   1057   1058   1059   1060   1061   1062   1063   1064  

BATS216

HOT186

HOT179

MedDCM

GOS

Marine  Virome

HOT212

HOT212 (3  cyanopodovirus  genomes)

0%  

12  cyanopodovirus  genomes

Figure 6. Proportion of reads recruited from different metagenomic datasets by different families of cyanophage. The number of recruited reads was normalized to the average size of the genome of each phage family. “Bacterial metagenomes” refers to viral sequences found in samples that were designed to collect the bacterial fraction; viruses are by-catch. “Viral metagenomes” refers to samples that were collected specifically to capture the viral fraction. For the HOT212 sample, we compare the recruitment proportions obtainedusing the cyanopodovirus genomes extant before this study (3 phage: P-SSP7, Syn5 and P60), and those obtainedusing all marine cyanopodoviruses.

   

44  

A

3.5e-07 3.0e-07

Cyanomyovirus Normalized recruited read counts

2.5e-07 2.0e-07 1.5e-07 1.0e-07

C D

0.0e+00 1.0e-07 5.0e-08 0.0e+00

Prochlorococcus Normalized recruited read counts

GS051 G G GS047 G GS037 G GS035 G GS031 GS030 G GS036 G GS032 G GS029 G GS033 G GS027 G GS028 G GS026 G GS034 G GS025 G GS023 G GS022 G GS021 G GS020 G GS019 G GS018 G GS017 G GS016 G GS015 G GS014 G GS013 G GS012 G GS011 G GS010 G GS009 G GS008 G GS007 G GS006 G GS005 G GS004 G GS003 G GS002 G GS001a GS S GS001b GS S GS001c GS S GS000a GS S GS000b GS S GS000d GS S GS000c GS S GS123 G GS122a GS S GS122b GS S GS121 G GS120 G GS119 G GS148 G GS149 G GS117a GS S GS117b GS S GS116 G GS115 G GS114 G GS113 G GS112a GS S GS112b GS S GS111 G GS110a GS S GS110b GS S GS109 G GS108a GS S GS108b GS S

B

Cyanopodovirus Normalized recruited read counts

5.0e-08

50

0

GS117a GS S1 S117a

GS047 GS05 S051 GS051

1.5

GS035

GS116 GS115 5

GS114

GS113 GS112a GS111 GS110a G GS109 9 GS108a G 10

GS026

1.0

0.5

GS119

GS030 GS036 GS029

0.0

GS031 -0.5

-1.0

-91.5

1065   1066   1067   1068   1069   1070   1071   1072  

−50

GS120 G GS G GS121

GS034

GS122a G G GS123

GS032

-91.0

−100

GS027 GS028 GS033

-90.5

-90.0

-89.5

0

100

Figure 7. Normalized recruited read counts corresponding to (A) cyanomyoviruses and (B) cyanopodoviruses in the GOS database. Each bar represents a sampling site. The number of reads was normalized to the average size of the genome of each phage family and to the total number of sequencing reads at each of the GOS sites. (C) The relative abundance of Prochlorococcus is shown as a series of dots for which the size is proportional to the counts of normalized recruited reads. (D) Map illustrating the position of the GOS sites.

   

45  

WH7805  12138

15

31

02 1  

P9313  04931

WH7805  07341

SynRCC307  1441

Sy nec hoc occu s

 1 44 0 18 S9 7   3 2 90 00 2 9  22 S9 03 90 2  1 1 S9 97 90 91 2  1 97 BL1 61 07  0 816 BL1 9 07  0 935 1 BL1 07  0 9336 BL10 7  081 79 S8102   21261 S8102  105 91

07

 2

07

30

CC 3

S8102  23831

S9605  03131

nR

CC

CC 3

nR

RS9917  10626 SynW H7803  0784 WH7 805  0 0300 BL1 07  1 426 S99 0 02  1 014 S96 1 05  1 121 S8 10 1 2  1 S9 64 31 21 1  1 W 92 H5 61 70 66 1  1 60 26 RS  53 9  C  PR 917  1 G 29  0 75 00 95

0.1

S9605  05371

Sy

Sy

RS9916  2789 9

5 20 2 9   11 n1 00 Sy TG     CP  0  66 S 6 30 21 6í  gp V9 81 BS 001   UG  CP  60 65   660  220 Syn1 3  213 Syn3  66K0 CPLG  00104

cu oc oc ch ne Sy

WH7805  00325 315 WH7805  00 5 7  0802 RS991 498 17  03 RS99 4917 916  3 RS9 61  395 916 9 6 S R 950  3 6 20 991 11 RS 1  0 0 19 57 07 WH 1  1 0 26 57 55 H 1   W 1 61 70 36 1  0 H5 1 W 23 31 1 19 S9 1   39 1 3 24 9   S 11 3 S9

s

1073   1074   1075   1076   1077   1078   1079  

CPXG  00013

SynW H78 03  0 366 SynW H780 3  208 4 SynWH 7803  0 790 WH7805  1212 8

nR

Syn.  cyanomyovirus

s

25

 0

1 46 21 25 21 P9 0  0 51 2 1 32 S 1   S 1 1 TL 72 NA  15 1 TL 031 NA  03 1 L T 761 NA  12 TL2 NA 131 2  15 L T NA 2831 L2  0 T A N  18681 P9303 091 P9303  04 P9313  19281

5 P9

yovir u

Sy

Synechococcus

3660 PRTG  0 0160 3660  P930 1  02 451 P92 02  2 481 P92 02  1 281 A9 1 601  12 A9 521 60 1  0 P9 24 21 41 5  1 P9 30 31 11 2  1 P9 23 3 30 12 41  0 23 P9 (' 41  51  5    02  55 1

Pr o.   Cy an o Pro.  cyanopodovirus

CY IG PR OG  000 09 CY  0 00 PG 13  00 PR 03 UG 4  00 CY 04 OG 0  00 04 3

us cc co ro hlo oc Pr

Pro.  cyanomyovirus

35606

Prochlorococcus

3+0  3+0

Legend

 &30* 8  0010 PRAG  0 366 6 010 G  0 CPQ 185  00 XG 01 1 CY  00 PG  CP  7 60 02 66 00   46 SG 0 0 7 PR  0 02 LG 7   7 CY SP 004  0 G Q PR

nom

s iru ov d po

PS

 Cya

s

u vir yo om an Cy

n.   Sy

Syn.

yovirus Cyanom Pro.  

s iru ov y om yan .  C n y S

Figure S1. Maximum likelihood, circular phylogenetic tree of PsbA from cyanophage and marine picocyanobacteria ( (Kelly et al., 2012) - http://proportal.mit.edu/ ). The bar represents 0.1 amino acid substitutions per site and branches with a bootstrap value greater than 80% are indicated by a black dot. The ring indicates the origin of PsbA sequences.

   

46  

P PS SS SM M2   SS 5  (P (PS KLO XV M R SM WR SR 2 T UULG S Sy  (S G 2 7K HUP XV' 666 M4 n1  ( SM  00  244 id 1  ( S Fla OLQX atus RWRJ 60 0 SR yn 2  2 97) ) voba  So  D 6 S 1 5 cter P$V liba PD  66 M4  200 8) Nitr UL ium WU 0  0  osop  john $7& cter  u WLPD   46) ) &RO um OLQVH soni & sita     ilu  tu  OODD  HURI s  mariti ae  UW  s  Elli   DFLH  QV$ mus  S 101  (1  n607  2FH DQLFD 6 7&& CM  46 XOLV  1  (1 2997 Deino Chlor VS   6152 62 cocc obium us  ra +7&&  7809 )  phae diodu  obac rans   ) teroid es  DSM  R1  (16  $QDHUR  266  ( 157948  6DOLQL P\[RED 2266 7) EDFWHU FWHUGH 95 UX 49 KDORJHQ EHU'6 1) DQV& $QDSODV 3í& 0 PDSKDJR  F\WRSKLOX P+=  Ehrlichia  ru  minantium  s tr.  Gardel  (5 8416808) 6SKLQJRS\[LVDOD VNHQVLV5%   1HRULFNHWWVLDVHQQHWVXVWU0L\D\DPD   URS

nd

3LF

Ca

RWX

PE

GLX

VWUL

&OR

3HFWREDFWHULXPDWURVHSWLFXP6&5, 

an Cy

om yo v

               0   5)   5    259    QVLV  0      217 UL- OLH   \OR SVD L50  ps  (  Q  US X ce  WH WHU MX 0 rin 97 60 DF DF UMH UL5 ya  p D2 ' RE E FWH OD OLF S\OR ED FWHU mbla  ME HQV H is LJ OR + DP S\ ED  Tre ns 1) OH[ & DP S\OR tus nde UVD 772 978)  & DP ida  bla DFWH 4952 264   2 & and kea ORE ta  (    (8  C ine RKD bra ridis 5)    Re URP a  gla igrovi UHXV & 6076  Ellin &K ndid on  n VDX $7&   tus   Ca traod RFFX UHYLV sita V$ Te SK\ORF LOOXVE acter  u XVWULV%L 6WD REDF s  Solib DVSDO 2) /DFW didatu RPRQ  2256 Can RSVHXGnetii  (817  6    5KRG lla  bur   V6  :+ Coxie FKRFRFFX +  FXV: 3 6\QH  FKRFRF FXV0,7  6\QH  3 RURFRF V0,7 3URFKO ) URFRFFX H702  3URFKOR  JDOHWWLQJD 0  7KHUPRWR DJDODFWLDH1( 6WUHSWRFRFFXV 7) m  (Q88UA Lactobacillus  plantaru Enterococcus  faecalis  (81722290) Enterococcus  faecium  DO  (68196462) Leptospirillum  rubarum Candidatus  P elagibacter  ub ique  HTCC10 &DQGLGDWX 02  (91762288) :LJJOHV V&DUVRQHOODUXGGLL ZRUWKLD  JORVVLQ )UDQFL LGLDHQG VHOODWXOD 0DJQ RV\PELRQW UHQVLV RI*ORVV VXEVS +DKH DSRUWKHRU LQDEUHY KRODUF LSDOSLV  WLFD)7 Reine OODFKHMX \]DHí 1) )  4. Yers kea  bla HQVLV.&7 í  *,  & $OWH inia  pes ndensis   Bau URPRQ tis  Pes  MED29    7 &D man DGDOHV toides 0D QGLGD nia  cica EDFWH  F  (145 ULXP 2121 delli $OF ULQRE WXV% 7 20) nico ORFK 5 DQLY DFW la  s :í  Re KRGR RUD HUDT PDQ tr.   Hc   F ine VS [ER XDH QLDI  (Hom ; ulvim kea LULOOX UNX ROHL9 ORULG alod DQX A DQWK a  bla PU PH 7 isca V    7 gro R rina nd XEUX QVLV  coa  3D HWUD bac PRQ  pe ens P$ 6.  gula     UD K\ te DV lag is  M 7& ta)  (   P P riu FD i  H 9467 ED &  HF HQ m P T 6584 LX DW  tu SH CC 297   P K m 25 ) V  WU  WH HUP efa LV 06  S  ( WUD c  XUH RSK iens YF 1147  OLD LOD  str DPS 065    .  C HV 88 WULV ) 5   VWU   8  (1 5    89    05  21  )   

cus hococ

}  Synec coccus }  Prochloro

              6  0 $    3& WU   - VV    P QX      Ií QWX DUL    ID P 5Z 0    ( 1 3  LQ V   8:  LD FFX S 01    DQ FR UV CY DQV  & SKLOD 1)  & ER KP UR FWH p  C J  RH 0459 LV KOR ED  s HOH XP $7  (279 /H URF KUR ece GLWLV SRQLF DQQLL LDDP   ae)   3 V\F oth DE MD XP P\G    taci  í 3 an RUK PD ED KOD  pis ) P Cy DHQ WRVR FWHU WRF  HX H( zongia & KLV WRED V3UR FRLG HQLD  ai  6F LQH DWX PGLV HUHLV .  Bp  (B  í $F QGLG HOLX EDFW la  str )  $5  5288 co &D RVW KUR $+  50 di W\ S LV (154 í 'LF UPLQH a  aphi KRPDW 149   )Z 9H hner LDWUDF C  29 UVS Buc am\G REDFWH us  ATC    Chl URP\[ cus  gnav  QG  $QDH inococ $.  WU/DQJHOD )V Rum VKLORQLL LQXP  7796) 9LEULR LGLXPERWXO    (15450 CC  17982 &ORVWU GLXPGLIILFLOH icus  AT &ORVWUL yces  odontolyt 8052 ckii  NCIMB   Actinom m  beijerin ) Clostridiu /%í %DFLOOXVVS155  ULDH6G  6KLJHOODG\VHQWH $FWLQREDFLOOXVVXFFLQRJHQHV= 

s iru ov od nop Cya

1080   1081   1082   1083   1084   1085  

6\Q   Syn9  (BSV  Syn1 0  (CP 9  gp199 Syn2  (C UG  00196)) PTG  0 010 Syn19  ( Syn19  1 0) 90) 6\Q 6\ Q) PSSM7  (PSS M7  207) SSSM4  (CYXG  00162) SSSM5  (SSSM5  195) PRSM4  (PRSM4  209) SSM1  (SSM1  205)  6\Q &35* M4  171) PSSM4  (PSS  &34* 3560 5$*  3  3660 3* 061)  &3  00 3560 CPLG 2  027) 2  ( M SSSM 2  (SSh 0025)  0 ) SShM (CPXG  00014 ) 6   19 UG PRSM 6  (PR G  000 02) O 00 PSSP 5  (CY PG  0  )  Y 0 SP PS P1  (C 2* 005  PH  35 YIG  0  2)   C * 001 4) 63 5  ( 36 RSP 356 LG  0 7  05 9) P P 01  Y 63  (C SS 0  ) 36 SP2 7  (P QG  0  27 ) 8 PS SSP PR 0* M1  2  22 P P1  ( &3 H M2 S   (P H PG  M1  (P  PH M2 (' PH 0

Cyanomyovirus

Acidobacteria Epsilonproteobacteria Cyanobacteria_Chroococcales_Cyanothece Nitrospirae Euryarchaeota Chlamydiae Amoebozoa Bacteroidetes Thaumarchaeota Firmicutes Metazoa Gammaproteobacteria Betaproteobacteria Fungi Chlorobi Alveolata Cyanobacteria_Nostocales_Nostocaceae Thermotogae Euglenozoa Deinococcus Alphaproteobacteria Deltaproteobacteria Cyanobacteria_Gloeobacteria_Gloeobacterales Actinobacteria Planctomycetes

%LI LG RE DF m ari WH ULX ne  a P cti OR T QJ $F roph nob XP LGR ery ac 1 WKH m teri Ru a  w um N && UP bro X ba VF hip  PH oca  cte ple S rd  HOO r  x $ON X i   ORO\ TW C20 ioid   yla DOLO  n WLF LP 08 C1 es  s  QLF oph X V N ilu ROD  /27  (8 p  J  HK s  D ocard %  (2 881 S6    85 45 1 SM UOLF io 7 KLL  9 0 4 id  0/ 941 es  s  258 5)  1 +(  (1 í 08 p  JS  )   76 61  41 4  47  ) 

Dechloromonas  aromatica  RCB  (71907689)

Prochloroccus Cyanopodovirus Cyanomyovirus Synechococcus

$URPDWROHXPDURPDWLFXP(E1  $VSHUJLOOXVRU\]DH5,%   Xanthobact er  autotroph Candida icus  Py2  (15 tus  Kuene 4244785) Rhodop nia  stuttg artiensis *OXFRQ seudomonas  pa lustris  ( *UDQ REDFWHUR[\G 816198 DQV 55) Nitro XOLEDFWHUE +   Cand sococcus HWKHVGHQV   ocea LV&* Glo idatus '1,+ ni  AT   Cya eobact  Kuene CC  19  ni a  st 707  ( Nos nothec er  viola  7688 ceus uttgartie &D toc  sp e  sp 2118  (817 ns ) 1LW QGLGD .  PC  CCY 0901 is &D UDWLUX WXV. C  712 0110 1) RULE 0  (2 6X PLQ SWR 0141 $UF OIXUR LEDF UVS DFWH U WH 75 YHUV S RE YX U 6% DWLO 9) : ulfu DF PV PH  LV( &D ROLQ rim WHUE S1 GLDWO í OOLQ  DQ & P H on XW] %   & DP S\ OODV as  d OHUL & WLFXV    DP S OR X í   7 5 e   S\ \ORE EDF FFLQ nitri 0   %í  OR DF WHU RJ fica     ED WH F HQ    HV ns  D    FWH UIH RQ   UK WX FLVX   SM RP VV V   12    X 5  LQ  1  ( E LV VS   78 $  49 7& IHWX   72  V & 05 %   ) $$  í  í             

 7KLREDFLOOXVGHQLWULILFDQV$7&&  HXP  &KURPREDFWHULXPYLRODF   AM eningitidis  F  Neisseria  m 6$   GRVXV9&    DFWHUQR KLOD6/ 'LFKHORE DKDORS '600814) RGRVSLU UXEHU +DORUK EDFWHU 196  (8241  8052 6DOLQL CC  25 ii  NCIMB  is  AT  ck iform ijerin í   a  mult ium  be 1 ospir  862) 0 trid Nitros Clos 3+( &  (90962  ODULV 8    FHOOX L$7& CC11   ) LQWUD  U FDVH  99 RQLD LOOXV ivarius   +