Metagenome sequencing and 98 microbial genomes ... - Jungbluth Lab

2 downloads 0 Views 6MB Size Report
Dec 1, 2016 - ... of Earth Sciences, University of Southern California, Los Angeles, CA ... https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open ...
Metagenome sequencing and 98 microbial genomes from Juan de Fuca Ridge Flank subsurface fluids

Sean P. Jungbluth1#, Jan P. Amend1,2,3, and Michael S. Rappé4#

1

Center for Dark Energy Biosphere Investigations, University of Southern California, Los

Angeles, CA 2

Department of Earth Sciences, University of Southern California, Los Angeles, CA

3

Department of Biological Sciences, University of Southern California, Los Angeles, CA

4

Hawaii Institute of Marine Biology, SOEST, University of Hawaii, Kaneohe, HI

#Corresponding authors: [email protected] or [email protected]

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

ABSTRACT The global deep subsurface biosphere is thought to be one of the largest reservoirs for microbial life on our planet. This study takes advantage of new sampling technologies and couples them with improvements to DNA sequencing and associated informatics tools to reconstruct the genomes of uncultivated Bacteria and Archaea from fluids collected deep within the subseafloor of the Juan de Fuca Ridge. Here, we generated two metagenomes from borehole observatories located 311 meters apart and, using binning tools, retrieved 98 genomes from metagenomes (GFMs) with completeness > 10%. Of the GFMs, 31 were estimated to be > 90% complete, while an additional 17 were > 70% complete. In most instances, estimated redundancy in the GFMs was < 10%. Phylogenomic analysis revealed 53 bacterial and 45 archaeal GFMs and nearly all were distantly related to known cultivates. In the GFMs, abundant bacteria included Chloroflexi, Nitrospirae, Acetothermia (OP1), EM3, Aminicenantes (OP8), Gammaproteobacteria, and Deltaproteobacteria and abundant archaea included Archaeoglobi, Bathyarchaeota (MCG), and Marine Benthic Group E (MBG-E). In this study, we identified the first near-complete genomes from archaeal and bacterial lineages THSCG, MBG-E, and EM3 and, based on the warm, subsurface and hydrothermally-associated from which these groups tend to be found, propose the names, Geothermarchaeota, Hydrothermarchaeota, and Hydrothermae, respectively. The data set presented here are the first description of Juan de Fuca igneous basement microbial GFMs reported and will provide a platform by which one can perform a higher level interrogation of the many uncultivated lineages presented herein.



2 PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

BACKGROUND & SUMMARY Beneath the sediments of the deep ocean, the subseafloor igneous basement presents a largely unexplored habitat that may play a crucial role in global biogeochemical cycling1. This system also provides a gradient of untapped environments for the discovery of novel microbial life. Because of extensive hydrothermal circulation, the porous uppermost igneous crust is likely quite suitable for microbial life2. Entrainment of deep seawater into young ridge flanks injects a variety of terminal electron acceptors into the deep ocean crust, establishing chemical gradients with the reducing deeper fluids, and thereby fueling redox-active elemental cycles3. The redox disequilibria and circulation of fluids through the permeable network of volcanic rock sustains a largely uncharacterized microbial community that potentially extends thousands of meters below the seafloor4. In such environments, temperatures are elevated, and energy and nutrient sources may be extremely limited, the combination of which provides unique challenges to microbial life. CORK (circulation obviation retrofit kit) observatories have been used in recent years to collect warm, anoxic crustal fluids originating from boreholes drilled into 1.2 and 3.5 million-year-old ridge flank of the Juan de Fuca Ridge (JdFR)5. This young, hydrologically-active basaltic crustal environment is overlain by a thick (>100 m) blanket of sediment that serves to locally restrict fluid circulation in the ocean basement6,7. The sampling of raw basement fluids enabled by CORK observatories has demonstrated the presence of novel microbial lineages that are related to uncultivated candidate microbial phyla with unknown metabolic characteristics8-11. Here, we present the genomes from metagenomes (GFMs) of two pristine large-volume igneous basement fluid samples



3 PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

collected from JdFR flank boreholes CORK observatories U1362A and U1362B (Figures 1A, 1B, and 1C). Shotgun sequencing produced 503 and 705 megabase pairs (Mbp) of unassembled sequence data from individual borehole U1362A and U1362B samples (Table 1). The metagenomes were assembled separately into 137,575 and 212,307 scaffolds totaling 170 and 168 Mbp of sequence data from U1362A and U1362B, respectively (Tables 1 and 2). The maximum scaffold lengths constructed from U1362A and U1362B metagenome were, respectively, 541 and 1,137 Mbp (Table 2). The success of this assembly to generate long scaffolds that represent major, intact fractions of individual genomes provides a significant foundation for which to apply binning methods to piece together genomes from populations in the original samples. Several methods were used to generate GFMs, which were then evaluated, further curated, and reduced to a set for additional characterization. Ultimately, analysis was performed on 98 GFMs that were over 200 Kbp in length, contained marker gene sets identified by CheckM, and were >10% complete (Tables 3 and S1). Phylogenetic analysis of concatenated universally conserved marker gene alignments (Figures 2-5) and taxonomic identification of SSU rRNA genes (Table S2) allowed for the phylum-level identification of most of the 53 bacterial and 45 archaeal GFMs. The U1362A and U1362B borehole fluid GFMs were comprised of many of the same microbial lineages described previously using SSU rRNA sequencing8,11; including bacterial groups Chloroflexi (11), Nitrospirae (8), Acetothermia (OP1; 7), EM3 (5), Aminicenantes (OP8; 4), Gammaproteobacteria (4), and Deltaproteobacteria (4), and archaeal groups Archaeoglobi (21), Bathyarchaeota (MCG; 9), and Marine Benthic



4 PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

Group E (MBG-E; 3) (Tables 4 and S1). In this study, we identified the first nearcomplete genomes from archaeal and bacterial lineages THSCG, MBG-E, and EM3 and, based on the warm, subsurface and hydrothermally-associated from which these groups tend to be found, propose the names, Geothermarchaeota, Hydrothermarchaeota, and Hydrothermae, respectively. The 98 genomes described here were functionally annotated and deposited into the National Center for Biotechnology Information (NCBI) and Integrated Microbial Genomes (IMG) databases12. The genome data described here are the first GFMs described from the deep subseafloor volcanic basement environment and will be used to interrogate the functional underpinnings of individual microbial lineages within this remote and distinct ecosystem. Considering that genome binning methods cannot yield comprehensive segregation of all entities in complex samples13, and that informatics tools are continuously improving, we recommend that anyone using these data verify the contents of these GFMs with the latest tools available.



5 PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

METHODS Borehole fluid sampling. Sample collection methods are described elsewhere11. Briefly, during R/V Atlantis cruise ATL18-07 (28 June 2011 – 14 July 2011) samples of basement crustal fluids were collected from CORK observatories located in 3.5 millionyear-old ocean crust east of the Juan de Fuca spreading center. Basement fluids were collected from lateral CORKs (L-CORKs) at boreholes U1362A (47°45.6628’N, 127°45.6720’W) and U1362B (47°45.4997’N, 127°45.7312’W) via polytetrafluoroethylene (PTFE)-lined fluid delivery lines that extend to 200 (U1362A) and 30 (U1362B) meters sub-basement. Fluids were filtered in situ through SteripakGP20 (Millipore, Billerica, MA, USA) polyethersulfone filter cartridges containing 0.22 µm pore-sized membranes using a mobile pumping system. Filtration rates were estimated at 1 L/min in laboratory trials, indicating that ~124 liters and ~70 liters were filtered from boreholes U1362A and U1362B, respectively.

DNA extraction and metagenome sequencing. Nucleic acids were extracted from borehole fluids using a modified phenol/chloroform lysis and purification method, and is described in detail elsewhere11 (samples SSF21-22, SSF23-24). Library preparation, DNA sequencing, read quality-control, metagenome assembly, and gene prediction and annotation were conducted by the Department of Energy Joint Genome Institute as part of their Community Sequencing Program using previously described informatics workflows12, which are described in detail elsewhere14.



6 PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

Genome binning. Assemblies from the U1362A and U1362B metagenomes were combined and used to generate GFMs. Four different genome binning approaches were used to identify the workflow that yielded the most favorable balance between maximizing genome completeness while minimizing contamination for these metagenomes: MaxBin15, ESOM16, MetaBAT17, and CONCOCT18. Genome binning was performed using MaxBin version 2.1.115 with the 40 marker gene set universal among Bacteria and Archaea19, minimum scaffold length of 2000 bp, and default parameters. Scaffold coverage from each metagenome was estimated using the quality-control filtered raw reads as input for mapping using Bowtie2 version 2.2.320 used within MaxBin. Genome binning was also performed using a combination of tetranucleotide frequencies and differential coverage in emergent self-organizing maps (ESOM) 16. Scaffold coverage was calculated using bbmap version 35.40 and the jgi_summarize_bam_contig_depths script from the MetaBAT pipeline17. Scripts downloaded from (http://github.com/tetramerFreqs/Binning) were used to calculate tetramer frequencies and create input files for ESOM. A robust Z-transformation was applied to the input data prior to generation of the ESOM. Scaffolds greater than or equal to 10 Kbp were cut into slices of 2000 bp prior to clustering. The number of epochs used for clustering was 20 and the dimensions of the ESOM were 400 x 430 (Figure 6). Using MetaBAT version 0.26.317, genome binning was performed with the jgi_summarize_bam_contig_depths script and the same scaffold coverage map calculated using bbmap described above. Default parameters were used.



7 PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

Finally, genome binning was performed using CONCOCT18 within the Anvi’o package, version 1.1.021. The metagenomic workflow employed here is described online (merenlab.org/2015/05/02/anvio-tutorial), and included as input data the quality-filtered raw sequence reads from both metagenomes, as well as assemblies generated by the JGI. The scaffold coverage map was calculated using bbmap version 35.82. Scaffolds greater or equal to 2.5 Kbp were used for binning with CONCOCT.

Comparison of genome binning methods and bin curation. Completeness and redundancy of all GFMs created using the four binning methods were assessed using CheckM version 1.0.522. Overall, GFMs generated with CONCOCT yielded the highest percent completeness for bins that were at least 50% complete (Table 3). Genome completeness was the primary criterion used in the selection of the binning method because the facilitated supervised binning via the “anvi-refine” function in Anvi’o works to remove contamination from a draft set of genome scaffolds. Manual refinements to the GFMs were executed in Anvi’o using differential coverage, tetranucleotide frequency, and marker gene content (i.e. completeness/redundancy). Bin splitting was assisted by the analysis of SSU rRNA genes identified using CheckM and inspected via the SILVA/SINA online aligner version 1.2.1123 with the following parameters: minimum identity with query sequence, 0.8, and number of neighbors per query sequence, 3. When SSU rRNA genes of different taxonomic origin were found to conflict within a single bin, those bins were further scrutinized and split manually. In most instances where redundancy was > 50%, splitting bins into their U1362A and U1362B components resolved conflicts. Bins were split until no more SSU rRNA gene conflicts



8 PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

were observed and all bins had been manually inspected and screened for outlying scaffolds. Four other marker gene sets18,24-26 were used to compare completeness and redundancy within Anvi’o (Figure 7). A total of 252 GFMs were identified after curation with Anvi’o, and completeness and redundancy of the final GFMs was ultimately estimated with CheckM and the marker gene set of Wu and colleagues19. Of these, 98 were at least 10% complete (Tables 4 and S1), which was used as a minimum cutoff because the GFMs all contained marker genes that allowed them to be given phylogenetic identities using CheckM. The 98 GFMs included a total of 16,066 scaffolds and 154,610,017 bp.

Phylogenomics and identification of genomes from metagenomes. From all genomes described here with completeness > 10% and relevant GFMs and singleamplified genomes (SAGs) from the Integrated Microbial Genomes (IMG)27, ggKbase, and National Center for Biotechnology Information (NCBI) GenBank databases, phylogenetically informative marker genes from were identified and extracted using the ‘tree’ command in CheckM. In CheckM, open reading frames were called using prodigal version 2.6.128 and a set of 43 lineage-specific marker genes, similar to the universal set used by PhyloSift29, were identified and aligned using HMMER version 3.1b130. The 61 GFMs with > 50% completeness were given taxonomic idendifications through analysis of a concatenated marker gene alignment (6988 amino acid positions) and placement in a phylogenomic tree with closest related GFMs and SAGs found in the NCBI, IMG, and ggKbase databases. The phylogeny was produced using FastTree (version 2.1.931) with the WAG amino acid substitution model and ‘fastest’ mode.



9 PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

Bootstrap values reported by FastTree analysis indicate local support values. To leverage the taxonomic identifications assigned to the GFMs with > 50% completeness toward the identification of the 37 GFMs with completeness 10-50%, an additional phylogenetic analysis with only the 98 Juan de Fuca GFMs was performed in ARB32 using RAxML version 7.7.233 with the PROTGAMMA rate distribution model and WAG amino acid substitution model. Bootstrapping was executred in ARB using the RAxML rapid bootstrap analysis algorithm34 with 100 bootstraps. To further aid in identification of GFMs, SSU rRNA genes were extracted successfully from 49 genome bins using the “ssu_finder” command within CheckM and identified via the SILVA/SINA online aligner version 1.2.11 (Pruesse et al., 2012) with the version 123 database and the following parameters: minimum identity with query sequence, 0.8, and number of neighbors per query sequence, 3 (Table S2).

DATA ACCESS The U1362A and U1362B metagenome projects and raw sequencing reads are available via the IMG-M web portal under Taxon ID numbers 330002481 (U1362A) and 3300002532 (U1362B). Gold Analysis Project ID numbers are Ga0004278 (U1362A) and Ga0004277 (U1362B). Sample metadata can be accessed using the BioProject identifier PRJNA269163. The NCBI BioSamples used here are SAMN03166137 (U1362A) and SAMN03166138 (U1362B). FASTA files containing the contigs of all 98 genomes from metagenomes can be accessed at doi: 10.6084/m9.figshare.4269587.v1. A FASTA file containing 54 SSU rRNA genes with length >300 base pairs extracted from the 98 genomes from metagenomes can be accessed at doi:



10 PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

10.6084/m9.figshare.4269593.v1. IMG/M-relevant files needed to isolate scaffold sets for all 98 genomes from metagenomes can be accessed at doi: 10.6084/m9.figshare.4269590.v1. IMG/M annotations associated with the scaffolds of all 98 genomes from metagenomes can be accessed at doi: 10.6084/m9.figshare.4269581.v1.

ACKNOWLEDGEMENTS We thank the captain and crew, Andrew Fisher, Keir Becker, Geoff Wheat, and other members of the science teams on board R/V Atlantis cruise AT18-07. We also thank the pilots and crew of remote-operated vehicle Jason II. We are grateful to Huei-Ting Lin, Chih-Chiang Hsieh, Alberto Robador, Brian Glazer, and Jim Cowen for sampling, and Beth Orcutt and Ramunas Stepanauskas for facilitating metagenome sequencing. We thank Brian Foster, Alex Copeland, Tijana Glavina del Rio, and Susannah Tringe of the Department of Energy Joint Genome Institute for metagenome sequencing and assembly (Community Sequencing Award 987 to R. Stepanauskas). This study used samples and data provided by the Integrated Ocean Drilling Program.

AUTHOR CONTRIBUTIONS S.P.J. and M.S.R. designed the study. S.P.J. performed all analyses outside of those included in the standard operating procedure of the JGI, generated all figures and wrote the manuscript. All co-authors commented on and provided critical feedback for the final manuscript.



11 PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

FUNDING INFORMATION This work, including the efforts of Sean Jungbluth and Michael Rappé, was funded by National Science Foundation grants MCB06-04014 and OCE-1260723. This work, including the efforts of all authors, was supported by the Center for Dark Energy Biosphere Investigations (C-DEBI), a National Science Foundation-funded Science and Technology Center of Excellence (OCE-0939564).



12 PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

REFERENCES 1

Schrenk, M. O., Huber, J. A. & Edwards, K. J. Microbial provinces in the subseafloor. Ann Rev Mar Sci 2, 279-304, doi:10.1146/annurev-marine-120308-081000 (2010).

2

Baross, J. A., Wilcock, W. S. D., Kelley, D. S., DeLong, E. F. & Cary, S. C. in The Subseafloor Biosphere at Mid-Ocean Ridges Geophysical Monograph (eds W. S. D. Wilcock et al.) 1-11 (American Geophysical Union, 2004).

3

Edwards, K. J., Bach, W. & McCollom, T. M. Geomicrobiology in oceanography: microbe-mineral interactions at and below the seafloor. Trends in Microbiology 13, 449456, 10.1016/j.tim.2005.07.005 (2005).

4

Edwards, K. J., Fisher, A. T. & Wheat, C. G. The deep subsurface biosphere in igneous ocean crust: frontier habitats for microbiological exploration. Frontiers in Microbiology 3, 8 (2012).

5

Wheat, C. G. et al. in Proceedings of the Integrated Ocean Drilling Program Vol. 327 (eds A. T. Fisher, T. Tsuji, K. Petronotis, & Expedition 327 Scientists) 1-36 (Integrated Ocean Drilling Program Management Internation, Inc., 2011).

6

Wheat, C. G. & Mottl, M. J. Hydrothermal circulation, Juan de Fuca Ridge eastern flank - factors controlling basement water composition. Journal of Geophysical ResearchSolid Earth 99, 3067-3080 (1994).

7

Cowen, J. P. The microbial biosphere of sediment-buried oceanic basement. Res Microbiol 155, 497-506, doi:10.1016/j.resmic.2004.03.008 (2004).

8

Cowen, J. P. et al. Fluids from aging ocean crust that support microbial life. Science 299, 120-123, doi:10.1126/science.1075653 (2003).

9

Jungbluth, S. P., Grote, J., Lin, H.-T., Cowen, J. P. & Rappé, M. S. Microbial diversity 13

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

within basement fluids of the sediment-buried Juan de Fuca Ridge flank. ISME Journal 7, 161-172 (2013). 10

Jungbluth, S. P., Lin, H.-T., Cowen, J. P., Glazer, B. T. & Rappé, M. S. Phylogenetic diversity of microorganisms in subseafloor crustal fluids from boreholes 1025C and 1026B along the Juan de Fuca Ridge flank. Frontiers in Microbiology 5, 119, doi:10.2289/fmicb.2014.00119 (2014).

11

Jungbluth, S. P., Bowers, R., Lin, H.-T., Cowen, J. P. & Rappé, M. S. Novel microbial assemblages inhabiting crustal fluids within mid-ocean ridge flank subsurface basalt. ISME Journal 10, 2033-2047 (2016).

12

Huntemann, M. et al. The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v.4). Stand Genomic Sci 11 (2016).

13

Nielsen, C. L. et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat Biotechnology 32, 822-828 (2014).

14

Jungbluth, S. P., Glavina del Rio, T., Tringe, S., Stepanauskas, R. & Rappé, M. S. Genomic characterization of a marine lineage ubiquitous throughout terrestrial and oceanic subsurface environments. Submitted to PeerJ.

15

Wu, W.-Y., Tang, Y.-H., Tringe, S. G., Simmons, B. A. & Singer, S. W. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome 2 (2014).

16

Dick, G. J. et al. Community-wide analysis of microbial genome sequence signatures. Genome Biol 10, R85, doi:10.1186/gb-2009-10-8-r85 (2009).



14 PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

17

Kang, D. D., Froula, J., Egan, R. & Wang, Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, doi:10.7717/peerj.1165 (2015).

18

Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat Methods 11, 1144-1146, doi:10.1038/nmeth.3103 (2014).

19

Wu, D., Jospin, G. & Eisen, J. A. Systematic Identification of Gene Families for Use as “Markers” for Phylogenetic and Phylogeny-Driven Ecological Studies of Bacteria and Archaea and Their Major Subgroups. PlOS One, e77033, doi:10.1371/journal.pone.0077033 (2013).

20

Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357-U354, doi:10.1038/nmeth.1923 (2012).

21

Eren, A. M. et al. Anvi’o: an advanced analysis and visualization platform for ‘omics data. PeerJ 3 (2015).

22

Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25, 1043-1055, doi:10.1101/gr.186072.114 (2015).

23

Pruesse, E., Peplies, J. & Glockner, F. O. SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes. Bioinformatics 28, 1823-1829, doi:10.1093/bioinformatics/bts252 (2012).

24

Creevey, C. J., Doerks, T., Fitzpatrick, D. A., Raes, J. & Bork, P. Universally Distributed Single-Copy Genes Indicate a Constant Rate of Horizontal Transfer. PLoS One 6, e22099 (2011).



15 PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

25

Dupont, C. L. et al. Genomic insights to SAR86, an abundant and uncultivated marine bacterial lineage. Isme Journal 6, 1186-1199, doi:10.1038/ismej.2011.189 (2012).

26

Campbell, J. H. et al. UGA is an additional glycine codon in uncultured SR1 bacteria from the human microbiota. Proc Natl Acad Sci U S A 110, 5540-5545 (2013).

27

Markowitz, V. M. et al. IMG/M 4 version of the integrated metagenome comparative analysis system. Nucleic Acids Res 42, D568-573, doi:10.1093/nar/gkt919 (2014).

28

Hyatt, D., LoCascio, P. F., Hauser, L. J. & Uberbacher, E. C. Gene and translation initiation site prediction in metagenomic sequences. Bioinformatics 28, 2223-2230, doi:10.1093/bioinformatics/bts429 (2012).

29

Darling, A. E. et al. PhyloSift: phylogenetic analysis of genomes and metagenomes. PeerJ 2, e243, doi:10.7717/peerj.243 (2014).

30

Eddy, S. R. Accelerated Profile HMM Searches. PLoS Comput Biol 7, e1002195, doi:10.1371/journal.pcbi.1002195 (2011).

31

Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One 5, e9490, doi:10.1371/journal.pone.0009490 (2010).

32

Ludwig, W. et al. ARB: a software environment for sequence data. Nucleic Acids Res 32, 1363-1371, doi:10.1093/nar/gkh293 (2004).

33

Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688-2690, doi:10.1093/bioinformatics/btl446 (2006).

34



Stamatakis, A., Hoover, P. & Rougemont, J. A rapid bootstrap algorithm for the RAxML

16 PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

Web servers. Syst Biol 57, 758-771, doi:10.1080/10635150802429642 (2008).



17 PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

FIGURE LEGENDS Fig 1. (A) Bathymetric map of Juan de Fuca Ridge boreholes U1362A and U1362B with inset world map showing region location. (B) Schematic of CORK observatories at U1362A and U1362B. (C) Workflow used to process the basement crustal fluid samples and generate metagenomes and GFMs. This process included in situ filtration, extraction of nucleic acids via phenol-chloroform method, preparation of nucleic acids for Illumina sequencing, quality-filtering sequencing reads, assembling metagenome scaffolds, performing binning and associated quality-control and refinement, and gene annotation and phylogenetic analysis.

Fig 2. Phylogenomic relationships between archaeal genomes > 50% complete identified in CORK borehole fluid metagenomes and other closely related genomes retrieved from popular databases. The scale bar corresponds to 1.00 substitutions per amino acid position. Some groups are collapsed to enhance clarity and all groups with taxonomic identities are shown. The names of major lineages with GFMs found in Juan de Fuca Ridge basement fluids are indicated with the bold-face font. JdFR GFM prefixes are abbreviated from “JdFR” to “J” and labeled using red-colored text. Black (100%), gray (≥ 80%), and white (≥ 50%) circles indicate nodes with high local support values, from 1000 replicates.

Fig 3. Phylogenomic relationships between bacterial genomes > 50% complete identified in CORK borehole fluid metagenomes and other closely related genomes



18 PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

retrieved from popular databases. JdFR GFM prefixes are labeled using green-colored text. Other information as in Figure 2.

Fig 4. Phylogenomic relationships between archaeal GFMs > 10% complete identified in metagenomes from deep subseafloor crustal fluids of boreholes U1362A and U1362B. Archaeal GFMs found in this study were used as the outgroup. The scale bar corresponds to 0.001 substitutions per amino acid position. Black (100%), gray (≥ 80%), and white (≥ 50%) circles indicate nodes with bootstrap support, from 100 replicates. All bins are abbreviated “J” for “JdFR”.

Fig 5. Phylogenomic relationships between bacterial GFMs > 10% complete identified in metagenomes from deep subseafloor crustal fluids of boreholes U1362A and U1362B. Bacterial GFMs found in this study were used as the outgroup. Other information as in Figure 4.

Fig 6. Assignment of contigs from CORK borehole fluid metagenomes using ESOM implemented with tetranucleotide frequencies and differential coverage. The ESOM is shown (A) before and (B) after identification of GFMs. Each point represents a contig and identified bins have a non-white color.

Fig 7. Overview of GFM completeness and redundancy and calculated average using five different marker gene sets. All bins are abbreviated “J” for “JdFR.”



19 PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

A)

C)

50° N

48°

U1362A/B

DNA extraction phenol-chloroform method

Depth (m)

46°

0 1000

Illumina Sequencing TruSeq library prep HiSeq 2000 (2 x 150 bp)

2000

44°

3000 4000

42°

130°

128°

B)

126°

124°

122° W

U1362A

Data pre-processing Quality-filtering and adapter trimming

U1362B Fluid sampling port

Metagenome assemblies SOAPdenovo & Newbler, minimus 2

Seafloor 0 20

Seafloor CORK seal

Genome binning Tetranucleotide and coverage-based CONCOCT

40

Sediment

60 80

Casing

100 120

Casing

140

Genome quality control CheckM/Anvi’o

160 180 200

Cement

240

0

260

20 40

280

60

300 320

80

340

100

360 380 400 420 440 460 480

Depth (msb)

Depth (mbsf)

220

Basement

Filtration in situ, 0.22 μm

Cement Inflatable packers

140 160 180 200 220 240 260

520

280 300

Genome refinement Anvi’o

Fluid sampling line intake Fill

120

500 540

Cement Inflatable packers

Inflatable packers Fluid sampling line intake

Fill

Annotation IMG/ER Phylogenome analysis Inferred from single-copy genes extracted with CheckM

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

icr

ap

ar

he

ch

ro

ae

Ba

um

cte

tri

Thermoprotei

Thorarchaeota

04 1 J- D8- 32-3 A

te

s

ria

gidus

J-39 J-41

us

il T. baroph

A. profundus G. acetivorans G. ahangari F. placidus

J-02

icus venef

J-35 J-342 J-3 1 J-3 7 J-2

us llid 24 a c J- 3 ati

Methanomicrobia-Methanocellales

tha nom

ob

Me

mi cr no tha Me

p

rmo

& Halobacteria

The

Methanomicrobia

39 J-45 AR08-3 A. sp. M A. boonei is M. stordalenmirens

Euryarchaeota NRA7

iaJ Me J- -21 tha 20 no icro s a bia rci -M na eth les a nos IMG Meth aet _25 anom 166 J-19 a icrobi 5 3 088 a-Me thano micro biales Halobacte ria

Archaeoglobi

Meth anob acter Oth ia er M etha Oth noc er M occ e i t han Ot he oco rM cci eth an M oc O oc th J- . the ci er r 03 m oli M t ho et tro ha ph no icu co s cc i Methanococci

43 50 J- M1) S 1-4 54 ales 2-( 711 mat 8-5 021 is 8-5 plas s SG 25 n _ e SG rmo IMGluminy The ataM. lasm

2 J- -22 J -42 J

lf

su

Thermococci

T. sp. PK T. sibiricus Methanopyr i

1.00

J-37

A.

8 SG 07

J- 08 Jta aeo h c r esa occi Had moc r e h rT occi Othe ermoc h T r e Oth

A. ful

A.

Bathyarchaeota

M D TZ BA G-45 -80 S 1 BA G82 321

Di

J-18 J-17

0-1 DG-7 70 DG- les a hae arc A384 C2 pM

Alti

ota ta & a a ae t e ot rch ha aeo ae c kia ar ch Lo ch ce ar ar Pa se gm oe ni W Ae

M

Thaum archae ota Cald iarch aeum J- 1 J-1 3 J -1 4 J- 0 S 11

THSCG

MBG-E

Thermoplasmata

DHVEG-2 & Aciduliprofundum

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

tes

E3

he

Firmicutes

Ot

m eu ha s p T. ute tes ic icu sii tor us irm irm egen xvia rm F a he F d r d T . r A au e s th the D. 9 ccu

CP

WS

Arc

eria J-6 noco ria act i acte De anob inob t y c A C s

O O

R3

6

ete nad imo t a flexi Arm_56 Chloro 81 DG P1-4_ i sp. T CS loroflex 2 Ch 23_28_ SM 18 DG_ J-584_51_3 SG J-56 nthroporepellens D. lyka

TM

hae

Other

ia

O th er F rF irm irm icu O tes Ot the J T. oc icute O her r F 68 ean s th i i er Firm rmi cu Fi rm icut tes e icu s te s

ma

er

J-46 J-48 J-49 J-50 J-51 J-52 Fusob acteria Othe r Fir m i cute Oth s er F irmi cute s

ct

J-47

ba

W

64_32

el

W

ia

ne

se

rk

ge

do

Be

cro

A. autotro togae Thermo glomi Dictyo

rugen s T. na icute r Firm Othe C. exile6 J -6 5 s J-6 istete erg Syn tes hae m R2 biu iroc Sp micro TM7 CP n za Ka

En

r te ria a c te ib ac in ub egr r rc Pa Pe

Mi

6 SCGC_AAA255-C0 phicum

Acetothermia (OP1)

Firmicutes

6

a

N

itrosp irae T T. hydro. thiophilus geniphilu s

J-85

SM23_35 CG1_02A_Bin6

Other Chloroflexi Other Chloroflexi

J-86 J-88 J-87

Nitrospirae

SM23_84 C. aerophila

SG8_35_4 SG8_35_1

J-83 J-81

nitr

id

N teria ac J-95ii ars i Ro ba ilin D. . an . i D teo je d 8 o pr tie G_ 1 lta D. D 23_6 97 e JrD SM he eo

-1

-H

6

TA06

EM3 WOR_3

s oidete

Epsilonproteo.

Other

ed

Bacter

Zeta prote o. Alphap roteo. C. cresceJ-89 ntus Other Alphap roteo.

ot pr

O

ta el _D -8 P1

CS

4s

71 42 J- -7223_ 77 J M 1_ 8 0 S 7 6 SMG_ 23_ D M _63 S G

C 23 nia SM pto

D ia r 57 cte 23_ iba SM-76 yssi J . ab _31

Zix Kry iae cter viba robi Chlo

Igna

Calescamantes

Gammaproteo.

Dadabacteria

Deltaproteo.

Aquificae fobacteria Thermodesul

b Ot the acte he r D ria r e N. Del ltap Oth ta ro gr pr t a D er c Ga P. sp. Nit efer ilis ote eo. mm UR ros o. r i apr IL1 pin bac 4 Oth oteo. HWK J-9ae tere s 12 1 er G :I6 am ma Bet pro apro teo teo. .

b ku

Ot

Chloroflexi

Plan ctom ycet oph es i c a Chl a m Len y tisp diae Ge V h mm erruc aerae om a icr S tim ob Fibyne on ia DG ro rg ad i SM _ ba ste et SM 2 26 ct te es er s J 3 es LH-70 1_4 _40 0 C

Om

7 J-7 ria e c a t ob C10

Ac

SG8_1 A. the 9 SM23_rmophila 63 J-6 SM21 3_51

1.00

hilum L. ferrip uvii N. defl A02 252- F08 A A A _ 2SCGC _AAA25 J-80) 9 C G C 515 S 006 J-79 (Ga 8 3 in b J7

Aminicenantes (OP8)

J-64

Deferribacteres

Alphaproteo.

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

J-45 J-44 DHVEG-2 J-43 Thermoplasmatales - 20c-4 J-42 J-41 J-40 J-39 J-38 J-37 J-36 J-35 J-34 J-33 J-32 J-31 Archaeoglobi J-30 J-29 J-28 J-27 J-26 J-25 J-24 J-23 J-22 J-21 J-20 Unknown Euryarchaeaota - NRA7 J-19 Methanomicrobia J-18 Marine Benthic Group E J-17 (MBG-E) J-16

J-15 Methanopyrus

0.001

Terrestrial Hot Springs J-14 Crenarchaeotic Group (THSCG) J-13

J-12 Unknown Crenarchaeota J-11 J-10 J-09 Bathyarchaeota J-08 (MCG) J-07 J06 J-05 J-04 J-03 Methanothermococcus J-02 Thermococcus J-01 Bathyarchaeota (MCG)

Bacteria

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

Archaea

0.001

J-46 J-47 J-48 J-49 Acetothermia (OP1) J-50 J-51 J-52 J-53 Unknown Bacteria J-54 J-55 J-56 J-57 J-58 J-59 J-60 J-61 J-62 J-63 J-64 J-65 J-66 Unknown Bacteria J-67 J-68 Firmicutes J-69 J-70 Unknown Bacteria J-71 J-72 J-73 EM3 J-74 J-75 J-76 Deferribacteres J-77 J-78 Aminicenantes (OP8) J-79 J-80 J-81 J-82 J-83 J-84 Nitrospirae J-85 J-86 J-87 J-88 J-89 Alphaproteobacteria J-90 Betaproteobacteria J-91 J-92 Gammaproteobacteria J-93 J-94 J-95 J-96 Deltaproteobacteria J-97 J-98

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

A)

B)

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

t al .

t al .

Genome Bin Stats (%)

Wu e

al.

Cre eve ye

ll et pbe

t al .

Cam

Du pon te

rg e Aln be

t al .

t al .

t al .

Contamination

Wu e

al.

Cre eve ye

ll et

Cam

pbe

t al .

Du pon te

Aln be

rg e

t al .

Completeness

0

20

60

100

J-02 Thermococcus J-39 Archaeoglobi J-24 Archaeoglobi J-11 Bathyarchaeota (MCG) J-42 Archaeoglobi J-32 Archaeoglobi J-22 Archaeoglobi J-85 Nitrospirae J-86 Nitrospirae J-88 Nitrospirae J-87 Nitrospirae J-18 Marine Benthic Group E (MBG-E) J-69 Firmicutes - Ca. Desulforudis J-10 Bathyarchaeota (MCG) J-97 Deltaproteobacteria - Desulfarculaceae J-19 Methanomicrobia J-21 Unknown Euryarchaeota - NRA7 J-41 Archaeoglobi J-27 Archaeoglobi J-37 Archaeoglobi J-81 Nitrospirae J-14 THSCG J-78 Aminicenantes (OP8) J-13 THSCG J-80 Aminicenantes (OP8) J-31 Archaeoglobi J-45 DHVEG-2 J-48 Acetothermia (OP1) J-72 EM3 J-35 Archaeoglobi J-03 Methanothermococcus J-68 Firmicutes - Thermosediminibacter J-20 Unknown Euryarchaeota - NRA7 J-51 Acetothermia (OP1) J-56 Chloroflexi - Dehalococcoides J-52 Acetothermia (OP1) J-66 Unknown Bacteria J-77 Aminicenantes (OP8) J-47 Acetothermia (OP1) J-61 Chloroflexi - Anaerolineales J-71 EM3 J-89 Alphtaproteobacteria - Caulobacteraceae J-43 Thermoplasmatales - 20c-4 J-54 Chloroflexi - Dehalococcoides J-76 Deferribacteres J-04 Bathyarchaeota (MCG) J-79 Aminicenantes (OP8) J-64 Chloroflexi - Anaerolineales J-95 Deltaproteobacteria - Desulfarculaceae J-49 Acetothermia (OP1) J-07 Bathyarchaeota (MCG) J-23 Archaeoglobi J-83 Nitrospirae J-91 Gammaproteobacteria - Pseudomonas J-34 Archaeoglobi J-50 Acetothermia (OP1) J-70 Unknown Bacteria J-46 Acetothermia (OP1) J-17 Marine Benthic Group E (MBG-E) J-65 Unknown Bacteria J-08 Bathyarchaeota (MCG) J-74 EM3 J-82 Nitrospirae J-58 Chloroflexi - Dehalococcoides J-63 Chloroflexi - Anaerolineales J-30 Archaeoglobi J-38 Archaeoglobi J-96 Deltaproteobacteria - Desulfarculaceae J-67 Unknown Bacteria J-05 Bathyarchaeota (MCG) J-90 Betaproteobacteria - Cupriavidus J-75 EM3 J-06 Bathyarchaeota (MCG) J-73 EM3 J-59 Unknown Chloroflexi J-93 Gammaproteobacteria - Pseudomonas J-84 Nitrospirae J-53 Unknown Bacteria J-57 Chloroflexi - Dehalococcoides J-16 Marine Benthic Group E (MBG-E) J-62 Chloroflexi - Anaerolineales J-44 DHVEG-2 J-09 Bathyarchaeota (MCG) J-25 Archaeoglobi J-60 Chloroflexi - Anaerolineales J-92 Gammaprotebacteria - Pseudomonas J-28 Archaeoglobi J-29 Archaeoglobi J-36 Archaeoglobi J-26 Archaeoglobi J-98 Deltaproteobacteria - Desularculaceae J-40 Archaeoglobi J-55 Chloroflexi - Dehalococcoides J-15 Methanopyrus J-01 Bathyarchaeota (MCG) J-94 Gammaproteobacteria - Acinetobacter J-12 Unknown Crenarchaeota J-33 Archaeoglobi

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

Table 1. Metagenome sequencing statistics reported in IMG U1362B

U1362A no. assembled (% of assembled)

no. unassembled (% of unassembled)

total (% of total)

no. assembled (% of assembled)

no. unassembled (% of unassembled)

total (% of total)

Number of sequences

137575 (8.08)

1564185 (91.92)

1701760 (100)

212307 (7.60)

2582305 (92.40)

2794612 (100)

Number of bases

169908118 (33.78)

333077167 (66.22)

502985285 (100)

168044831 (23.83)

537213224 (76.17)

705258055 (100)

GC count

82941377 (48.82)

163998454 (49.24)

246939831 (49.09)

87552944 (52.10)

270739112 (50.40)

358292056 (50.80)

rRNA genes

609 (0.22)

1124 (0.08)

1733 (0.10)

682 (0.21)

1219 (0.05)

1901 (0.07)

16S rRNA

198 (0.07)

162 (0.01)

360 (0.02)

199 (0.06)

191 (0.01)

390 (0.01)

23S rRNA

315 (0.12)

617 (0.04)

932 (0.05)

359 (0.11)

587 (0.02)

946 (0.04)

Protein coding genes

267511 (98.50)

1489984 (99.63)

1757495 (99.46)

319764 (98.87)

2344253 (99.37)

2664017 (99.31)

with Product Name

160006 (58.91)

438495 (29.32)

598501 (33.87)

170964 (52.86)

559698 (23.73)

730662 (27.24)

with COG

186319 (68.60)

675287 (45.16)

861606 (48.76)

207169 (64.06)

834581 (35.38)

1041750 (38.84)

with Pfam

172149 (63.38)

519243 (34.72)

691392 (39.13)

187717 (58.04)

647505 (27.45)

835222 (31.14)

with KO

131624 (48.46)

604486 (40.42)

736110 (41.66)

151186 (46.75)

773722 (32.80)

924908 (34.48)

with Enzyme (EC)

73927 (27.22)

356052 (23.81)

429979 (24.33)

83086 (25.69)

440214 (18.66)

523300 (19.51)

with MetaCyc

52288 (19.25)

244997 (16.38)

297285 (16.82)

58809 (18.18)

301799 (12.79)

360608 (13.44)

with KEGG

78361 (28.85)

365246 (24.42)

443607 (25.10)

88171 (27.26)

455581 (19.31)

543752 (20.27)

Genes

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

Table 2. Metagenome scaffold length statistics

U1362A

U1362B

Minimum scaffold length

Num. of a Scaffolds

Total Scaffold a Length

Num. of a Scaffolds

Total Scaffold a Length

All

137575

169908118

212307

168044831

1 kb

25958

122371000

22179

94767619

2.5 kb

10118

98145686

7817

72903412

5 kb

4544

78915922

3232

57281039

10 kb

1933

60882353

1339

44376823

25 kb

615

41195243

435

30631998

50 kb

273

29394283

191

22129275

100 kb

105

18147775

72

13983109

250 kb

15

5160259

11

5597623

500 kb

1

540961

3

2801775

1 mb

0

0

1

1136825

a

Numbers listed are the cumulative sum of all scaffolds equal to or above the scaffold length

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

Table 3. Genome binning method summary

a

Method

Num Bins

Num Bins >10% Complete

Num Bins >50% Complete

Avg. Completeness a (%)

Avg. Contamination a (%)

CONCOCT

66

56

46

90.9

50.8

ESOM

60

54

49

90.4

71.5

MaxBin2

75

66

51

85.7

42.9

MetaBAT

69

64

45

87.7

9.7

CONCOCT (post manual curation in Anvi’o)

252

98

61

84.4

3.3

Average calculated for bins >50% completeness

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

Table 4. Summary of genomes from metagenomes (GFMs) Bin JdFR-01 JdFR-02 JdFR-03 JdFR-04 JdFR-05 JdFR-06 JdFR-07 JdFR-08 JdFR-09 JdFR-10 JdFR-11 JdFR-12 JdFR-13 JdFR-14 JdFR-15 JdFR-16 JdFR-17 JdFR-18 JdFR-19 JdFR-20 JdFR-21 JdFR-22 JdFR-23 JdFR-24 JdFR-25 JdFR-26 JdFR-27 JdFR-28 JdFR-29 JdFR-30 JdFR-31 JdFR-32 JdFR-33 JdFR-34 JdFR-35 JdFR-36 JdFR-37 JdFR-38 JdFR-39 JdFR-40 JdFR-41 JdFR-42 JdFR-43 JdFR-44 JdFR-45 JdFR-46 JdFR-47 JdFR-48 JdFR-49 JdFR-50 JdFR-51 JdFR-52 JdFR-53

Taxonomy Bathyarchaeota (MCG) Thermococcus Methanothermococcus Bathyarchaeota (MCG) Bathyarchaeota (MCG) Bathyarchaeota (MCG) Bathyarchaeota (MCG) Bathyarchaeota (MCG) Bathyarchaeota (MCG) Bathyarchaeota (MCG) Bathyarchaeota (MCG) Unknown Crenarch THSCG THSCG Methanopyrus MBG-E MBG-E MBG-E Methanomicrobia Unknown Euryarch - NRA7 Unknown Euryarch - NRA7 Archaeoglobi Archaeoglobi Archaeoglobi Archaeoglobi Archaeoglobi Archaeoglobi Archaeoglobi Archaeoglobi Archaeoglobi Archaeoglobi Archaeoglobi Archaeoglobi Archaeoglobi Archaeoglobi Archaeoglobi Archaeoglobi Archaeoglobi Archaeoglobi Archaeoglobi Archaeoglobi Archaeoglobi Thermoplasmatales - 20c-4 DHVEG-2 DHVEG-2 Acetothermia (OP1) Acetothermia (OP1) Acetothermia (OP1) Acetothermia (OP1) Acetothermia (OP1) Acetothermia (OP1) Acetothermia (OP1) Unknown Bacteria

Size (Kbp) 200 2362 1476 1068 433 735 908 919 499 1818 1726 301 1672 1635 208 1353 2178 2062 1289 1610 1417 2063 1629 2702 885 645 2360 752 738 1100 2351 1967 479 1088 1703 581 1972 1025 2279 530 1752 2149 1231 519420 1249758 1088071 1622290 1893129 1698648 1292279 1763844 1710598 346303

Contigs 53 44 164 204 101 72 43 45 16 57 12 65 4 6 54 241 344 22 27 209 22 130 101 154 31 21 67 14 185 55 103 120 89 88 167 75 52 99 93 76 103 42 219 139 161 254 230 62 312 255 185 155 76

Genes 262 2708 1639 1345 572 920 1061 1079 621 2051 1932 372 1780 1722 273 1714 2766 2328 1594 2109 1724 2360 1924 3011 1007 713 2680 829 967 1276 2752 2225 593 1246 2029 732 2248 1208 2743 671 1921 2479 1425 527 1454 1259 1811 1992 2003 1516 1919 1870 434

N50 3811 135395 11655 5722 4308 24679 33770 28674 39478 94071 251681 3767 488929 462362 3688 6267 7687 149032 78121 8881 102423 20363 25841 48758 65531 141267 82469 97698 3748 33048 43068 24104 5918 18271 14968 9486 64398 13482 55279 8743 25992 70809 6292 3618 9500 4279 8291 56879 5976 5492 13057 17111 4902

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ:

%GC 38.2 38.4 33.0 41.9 42.4 38.1 39.1 38.6 39.1 51.4 52.0 36.1 37.6 41.9 36.0 50.4 50.1 39.1 43.2 41.1 41.2 39.7 40.0 38.2 39.9 40.2 40.6 40.4 44.9 39.5 42.1 41.8 40.3 41.3 41.2 40.6 44.7 44.5 43.8 42.8 42.3 40.3 37.5 55.3 57.9 65.9 63.7 61.1 59.9 62.2 63.0 62.7 38.4

JdFR-54 JdFR-55 JdFR-56 JdFR-57 JdFR-58 JdFR-59 JdFR-60 JdFR-61 JdFR-62 JdFR-63 JdFR-64 JdFR-65 JdFR-66 JdFR-67 JdFR-68 JdFR-69 JdFR-70 JdFR-71 JdFR-72 JdFR-73 JdFR-74 JdFR-75 JdFR-76 JdFR-77 JdFR-78 JdFR-79 JdFR-80 JdFR-81 JdFR-82 JdFR-83 JdFR-84 JdFR-85 JdFR-86 JdFR-87 JdFR-88 JdFR-89 JdFR-90 JdFR-91 JdFR-92 JdFR-93 JdFR-94 JdFR-95 JdFR-96 JdFR-97 JdFR-98

Chloroflexi - Dehalococcoides Chloroflexi - Dehalococcoides Chloroflexi - Dehalococcoides Chloroflexi - Dehalococcoides Chloroflexi - Dehalococcoides Unknown Chloroflexi Chloroflexi - Anaerolineales Chloroflexi - Anaerolineales Chloroflexi - Anaerolineales Chloroflexi - Anaerolineales Chloroflexi - Anaerolineales Unknown Bacteria Unknown Bacteria Unknown Bacteria Firmicutes Thermosediminibacter Firmicutes – “Can. Desulforudis” Unknown Bacteria EM3 EM3 EM3 EM3 EM3 Deferribacteres Aminicenantes (OP8) Aminicenantes (OP8) Aminicenantes (OP8) Aminicenantes (OP8) Nitrospirae Nitrospirae Nitrospirae Nitrospirae Nitrospirae Nitrospirae Nitrospirae Nitrospirae Caulobacteraceae (Alphaprot.) Cupriavidus (Betaprot.) Pseudomonas (Gammaprot.) Pseudomonas (Gammaprot.) Pseudomonas (Gammaprot.) Acinetobacter (Gammaprot.) Desulfarculaceae (Deltaprot.) Desulfarculaceae (Deltaprot.) Desulfarculaceae (Deltaprot.) Desulfarculaceae (Deltaprot.)

1691297 545773 1798153 776292 908200 2039159 863183 2906759 906207 1318067 2358376 936826 1838550 873684

67 132 95 207 203 527 239 380 183 282 468 193 224 172

1743 599 1964 951 1152 2194 992 2909 1068 1521 2680 1101 2056 978

54623 4081 28676 3716 4219 3822 3541 8915 5571 5066 5560 5093 11163 5273

59.9 62.8 57.1 57.6 57.4 61.0 64.5 64.2 52.3 52.4 52.6 42.2 40.4 41.0

2975652

458

3396

7798

39.7

1779991 921100 1702039 2059798 1305425 1800622 789045 3099908 2326237 2530072 2046026 2914703 2050278 735991 1165085 1526138 2325903 2103466 1861077 1858353 4055294 2666033 3770010 1611073 2230228 368932 2807742 1392833 4102562 932589

36 214 18 19 257 267 171 557 200 42 32 113 95 154 177 342 47 25 58 22 650 703 358 457 225 117 499 312 127 224

1865 596 1729 2026 1371 1861 979 3087 2380 2463 1995 2726 2117 875 1310 1833 2399 2166 2005 1983 4284 3105 3583 1865 2227 478 2872 1551 3814 1057

111790 4521 142680 176033 5573 8536 4417 5995 16386 114879 94220 57497 39421 5174 8103 4439 78042 124076 51364 131863 7375 3723 15126 3440 13223 3101 6282 4552 54652 3945

61.1 35.8 34.8 34.7 32.8 32.9 32.1 52.3 30.5 32.5 32.7 44.5 48.0 43.8 42.1 39.9 41.4 45.2 62.5 62.8 67.8 62.8 60.9 59.3 59.8 37.9 68.3 58.3 57.0 57.7



PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2613v1 | CC BY 4.0 Open Access | rec: 1 Dec 2016, publ: