The molecular mechanisms, evolution, and ecology of

0 downloads 0 Views 19MB Size Report
First and foremost, I would like to thank my advisor Dr. Katherine D. ...... 5.2.5 Identification of polymer cycling organisms based on congruent ...... biopolymer transformations that occur during an EBPR cycle. ...... Gene gain (blue triangle), loss (red triangle) ...... Myung KK, Choi KM, Yin CR, Lee KY, Im WT, Ju HL, et al. (2004).
The molecular mechanisms, evolution, and ecology of bacterial polymer cycling under oscillating feast famine conditions By Ben Ozer Oyserman

A dissertation submitted in partial fulfillment of the requirements for the degree of

Doctor of Philosophy (Civil and Environmental Engineering)

at the UNIVERSITY OF WISCONSIN-MADISON 2016

Date of final oral examination: July 21, 2016 The dissertation is approved by the following members of the Final Oral Committee: Katherine D. McMahon, Professor, Civil and Environmental Engineering and Bacteriology Daniel R. Noguera, Professor, Civil and Environmental Engineering Gregory W. Harrington, Professor, Civil and Environmental Engineering Katrina T. Forest, Professor, Bacteriology Timothy J. Donohue, Professor, Bacteriology

`

© Copyright by Ben Ozer Oyserman 2016 All Rights Reserved

`

i

Abstract Phosphorus (P) is an essential nutrient with a dualistic role in society as both a fertilizer and pollutant. Nutrient removal technologies are therefore a fascinating and interdisciplinary topic with social, economical and ecological implications that impact both global food and water security. Wastewater treatment technology is primarily a biological process that relies on highdensity microbial communities. One of the most common treatment designs achieves P removal by selecting communities that are capable of sequestering P in the storage polymer polyphosphate (polyP). By operating under conditions that separate carbon source and terminal electron acceptors availability, a growth imbalance is introduced selecting for organisms that can overcome this by storing carbon and energy in polymers. This process is called Enhanced Biological Phosphorus Removal (EBPR). A key organism in EBPR treatment systems is Candidatus Accumulibacter phosphatis (Accumulibacter), which achieves P removal by cycling between three polymers: polyP, glycogen and polyhydroxyalkanoate (PHA). The ability to cycle between all three polymers is a phenotype yet to be described in any other organism. In this thesis, I focus first on the molecular mechanisms, ecology and evolution of this unique polymer cycling phenotype in Accumulibacter using metatranscriptomics, comparative genomes and an ancestral genome reconstruction (Chapters 3 and 4). Next, I develop a comparative metatranscriptomics method to identify functional redundancy in ecosystems, and demonstrate its utility identifying additional polymer cycling organisms in an EBPR system (Chapter 5). Finally I introduce a novel photosynthetic EBPR system that achieves phosphorus removal without mechanical aeration (Chapter 6).

ii

Acknowledgments I arrived in Madison September 1st 2011, just as the fall semester began, and completing my PhD in August 2016. In the 5 years I spent working towards a PhD, there was a seemingly infinite amount of work, and well over 40 hours a week doing it. The face value of a PhD degree is not so concrete, but the experience is tangible and evident through the work represented by a thesis and the relationships gained through the journey. I am very proud of my accomplishments, and there are many people I would like to thank for their support and encouragement. First and foremost, I would like to thank my advisor Dr. Katherine D. McMahon for her mentoring and guidance, and for her support of my scientific curiosity. Thank you Trina. It was your excitement for science (originally conveyed through Skype) that brought me to Madison, and it is this excitement I hope to retain throughout my career. Thank you Dr. Daniel R. Noguera for your support. I always left your office feeling more confident about my research. I would have been completely lost at the beginning of my PhD were it not for the guidance of Jackie Bastyr Cooper. Thank you Jackie for your analytical and technical support throughout my PhD. I will miss our conversations. To my many peers, thank you for fruitful collaborations, and conversations that were scientific, political, silly and profound. I will miss sharing a latte with you and I look forward to such venues in the future. To my family, especially to my mom, dad and step dad, thank you, you who have inspired me at every step, sharing with me your love for science, nature, and discovering the unknown. To my two sisters, I am so proud of you both. To my big sister Sivan, you have always

iii looked out for me and been a good role model. Now you are raising your own beautiful family and I hope I can move closer to see them grow. To my little sister Shira, you are Wonder Woman! Finally, I would like to thank my girlfriend Elizabeth. Our journey has taken us to Northern Michigan, Wisconsin and now the Netherlands.

Figure 0-1: Dukesy as a puppy in 2006 in Ann Arbor Michigan. At that time, I was a student at Washtenaw Community College.

iv

Table of Contents Abstract  .....................................................................................................................................................  i   Acknowledgments  ...............................................................................................................................  ii   Table  of  Contents  ................................................................................................................................  iv   List  of  Figures  .........................................................................................................................................  x   List  of  Tables  ........................................................................................................................................  xii   Abbreviations  .....................................................................................................................................  xiii   Chapter  1  Introduction  ......................................................................................................................  1   1.1  Literature  Cited  .........................................................................................................................................  6  

Chapter  2  Objectives  ...........................................................................................................................  9   Chapter  3  Metatranscriptomic  insights  on  gene  expression  and  regulatory  controls  in   Candidatus  Accumulibacter  phosphatis  .....................................................................................  11   3.  Abstract  .......................................................................................................................................................  12   3.1  Introduction  ............................................................................................................................................  13   3.2  Materials  and  Methods  ........................................................................................................................  15   3.2.1  Reactor  Maintenance  .......................................................................................................................................  15   3.2.2  Chemical  Analysis  .............................................................................................................................................  15   3.2.3  Biomass  sample  collection  and  RNA  extraction  ...................................................................................  16   3.2.4  Community  characterization  ........................................................................................................................  16   3.2.5  Library  ....................................................................................................................................................................  17   3.2.6  Sequencing  ...........................................................................................................................................................  17  

v 3.2.7  Bioinformatics  ....................................................................................................................................................  17   3.2.8  Identification  of  highly  expressed  and  highly  dynamic  genes  .......................................................  19   3.2.9  Functional  enrichment  analysis  ..................................................................................................................  19   3.2.10  Operons  and  upstream  motif  identification  ........................................................................................  20   3.3  Results  .......................................................................................................................................................  20   3.3.1  Community  composition,  chemical  analysis,  and  total  raw  read  statistics  .............................  20   3.3.2  Co-­‐expression  patterns  during  a  single  EBPR  cycle  ...........................................................................  21   3.3.3  Differential  transcript  abundances  across  COG  categories  in  a  single  EBPR  cycle  ...............  25   3.3.4  Hydrogen  gas  production  and  glycine  utilization  in  Accumulibacter  ........................................  26   3.4  Discussion  ................................................................................................................................................  28   3.4.1  Carbon  Metabolism  in  Accumulibacter  ...................................................................................................  28   3.4.2  Anaerobic  Reducing  Equivalents  and  Energy  Metabolism  .............................................................  31   3.4.3  Low/high  soluble  phosphorus  and  carbon  correlated  gene  expression  ...................................  33   3.4.4  New  Insights  into  Regulatory  Mechanisms  in  Accumulibacter  .....................................................  34   3.5  Conclusion  ...............................................................................................................................................  37   3.6  Acknowledgements  ..............................................................................................................................  38   3.7  Conflict  of  Interest  Statement  ...........................................................................................................  38   3.8  Supplementary  Material  .....................................................................................................................  38   3.9  Literature  Cited  ......................................................................................................................................  39  

Chapter  4  Ancestral  genome  reconstruction  identifies  the  evolutionary  basis  for  trait   acquisition  in  polyphosphate  accumulating  bacteria  ...........................................................  47   4.  Abstract  .......................................................................................................................................................  48   4.1  Introduction  ............................................................................................................................................  49   4.2  Materials  and  Methods  ........................................................................................................................  53  

vi 4.2.1  Accession  numbers  ...........................................................................................................................................  53   4.2.2  Orthologous  gene  clusters  .............................................................................................................................  53   4.2.3  Phylogenetic  analysis  of  pan  orthologs  ...................................................................................................  54   4.2.4  Gene  flux  analysis  ..............................................................................................................................................  54   4.2.5  Core  Genome  Determination  ........................................................................................................................  55   4.2.6  Metabolic  function  analysis  ..........................................................................................................................  57   4.2.7  Identifying  laterally  derived  genes  with  KEGG  annotations  ..........................................................  57   4.2.8  Reactor  operation,  population  characterization,  kinetics  and  stoichiometry  ........................  58   4.3  Results  .......................................................................................................................................................  61   4.3.1  Identification  of  orthologous  gene  clusters  ...........................................................................................  61   4.3.2  Gene  flux  analysis  ..............................................................................................................................................  61   4.3.3  Substrate  uptake  and  internal  flux  kinetics  and  stoichiometry  ....................................................  63   4.3.4  Evolution  of  Accumulibacter  metabolic  pathways  .............................................................................  65   4.3.5  Phylogenetic  analysis  of  derived  genes  ...................................................................................................  66   4.3.6  Expression  profiles  of  laterally  derived  genes  .....................................................................................  71   4.4  Discussion  ................................................................................................................................................  71   4.4.1  Acetate  activation  ..............................................................................................................................................  72   4.4.2  PHB  synthesis  .....................................................................................................................................................  73   4.4.3  Anaerobic  Reducing  Equivalents:  Glycolysis,  Glycogen  Degradation  and  PntAB  .................  74   4.4.4  Pyruvate  Metabolism  .......................................................................................................................................  75   4.4.5  Phosphorus  and  Counter  Cation  Transport  ...........................................................................................  76   4.4.6  Ferrous  Iron  Transport  ...................................................................................................................................  77   4.4.7  Signaling  and  Regulation  ................................................................................................................................  77   4.4.8  Uncertainty  in  Reconstructions  and  Future  Work  ..............................................................................  78  

vii 4.5  Conclusion  ...............................................................................................................................................  79   4.6  Competing  Interests  .............................................................................................................................  80   4.7  Acknowledgements  ..............................................................................................................................  80   4.8  Literature  Cited  ......................................................................................................................................  81  

Chapter  5  Congruent  transcriptional  responses  identify  diverse  lineages  of  polymer   cycling  organisms  ..............................................................................................................................  90   5.  Abstract  .......................................................................................................................................................  91   5.1  Introduction  ............................................................................................................................................  92   5.2  Methods  ....................................................................................................................................................  94   5.2.1  Reactor  operation  .............................................................................................................................................  94   5.2.2  Genomic  DNA  extraction  and  sequencing  ..............................................................................................  94   5.2.3  Genome  assembly,  draft  genome  binning,  quality  control  and  annotation.  ............................  95   5.2.4  Metatranscriptomic  sequence  processing,  mapping  and  normalization  ..................................  97   5.2.5  Identification  of  polymer  cycling  organisms  based  on  congruent  transcriptional   responses  .........................................................................................................................................................................  98   5.2.6  Transcriptional  marker  gene  selection  ....................................................................................................  98   5.2.7  Scoring  of  CTR  ..................................................................................................................................................  100   5.2.8  Determining  cut-­‐offs  of  significance  for  CTR  scores  .......................................................................  102   5.3  Results  .....................................................................................................................................................  104   5.3.1  Metagenomic  assembly,  binning,  completeness  estimates  and  coverage  .............................  104   5.3.2  Metatranscriptomic  sequence  mapping  ...............................................................................................  105   5.3.4  Identification  of  polymer  cycling  organisms  based  on  CTR  ........................................................  106   5.3.5  Accumulibacter  ...............................................................................................................................................  109   5.3.6  Comamonadaceae  and  Rubrivivax  .........................................................................................................  109  

viii 5.3.7  Actinomyecetales  ...........................................................................................................................................  110   5.3.8  Bacteroidetes  ...................................................................................................................................................  111   5.3.9  Alphaproteobacteria,  Gematimonas  and  Xanthamonadaceae  ....................................................  111   5.4  Discussion  ..............................................................................................................................................  111   5.4.1  A  comparison  of  gene  content  and  CTR  ................................................................................................  112   5.4.2  Congruent  Transcriptional  Responses  in  Accumulibacter  ...........................................................  113   5.4.3  Congruent  Transcriptional  Responses  in  Comamonadaceae  and  Rubrivivax  .....................  113   5.4.4  Congruent  Transcriptional  Responses  in  Actinomycetales  .........................................................  114   5.5  Conclusions,  perspectives  and  limitations  .................................................................................  114   5.6  Competing  Interests  ...........................................................................................................................  116   5.7  Acknowledgements  ............................................................................................................................  116   5.8  Literature  Cited  ....................................................................................................................................  117  

Chapter  6  Community  Assembly  and  Ecology  of  Activated  Sludge  Under   Photosynthetic  Feast/Famine  Conditions  ..............................................................................  122   6.  Abstract  .....................................................................................................................................................  123   6.1  Introduction  ..........................................................................................................................................  125   6.2  Materials  and  Methods  ......................................................................................................................  127   6.2.1  Seed  sludge  ........................................................................................................................................................  127   6.2.2  Reactor  operation  ..........................................................................................................................................  127   6.2.3  Analytical  chemistry  .....................................................................................................................................  128   6.2.4  DNA  extraction  ................................................................................................................................................  129   6.2.5  Construction  and  sequencing  of  16S  and  18S  rRNA  gene  amplicon  libraries  .....................  129   6.2.6  16S  and  18S  rRNA  gene  sequence  processing  ...................................................................................  130   6.2.7  Phylogenetic  analysis  of  OTUs  ..................................................................................................................  130  

ix 6.2.8  Nucleotide  accession  numbers  .................................................................................................................  131   6.3  Results  .....................................................................................................................................................  131   6.3.1  Reactor  performance  ....................................................................................................................................  131   6.3.2  Raw  read  processing  .....................................................................................................................................  138   6.3.3  Oxygen  diffusion  and  theoretical  oxygen  demand  ...........................................................................  138   6.3.4  16S  and  18S  rRNA  gene  based  microbial  community  analysis  ..................................................  139   6.3.5  Phylogenetic  analysis  of  Accumulibacter  and  Nitrosomonas  OTUs  .........................................  143   6.4  Discussion  ..............................................................................................................................................  146   6.4.1  Assessing  photosynthetic-­‐EBPR  function  ............................................................................................  146   6.4.2  Assessing  photosynthetic-­‐EBPR  function  under  continuous  illumination  ...........................  148   6.4.3  Community  assembly  ...................................................................................................................................  150   6.5  Conclusions  ...........................................................................................................................................  155   6.6  Competing  Interests  ...........................................................................................................................  156   6.7  Acknowledgements  ............................................................................................................................  156   6.8  Contributors  ..........................................................................................................................................  156   6.9  Literature  Cited  ....................................................................................................................................  157  

Chapter  7  Future  perspectives  and  conclusions  ..................................................................  165   7.1  Introduction  ..........................................................................................................................................  165   7.2  Suggested  Experiments  .....................................................................................................................  166   7.3  Concluding  remarks  ...........................................................................................................................  171   7.2  Literature  Cited  ....................................................................................................................................  173  

x

List of Figures Figure  0-­‐1:  Dukesy  as  a  puppy  in  2006.  ........................................................................................................................  iii   Figure  1-­‐1  Phosphorus  added  to  half  of  Lake  226  ......................................................................................................  1   Figure  1-­‐2:  A  simplified  line  diagram  of  an  EBPR  system  ........................................................................................  3   Figure  3-­‐1:  Time-­‐series  gene  expression  profiles.  ...................................................................................................  23   Figure  3-­‐2:  Updated  metabolic  model  for  Accumulibacter  ...................................................................................  24   Figure  3-­‐3:  Bar  plots  of  the  number  of  genes  from  each  COG  category  in  various  gene  subsets.  .............  25   Figure  3-­‐4:  Hydrogen  production  and  glycine  uptake  assays.  .............................................................................  26   Figure  3-­‐5:  Upstream  regulatory  motifs  and  locations.  ..........................................................................................  27   Figure  4-­‐1:  A  simplified  flow  diagram  of  EBPR.  .........................................................................................................  50   Figure  4-­‐2:  Determining  core  genes  using  expected  probability  cut-­‐offs.  .......................................................  58   Figure  4-­‐3  Gene  gain  and  loss  in  the  Accumulibacter  lineage.  .............................................................................  60   Figure  4-­‐4:  Five-­‐way  Venn  diagram  depicting  the  number  of  ancestral,  derived,  flexible  and  lineage   specific  genes  within  the  CAP2UW1  Accumulibacter  genome.  ..................................................................  63   Figure  4-­‐5:  A  simplified  biochemical  model  and  the  measured  kinetic  and  stoichiometric  parameters   for  phosphorus,  magnesium,  potassium,  acetate  and  polyhydroxybutyrate  (PHB)  of   Accumulibacter  Clade  IIA.  ......................................................................................................................................  65   Figure  4-­‐6:  The  contribution  of  ancestral  and  derived  genes  to  Accumulibacter  metabolism  .................  66   Figure  4-­‐7:  An  evolutionary  model  of  CAP2UW1  depicting  ancestral,  laterally  derived,  flexible  and   lineage  specific  genes.  ..............................................................................................................................................  69   Figure  5-­‐1:  A  metagenomes  and  binning  workflow  .................................................................................................  96   Figure  5-­‐3:  The  statistically  significant  relationship  between  the  number  of  DNA  and  RNA  reads   mapping  to  each  genome  .....................................................................................................................................  106   Figure  5-­‐4:  The  cut-­‐off  to  determine  whether  a  gene  had  a  CTR  score  ..........................................................  108   Figure  5-­‐5:  Identifying  polymer  cycling  organisms  using  CTR  scores  ............................................................  109  

xi Figure  5-­‐6  A  comparison  of  gene  content  based  and  transcriptionally  based  scoring  demonstrates  that   added  information  of  transcriptional  profiles  allows  the  differentiation  between  genomes  of   similar  content.  This  is  shown  most  clearly  in  the  glycogen,  where  there  are  many  genomes  that   contain  all  gene  involved  in  glycogen  metabolism,  but  only  a  fraction  of  them  have  high  CTR   scores.  .........................................................................................................................................................................  113   Figure  6-­‐1:  Chemical  profiles  during  photosynthetic  EBPR  ...............................................................................  134   Figure  6-­‐2:  Oxygen  profiles  during  photosynthetic  EBPR  ..................................................................................  135   Figure  6-­‐3:  Community  structure  under  photosynthetic  EBPR  ........................................................................  141   6-­‐4:  Nitrifying  community  under  photosynthetic  EBPR  ......................................................................................  142   Figure  6-­‐5:  A  phylogenetic  tree  of  the  Chlorobiales;SJA-­‐28  ................................................................................  145   Figure  6-­‐6:  Conceptual  figure  of  oxygen  production  and  consumption  under  photosynthetic  EBPR.   150   Figure  7-­‐1:  Proposed  mechanisms  for  allosteric  regulation  in  Accumulibacter.  Anaerobic  cleavage  of   glycine  produces  Ac-­‐P,  activating  PHB  synthesis  without  activating  glycolysis  through  the   formation  of  AMP.  ..................................................................................................................................................  168   Figure  7-­‐2  Identifying  the  derived  regulatory  mechanisms  in  Accumulibacter  will  be  a  key  discovery   towards  understanding  the  difference  between  PAO  and  non-­‐PAO  organisms.  ..............................  170  

xii

List of Tables Table  3-­‐1  A  summary  of  the  trend  categories  identified  in  this  study  and  the  patterns  they  display.  ..  19   Table  4-­‐1:  Calculating  the  expected  probability  that  a  gene  is  core.  ..................................................................  56   Table  5-­‐1:  The  marker  genes  used  in  this  investigation  to  identify  PHA,  polyP  and  glycogen  cycling.  ......................................................................................................................................................................................  102   Table  6-­‐1:  The  average  values  for  various  measurements  taken  during  reactor  operation  over  each   phase  ...........................................................................................................................................................................  136   Table  6-­‐2:  Pairwise  comparisons  across  the  three  phases  of  reactor  operation  ........................................  137  

xiii

Abbreviations AAC ADP AMP ATP BNR C CoA COG CTR DAPI DNA DO EBPR FISH GFMs HRAP LCA LED N NADH NADPH NCBI OTU P PAOs PFOR PHA PHB PolyP Ppk RNA RPKM rRNA RT SAG SBNR SOP SRA TCA ThOD VFA

Anaerobic Acetate Contact Adenosine diphosphate Adenosine monophosphate Adenosine triphosphate Biological Nutrient Removal Carbon Coenzyme A Cluster of orthologous genes Congruent Transcriptional Response 4’,6-diamidino-2-phenylindole Deoxyribonucleic acid Dissolved Oxygen Enhanced Biological Phosphorus Removal Fluorescence in situ hybridization Genome from metagenomes High rate algal ponds Last common ancestor Light-emitting diode Nitrogen Nicotinamide adenine dinucleotide Nicotinamide adenine dinucleotide phosphate National Center for Biotechnology Information Operational taxonomic unit Phosphorus Polyphosphate Accumulating Organisms Pyruvate ferrodoxin oxidoreductase Polyhydroxyalkanoate Polyhydroxybutyrate Polyphosphate Polyphosphate kinase Ribonucleic Acid Reads per kilobase per million mapped reads Ribosomal RNA Redox Transition Single Amplified Genome Simultaneous biological nutrient removal Standard operating procedure Sequence read archive Tricarboxylic acid Theoretical oxygen demand Volatile fatty acids

1

Chapter 1 Introduction Phosphorus (P) is an essential element that is incorporated into the most basic and indispensible molecules of life including DNA, RNA, phospholipids and ATP. It is also often a scarce component of an ecosystem (Schindler, 1974) and is thus considered a limiting nutrient. The addition of phosphorus, and other nutrient such as nitrogen, into an environment often lead to rapid growth and the accumulation of biomass (Elser et al., 2007). In this context, phosphorus is both a life-giving nutrient essential for agricultural systems to flourish, but also a noxious pollutant that may stimulate unwanted and uncontrolled growth. The recognition that nutrients are fertilizing agents that pollute waterways was an important factor that prompted the development, and rapid global adoption, of advanced biological wastewater treatment systems such as the activated sludge system (Ardern & Lockett, 1914).

-P

bar

rier

+P Figure 1-1 Phosphorus added to half of Lake 226 (+P) resulted in algal growth (Schindler, 1974)

2 The activated sludge systems developed by Ardern and Locket just over 100 years ago was revolutionary because it introduced the concept of recycling microbial biomass to treat wastewater. Recycling biomass provides two features: 1) a mechanism for achieving high densities of microorganisms and 2) a mechanism for the ecological selection of organisms based on their growth characteristics and physiology. Consequently, wastewater engineers may vary the configuration and operational parameters of a wastewater facility to provide conditions that select for specific biochemical processes such as nitrification, denitrification and polyphosphate (polyP) accumulation. One widely applied configuration to enhance P removal relies on exposing biomass to alternating ‘anaerobic feast’ and ‘aerobic famine’ environments (Figure 1-2, Seviour et al., 2003). Under these operational parameters, organisms that store polyP during periods of aerobic famine may be enriched, hence providing excess P removal than what may be achieved from microbial growth alone. Organisms that store polyP are termed polyPaccumulating organisms (PAOs) and the wastewater treatment process that enriches for them is called Enhanced Biological Phosphorus Removal (EBPR). By operating EBPR systems under cycles of ‘anaerobic feast’ and ‘aerobic famine’, the availability of organic carbon and terminal electron acceptor for respiration are temporally and/or physically separated. The uncoupling of resource availability promotes the growth of organisms that store carbon and energy when abundant, for subsequent use when other resources are limiting. Thereby, internal storage compounds complement the fluctuating availability of external resources. Many diverse storage compounds may be found under feast famine conditions, including such polymers as polyP, glycogen, and PHA. It is the synthesis of polyP in particular which makes EBPR systems effective at P removal.

3 Aeration basin Influent

Anaerobic Aerobic zone zone

Clarifier Effluent

Return Activated Sludge

Relative Concentration

Legend Glycogen VFA SRP PHA Travel time through aeration basin

Figure 1-2: A simplified line diagram of an EBPR system showing the anaerobic zone, aerobic zone, clarifier and return activated sludge. “Activated sludge” refers to the biomass returned to the beginning of the process after settling in the clarifier and is an essential aspect of modern wastewater treatment (see text). The chemical profile for glycogen, volatile fatty acids (VFA), phosphorus (SRP), and polyhydroxyalkanoate (PHA) observed during EBPR is shown below.

Treatment systems that operate EBPR display a distinctive chemical cycling characterized as follows: During the anaerobic phase there is a decrease in soluble volatile fatty acids (VFAs), and an increase in soluble P with a concomitant increase in intracellular PHA, and polyP, as well as a decrease in intracellular glycogen (Figure 1-2). This is followed by an aerobic phase in which soluble P decreases, and intracellular PHA and P increase with a concomitant decrease in glycogen. Based on this hallmark profile, polyP-accumulating organisms (PAOs) with this general physiology was predicted to exist even before their discovery. However, cultivation dependent techniques failed to identify such an organism. The advent of culture independent techniques rapidly changed this, and the 16S rRNA genes of a Rhodocyclus-related organism, subsequently named Candidatus Accumulibacter (henceforth Accumulibacter) displaying all the hallmarks of a PAO was sequenced (Hesselmann et al., 1999; Zilles, Peccia, Kim, et al., 2002; Zilles, Peccia & Noguera, 2002). Since this initial discovery, an increasingly fine-scale understanding of Accumulibacter has come into focus. Initial investigations showed there was extensive diversity in the lineage as

4 shown by both 16S and polyphosphate kinase (ppk) gene phylogenies (McMahon et al., 2002, 2007; Kim et al., 2010). The ppk gene continues to be a good marker for both sequencing and fluorescent in situ hybridization, and the list of known lineages and increasingly specific primers and probes are designed (Camejo et al., 2016; Mao et al., 2015; Zhang et al., 2016). In conjunction with increasingly specific primers sets, the differentiation of traits across lineages becomes accessible, and special attention has been given to differentiating denitrifying and nondenitrifying lineages of PAO (Flowers et al., 2009; Kim et al., 2013; Camejo et al., 2016). The omics revolution in recent years has seen an explosion in the number of genomes available from diverse lineages of Accumulibacter (García Martín et al., 2006; Flowers et al., 2013; Skennerton et al., 2015; Mao et al., 2014), and has enabled various proteomic (Wilmes, Andersson, et al., 2008; Wexler et al., 2009) and metatranscriptomic investigations (He et al., 2010; He & McMahon, 2011; Mao et al., 2014). In this thesis I focus on the physiological (metabolic), ecological, evolutionary, and engineering aspects of P removal. First I use metatranscriptomics to investigate the gene expression patterns and metabolism of a single clade within the Accumulibacter lineage (Chapter 3). While conducting the metatranscriptomic analysis, 8 additional genomes became available and allowed for a large-scale comparative analysis of Accumulibacter evolution. For this analysis, I identified the evolutionary changes in the Accumulibacter lineage that occurred at the last common ancestor (LCA) of the lienage (Chapter 4). The genes inferred gained at the LCA allowed me to develop a hypothesis of the metabolic requirements necessary for the transition from non-PAO to PAO occurred. I next develop a novel high throughput method to characterize functional redundancy, and using the information gathered from the metabolic and evolutionary investigations, predict additional organisms capable of polymer cycling using their

5 transcriptional responses (Chapter 5). In the final chapter, I change of gears and address sustainability in wastewater treatment, investigating the microbial community dynamics of a novel EBPR treatment method that couples the phosphorus removal capabilities of Accumulibacter with the photosynthetic oxygen production of algae (Chapter 6). Finally, I end with some concluding remarks and suggestions for continued work in EBPR systems (Chapter 7).

6 1.1 Literature Cited 1. Ardern E, Lockett WT. (1914). OXIDATION OF SEWAGE WITHOUT FILTERS. J Soc Chem Ind 33:523–539. 2. Camejo PY, Owen BR, Martirano J, Ma J, Kapoor V, Santodomingo J, et al. (2016). Candidatus Accumulibacter phosphatis clades enriched under cyclic anaerobic and microaerobic conditions simultaneously use different electron acceptors. Water Res 102:125–137. 3. Elser JJ, Bracken MES, Cleland EE, Gruner DS, Harpole WS, Hillebrand H, et al. (2007). Global analysis of nitrogen and phosphorus limitation of primary producers in freshwater, marine and terrestrial ecosystems. Ecol Lett 10:1135–1142. 4. Flowers JJ, He S, Malfatti S, del Rio TG, Tringe SG, Hugenholtz P, et al. (2013). Comparative genomics of two ‘Candidatus Accumulibacter’ clades performing biological phosphorus removal. ISME J 7:2301–14. 5. Flowers JJ, He S, Yilmaz S, Noguera DR, McMahon KD. (2009). Denitrification capabilities of two biological phosphorus removal sludges dominated by different ‘Candidatus Accumulibacter’ clades. Environ Microbiol Rep 1:583–588. 6. García Martín H, Ivanova N, Kunin V, Warnecke F, Barry KW, McHardy AC, et al. (2006). Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge communities. Nat Biotechnol 24:1263–9. 7. He S, Kunin V, Haynes M, Martin HG, Ivanova N, Rohwer F, et al. (2010). Metatranscriptomic array analysis of ‘Candidatus Accumulibacter phosphatis’-enriched enhanced biological phosphorus removal sludge. Environ Microbiol 12:1205–17. 8. He S, McMahon KD. (2011). ‘Candidatus Accumulibacter’ gene expression in response

7 to dynamic EBPR conditions. ISME J 5:329–40. 9. Hesselmann RPX, Werlen C, Hahn D, Roelof Van Der Meer J, Zehnder AJB. (1999). Enrichment, Phylogenetic Analysis and Detection of a Bacterium That Performs Enhanced Biological Phosphate Removal in Activated Sludge. Syst Appl Microbiol 22:454–465. 10. Kim JM, Lee HJ, Kim SY, Song JJ, Park W, Jeon CO. (2010). Analysis of the fine-scale population structure of ‘Candidatus accumulibacter phosphatis’ in enhanced biological phosphorus removal sludge, using fluorescence in situ hybridization and flow cytometric sorting. Appl Environ Microbiol 76:3825–35. 11. Kim JM, Lee HJ, Lee DS, Jeon CO. (2013). Characterization of the denitrificationassociated phosphorus uptake properties of ‘Candidatusaccumulibacter phosphatis’ clades in sludge subjected to enhanced biological phosphorus removal. Appl Environ Microbiol 79:1969–1979. 12. Mao Y, Graham DW, Tamaki H, Zhang T. (2015). Dominant and novel clades of Candidatus Accumulibacter phosphatis in 18 globally distributed full-scale wastewater treatment plants. Sci Rep 5:11857. 13. Mao Y, Yu K, Xia Y, Chao Y, Zhang T. (2014). Genome Reconstruction and Gene Expression of ‘Candidatus Accumulibacter phosphatis’ Clade IB Performing Biological Phosphorus Removal. Environ Sci Technol 48:10363–10371. 14. McMahon KD, Dojka MA, Pace NR, Jenkins D, Keasling JD. (2002). Polyphosphate Kinase from Activated Sludge Performing Enhanced Biological Phosphorus Removal †. Society 68:4971–4978. 15. McMahon KD, Yilmaz S, He S, Gall DL, Jenkins D, Keasling JD. (2007). Polyphosphate

8 kinase genes from full-scale activated sludge plants. Appl Microbiol Biotechnol 77:167– 73. 16. Schindler DW. (1974). Eutrophication and Recovery in Experimental Lakes: Implications for Lake Management. Science (80- ) 184:897–898. 17. Seviour RJ, Mino T, Onuki M. (2003). The microbiology of biological phosphorus removal in activated sludge systems. FEMS Microbiol Rev 27:99–127. 18. Skennerton CT, Barr JJ, Slater FR, Bond PL, Tyson GW. (2015). Expanding our view of genomic diversity in Candidatus Accumulibacter clades. Environ Microbiol 17:1574– 1585. 19. Wexler M, Richardson DJ, Bond PL. (2009). Radiolabelled proteomics to determine differential functioning of Accumulibacter during the anaerobic and aerobic phases of a bioreactor operating for enhanced biological phosphorus removal. Environ Microbiol 11:3029–44. 20. Wilmes P, Andersson AF, Lefsrud MG, Wexler M, Shah M, Zhang B, et al. (2008). Community proteogenomics highlights microbial strain-variant protein expression within activated sludge performing enhanced biological phosphorus removal. ISME J 2:853–64. 21. Zhang AN, Mao Y, Zhang T. (2016). Development of Quantitative Real-time PCR Assays for Different Clades of ‘Candidatus Accumulibacter’. Sci Rep 6:23993. 22. Zilles JL, Peccia J, Kim M, Hung C, Noguera DR. (2002). Involvement of RhodocyclusRelated Organisms in Phosphorus Removal in Full-Scale Wastewater Treatment Plants. Society 68:2763–2769. 23. Zilles JL, Peccia J, Noguera DR. (2002). Microbiology of enhanced biological phosphorus removal in aerated-anoxic Orbal processes. Water Environ Res 74:428–36.

9

Chapter 2 Objectives Chapter 3 - Metatranscriptomic insights on gene expression and regulatory controls in Candidatus Accumulibacter phosphatis The objective of this chapter was to use a time-series transcriptional analysis to identify patterns of gene expression and regulatory mechanisms involved in the metabolism of Accumulibacter. An ancillary goal was to identify if any novel metabolic traits necessary for the Accumulibacter metabolism and to test these experimentally verify any of these hypothesis. Chapter 4 - Ancestral genome reconstruction identifies the evolutionary basis for trait acquisition in polyphosphate accumulating bacteria In this chapter I provide an evolutionary perspective of the polymer cycling phenotype in Accumulibacter. The objective of this analysis was to identify the metabolic innovation that occurred in the last common ancestor of Accumulibacter, with a focus on genes gained through horizontal gene transfer at this ancestral node. The Accumulibacter ‘pan-genome’ is thus divided into the following portions: an ancestral core, derived core, flexible and lineage specific. Chapter 5 - Congruent transcriptional responses identify diverse lineages of polymer cycling organisms The objective of this chapter was to use comparative metatranscriptomics to develop a highthroughput method that enables the identification of functional redundancy in the EBPR system. Chapter 6 - Community Assembly and Ecology of Activated Sludge Under Photosynthetic Feast/Famine Conditions

10 Here we present the first ever coupling of photosynthetic processes with EBPR in order to investigate the community structure under these novel conditions so that they may be further exploited. Chapter 7 - Future perspectives In this final chapter, I provide context on the importance of polymers in overall metabolic processes to set the stage with some suggested experiments for future researchers and investigations in EBPR.

11

Chapter 3 Metatranscriptomic insights on gene expression and regulatory controls in Candidatus Accumulibacter phosphatis

Ben O. Oyserman1,2, Daniel R. Noguera1, Tijana Glavina del Rio3, Susannah G. Tringe3, Katherine D. McMahon1,2,†

1

Department of Civil and Environmental Engineering, University of Wisconsin at Madison,

Madison, WI, 53706, USA; 2Department of Bacteriology, University of Wisconsin at Madison, Madison, WI, 53706, USA; 3US Department of Energy Joint Genome Institute, Walnut Creek, CA 94598, USA.



corresponding author

Citation: Oyserman, B.O., Noguera, D.R., del Rio, T.G., Tringe, S.G., McMahon, K.D., 2015. Metatranscriptomic insights on gene expression and regulatory controls in Candidatus Accumulibacter phosphatis. ISME J 1–13. doi:10.1038/ismej.2015.155

Author Contributions: BOO & KDM conceived and designed the experiment. TGdR and SGT provided support from JGI. BOO conducted all experimental work, data analysis and bioinformatics. BOO wrote the manuscript with guidance from DRN and KDM.

12 3. Abstract Previous studies on enhanced biological phosphorus removal (EBPR) have focused on reconstructing genomic blueprints for the model polyphosphate accumulating organism Candidatus Accumulibacter phosphatis. Here, a time series metatranscriptome generated from enrichment cultures of Accumulibacter was used to gain insight into anaerobic/aerobic metabolism and regulatory mechanisms within an EBPR cycle. Co-expressed gene clusters were identified displaying ecologically relevant trends consistent with batch cycle phases. Transcripts displaying increased abundance during anaerobic acetate contact were functionally enriched in energy production and conversion, including up regulation of both cytoplasmic and membrane bound hydrogenases demonstrating the importance of transcriptional regulation to manage energy and electron flux during anaerobic acetate contact. We hypothesized and demonstrated hydrogen production after anaerobic acetate contact, a previously unknown strategy for Accumulibacter to maintain redox balance. Genes involved in anaerobic glycine utilization were identified and P release after anaerobic glycine contact demonstrated, suggesting that Accumulibacter routes diverse carbon sources to acetyl-CoA formation via previously unrecognized pathways.

A comparative genomics analysis of sequences upstream of co-

expressed genes identified two statistically significant putative regulatory motifs. One palindromic motif was identified upstream of genes involved in PHA synthesis and acetate activation and is hypothesized to be a phaR binding site, hence representing a hypothetical PHA modulon. A second motif was identified approximately -35 basepairs upstream of a large and diverse array of genes and hence may represent a sigma factor binding site. This analysis provides a basis and framework for further investigations into Accumulibacter metabolism and the reconstruction of regulatory networks in uncultured organisms.

13 3.1 Introduction Enhanced biological phosphorus removal (EBPR) is a widespread environmental biotechnology that exploits microorganisms capable of polyphosphate (polyP) accumulation to remove phosphorus (P) from wastewater (Hesselmann et al., 1999; Bond et al., 1995). The most widely studied organism responsible for EBPR in many wastewater treatment plants is named Candidatus Accumulibacter phosphatis (henceforth Accumulibacter) (Nielsen et al., 2012). Although not yet isolated, a great deal has been learned about Accumulibacter physiology by studying enrichment cultures in laboratory scale bioreactors. Engineers have used this information to build quantitative metabolic models to predict how carbon, P, energy, and reducing equivalents move through the wastewater ecosystem (Oehmen et al., 2010; Comeau et al., 1986). The accuracy and utility of these models depends heavily on an accurate understanding of Accumulibacter physiology. It is well established that alternating cycles of carbon rich (feast) anaerobic and carbon poor (famine) aerobic environments are essential for successful EBPR operation (Oehmen et al., 2007). Under anaerobic conditions, Accumulibacter transports short-chain fatty acids (e.g., acetate and propionate) into the cell and stores the carbon as polyhydroxyalkanoates (PHA). Current metabolic models generally assume that the energy for this process is obtained from ATP generated through polyP and glycogen degradation as well as from reducing equivalents generated through the degradation of glycogen and the anaerobic operation of the TCA cycle (Wexler et al., 2009; Zhou et al., 2009; Oehmen et al., 2010; Filipe & Daigger, 1998). Under subsequent aerobic conditions, PHA degradation supplies carbon and energy for growth and replenishment of glycogen and polyP storage molecules (Mino et al., 1998; Comeau et al., 1986).

14 The ability to store large quantities of polyP has lead researchers to refer to organisms that display the aforementioned phenotype as polyphosphate accumulating organisms (PAOs). Although the ability to produce polyP and carbon storage polymers such as PHA and glycogen are phylogenetically dispersed traits (Wood & Clark, 1988; Reddy et al., 2003), by linking these metabolic processes and synchronizing them with key environmental conditions, Accumulibacter and other PAOs have a highly specialized and biotechnologically important phenotype that is the foundation of EBPR. In order to validate the underlying assumptions embedded in the metabolic models used by engineers, it is necessary to dissect the molecular mechanisms responsible for this synchronization. The sequencing and completion of the first Accumulibacter genome (García Martín et al., 2006) has facilitated numerous transcriptional investigations, with the hypothesis that Accumulibacter’s highly coordinated physiology is the result of dynamic gene expression. Changes in transcript abundances have been previously investigated with RT-qPCR, microarrays, and RNA-seq under both stable and perturbed conditions (He et al., 2010; He & McMahon, 2011a; Mao et al., 2014). However, these previous studies either targeted a handful of specific genes or examined limited time points during the anaerobic-aerobic cycle. Here we used high-resolution time-series metatranscriptomics with next generation RNA-seq to identify highly expressed/dynamic genes and to identify putative co-regulated gene clusters. We then used comparative genomics to identify putative regulatory sequences and explore the underlying control mechanisms of the EBPR phenotype. Our results further validated some previously hypothesized aspects of Accumulibacter metabolism, uncovered important metabolic pathways that have been previously overlooked, and identified two putative sequence motifs providing the first step in determining gene expression regulatory mechanisms in Accumulibacter.

15 3.2 Materials and Methods 3.2.1 Reactor Maintenance A single bioreactor was used in this study. Detailed reactor description and operating conditions are provided in Garcia Martin et al. (2006). Briefly, the sequencing batch reactor was operated with a 2-L working volume and was fed with a mineral medium with acetate as a primary carbon source. The hydraulic retention time was 12 hours and the sludge retention time was 4 days. The anaerobic/aerobic cycle time was 6 hours with 140 min anaerobic contact (sparging with N2 gas), 190 min aerobic contact (sparging with air) and 30 minute settling time. Nitrification was inhibited using allylthiourea. For the experiment described herein, the cycle differed from normal operation in that acetate was fed over a 60 min period in order to elongate acetate contact. Representative phosphate release and uptake, PHB synthesis and acetate profiles across the cycle are shown in Supplemental Figure 1A. Steady state operation is demonstrated by characteristic high anaerobic and low aerobic P concentrations for the month before the experiment (Supplemental Figure 1B). 3.2.2 Chemical Analysis All chemical analyses were conducted during the same reactor cycle used for transcriptomic analysis (on May 28th, 2013), except for hydrogen production assays, which were conducted after RNA-seq results were analyzed. To monitor the EBPR cycle, soluble phosphate, total suspended solids, volatile suspended solids, and acetate were measured using previously described methods (Flowers et al., 2009). Polyhydroxyalkanoate analysis was performed using a GC-MS as outlined previously (Comeau et al., 1988). Hydrogen production was measured under anaerobic conditions using six batch tests conducted in 150 mL septum bottles with 25 mL of

16 sludge. Three were negative controls in which no acetate was fed to get background hydrogen production rates and three were fed with 0.18 mmol of acetate. Hydrogen production was measured using a reduction gas analyzer (ta3000 Gas Analyzer, Trace Analytical, Ametek). To test the viability of glycine as a carbon source, anaerobic batch tests were conducted in triplicate with negative control (no carbon addition), positive control (acetate addition) and glycine for a total of nine batch tests conducted in 60 mL septum bottles with 50 mL of sludge. Approximately 0.06 mmol of acetate and glycine were added respectively and phosphorus release was measured as previously described (Flowers et al., 2009). 3.2.3 Biomass sample collection and RNA extraction Six biomass samples were collected across a single reactor cycle to capture key transition points in the EBPR cycle (Supplemental Figure 1A and Supplemental Table 1). Bulk biomass (2-mL) was collected in microcentrifuge tubes. Samples were centrifuged, supernatant removed, and cell pellets flash frozen in dry ice and ethanol bath within three minutes of collection. RNA was extracted from the samples using an RNeasy kit (Qiagen, Valencia, California, U.S.A.) with a DNase digestion step. RNA integrity and DNA contamination was assessed using the Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, California). 3.2.4 Community characterization Fluorescence in situ hybridization (FISH) was conducted using PAOMIX probes to target all Accumulibacter clades (Crocetti et al., 2000), Acc-I-444 to target clade IA, and Acc-II-444 to target Clade IIA, as previously described (Flowers et al., 2009). Cells were counter-stained with 4',6-diamidino-2-phenylindole (DAPI). A Zeiss Imager.Z2 equipped with a AxioCam MRm

17 camera was used to image fluorescing cells, which were then enumerated using ImageJ software (Abràmoff et al., 2004). 3.2.5 Library Ribosomal RNA (rRNA) was removed from 1µg of total RNA using Ribo-Zero™ rRNA Removal Kit (Bacteria) (Epicentre). Libraries were generated using the Truseq Stranded mRNA sample preparation kit (Illumina). Briefly, the rRNA-depleted RNA was fragmented and reversed transcribed using Superscript II (Invitrogen), followed by second strand synthesis. The fragmented cDNA was treated with end-repair, A-tailing, adapter ligation and 10 cycles of PCR amplification. 3.2.6 Sequencing The libraries were quantified using KAPA Biosystem’s next-generation sequencing library qPCR kit and run on a Roche LightCycler 480 real-time PCR instrument. The quantified libraries were then prepared for sequencing on the Illumina HiSeq 2000 sequencing platform utilizing a TruSeq paired-end cluster kit, v3, and Illumina’s cBot instrument to generate a clustered flowcell for sequencing. Sequencing of the flowcell was performed on the Illumina HiSeq 2000 sequencer using a TruSeq SBS sequencing kit 200 cycles, v3, following a 2x150 indexed run recipe. Sequence data was deposited at IMG/M under Taxon Object IDs 33000023413300002346. 3.2.7 Bioinformatics Reads were quality trimmed and quality statistics were calculated using FASTX Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) (Supplemental Figure 2). Ribosomal RNA sequences were removed with SortMeRNA using six built in databases for bacterial, archaeal, and

18 eukaryotic small and large subunits (Kopylova et al., 2012). Reads that passed filtering were then mapped to the Accumulibacter clade IIA strain UW-1 (CAP2UW1) (García Martín et al., 2006) chromosome and plasmids using the BWA mem algorithm with default parameters (Li & Durbin, 2009). Read counts were then calculated using HTseq with the ‘intersection strict’ parameter (Anders et al., 2014). Read counts were normalized by total reads in the sequencing run, the number of reads that remained after rRNA filtering, and the fraction of total reads that aligned to the Accumulibacter genome (Supplemental Table 1-3). Non-rRNA reads represented between 35 and 65 percent of all reads. Reads were then converted to log base two reads per kilo base per million (RPKM) (Mortazavi et al., 2008) (Supplemental Table 4-5). Before clustering, all genes that did not have at least one observation with a log2(RPKM) read count of one or greater from the minimum observation were removed from the data set. Of 4735 genes in strain UW-1, 3893 passed this filter. These genes were further binned into clusters of co-expressed genes (Supplemental Table 6, Figures S3-S7) based on an uncentered Pearson similarity metric of expression profiles followed by a centroid linkage clustering method using Java Treeview (Saldanha, 2004) (Supplemental Figure 8). Clusters of co-expressed genes were then manually curated in Java TreeView. The resulting clusters are henceforth referred to as trend categories and are named with characters (e.g. Trend Category A). Trend categories were then classified into patterns (Table 3-1) through visual inspection and comparison with known solute and biopolymer transformations that occur during an EBPR cycle.

19

Pattern

# of genes

Trend Categories

# of Trend Categories

Anaerobic Acetate Contact

126

Q,R

2

Redox Transition

697

AA,BB,CC,DD,EE,FF,GG,HH,NN

9

Aerobic

1844

II,JJ,KK,PP,QQ,RR,SS,TT,VV,WW,XX,YY, AAA,BBB,CCC,HHH,III,JJJ,KKK,LLL,MMM

21

High Phosphorus

40

O,P

2

Low Phosphorus

438

F,EEE,GGG,NNN,OOO,PPP ,QQQ,RRR,SSS,TTT,UUU,VVV

12

Sum

3145

-

46

Table 3-1 A summary of the trend categories identified in this study and the patterns they display.

3.2.8 Identification of highly expressed and highly dynamic genes Genes that displayed the highest relative transcript abundance and those that displayed the largest changes in relative transcript abundance were identified as follows: Each gene was represented as a vector of 6 RNA relative transcript abundance values. The maximum value of this vector represents the maximum relative expression of that gene. The maximum minus the minimum value of this vector represents the relative change in abundance over the entire cycle. Using these statistics, the genes may be ranked by those that show the highest relative expression and largest relative change. Based upon the distribution of maximum expression values, a cut-off of 350 was determined to identify the highly expressed, and highly dynamic genes (Supplemental Figure 9). 3.2.9 Functional enrichment analysis Numerous subsets of genes were identified in this investigation including highly expressed/dynamic genes and trend categories. To determine if these subsets were enriched in specific functions, a bootstrap method was employed to determine how the distribution of COG functions compared to a null model produced from 1000 randomly generated gene subsets of equal size to the gene subset in question. The randomly generated null models for each gene subset was then compared to the observed abundance using a one sided t-test.

20 3.2.10 Operons and upstream motif identification Putative operons were determined using the following set of criteria 1) genes must have the same orientation; 2) adjacent genes were co-expressed with a cut-off correlation of 0.7; and 3) there was an intergenic region of 1000 base pairs or less (Supplemental Table 7). For each identified trend category, upstream sequences of called operons were analyzed for putative upstream motifs using MEME (Bailey et al., 2009) (meme input.fasta -bfile background.txt – mod zoops –evt 0.05 –dna –nmotifs 10 –minsites 3 –o output), with one motif identified in a specific trend category (Supplemental Table 7). Additional motif sites were identified based on sequence homology using MAST (Bailey et al., 1998) (mast meme.txt input.fasta -bfile background.txt -oc . -nostatus -remcorr -ev 10 -norc -m 1). In addition, a motif search was conducted on the highly dynamic genes that displayed the anaerobic acetate contact pattern. 3.3 Results 3.3.1 Community composition, chemical analysis, and total raw read statistics On the date samples were collected for transcriptomics, P removal exceeded 99% and carbon and P dynamics characteristic of EBPR systems were observed (Supplemental Figure 1A). Accumulibacter relative abundance measured by FISH was 80% of total DAPI-stained cells and Clade IIA accounted for 99% of the total Accumulibacter cells. P measurements at the end of aerobic and anaerobic phases during the month of the investigation indicated stable state operation (Supplemental Figure 1B). Illumina sequencing of ribosomal-depleted total RNA resulted in 1 461 769 869 reads across 6 samples (Supplementary Table 1). Quality filtering of reads removed 695 865 184, resulting in 765 904 685 for downstream analysis. Resulting reads were then mapped to the finished

21 Accumulibacter clade IIA strain UW-1 reference genome (García Martín et al., 2006; Flowers et al., 2013), where 104 844 897 reads were aligned. 3.3.2 Co-expression patterns during a single EBPR cycle Sampling across a single EBPR cycle and subsequent hierarchical clustering analysis allowed identification of clusters of co-expressed transcripts henceforth referred to as trend categories. Trend categories had an average Pearson correlation of 0.96 and an average size of ~50 genes (Supplemental Table 8, Supplemental Figures 3-7). Ecologically relevant patterns of transcript abundance were identified that corresponded with the following important EBPR stages: High Phosphorus concentration (Figure 3-1B), Low Phosphorus concentration (Figure 3-1C), Anaerobic Acetate Contact (AAC) (Figure 3-1D), Redox Transition (RT) (Figure 3-1E), Aerobic (Figure 3-1F). The largest number of transcripts displayed the Aerobic pattern, followed by RT, Low Phosphorus, AAC and finally High Phosphorus patterns. A summary of the number of genes, trend categories and which trend categories were assigned to each pattern are given in Table 3-1. Ecologically relevant expression profile patterns were overlaid on a map of central Accumulibacter metabolic processes to enable interpretation of how they might relate to regulation of key pathways (Figure 3-2, see Supplemental Figure 11 for model with locus tags). Biochemical transformations were color-coded based on the expression pattern to which the corresponding gene was assigned. Key genes identified as up regulated after anaerobic acetate contact (AAC) include those involved in acetate activation, PHB synthesis (CAP2UW1_3191) and regulation (phasins, CAP2UW1_0642-CAP2UW1_0643), glycine cleavage (CAP2UW1_1955-CAP2UW1_1960), phospholipid monolayer formation (CAP2UW1_3192, CAP2UW1_3266, CAP2UW1_3702,

22 CAP2UW1_0341, CAP2UW1_2586), carbonic anhydrase (CAP2UW1_1967), and hydrogenases (CAP2UW1_0998-CAP2UW1_0999, CAP2UW1_2286). The presence of High P was accompanied by relatively high expression rates of various transporters such as low affinity P transporters (Pit, CAP2UW1_2085), sulfur transporters (SulP, CAP2UW1_2094) as well as porins (CAP2UW1_1151, CAP2UW1_1152) and the regulatory phoU (CAP2UW1_2086, CAP2UW1_2093, CAP2UW1_3728). Low P conditions corresponded to increased relative transcript abundance of Calvin cycle genes as well as those involved in high affinity P transporters (Pst, CAP2UW1_2002-CAP2UW1_2008) numerous regulatory genes including phoR (CAP2UW1_1995), phoB (CAP2UW1_1996) and phoD (CAP2UW1_1732).

23

Figure 3-1: Time-series gene expression profiles. Including soluble phosphorus and acetate (Panel A) and gene expression profile patterns (Panels B-F). Gray and white backgrounds represent anaerobic and aerobic phases respectively. Panels B-F: Each panel depicts a single trend category that is representative of an ecologically relevant pattern. Genes were assigned to trend categories based on co-expression analysis using hierarchical clustering, as explained in the Methods. Trend categories were then binned into pattern groups with putative ecological relevance by manually inspecting the gene expression profiles relative to soluble phosphorus, acetate, PHB profiles as well as redox state (aerobic/anaerobic). Each solid line represents the change in relative transcript abundance (measured as log(RPKM,2)) compared to its minimum value. Panel B: Transcripts displaying the High Phosphorus pattern had transcript abundance that were relatively high until the end of the aerobic phase when phosphorus was low. In this panel they are represented by Trend Category P Panel C: Transcripts displaying the Low Phosphorus patterns had transcript abundance that were relatively low until the end of the anaerobic phase when phosphorus levels are low. In this panel, the transcripts within Trend Category PPP are representative of this pattern. Panel D: Transcripts displaying the Anaerobic Acetate Contact pattern increased drastically after acetate contact and peaked before oxygen contact. In this panel, the transcripts within Trend Category Q are representative of this pattern. Panel E: Transcripts displaying the Redox Transition pattern displayed a pattern of increasing abundance throughout the anaerobic period, peaking after oxygen contact. In this panel, the transcripts within Trend Category DD are representative of this pattern Panel F: Transcript displaying the Aerobic trend category increased in relative abundance during the aerobic phase. In this panel, the transcripts within Trend Category RR are representative of this pattern

FNR

PhoB

PhoR

Gly

ADPGlu

Glu-1-p

Glu

E4-P

24

NOS NIR NOR

NAP

acyl-CoA Mg

Glu-6-p

PGP

Fru-6-P

CDPD

Fru-1-6P

Ri5-P

Ri1,5P2 CO2

NADP

NAD

PPi

CoA

FDox

FDr

ATP H2

ADP

ATP

K

PPi

Ca PolyP

pyrophosphateenergized proton pump H+

AMP ATP

GTP GDP

K+ Transporter

ADP

Mg2+ Transporter

(Phospholipid Monolayer)

SulP

PhaP

PhaR Polyhydroxybutyrate

Pit Porin

3HB-CoA

3-PG

Figure Legend NADH

ATPase

Pi +ADP

1,3-bPG

Ru5-P

NADPH

C. I

C. II

PE

Hydrogenase

G3P

SBP

X5P

C. III

Ptd-L-Ser

S7-P Glyc-P

C. IV

Long Chain FA

GlyA

PhoD

Pst PhoU

Periplasmic Hydrogenase

PntAB

Pi

AcAc-CoA

2-PG PEP

Ac-P Pyr

AMP

Redox Transition An Acetate Contact Aerobic High P Low P No Pattern AAC Upstream Motif

Ac

Ac-CoA Ac-AMP

Oxaloacetate Citrate

Malate

Malonyl-CoA HCO3

Fumarate

Glyoxylate

QH2 Q

Succinate GTP

Succinyl-CoA

CO2 H20

cis-Aconitate

GDP

Ac Symporter

Glycine

Isocitrate a-Ketoglutarate MethylMalonyl-CoA

Figure 3-2: Updated metabolic model for Accumulibacter with biochemical reactions color-coded based on the expression profile pattern to which the corresponding gene was assigned. Genes involved in PHB formation demonstrate the anaerobic acetate contact pattern and are colored green. Genes involved in the TCA cycle/glycolysis generally demonstrated high expression levels across the Redox Transition (RT) and are colored blue. Genes involved in the Calvin Cycle demonstrated either the Aerobic or Low P patterns and are colored red and orange respectively. Genes grouped into the High Phosphorus pattern are colored in yellow. These include low affinity phosphate transporters. Abbreviations: GlyA, glycogen amylose; Gly, glycogen; ADP-Glu, Adenosine 5-Diphosphoglucose; Glu-1-p, Glucose 1-phosphate; Glu, Glucose; Glu-6-P, Glucose 6-phosphate; Fru-6-P, Fructose 6-phosphate; Fru-1-6P, Fructose 1,6bisphosphate; G3P, Glyceraldehyde 3-phosphate; 1,3-bPG, 1,3-Bisphosphoglyceric acid; 3-PG, 3Phosphoglyceric acid; 2-PG, 2-phosphoglycerate; PEP, phosphoenolpyruvate; Pyr, Pyruvate; Ac-CoA, Acyl-CoA; AcAc-CoA, Acetoacetyl-CoA; 3HB-CoA, (R)-3-Hydroxy-butanoyl-CoA; PE, Phosphatidylethanolamine; Ptd-L-Ser, Phosphatidylserine; CDPD, Cytidine Diphosphate Diacylglycerol; PGP - 1,2-Diacyl-sn-glycerol-3p; Long Chain FA, Long Chain Fatty Acid; FNR, NADPH-ferredoxin reductase; pntAB, proton-translocating transhydrogenase; C.IV, Complex IV oxidative phosphorylation; C.III, Complex III oxidative phosphorylation; C.II, Complex II oxidative phosphorylation; C.I, Complex I oxidative phosphorylation; PPP, pyrophosphate-energized proton pump; PolyP, polyphosphate; Ac-P, acetyl-P; Ac-AMP, Acetyl AMP; Ac, Acetate; Ac-CoA, acyl-CoA; E4-P, erythrose 4-phosphate; S7-P, sedoheptulose-7-phosphate; SBP, sedoheptulose 1,7-bisphosphate; Ri5-P, ribose 5-phosphate; Ru5P, ribulose 5-phosphate; X5P, xylulose 5-phosphate; Ri15P2, Ribulose 1,5P2; Glyc-P - Glycerone-P;

25 3.3.3 Differential transcript abundances across COG categories in a single EBPR cycle Numerous subsets of genes (transcripts) were identified in this investigation including highly expressed/dynamic genes (Supplementary Table S10) as well as Trend Category Q and DD (Supplementary Table S6), in which genes related to Energy Production and Conversion were enriched (p-values 5.7 e-26, 1.6 e-10, 1.9 e-13 and 2.3 e-08 respectively) (Figure 3-3A and 33B). Furthermore, in each of these gene subsets, Energy Production and Conversion represented the largest fraction of genes with predicted functions (Figure 3-3A and 3-B). Additional details are located in the Supplemental Material.

A

B

Functional enrichment of highly expressed and dynamic transcripts

Functional enrichment of trend category Q and DD

Number of Genes

0

C

COG Category

E M L P O H I G N U

10

20

30

50

*

*

*

40

60

Number of Genes

70

*

* Highly Expressed Highly Dynamic

COG Category Key

0

5

10

20

25

30

**

C

E M L P O H I G N U

15

* * *

Trend Category Q Trend Category DD

[C] Energy production and conversion [E] Amino acid transport and metabolism [M] Cell wall/membrane/ envelope biogenesis [L] Lipid transport and Metabolism [P] Inorganic Ion Transport and Metabolism [O] Posttranslational modification, protein turnover, chaperones [H] Coenzyme transport and metabolism [I] Lipid transport and metabolism [G] Carbohydrate transport and metabolism [N] Cell motility [U] Intracellular trafficking, secretion, and vesicular trans.

Figure 3-3: Bar plots of the number of genes from each COG category in various gene subsets. Stars indicate statistically significant enrichment of a COG category over the expected number given the background abundance of each COG category in the CAP2UW1 genome. Panel A: The top 350 most highly expressed and dynamic genes. Panel B: Trend Categories Q and DD.

26 3.3.4 Hydrogen gas production and glycine utilization in Accumulibacter Based on the transcriptional profile of hydrogenases and a glycine cleavage operon detected in this study, we hypothesized that hydrogen gas production would occur during anaerobic conditions after acetate addition and that glycine is a viable carbon source and would result in anaerobic P release. To test these hypotheses two sets of batch tests were conducted. Hydrogen gas production after acetate addition was measured and confirmed above background anaerobic hydrogen gas production levels (Figure 3-4A). Additionally, anaerobic glycine addition resulted in P release, albeit at a lower rate than achieved by acetate contact (Figure 3-4B).

Hydrogen Gas Production After Acetate Addition

800 700 600 500 400 300

Acetate Addition

200

B

100

P Release After Glycine Addition

70

Soluble Phosphorus (mg/L)

Hydrogen Gas Production (ppb)

A

60 50

Acetate

40

Glycine

30 20

Control

Acetate/ Glycine Addition

10 0

0 0

30

60 90 Time (minutes)

120

0

30 60 Time (minutes)

90

Figure 3-4: Hydrogen production and glycine uptake assays. Panel A: Hydrogen production assay demonstrating low background levels of anaerobic hydrogen production without any carbon addition. Acetate addition produces elevated hydrogen production. Hydrogen production after acetate addition may be due to the activity of a cytoplasmic hydrogen dehydrogenase restoring the NADH/NAD imbalance caused by glycogen degradation anaerobically. Panel B: Batch tests were conducted to test the viability of glycine as a carbon source for Accumulibacter. Phosphorus release after carbon contact was measured for acetate, glycine and a no carbon addition control. These results demonstrate that glycine addition stimulates phosphorus release and is therefore a viable carbon source for Accumulibacter.

3.3.5 Upstream Sequence Motif identification

27 In order to identify genes putatively co regulated by cis-regulatory elements an upstream motif analysis was conducted. A sequence motif was identified upstream of 51 sequences within Trend Category DD using MEME (each with a p-value 80% Clade IIA Accumulibacter abundance), soluble phosphate, total suspended solids, volatile suspended solids, and acetate were measured using previously described methods (Flowers et al., 2009). Additionally, PHA analysis was performed using a GC-MS as outlined previously (Comeau et al., 1988). Calcium, magnesium and potassium were analyzed using a VISTA-MPX CCD Simultaneous ICP-OES (Varian Ibérica SL, Madrid, Spain). Kinetic rates for acetate, soluble P, polyhydroxybutryate (PHB),

polyhydroxyvalerate

(PHV),

magnesium,

potassium,

and

calcium,

over

an

anaerobic/aerobic cycle were calculated based on linear rates of change observed for each analyte and were normalized to the VSS and Accumulibacter Clade IIA relative abundance.

60

Legend

Dechloromonas

Present

>4000 3000-3999 2000-2999 1000 500-999 250-499 400 250-399 100-249 99% of a Clade II sequences (Supplemental

64 Spreadsheet 7). Thus, Clade IIA dominated the community, accounting for 95-99% of total PAOs. Supplemental Table 1 shows the relative abundance during the dates of kinetic and stoichiometric investigation. Numerous previous investigations have measured key parameters of bulk PAOs, but these investigations have only sporadically included specific molecular identification of the dominant Accumulibacter clade being investigated (Welles et al., 2015), and only in very recent studies. Here, the average kinetic parameters and stoichiometric values of Clade IIA for acetate, PHB, P, magnesium and potassium were estimated based on results from high-enrichment cultures (Figures 4-5, Supplemental Spreadsheet 3). The calcium uptake/release and PHV synthesis/degradation measurements were negligible and are not reported. Anaerobic acetate uptake was measured at a rate of 4.8±0.8 C-mmol/gVSS*hr. Anaerobic PHB synthesis was higher than acetate uptake rates and aerobic PHB degradation (7.0±1 and 3.4±0.5 C-mmol/g VSS*hr, respectively). In contrast, the uptake and release of P (2.4±0.4 and 2.1±0.4 P-mmol/g VSS*hr, respectively), Mg (0.7±0.06 and 0.8±0.02 Mg-mmol/g VSS*hr, respectively), K (0.7±0.4 and 0.7±0.02 K-mmol/g VSS*hr, respectively) were relatively stable across both anaerobic and aerobic phases. Mg and K were the dominant counter-cations for PolyP with sum molar equivalents of approximately 1 (0.98 ± 0.008 P eq/Mg&K eq).

65

Figure 4-5: A simplified biochemical model and the measured kinetic and stoichiometric parameters for phosphorus, magnesium, potassium, acetate and polyhydroxybutyrate (PHB) of Accumulibacter Clade IIA. Calcium and polyhydroxyvalerate (PHV) were measured but showed negligible changes over an anaerobic/aerobic cycle.

4.3.4 Evolution of Accumulibacter metabolic pathways To determine the influence of genetic flux on metabolic pathways at the Accumulibacter LCA, genes annotated within KEGG pathways (Kanehisa et al., 2014) were parsed into ancestral, derived, flexible, and lineage-specific portions (Supplemental Spreadsheet 4). Similar delineation was conducted for and all inorganic ion transporters identified in the COG database (Tatusov et al., 2000). The KEGG categories of Carbohydrate metabolism, Lipid metabolism, Metabolism of other Amino Acids, and the COG category Inorganic Ion Transport and Metabolism showed the highest proportions of derived genes (Figure 4-6A). In contrast, the KEGG categories of Translation, Amino Acid Metabolism and Nucleotide Metabolism showed high proportions of ancestral genes (Figure 4-6A). Specific pathways also showed differential contributions from ancestral and derived genes. Within the broad KEGG category of Carbohydrate metabolism the

66 starch and sucrose, glycolysis/gluconeogenesis and pyruvate metabolic pathways had a high proportion of derived genes, whereas ancestral genes dominated the citric acid cycle (TCA cycle) and

glyoxylate/dicarboyxlate

pathways

(Figure

4-6B).

In

Lipid

Metabolism,

the

glycerophospholipids sub-category contained a high abundance of derived genes, whereas fatty acid degradation had a higher ancestral composition. For Inorganic Ion Transport and Metabolism, P, K, Mg and Fe all showed high proportions of derived genes, especially P (Figure 4-6B).

Translation

15

Propanoate

10

Phosphorus Amino sugar and nucleotide sugar Glycerophospholipid Taurine and hypotaurine

Glutathione

ag

ne s Po ium ta ss iu

m

C5-Branched dibasic acid Citrate cycle (TCA cycle)

Fructose and mannose Fatty acid biosynthesis

Glycerolipid

Fatty acid degradation Inorganic Ion Transport Lipid Metabolism Carbohydrate Metabolism Metabolism of other Amino Acids Other Kegg Maps

0

Pentose phosphate pathway

0.2

0.6

0

Proportion Ancestral

Glyoxylate and dicarboxylate

Butanoate

Iro n

Amino Acid Metabolism

Pyruvate

Starch/Sucrose

M

0.15

Nucleotide Metabolism

Glycolysis/ Gluconeogenesis

5

Inorganic Ion Transport

Carbohydrate Metabolism

Ancestral vs Derived Specific KEGG Pathways

Number of Derived or Laterally Derived

0.25

Lipid Metabolism

Number of Derived Genes

0.35

Metabolism of other amino acids

0.05

Proportion Derived or Laterally Derived

df_3$Derived[2:14]

Ancestral vs Derived Broad KEGG Maps

5

10

15

20

25

30

35

Number of Ancestral Genes

Figure 4-6: The contribution of ancestral and derived genes to Accumulibacter metabolism A. The contribution of ancestral and derived genes to broad KEGG maps and the COG categories involved in Inorganic ion transport and metabolism. B. The contribution of ancestral and derived genes to specific KEGG pathways and COG categories involved in specific inorganic ion transporters.

4.3.5 Phylogenetic analysis of derived genes

67 Determination of orthologous gene clusters is conditional on the BLAST and MCL parameters chosen. Strict parameters (e.g. high percent identity and coverage requirements) will increase the number of clusters identified, potentially splitting true clusters that have diverged sufficiently through mutation. In contrast, loose parameters will result in grouping of potentially nonorthologous clusters. To address these concerns, we used relatively strict parameters and then manually differentiated between derived genes that likely arose through sufficient accumulation of mutations and those that arose through HGT. To do so, we conducted a phylogenetic analysis for each of the 238 derived genes involved in KEGG pathways or in the COG category Inorganic Ion Transport and Metabolism. The average number of non-Accumulibacter Rhodocyclaceae BLAST hits per gene was approximately 10%, with 135 genes being identified with fewer than this average (Supplemental Spreadsheet 5). Based on these results, a separate classification within the derived portion of the CAP2UW1 genome was distinguished as “laterally derived” genes. Figure 4-7 depicts an evolutionary model of Accumulibacter with color-coded ancestral, derived, laterally derived, flexible and lineage-specific genes. A sensitivity analysis demonstrated that if the threshold were lowered to 5% of the top 100 BLAST hits, 106 (79%) of these genes would be identified as having arisen by HGT, including 82% of the genes in the evolutionary model (Supplemental Spreadsheet 5). One of the salient features of this evolutionary model is the abundance of laterally acquired genes involved in the distinctive carbon metabolism of the PAO phenotype, including: glycogen degradation (CAP2UW1_0254, CAP2UW1_0255, CAP2UW1_2663), glycolysis (CAP2UW1_2124-2127, CAP2UW1_3196,

CAP2UW1_2662,

CAP2UW1_0487,

CAP2UW1_2666,

CAP2UW1_1890),

PHB

CAP2UW1_2669,

metabolism

(phaC

-

CAP2UW1_0143, CAP2UW1_3185, CAP2UW1_3191), pyruvate ferredoxin oxidoreductase

68 (PFOR - CAP2UW1_2510-2512) and acetate activation to acetyl-CoA (CAP2UW1_1515, CAP2UW1_2035). Another prominent laterally derived set of genes is P transport (PHO4 CAP2UW1_3785, CAP2UW1_3788) and regulation (phoR - CAP2UW1_1995; phoBCAP2UW1_1996; phoR/phoB -CAP2UW1_1997; phoU - CAP2UW1_3786, CAP2UW1_3787, CAP2UW1_3789). Additional derived transporters arising from HGT included magnesium transport (corA - CAP2UW1_3581, CAP2UW1_2797) and ferrous iron transport (FeoA CAP2UW1_0420; FeoB - CAP2UW1_0421, CAP2UW1_3321). Other notable HGT include genes

involved

(CAP2UW1_4179

in -

energy

metabolism,

CAP2UW1_4180)

such

and

as

NADP/NADPH

cytochrome-c

oxidase

transhydrogenase (CAP2UW1_1790,

CAP2UW1_1791). Finally, laterally derived genes were also identified to be involved in regulation and signaling, including two-component redox signaling (RegB - CAP2UW1_0008, RegA - CAP2UW1_0009) (Figure 4-7, see Supplemental Figure 4 for evolutionary model with locus tags). A prominent absence of laterally derived genes is seen in both the TCA cycle and in polyP metabolism.

69 Denitrification

FNR

ABC transporters

Periplasmic Hydrogenase

PntAB

a-Man-1p GlyA

Zinc

NOS NIR NOR

a-D-Glu

Sulfate Spermidine/ Putricine phosphonate

Calvin Cycle E4-P

Glu/Asp phospholipid

Glyc-P

Branched-chain amino acids

Glycolysis

X5P

Nickel

Pi

PolyP

Pi

Butanoate/PHB Metabolism

Lipooligosacharide

PhoU

Osmotic Upshift (K+) K+ Limitation Copper Ions Stress Nitrogen Availability Low

3HB-CoA

1,3-bPG

C4-Dicarboxylate Redox Signal

3-PG

AcAc-CoA

Pyr

Ac-CoA

Ac-P Ac-AMP

Oxaloacetate

Phosphate (PST)

Fumarate

Ancestral Derived Laterally Derived Lineage Specific Flexible

}

Core Genome

Glyoxylate

Succinate Succinyl-CoA

Cobalt

Sulfate Potassium K+:H+ antiporter

Citrate

Malate

Ac

Iron Permease Acetate

cis-Aconitate

Legend

Electrochemical potential driven transporters

Glycine

Heme

TCA Cycle

Two Component Systems Phosphate Limitation

Soluble Hydrogenase

PEP

Lipoprotein

H+

Mg

PHB

G3P

Ru5-P

proton pumping Pyrophosphatase PiT

AMP ADP ATP

2-PG

Tungstate

ATP

F-type ATPase

Iron(II)

Fru-1-6P

Ri1,5P2 CO2

Cobalt

Pi+ADP

Fru-6-P

Ri5-P

Lipopolysacharide

GTP

Glu-6-p

SBP

Urea

Iron(III)

Glu-1-p

C. I

C. II

PPi

GDP

S7-P

Cell divisition

C. III

PolyP Metabolism

Glycogen

ADPGlu Glu

C. IV

NAP

Iron Complex Molybdate

Oxidative Phosphorylation

Isocitrate

Glutathione-regulated K-efflux

C4-dicarboxylate

An C4-dicarboxylate Benzoate Lysine exporter

a-Ketoglutarate

Lactate permease

MethylMalonyl-CoA

Figure 4-7: An evolutionary model of CAP2UW1 depicting ancestral, laterally derived, flexible and lineage specific genes. Abberviations: Ac, acetate; AcAc-CoA, acetoacetyl-CoA; Ac-CoA, acyl-CoA; AcAMP, acetyl AMP; Ac-P, acetyl-P; ADP-Glu, adenosine 5-diphosphoglucose; CDPD, cytidine diphosphate diacylglycerol; C.I, complex I oxidative phosphorylation; C.II, complex II oxidative phosphorylation; C.III, complex III oxidative phosphorylation; C.IV, complex IV oxidative phosphorylation; E4-P, erythrose 4-phosphate; FNR, NADPH-ferredoxin reductase; Fru-1-6P, fructose 1,6-bisphosphate; Fru-6-P, fructose 6-phosphate; G3P, glyceraldehyde 3-phosphate; Glu, glucose; Glu-1p, glucose 1-phosphate; Glu-6-P, glucose 6-phosphate; Gly, glycogen; GlyA, glycogen amylose; Glyc-P, glycerone-P; Long Chain FA, long chain fatty acid; PE, phosphatidylethanolamine; PEP, phosphoenolpyruvate; PGP, 1,2-diacyl-sn-glycerol-3p; pntAB, proton-translocating transhydrogenase; PolyP, polyphosphate; PPP, pyrophosphate-energized proton pump; Ptd-L-Ser, phosphatidylserine; Pyr, pyruvate; 1,3-bPG, 1,3-bisphosphoglyceric acid; Ri15P2, ribulose 1,5P2; Ri5-P, ribose 5-phosphate; Ru5P, ribulose 5-phosphate; S7-P, sedoheptulose-7-phosphate; SBP, sedoheptulose 1,7-bisphosphate;

70 X5P, xylulose 5-phosphate; 3HB-CoA, (R)-3-hydroxy-butanoyl-CoA; 2-PG, 2-phosphoglycerate; 3-PG, 3-phosphoglyceric acid

71 4.3.6 Expression profiles of laterally derived genes Recent metatranscriptomic investigations resulted in the identification of co-expressed gene clusters and of highly expressed genes in CAP2UW1 (Oyserman et al., 2015). Of the 135 putative HGT genes within the derived genome that have KEGG functional annotations, 31 genes were highly expressed. These included glycogen degradation (CAP2UW1_0255, CAP2UW1_2663), glycolysis (CAP2UW1_2124, CAP2UW1_2126-2127, CAP2UW1_2662, CAP2UW1_2666, CAP2UW1_3196, CAP2UW1_0487), PHB metabolism (CAP2UW1_3185, CAP2UW1_3191), pyruvate ferredoxin oxidoreductase (PFOR - CAP2UW1_2510-2512), ferrous iron transport (FeoA - CAP2UW1_0420) and NADP/NADPH transhydrogenase (PntAB - CAP2UW1_4179 - CAP2UW1_4180) (Supplemental Spreadsheet 4, Sheet 1, Column J). Furthermore, the metatranscriptomic analysis demonstrated that of the 135 laterally derived genes identified in this study, 114 displayed co-expression patterns related to known environmental variables such as anaerobic acetate contact, including PhaC (CAP2UW1_3191) within the PHA synthesis modulon (Oyserman et al., 2015, also see Supplemental Spreadsheet 5, Sheet 2, Column Q). 4.4 Discussion The transition from non-PAO to PAO, hypothesized to have occurred at the Accumulibacter LCA, was accompanied by significant molecular evolution in key carbon pathways, transporters, energy metabolism, and regulatory elements. The changes in these pathways ranged from considerable, such as in glycolysis, to nearly no change at all such as in the TCA cycle (Figure 46 A and B). Below we provide a detailed discussion of key laterally derived genes in the context of known aspects of PAO metabolism and the measured stoichiometry/kinetics of

72 Accumulibacter Clade IIA identified in this study. Additionally, we incorporate previous metatranscriptomic analyses (Oyserman et al., 2015) to postulate the relative importance of these derived genes in optimizing and linking key pathways in the Accumulibacter-type PAO phenotype. Finally, we discuss the broader implications of how these findings will change the search for additional PAO. 4.4.1 Acetate activation The primary route for carbon acquisition in Accumulibacter is through the anaerobic uptake of volatile fatty acids, such as acetate, and the subsequent synthesis of the storage polymer PHA. After anaerobic acetate contact, acetate is transported into the cell via both passive and active transport (Burow, Mabbett, McEwan, et al., 2008; Saunders et al., 2007) and activated to acetylCoA (Figure 4-4 and 4-5). The activation of acetyl-CoA occurs either through acetyl-P or acetylAMP intermediates. The primary route for the activation of acetate is currently unknown, however higher relative expression of genes involved in acetyl-CoA synthetase suggest that the primary route is via acetyl-AMP (Oyserman et al., 2015). While no laterally derived acetate transporters were identified, both routes for acetate activation contain laterally derived genes (CAP2UW1_1515 and CAP2UW1_2035) (Figure 4-7). Numerous copies of acetyl-CoA synthetase are found in the CAP2UW1 genome, including flexible (CAP2UW1_1069, CAP2UW1_2247, CAP2UW1_3266) and an ancestral gene (CAP2UW1_3755). Of these, the laterally derived gene had the lowest transcription rates while the ancestral copy (CAP2UW1_3755) was one of the most highly expressed genes in the CAP2UW1 genome (Oyserman et al., 2015). In contrast, no redundant copies for acylphosphatase are annotated in the CAP2UW1 genome aside from the laterally derived gene (CAP2UW1_1515) and this gene is

73 also not highly expressed (Oyserman et al., 2015). This analysis suggests that despite containing laterally derived genes, the evolution of acetate activation at the Accumulibacter LCA may not have contributed substantially to transitioning from non-PAO to PAO. 4.4.2 PHB synthesis Once acetate has been transported into the cell and activated to acetyl-CoA, it enters the PHB synthesis pathway. The synthesis of PHB (7 C-mmol/(g VSS*hr)) in Accumulibacter Clade IIA occurs at twice the rate of the degradation (3.4 C-mmol/(g VSS*hr)) and is also greater than the acetate uptake rate (4.8 C-mol/gVSS-hr) (Figure 4-5 and Supplemental Figure 4-5). The kinetic disparity between PHA synthesis, degradation and acetate uptake is due to the additional intracellular flux of carbon from anaerobic glycogen degradation via pyruvate, acetyl-CoA and finally to PHB. Together, these kinetic parameters suggest that a strong evolutionary pressure for rapid PHB synthesis exists. Of the three enzymes in the PHA synthesis pathway (PhaA, PhaB and PhaC), only PhaC contains laterally derived genes. Of the four copies of the PhaC gene in the CAP2UW1 genome, three of these are laterally derived (CAP2UW1_0143, CAP2UW1_3191 and CAP2UW1_3185) and two are among the most highly transcribed genes in CAP2UW1 (CAP2UW1_3191 and CAP2UW1_3185). Additionally, CAP2UW1_3191 is co-expressed with a predicted PHA modulon controlled by the ancestral core regulatory protein phaR (CAP2UW1_3918) (Oyserman et al., 2015). Thus, in contrast to the activation of the acetate to acetyl-CoA, the polymerization of 3hydroxybutyryl-CoA to PHB is likely to occur primarily through laterally derived genes, suggesting that evolution of PHB metabolism in Accumulibacter was significant in transitioning from non-PAO to PAO. It is noteworthy that the laterally derived PhaC genes represent both

74 class I and III PHA synthase (CAP2UW1_3191 and CAP2UW1_3185 respectively) (Rehm, 2003; Yuan et al., 2001) and that these genes were highly expressed and showed dissimilar expression profiles from each other (Oyserman et al., 2015). The dissimilar expression profiles of related but functionally divergent PhaC suggests these genes contribute differentially to the PAO metabolism of Accumulibacter, however more research is required to make such a conclusion. Regardless, dose effect (e.g. numerous copies of PhaC) has been shown to increase PHA synthesis capabilities (Maehara et al., 1998). 4.4.3 Anaerobic Reducing Equivalents: Glycolysis, Glycogen Degradation and PntAB Anaerobic PHB synthesis requires both ATP and reducing equivalents. One strategy used by Accumulibacter to meet this demand is to use stored glycogen (Schuler & Jenkins, 1994). As noted earlier, a striking number of genes involved in glycogen degradation (starch/sucrose metabolism) and glycolysis are laterally derived genes (Figure 4-6 and 4-7). These include glycogen degradation via glucose phosphorylase (CAP2UW1_0255, CAP2UW1_2663), glucose6-phosphate isomerase (CAP2UW1_2124), fructose-bisphosphate aldolase (CAP2UW1_2669, CAP2UW1_3196), phosphoglycerate kinase (CAP2UW1_0487), phosphopyruvate hydratase (CAP2UW1_2666), pyruvate kinase (CAP2UW1_1890) (Figure 4-7). Although glycolysis produces reducing equivalents in the form of NADH, NADPH is generally required for PHB synthesis (Peoples & Sinskey, 1989; Steinbüchel et al., 1993; Madison & Huisman, 1999; Kim et al., 2014). A recent investigation demonstrating hydrogen gas production during anaerobic acetate contact in Accumulibacter enriched bioreactors suggests the regeneration of NAD+ may represent a bottleneck in PAO metabolism that is alleviated through hydrogenase activity (Oyserman et al., 2015). Furthermore, metatranscriptomic evidence

75 from this same study suggests that the demand for the conversion of NADH to NADPH is met by the NADPH/NADH transhydrogenase PntAB (CAP2UW1_4179, CAP2UW1_4180; Oyserman et al., 2015). While the hydrogenases are ancestral (CAP2UW1_0999, CAP2UW1_2286), interestingly, both complexes of PntAB are laterally derived. Furthermore, these complexes are highly expressed, as well as many of the laterally derived genes involved in glycogen degradation and glycolysis (CAP2UW1_2124, CAP2UW1_2126, CAP2UW1_2127, CAP2UW1_2662, CAP2UW1_2663, CAP2UW1_2666 , CAP2UW1_0255, CAP2UW1_3196, CAP2UW1_0487, CAP2UW1_1890, CAP2UW1_4179, CAP2UW1_4180; Oyserman et al., 2015). Together, this evidence suggests that considerable selective pressures to optimize the production of reducing equivalents in the form of NADPH via glycogen degradation, glycolysis and the activity of NADPH/NADH transhydrogenase existed at the LCA of Accumulibacter and is an important adaptation for the storing PHA anaerobically. 4.4.4 Pyruvate Metabolism Anaerobic glycogen degradation provides both ATP and NADH, but also produces abundant pyruvate that must be converted to PHB via acetyl-CoA. In general, two complexes exist that may convert pyruvate to acetyl-CoA, pyruvate-ferredoxin oxidoreductase (PFOR) and pyruvate dehydrogenase (PDH). These multi-enzyme complexes differ in that PFOR uses ferredoxin and is often coupled with hydrogen production (Chabrière et al., 1999), while PDH uses NAD+ and is inhibited by high levels of NADH (Snoep et al., 1993). Both of these complexes in CAP2UW1 are highly expressed and form separate operons (PFOR, CAP2UW1_2510-CAP2UW1_2512; pyruvate dehydrogenase CAP2UW1_1838-CAP2UW1_1840). However, because PFOR is the primary route from pyruvate to acetyl-CoA under NADH rich conditions (Patel & Roche, 1990;

76 Blamey & Adams, 1993; Townson et al., 1996), it likely fills this role in Accumulibacter PAO metabolism, contributing to the hydrogen gas production recently reported (Oyserman et al., 2015). Interestingly, the PFOR operon in Accumulibacter is composed of laterally derived genes (Figure 4-7). Thus, the kinetic, evolutionary and transcriptional data all suggest that the ability to efficiently shunt pyruvate to PHB via acetyl-CoA anaerobically is an essential adaptation for the Accumulibacter-type PAO phenotype, without which a build up of pyruvate would likely inhibit glycogen degradation and stall the anaerobic metabolism of Accumulibacter. 4.4.5 Phosphorus and Counter Cation Transport PolyP is a source of ATP in anaerobic PAO metabolism (Comeau et al., 1986). Thus, one of the key metabolic processes in Accumulibacter is the degradation and synthesis of polyP. Transport of P into and out of the cell must accompany the degradation and synthesis of polyP, as well as the transport of counter cations that are used to balance the negative charge of phosphate. Indeed, the stoichiometric analysis in this investigation demonstrates that P transport of Accumulibacter is linked to the counter cations magnesium and potassium at a nearly 1:1 molar equivalent ratio (Figure 4-5 and Supplemental Figure 5). Despite the obvious linkage between polyP metabolism and the transport of P, Mg and K, the evolutionary histories of these genes differ significantly. The polyP metabolism of Accumulibacter is ancestral, whereas many of the transporters involved in P (Pit CAP2UW1_3785, CAP2UW1_3788; PstS, CAP2UW1_1747 PstB, CAP2UW1_1751-1752

PstC

CAP2UW1_1749)

and

magnesium

transport

(corA

CAP2UW1_3581, CAP2UW1_2797) are laterally derived genes. The kinetic/stoichiometric and evolutionary data presented here suggests that an increased capability to transport P and counter cations such as Mg was an important adaptation at the Accumulibacter LCA, supporting and

77 expanding upon previous hypotheses that inorganic P transporters may be absolutely required for the Accumulibacter-PAO phenotype (Saunders et al., 2007; Kristiansen et al., 2013; Nobu et al., 2014). 4.4.6 Ferrous Iron Transport Iron is an essential co-factor in many enzymes, and bacteria have evolved many diverse strategies for the transport and acquisition of iron from the environment (Andrews et al., 2003; Wandersman & Delepelaire, 2004). When reducing (i.e., anaerobic) environmental conditions prevail, ferrous iron predominates over ferric iron. Under these conditions, ferrous iron transport using the Feo pathway is favored over alternative ferric transporter mechanisms, such as siderophores (Cartron et al., 2006). The Feo system was laterally acquired at the Accumulibacter LCA suggesting that anaerobic demand for iron-containing enzymes, such as by the highly expressed PFOR and hydrogenases, is an important adaptation for the Accumulibacter-type PAO phenotype. 4.4.7 Signaling and Regulation It has been demonstrated that Accumulibacter transcriptionally regulates genes correlating with carbon, P, and oxygen availability (Oyserman et al., 2015). In order to accurately respond to such environmental cues, bacteria rely primarily upon two-component systems (Chang 1998). Furthermore, HGT of two-component systems is an important mechanisms for niche adaptation, reflecting the selective pressures of the environment (Alm et al., 2006). In Accumulibacter, both phosphate

limitation

(PhoR

CAP2UW1_1995,

PhoB

CAP2UW1_1996,

PhoR-PhoB

CAP2UW1_1997) and redox signaling (RegB CAP2UW1_0008, RegA CAP2UW1_0009) twocomponent systems are laterally derived at the LCA. While it is difficult to surmise what specific

78 genes may be under control of these two-component systems without additional molecular evidence, metatranscriptomic analysis identified many co-expressed genes responding to aerobic (1,844) and low P (438) conditions (Supplemental Spreadsheet 5; Oyserman et al., 2015), which may be good candidates for further study in this regard. In addition to the evolution of novel regulatory mechanisms in Accumulibacter, it is also possible for genes to integrate into existing regulatory networks; albeit this process often occurs slowly, with both recent and ancient laterally acquired genes generally showing lower degrees of co-expression than non-laterally transferred counterparts (Lercher & Pál, 2008). Currently, one of the most well examined aspects of the Accumulibacter regulatory network is a putative PHA regulon likely controlled by the ancestral core regulatory protein (CAP2UW1_3918) (Oyserman et al., 2015). A key gene proposed to be in this regulon, a type III PhaC, is laterally derived providing evidence that laterally derived core genes integrated into existing ancestral regulatory networks. Thus, evolution of the regulatory networks through novel P and redox signaling, as well as through the integration of novel genes into existing regulatory networks such as the PHA regulon, likely contributed to the evolution of the PAO phenotype in Accumulibacter. 4.4.8 Uncertainty in Reconstructions and Future Work The analysis on Accumulibacter evolution was conducted within the constraints of our current knowledge into the phenotypic and genotypic diversity within the Rhodocyclaceae. We included all closely related, publically available, completed genomes (aside from the Accumulibacter genomes) at the time of the start of this analysis. Our understanding of the evolutionary and genomic capabilities of many lineages is continuously being re-written as the available data on a lineage increases. For example, recent investigations have expanded upon the definition of the

79 Cyanobacteria phylum is based on new genomic information (Soo et al., 2014). One of the key uncertainties in our analysis is a lack of closely related non-Accumulibacter Rhodocyclaceae genomes that have been reconstructed from EBPR systems (e.g. from Dechloromonas spp.). Additionally, it remains difficult to distinguish ancient HGT events, especially if they are obfuscated by multiple gains and losses. Future discoveries may expand the diversity of Rhodocyclaceae involved in EBPR, either blurring or clarifying the delineation between PAO and non-PAO. 4.5 Conclusion Here we report the first evolutionary study on the PAO phenotype through ancestral genome reconstructions, identification of HGT and chemical characterization. Through this analysis, we identified important metabolic transformations that occurred in the Accumulibacter LCA, where the transition from non-PAO to PAO is hypothesized to have occurred. Prominent lateral acquisitions include numerous genes involved in glycogen degradation, glycolysis, pyruvate metabolism and PHB pathways, as well as regulatory and sensory mechanisms involved in redox and P metabolism. In contrast, the TCA cycle and polyP metabolism are composed almost entirely of ancestral genes present before the Accumulibacter LCA. The molecular evolution that occurred in these pathways was likely necessary to overcome key stoichiometric and kinetic bottlenecks identified in PAO metabolism; specifically anaerobic carbon flux from glycogen to PHA via PFOR, P and counter cation transporters to maintain polyP synthesis, and anaerobic NADPH production from NADH via PntAB. Convergent evolution often occurs when nonrelated organisms under similar selective pressures independently evolve similar adaptations. Based on this assumption, the molecular evolution that occurred at the Accumulibacter LCA is

80 likely representative of the general adaptations necessary for the Accumulibacter-type PAO phenotype to emerge. This analysis demonstrates the significance of differentiating the core genome of a lineage into ancestral and derived states when investigating a complex and phylogenetically cohesive phenotype. 4.6 Competing Interests The authors declare they have no competing interests. 4.7 Acknowledgements The authors would like to thank Shaomei He, Sarah Stevens, Joshua Hamilton and Pamela Camejo for friendly review.

KDM acknowledges funding from the US National Science

Foundation (CBET-0967646 and MCB-1518130) and the UW-Madison Graduate School. The work described here would not have been possible without the ongoing support of scientists and programs at the US Department of Energy Joint Genome Institute.

81 4.8 Literature Cited 1.

Alm E, Huang K, Arkin A. (2006). The evolution of two-component systems in bacteria

reveals different strategies for niche adaptation. PLoS Comput Biol 2:1329–1342. 2.

Altschul S, Gish W, Miller W. (1990). Basic Local Alignment Search Tool. J Mol Biol

215:403–410. 3.

Andrews SC, Robinson AK, Rodríguez-Quiñones F. (2003). Bacterial iron homeostasis.

FEMS Microbiol Rev 27:215–237. 4.

Blamey JM, Adams MWW. (1993). Purification and characterization of pyruvate

ferrodoxin oxidoreductase from the hyperthermophilic archaeon Pyrococcus furiosus. Appl Environ Microbiol 1161:19–27. 5.

Burow LC, Mabbett AN, McEwan AG, Bond PL, Blackall LL. (2008). Bioenergetic

models for acetate and phosphate transport in bacteria important in enhanced biological phosphorus removal. Environ Microbiol 10:87–98. 6.

Cartron ML, Maddocks S, Gillingham P, Craven CJ, Andrews SC. (2006). Feo -

Transport of ferrous iron into bacteria. BioMetals 19:143–157. 7.

Castresana J. (2000). Selection of conserved blocks from multiple alignments for their

use in phylogenetic analysis. Mol Biol Evol 17:540–552. 8.

Chabrière E, Charon MH, Volbeda a, Pieulle L, Hatchikian EC, Fontecilla-Camps JC.

(1999). Crystal structures of the key anaerobic enzyme pyruvate:ferredoxin oxidoreductase, free and in complex with pyruvate. Nat Struct Biol 6:182–90. 9.

Chan AP, Sutton G, DePew J, Krishnakumar R, Choi Y, Huang X-Z, et al. (2015). A

novel method of consensus pan-chromosome assembly and large-scale comparative analysis reveal the highly flexible pan-genome of Acinetobacter baumannii. Genome Biol 16:143.

82 10.

Comeau Y, Hall KJ, Hancock REW, Oldham WK. (1986). BIOCHEMICAL MODEL

FOR ENHANCED BIOLOGICAL PHOSPHORUS REMOVAL. Water Res 20:1511–1521. 11.

Comeau Y, Hall KJ, Oldham WK. (1988). Determination of Poly-3-Hydroxybutyrate and

Poly-3-Hydroxyvalerate in Activated Sludge by Gas-Liquid Chromatography. Appl Environ Microbiol 54:2325–2327. 12.

Connell JH. (1980). Diversity and the Coevolution of Competitors, or the Ghost of

Competition Past. Oikos 35:131–138. 13.

Crocetti GR, Hugenholtz P, Bond PL, Schuler a, Keller J, Jenkins D, et al. (2000).

Identification of polyphosphate-accumulating organisms and design of 16S rRNA-directed probes for their detection and quantitation. Appl Environ Microbiol 66:1175–82. 14.

Csurös M. (2010). Count: evolutionary analysis of phylogenetic profiles with parsimony

and likelihood. Bioinformatics 26:1910–2. 15.

van Dongen SM. (2000). Graph clustering by flow simulation. University of Utrecht

doi:10.1016/j.cosrev.2007.05.001. 16.

Flowers JJ, He S, Malfatti S, del Rio TG, Tringe SG, Hugenholtz P, et al. (2013).

Comparative genomics of two ‘Candidatus Accumulibacter’ clades performing biological phosphorus removal. ISME J 7:2301–14. 17.

Flowers JJ, He S, Yilmaz S, Noguera DR, McMahon KD. (2009). Denitrification

capabilities of two biological phosphorus removal sludges dominated by different ‘Candidatus Accumulibacter’ clades. Environ Microbiol Rep 1:583–588. 18.

García Martín H, Ivanova N, Kunin V, Warnecke F, Barry KW, McHardy AC, et al.

(2006). Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge communities. Nat Biotechnol 24:1263–9.

83 19.

Ghylin TW, Garcia SL, Moya F, Oyserman BO, Schwientek P, Forest KT, et al. (2014).

Comparative single-cell genomics reveals potential ecological niches for the freshwater acI Actinobacteria lineage. ISME J 8:2503–16. 20.

Grillo JFJG. (1979). Regulation of phosphate accumulation in the uni- cellular

cyanobacterium Synechococcus. J Bacteriol 140(2)508-517 140:508–517. 21.

Hacker J, Carniel E. (2001). Ecological fitness, genomic islands and bacterial

pathogenicity. A Darwinian view of the evolution of microbes. EMBO Rep 2:376–81. 22.

Hao W, Golding GB. (2006). The fate of laterally transferred genes: Life in the fast lane

to adaptation or death. Genome Res 16:636–643. 23.

He S, Gall DL, McMahon KD. (2007). ‘Candidatus Accumulibacter’ population structure

in enhanced biological phosphorus removal sludges as revealed by polyphosphate kinase genes. Appl Environ Microbiol 73:5865–74. 24.

He S, Gu AZ, McMahon KD. (2008). Progress toward understanding the distribution of

Accumulibacter among full-scale enhanced biological phosphorus removal systems. Microb Ecol 55:229–36. 25.

Jendrossek D. (2009). Polyhydroxyalkanoate granules are complex subcellular organelles

(carbonosomes). J Bacteriol 191:3195–202. 26.

Kanehisa M, Goto S, Sato Y, Kawashima M, Furumichi M, Tanabe M. (2014). Data,

information, knowledge and principle: Back to metabolism in KEGG. Nucleic Acids Res 42:199–205. 27.

Katoh K, Standley DM. (2013). MAFFT multiple sequence alignment software version 7:

improvements in performance and usability. Mol Biol Evol 30:772–80.

84 28.

Kettler GC, Martiny AC, Huang K, Zucker J, Coleman ML, Rodrigue S, et al. (2007).

Patterns and implications of gene gain and loss in the evolution of Prochlorococcus. PLoS Genet 3:e231. 29.

Kim J, Chang JH, Kim E-J, Kim K-J. (2014). Crystal structure of (R)-3-hydroxybutyryl-

CoA dehydrogenase PhaB from Ralstonia eutropha. Biochem Biophys Res Commun 443:783–8. 30.

Kong Y, Nielsen JL, Nielsen PH. (2005). Identity and Ecophysiology of Uncultured

Actinobacterial Polyphosphate-Accumulating Organisms in Full-Scale Enhanced Biological Phosphorus Removal Plants. Society 71:4076–4085. 31.

Kornberg A, Rao NN, Ault-riché D. (1999). INORGANIC POLYPHOSPHATE: A

MOLECULE OF MANY FUNCTIONS. Annu Rev Biochem 89–125. 32.

Kristiansen R, Nguyen HTT, Saunders AM, Nielsen JL, Wimmer R, Le VQ, et al. (2013).

A metabolic model for members of the genus Tetrasphaera involved in enhanced biological phosphorus removal. ISME J 7:543–54. 33.

Larsson J, Nylander JA, Bergman B. (2011). Genome fluctuations in cyanobacteria

reflect evolutionary, developmental and adaptive traits. BMC Evol Biol 11:187. 34.

Latysheva N, Junker VL, Palmer WJ, Codd G a., Barker D. (2012). The evolution of

nitrogen fixation in cyanobacteria. Bioinformatics 28:603–606. 35.

Lefébure T, Stanhope MJ. (2007). Evolution of the core and pan-genome of

Streptococcus: positive selection, recombination, and genome composition. Genome Biol 8:R71. 36.

Lercher MJ, Pál C. (2008). Integration of horizontally transferred genes into regulatory

interaction networks takes many million years. Mol Biol Evol 25:559–567. 37.

Madison

LL,

Huisman

GW.

(1999).

Metabolic

engineering

hydroxyalkanoates): from DNA to plastic. Microbiol Mol Biol Rev 63:21–53.

of

poly(3-

85 38.

Maehara A, Ikai K, Ueda S, Yamane T. (1998). Gene dosage effects on

polyhydroxyalkanoates synthesis from n-alcohols in Paracoccus denitrificans. Biotechnol Bioeng 60:61–69. 39.

Markowitz VM, Chen I-MA, Palaniappan K, Chu K, Szeto E, Grechkin Y, et al. (2012).

IMG: the integrated microbial genomes database and comparative analysis system. Nucleic Acids Res 40:D115–D122. 40.

Maszenan a M, Seviour RJ, Patel BK, Schumann P, Burghardt J, Tokiwa Y, et al.

(2000). Three isolates of novel polyphosphate-accumulating gram-positive cocci, obtained from activated sludge, belong to a new genus, Tetrasphaera gen. nov., and description of two new species, Tetrasphaera japonica sp. nov. and Tetrasphaera australiensis sp. no. Int J Syst Evol Microbiol 50:593–603. 41.

McMahon KD, Read EK. (2013). Microbial contributions to phosphorus cycling in

eutrophic lakes and wastewater. Annu Rev Microbiol 67:199–219. 42.

Mielczarek AT, Nguyen HTT, Nielsen JL, Nielsen PH. (2013). Population dynamics of

bacteria involved in enhanced biological phosphorus removal in Danish wastewater treatment plants. Water Res 47:1529–1544. 43.

Moreno-Hagelsieb G, Latimer K. (2008). Choosing BLAST options for better detection

of orthologs as reciprocal best hits. Bioinformatics 24:319–24. 44.

Nobu MK, Tamaki H, Kubota K, Liu WT. (2014). Metagenomic characterization of

‘CandidatusDefluviicoccus tetraformis strain TFO71’, a tetrad-forming organism, predominant in an anaerobic-aerobic membrane bioreactor with deteriorated biological phosphorus removal. Environ Microbiol 16:2739–2751.

86 45.

Nowell RW, Green S, Laue BE, Sharp PM. (2014). The extent of genome flux and its

role in the differentiation of bacterial lineages. Genome Biol Evol 6:1514–29. 46.

Ochman H, Lawrence JG, Groisman EA. (2000). Lateral gene transfer and the nature of

bacterial innovation. Nature 405:299–304. 47.

Ochman H, Lerat E, Daubin V. (2005). Examining bacterial species under the specter of

gene transfer and exchange. Proc Natl Acad Sci U S A 102 Suppl :6595–9. 48.

Oyserman BO, Noguera DR, Glavina Del Rio T, Tringe SG, Mcmahon KD. (2016).

Metatranscriptomic insights on gene expression and regulatory controls in Candidatus Accumulibacter phosphatis. ISME J 10:810–822. 49.

Ozer EA, Allen JP, Hauser AR. (2014). Characterization of the core and accessory

genomes of Pseudomonas aeruginosa using bioinformatic tools Spine and AGEnt. BMC Genomics 15:737. 50.

Pál C, Papp B, Lercher MJ. (2005). Adaptive evolution of bacterial metabolic networks

by horizontal gene transfer. Nat Genet 37:1372–1375. 51.

Patel MS, Roche TE. (1990). Molecular biology and biochemistry of pyruvate

dehydrogenase complexes. FASEB J 4:3224–33. 52.

Peoples OP, Sinskey AJ. (1989). Poly-P-hydroxybutyrate ( PHB ) Biosynthesis in

Alcaligenes eutrophus H16 IDENTIFICATION AND CHARACTERIZATION OF THE PHB POLYMERASE GENE (phbC). J Biol Chem 264:15298–15303. 53.

Polz MF, Alm EJ, Hanage WP. (2013). Horizontal gene transfer and the evolution of

bacterial and archaeal population structure. Trends Genet 29:170–5.

87 54.

Pruitt KD, Tatusova T, Maglott DR. (2007). NCBI reference sequences (RefSeq): A

curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35:61–65. 55.

Rao NN, Liu S, Kornberg A. (1998). Inorganic polyphosphate in Escherichia coli: The

phosphate regulon and the stringent response. J Bacteriol 180:2186–2193. 56.

Ravenhall M, Škunca N, Lassalle F, Dessimoz C. (2015). Inferring Horizontal Gene

Transfer. PLOS Comput Biol 11:e1004095. 57.

Rehm BHA. (2003). Polyester synthases  : natural catalysts for plastics. Biochemistry

376:15–33. 58.

Saunders AM, Mabbett AN, McEwan AG, Blackall LL. (2007). Proton motive force

generation from stored polymers for the uptake of acetate under anaerobic conditions. FEMS Microbiol Lett 274:245–51. 59.

Schluter D, Price T, Mooers AØ, Ludwig D. (1997). Likelihood of Ancestor States in

Adaptive Radiation. Evolution (N Y) 51:1699–1711. 60.

Schuler AJ, Jenkins D. (1994). Enhanced biological phosphorus removal from

wastewater by biomass with different phosphorus contents, Part I: Experimental results and comparison with metabolic models. Water Environ Res 75:485–98. 61.

Schuler AJ, Jenkins D. Enhanced biological phosphorus removal from wastewater by

biomass with different phosphorus contents, Part III: Anaerobic sources of reducing equivalents. Water Environ Res 75:512–22. 62.

Seviour RJ, Mino T, Onuki M. (2003). The microbiology of biological phosphorus

removal in activated sludge systems. FEMS Microbiol Rev 27:99–127.

88 63.

Skennerton CT, Barr JJ, Slater FR, Bond PL, Tyson GW. (2015). Expanding our view of

genomic diversity in Candidatus Accumulibacter clades. Environ Microbiol 17:1574–1585. 64.

Snoep JL, de Graef MR, Westphal a H, de Kok a, Teixeira de Mattos MJ, Neijssel OM.

(1993). Differences in sensitivity to NADH of purified pyruvate dehydrogenase complexes of Enterococcus faecalis, Lactococcus lactis, Azotobacter vinelandii and Escherichia coli: implications for their activity in vivo. FEMS Microbiol Lett 114:279–283. 65.

Soo RM, Skennerton CT, Sekiguchi Y, Imelfort M, Paech SJ, Dennis PG, et al. (2014).

An expanded genomic representation of the phylum Cyanobacteria. Genome Biol Evol 6:1031– 1045. 66.

Steinbüchel a, Hustede E, Liebergesell M, Pieper U, Timm a, Valentin H. (1993).

Molecular basis for biosynthesis and accumulation of polyhydroxyalkanoic acids in bacteria. FEMS Microbiol Rev 10:347–50. 67.

Tatusov RL, Galperin MY, Natale D a, Koonin E V. (2000). The COG database: a tool

for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 28:33–36. 68.

Touchon M, Hoede C, Tenaillon O, Barbe V, Baeriswyl S, Bidet P, et al. (2009).

Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet 5:e1000344. 69.

Townson SM, Upcroft A, Upcroft P. (1996). Characterisation and purification of

pyruvate:ferredoxin oxidoreductase from Giardia duodenalis. Mol Biochem Parasitol 79:183– 193. 70.

Wandersman C, Delepelaire P. (2004). Bacterial iron sources: from siderophores to

hemophores. Annu Rev Microbiol 58:611–647.

89 71.

Welles L, Tian WD, Saad S, Abbas B, Lopez-Vazquez CM, Hooijmans CM, et al.

(2015). Accumulibacter clades Type I and II performing kinetically different glycogenaccumulating organisms metabolisms for anaerobic substrate uptake. Water Res 83:354–366. 72.

Wilkinson JF. (1963). Carbon and Energy Storage in Bacteria. J Gen Microbiol 32:171–

176. 73.

Yuan W, Jia Y, Tian J, Snell KD, Müh U, Sinskey a J, et al. (2001). Class I and III

polyhydroxyalkanoate synthases from Ralstonia eutropha and Allochromatium vinosum: characterization and substrate specificity studies. Arch Biochem Biophys 394:87–98. 74.

Zaremba-Niedzwiedzka K, Viklund J, Zhao W, Ast J, Sczyrba A, Woyke T, et al. (2013).

Single-cell genomics reveal low recombination frequencies in freshwater bacteria of the SAR11 clade. Genome Biol 14:R130. 75.

Zhang F, Blasiak LC, Karolin JO, Powell RJ, Geddes CD, Hill RT. (2015). Phosphorus

sequestration in the form of polyphosphate by microbial symbionts in marine sponges. Proc Natl Acad Sci 112:4381–4386. 76.

Zilles JL, Peccia J, Kim M, Hung C, Noguera DR. (2002). Involvement of Rhodocyclus-

Related Organisms in Phosphorus Removal in Full-Scale Wastewater Treatment Plants. Society 68:2763–2769.

90

Chapter 5 Congruent transcriptional responses identify diverse lineages of polymer cycling organisms

Ben O. Oyserman1*, Daniel R. Noguera1, Katherine D. McMahon1, 2

1

Department of Civil and Environmental Engineering, University of Wisconsin at Madison,

Madison, WI, 53706, USA; 2Department of Bacteriology, University of Wisconsin at Madison, Madison, WI, 53706, USA

*

Corresponding author

Citation: This is a draft manuscript to be submitted in the near future.

Author Contributions: BOO conceived of and designed the analysis including all bioinformatics and statistics. BOO wrote the manuscript with feedback from DRN and KDM.

91 5. Abstract Determining the traits of uncultivated microorganisms is one of the challenges of modern microbiology. The ‘omics revolution’ has provided an unprecedented amount of sequence space from which traits may be inferred and deduced. However, many traits are not linked with molecular markers with complete fidelity. In these situations, genomic content alone cannot predict the presence of a trait. Here we present a comparative metatranscriptomic method to identify traits in unknown organisms. The method works by identifying a set of marker genes that are indicative of a trait in a model organism, and screening all unknown genomes against this marker set. Each unknown organism is scored based on the presence and relative expression patterns of each gene in the marker set. We then developed a statistical analysis to determine whether the scores for a trait within a genome, or set of genomes, were significantly greater than expected from a randomly generated subset. Using this relatively basic scoring system, we demonstrate that it is possible to differentiate between numerous polymer cycling phenotypes and between genomes with otherwise overlapping genomic content.

92 5.1 Introduction Metabolic traits are often recognized based on the presence or absence of key genes and pathways (Wang et al., 2014; Byrne-Bailey & Coates, 2012; Salinero et al., 2009). However, there are limitations to gene content-based classification of microbial traits, as not all traits are linked to unique molecular markers with complete fidelity. For example, some traits may exhibit situational dependency, and classification of such traits may be conditional to a specific environmental context. In these instances, an attempt to classify this trait using genomic content alone may result in a high number of false positives. The classification of complex traits that lack unique molecular markers and exhibit situational dependency may thus be ‘hidden’ from highthroughput genome scale identification. A common microbial trait that is challenging to classify from gene content alone is polymer cycling. When environmental conditions are sub-optimal, polymer storage is considered an adaptive bet-hedging strategy to conserve resources (Lennon & Jones, 2011). However, the environmental conditions that trigger polymer storage may differ between organisms. For example, in Escherichia coli, P and amino acid starvation may trigger polyphosphate (polyP) storage (Rao et al., 1998), but in the so-called polyP accumulating organisms (PAOs), polyP storage occurs even under extremely high levels of P, and is triggered by feast/famine cycles related to carbon and terminal electron acceptor availability (Seviour et al., 2003). A further complication is that polyP is not used solely as a storage molecule (Kornberg, 1995), but is rather an ancient and multifunctional molecule. Hence, the presence of genetic content for polyP metabolism is insufficient evidence of polymer cycling; nor is the presence of the genetic content indicative of the triggering environmental conditions, an important characteristic when describing polymer cycling traits.

93 The inability to classify complex, situationally dependent, traits such as feast/famine induced polymer cycling by genomic content alone is a blind spot in the high-throughput classification of microbial metabolic potential. One promising solution is integrating metatranscriptomics data, which may be used to distinguish transcriptional patterns indicative of particular traits, and the environmental conditions that induce them. Similar transcriptional patterns for specific traits have been shown to evolve independently numerous times (Gallant et al., 2014; Pfenning et al., 2014; Sommer et al., 2014), thus, the molecular mechanisms driving a trait in a well-characterized model organism may be used to identify congruent transcriptional responses (CTR) in unknown organisms. Therefore, for traits in which genetic content is insufficient for the classification of traits, comparative metatranscriptomics and the identification of CTR may provide a high-throughput method to identify specific traits and functional redundancy within a microbial ecosystem. Here we demonstrate that when genomic content provides insufficient molecular markers for rapidly screening the presence or absence of a phenotype, transcriptional patterns of model organisms may provide a practical tool for the high-throughput detection of other organisms that may possess these hidden traits. Specifically, we draw upon recent progresses in the molecular and evolutionary understanding of polymer metabolism in Candidatus Accumulibacter phosphatis (henceforth Accumulibacter) to develop a set of transcriptional markers indicative of polyP, PHA, and glycogen polymer cycling under the feast/famine conditions typically found in wastewater treatment plants. Using this set of transcriptional markers, we coupled time-series metagenomics and metatranscriptomics to screen 42 high-quality metagenome assembled draft genomes to identify organisms displaying a CTR for each trait of the model (Accumulibacter Clade IIA). Using this novel approach, we were able to distinguish between three distinct, and

94 biotechnologically relevant polymer cycling phenotypes: polyphosphate/PHA/glycogen, polyphosphate/PHA and polyphosphate/glycogen cycling. 5.2 Methods 5.2.1 Reactor operation Two reactors (R1 and R2) were operated under feast/famine conditions as previously described (García Martín et al., 2006; Oyserman, Noguera, et al., 2016). Briefly, each reactor was fed during an anaerobic period with a mineral media containing acetate as a primary carbon source (feast phase). The anaerobic environment was achieved through continuous nitrogen gas sparging, and through the exclusion of nitrification by allylthiourea addition. After an approximately 2 hour anaerobic phase, air was introduced into the reactor to create an aerobic environment in the absence of additional external carbon source (famine phase). The cycling of these phases promotes a community enriched in organisms capable of storing carbon anaerobically when no terminal electron acceptors are available, and storing P aerobically in a process commonly referred to as Enhanced Biological Phosphorus Removal (EBPR) (Seviour et al., 2003). 5.2.2 Genomic DNA extraction and sequencing Three DNA samples from the two reactors operated in parallel were sequenced. Two were from proximal dates to a time-series metatranscriptomic analysis, 5/13/2013 (R2) and 5/23/2013 (R1), and third was from the same day as the metatranscriptomic analysis 5/28/2013 (R2). Each sample was extracted using a phenol-chloroform extraction with subsequent RNase treatment. Quality of the DNA was verified with NanoDrop instrument (Thermo Scientific, Wilmington, MA), Qubit® dsDNA HS Assay Kit (Life Technologies, Carlsbad, California, USA) and through

95 gel electrophoresis (Supplemental Spreadsheet 1). The high quality DNA was shipped overnight on dry ice to the Joint Genome Institute (JGI) where sequencing was conducted using the Illumina HiSeq platform. For each date, 48647230, 46923296 and 62989752 paired end reads (2×300) were sequenced respectively and are available from the Joint Genome Institute, https://img.jgi.doe.gov/, under the following Taxon Object IDs 3300003765, 3300003770, 3300003842) (Supplemental Table 2). 5.2.3 Genome assembly, draft genome binning, quality control and annotation. Each metagenome was quality filtered using Sickle (github.com/najoshi/sickle) with a quality score of 20 and minimum length of 100 (Supplemental Spreadsheet 3). Using quality filtered reads, a total of five different assembly/binning combinations were implemented and the resulting bins compared using BLAST (Camacho et al., 2009) to identify a subset of unique high quality bins with the greatest completeness, the fewest contigs and with minimal contamination (Figure 5-1). First, individual assemblies for each date (5-13-2013, 5-23-2013, and 5-28-2013) and a co-assembly of all three metagenomes, was conducted in IDBA-UD with k-mer length iterations from 20-100 bp (Peng et al., 2012) (Supplemental Spreadsheet 3). The resulting contigs from each assembly were then binned using MaxBin (Wu et al., 2014). Contigs from the co-assembly were also binned using MetaBAT (Kang et al., 2015). Supplemental Spreadsheet 4 provides a summary of the outputs from Maxbin (4 set of bins) and MetaBAT (1 set of bins). All binning was done using the default parameters for both MaxBin and MetaBAT.

96

Assemble Metagenomes

Bin, QC, filter

Pairwise Comparison Bins_5_28_2013.025

Co-Assembly MetaBat/CheckM

Bins_5_23_2013.008

Co-Assembly MetaBat/CheckM

Metabat_bins.41

5/23/2013 Assembly MaxBin/CheckM

High Quality Unique Bins Bins_5_28_2013.025

Metabat_bins.15 bin.037 Bins_5_23_2013.030

bin.037 Bins_5_23_2013.030

Metabat_bins.76 bin.103 Bins_5_23_2013.011 Metabat_bins.20

Metabat_bins.20

Bins_5_28_2013.017 bin.043

5/28/2013 Assembly MaxBin/CheckM 5/13/2013 Assembly MaxBin/CheckM

bin.026 Bins_5_28_2013.022

Bins_5_28_2013.022

Bins_5_13_2013.019 Metabat_bins.11 Bins_5_13_2013.014 Bins_5_28_2013.010 bin.020 Metabat_bins.21

Metabat_bins.21

Bins_5_23_2013.023

Figure 5-1: A metagenomes and binning workflow used in this study to obtain genome high quality genomes with high completion and low contamination. Five different assembly/binning combinations were used and the subsequent bins were compared using a pairwise blast analysis.

The resulting bins from each of the five assembling/binning combinations was evaluated for completeness, contaminations and given taxonomic classifications using CheckM (Parks et al., 2015) (Supplemental Spreadsheet 5). All high quality bins, greater than 90% completeness and 99), a representative bin was chosen manually based on completeness, contamination and the number of contigs (Supplemental Spreadsheet 6). The assembly and binning of a high quality Accumulibacter Clade IIA was not achieved, possibly due to high strain heterogeneity. However, a BLAST analysis revealed that numerous highly fragmented, incomplete bins had a high percent identity (>99%) with a previously assembled and completed Accumulibacter Clade IIA genome that came from the

97 same reactor (García Martín et al., 2006) (Supplemental Spreadsheet 7). Therefore, the complete and closed genome that had already been assembled from the same reactor was used in all downstream analysis (see Oyserman, Noguera, et al., 2016). To determine the fraction of metagenomic reads mapping to each genome bin, the metagenomes were competitively mapped to the 43 genomes (42 unique high-quality genomes plus the complete Accumulibacter Clade IIA genome) using BWA-mem (Li & Durbin, 2009). The number of reads mapped per genome was calculated using the SAMtools idxstats command (Li et al., 2009). A summary of the reads mapped to each contig, and the total number of reads each genome may be found in Supplemental Spreadsheet 8. All unique high quality genomes, and the Clade IIA genome, were annotated using Metapathways (Konwar et al., 2015). 5.2.4 Metatranscriptomic sequence processing, mapping and normalization The metatranscriptomic data set used in this analysis is publically available and may be found at the Joint Genome Institute website, https://img.jgi.doe.gov/, with Taxon Object IDs 3300002341-3300002346. For additional information on experimental design and RNA processing see the original publication (Oyserman, Noguera, et al., 2016). For this analysis, quality filtered data was downloaded and ribosomal RNA removed (rRNA) (Supplemental Spreadsheet 10) with SortMeRNA using all built in databases (Kopylova et al., 2012). The nonrRNA sequences remaining were then competitively mapped to the 42 unique high quality draft genomes as well as Accumulibacter Clade IIA (CAP2UW1) (García Martín et al., 2006) using BWA-mem with default parameters (Li & Durbin, 2009). The abundance of transcripts mapping to each gene was then calculated with HTseq using the intersection-strict parameter (Anders et al., 2014). The number of reads mapping to each gene during each time point is summarized in

98 Supplemental Spreadsheet 11. To estimate changes in relative expression across each gene, read counts were normalized by total reads in the sequencing run, the number of reads that remained after rRNA filtering, the fraction of total reads that aligned to the respective genome and finally converted to reads per kilobase per million mapped reads (RPKM) (Mortazavi et al., 2008) as described in (Oyserman, Noguera, et al., 2016). Normalized reads may be found in Supplemental Spreadsheet 12.

5.2.5 Identification of polymer cycling organisms based on congruent transcriptional responses To assign traits based on CTR, the transcriptional responses of marker gene sets for polyP, PHA and glycogen cycling in 42 high quality unknown draft genomes were compared against a wellcharacterized model (Accumulibacter Clade IIA). First, a marker gene set for each polymer was selected. Next, a method for scoring CTR between unknown and model was developed. Finally, a statistical cut-off based on the background distribution of CTR scores from randomly generated marker gene set for each trait was determined. 5.2.6 Transcriptional marker gene selection A set transcriptional marker genes (Table 5-1) involved in the synthesis and degradation of each polymer was chosen based on the molecular mechanisms of the model polymer cycling organism Candidatus Accumulibacter phosphatis Clade IIA (Oyserman, Noguera, et al., 2016; Oyserman, Moya, et al., 2016). A set of primary and accessory genes was used. Primary markers are those genes directly involved with the polymer synthesis and degradation, whereas accessory are not directly involved but nonetheless intimately linked through transport, redox regulation or otherwise.

99 Selected genes involved in glycogen metabolism included glycogen phosphorylase (EC 2.4.1.1), 1,4-alpha-glucan branching enzyme (EC 2.4.1.18) and glycogen synthase (EC 2.4.1.21). These genes were selected because they represent the initial and final steps in the synthesis and degradation of glycogen. No accessory genes for glycogen cycling were used. For PHA metabolism the primary marker genes included were PHA synthase (no EC number) and polyhydroxyalkanoate depolymerase (EC 3.1.1.75). These genes were selected because they represent the initial and final steps in the synthesis and degradation of PHA. In addition, three accessory genes were identified involved in the regulation of PHA metabolism and the activation of acetate to acetyl-CoA, an initial step in PHA metabolism. Specifically, these included PHA associated granule proteins (phasins, no EC number), phosphate acetyltransferase (EC 2.3.1.8), and Acetyl-CoA synthetase (ADP-forming) (EC 6.2.1.13). For polyP metabolism, two genes were used as primary indicators, polyphosphate kinase (ppk, EC 2.7.4.1) and polyphosphate kinase 2 (ppk2, EC 2.7.4.1). Many genes involved in the transport of P, key counter cations Mg and K, and P metabolism regulators were also used included as accessory genes. These included low affinity P transporters (Pit), high affinity P transport (PstS, PstS, PstB, PstA, PstC), phosphorus regulators (PhoB, PhoU) and transporters of potassium and magnesium (Kup, CorA). Recent investigations have found evidence that linking carbon flux between glycogen and PHA through pyruvate ferrodoxin oxidoreductase (PFOR subunits α, β and γ), and managing redox balance through NAD-reducing hydrogenase operon (HoxEFHUY, EC 1.12.1.2) may play an important role linking anaerobic glycogen degradation and PHA synthesis (Oyserman, Moya, et al., 2016; Oyserman, Noguera, et al., 2016). Therefore, these genes were selected as accessory markers genes to identify organisms capable of linking glycogen and PHA cycling anaerobically. Table 5-1 provides a summary of the marker genes selected for each trait.

100 5.2.7 Scoring of CTR There is currently limited understanding how transcriptional patterns within a module may vary across lineages whilst providing redundant function. It is not expected that all organisms with a particular function will have every gene in a given module regulated precisely the same. For example, some genes in a module may be highly dynamic (even with different patterns) while others constitutively expressed, and the subset of genes that are dynamic or highly expressed within a module may differ across lineages. Furthermore, redundant copies of a particular gene may be regulated differently, providing additional uncertainty to the role of any given transcriptional profile towards overall function. As a result, it is necessary to develop a scoring system that is flexible to the variance expected on a gene-by-gene basis whilst capturing the overall similarity expected at a module level (Figure 5-2).

Figure 5-2: A fuzzy method/workflow for identifying modules from genomes with uncharacterized functions that are significantly similar to modules from genomes with characterized functions. Once a module has been defined, pairwise scoring of each gene in the module occurs. The scores for each gene are summed and normalized by the maximum possible for the given module. These scores may then be compared to a background distribution of scores based on randomly generated modules of N genes to identify statistically significance and identify lineages for further phenotypic validation.

After module marker gene selection based on the molecular mechanisms of known model organisms with particular functions, analogous genes in each binned genome were be identified based on the annotations provided from Metapathways (Konwar et al., 2015). Each marker gene

101 in the unknown genomes was then scored on the following basis: present (+1), if the model gene is dynamically expressed, the unknown gene will be scored based on the Pearson correlation with the model (-1 to +1), if the gene is not dynamically expressed, but is highly expressed (top 10% of expressed genes) than it will be scored (+1). Thus, a dynamic gene in a module may be scored from 0-2, otherwise it will be scored absent (0), present (1), or present and highly expressed (2). The scores for each gene are then summed, and normalized by the maximum score possible for the set of genes (as defined by the expression patterns in the model) giving a final CTR score. Of the 26 marker genes selected, 17 were scored based on presence/absence and correlations with a model expression profile (HoxF, HoxH, HoxU, HoxY, PHA, phasins, AcyetylCoa_low, AcyetylCoa_high, COG0306, PstS, PstB, PstA, PstC, PhoB, PhoU, Kup, starch_synthase). Of those remaining, 8 were scored based on presence/absence and whether they were highly expressed (alpha, beta, gamma, CorA, ppk, ppk2, glycogen_phosphorylase, alpha_glucan). Only 1 gene was scored solely on presence/absence (PHA depolymerase). The scores for genes involved in polyP, PHA, Glycogen, PFOR and Hox were each summed and then normalized based on the maximum score possible for that trait providing the final CTR score for each trait. Table 5-1 provides a summary how pairwise gene comaparisons between an unknown and the model were scored. Currently there is limited understanding of the role of redundant gene copies and their contribution to organismal function. For the sake of simplicity, in this method we do not integrate this information. Therefore, when redundant copies of a gene with a particular function in the model genome are present, the most highly expressed and dynamic gene annotated with a particular function was used. When redundant copies in the unknown genomes were present, the

102 gene that had the strongest correlation with the model, or highest expression, was used for scoring purposes. A dialogue of the current limitations of this method and future directions may be found in the discussion section.  

Trait PFOR

NAD hydrogenase (HOX)

PHA

polyP

Glycogen

 

Annotation PFOR α PFOR β PFOR γ HoxF HoxH HoxU HoxY PHA phasins AcyetylCoa_low AcyetylCoa_high PHA_d Kup COG0306 PstS PstB PstA PstC PhoB PhoU CorA ppk ppk2 starch_synthase glycogen_phosphorylase alpha_glucan

 

Scoring Category Highly Expressed Highly Expressed Highly Expressed Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic NA Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic Highly Expressed Highly Expressed Highly Expressed Dynamic Highly Expressed Highly Expressed

B

Δlog2 Expression (Model) D H L J N

2.0 0.0 1.3 0.4 1.7 0.0 0.5 0.0

4.3 3.5 4.5 3.9 4.7 4.2 4.2 1.8

5.5 5.2 5.7 6.0 5.7 5.4 4.6 3.1

2.8 4.1 3.5 3.9 3.4 3.4 2.9 1.0

3.0 2.5 3.2 2.8 2.6 1.2 1.9 0.7

0.0 0.0 0.0 0.0 0.0 0.8 0.0 0.0

0.0 4.7 0.1 0.0 0.1 0.1 0.1 0.4

2.4 5.1 0.3 0.7 0.0 0.4 0.5 0.3

2.8 5.6 0.0 0.4 0.5 0.5 0.0 0.5

2.2 7.0 0.0 0.2 0.3 0.5 0.2 0.2

1.4 6.7 0.4 0.0 0.1 0.0 0.3 0.0

0.1 0.0 2.9 2.2 2.2 2.6 2.6 2.3

0.0

1.3

2.1

1.7

1.3

1.0

Gene Score 0,1,2 0,1,2 0,1,2 0-2 0-2 0-2 0-2 0-2 0-2 0-2 0-2 0,1 0-2 0-2 0-2 0-2 0-2 0-2 0-2 0-2 0,1,2 0,1,2 0,1,2 0-2 0,1,2 0,1,2

Max Score 6

8

9

22

6

Table 5-1: The marker genes used in this investigation to identify PHA, polyP and glycogen cycling. In addition, PFOR and NAD hydrogenase genes (HOX) were used to identify organisms capable of linking glycogen and PHA cycling anaerobically. For each gene, the scoring category used for pairwise scoring (see Figure 5-2) is indicated. When genes were dynamically expressed in the model (Accumulibacter Clade IIA), the Δlog2 expression patterns for the model are indicated using the time points for Oyserman & Noguera et al 2016. The potential scores for gene is also indicated, as well as the max score for the trait which is used in normalization (see Figure 5-2).

5.2.8 Determining cut-offs of significance for CTR scores To determine whether a CTR is significant, it is necessary to assess the score in comparison to a null distribution. Since a different number of genes (N= 3, 5 and 11) were used to score the

103 different polymer cycling traits, three null distributions were generated from a background distribution of 1000 CTR scores calculated from N random genes. First, a random gene is chosen from the model genome and scored from 0-2 in each of the GFMs with unknown functions (as described above). This is repeated N times (where N is equal to the number of genes in a module) with replacement to get a single randomly generated CTR score for all GFMs from an ecosystem. To get a background distribution of CTR scores, the random pairwise scoring is repeated 1000 times for each module (glycogen, PHA and polyP). Once the background distribution for each module has been calculated, a statistical cutoff may be selected such that if an organism of unknown function has a CTR score for a module greater than a prescribed quantile cut-off, than that trait will be putatively ascribed for downstream validation. To control for the false discovery rate, the cut-off quantile was set at !

1 − #  !"  !"#$%"&, such that less than 1 genome would be expected by random chance to have a CTR score greater than this quantile. In this case, the 99% quantile was chosen as a cut-off in order to control for the multiple testing of 42 genomes. Less than 1 genome would be expected by chance to score above this quantile when testing 42 genomes (0.01*42 =0.42 genomes). In contrast, if a 95% quantile were used, ~2 genome out of 43 would by chance be expected to fall above this quantile (0.05 * 42 = 2.1 genomes). The quantiles were used to determine if each genome had significant evidence for the presence of a polymer cycling trait. In contrast, to determine whether the community was enriched in a given trait, a one-way Wilcox rank-sum test was conducted to assess whether the observed mean for the community was significantly higher than expected based on the random background distribution of 1000 CTR scores.

104 To minimize the effects of uncharacterized and rare genes skewing the background distribution towards lower scores, the following criteria were set for random gene selection 1) at least 10 copies with a given annotation in the dataset, and 2) present in the model organism (Clade IIA). A total of 1107 genes with annotations within Accumulibacter fit these criteria. 5.3 Results 5.3.1 Metagenomic assembly, binning, completeness estimates and coverage For each individual assembly (5/13/2013, 5/23/2013, 5/28/2013), the number of contigs assembled was 180552, 190748, 192256 with a total length of 269109091, 312905568, 300156806 bp respectively. For the co-assembly, 333845 contigs were assembled with a total length of 586433209 bp. A Complete summary of assembly statistics may be found in Supplemental Spreadsheet 3. Of these, 108 were high-quality bins and that were kept for comparison and downstream processing (Supplemental Spreadsheet 6). After removing overlapping draft genomes obtained from the various assembly/binning combinations, 42 unique high-quality bins were identified with a sum of 166581749 bp and 5007 contigs. The set of unique high-quality bins had an average completeness of 97±2.7% (ranging between 90.7100%), an average contamination of 1±1.4% (ranging between 0-3.58%), and an average number of contigs 119 (ranging between 10-677) (Supplemental Spreadsheet 6). Mapping the metagenomes back to the Accumulibacter Clade IIA and the unique high-quality genomes accounted for approximately 85% of all quality filtered DNA reads. The Accumulibacter Clade IIA genome, and a bin subsequently identified as a Accumulibacter Clade IA accounted for 149964637 and 29031028 (59% and 11%) of quality filtered reads from all three dates respectively. Reads mapping to other unique high-quality bins ranged between 130231 and

105 8732619 reads, or 0.04 – 2.9% of the total quality filtered metagenomics reads from all three dates (Supplemental Spreadsheet 8). 5.3.2 Metatranscriptomic sequence mapping A metatranscriptomic time series used in this analysis was conducted previously from the same reactor and date of one of the metagenomes (5/28/13) (Oyserman, Noguera, et al., 2016). Additional details on the metatranscriptomics experimental design may be found in this original publication; here we expand the metatranscriptomic analysis to the whole community. Briefly, the metatranscriptomic time series consisted of six time points (labeled B, D, H, J, L and N) covering a range of environmental conditions during a feast/famine EBPR cycle. A competitive mapping of the metatranscriptomic reads against all high-quality unique genomes and Clade IIA resulted in a total of 31829374, 15335486, 22332060, 39579631, 54444894, 50388541 reads mapped for the six time points B, D, H, J, L, and N respectively. On a per genome basis, the number of reads mapped ranged from 97358 to 133512118. The average number of reads that mapped per genome over all time points was 4974650. The number of metagenomic and metatranscriptomic reads mapping to each genome on 5/28/13 was strongly correlated when fit with a log-log (base 2) regression (R2 =0.68, p-value: 5.941e-12) (Figure 5-3). A detailed summary of the number of reads that mapped to each contig and genome may be found in Supplemental Spreadsheet 11 and 12.

22 20

log(2,RNA)

24

26

106

18

Accumulibacter Comamonadaceae Actinomycetales Bacteroidetes Alphaproteobacteria Other 14

16

18

20

22

24

26

log(2,DNA)

Figure 5-3: The statistically significant relationship between the number of DNA and RNA reads mapping to each genome (R2 =0.68, p-value: 5.941e-12)

5.3.4 Identification of polymer cycling organisms based on CTR Based on the background distribution of randomly generated CTR scores, the 99% quantile cutoffs used were 0.56, 0.67 and 0.71 for polyP, glycogen, and PHA respectively (Figure 5-4). Glycogen had the largest median (0.645), followed by polyP (0.528). In contrast, PHA cycling scores were much lower, with a median score of 0.197 (Figure 5-4A). Both glycogen and polyP had a statistically significant higher average across all genomes score than expected based on the expected background distribution (p-values 10 times in the data set and present in the model. Each red circle represents a CTR score for one of the 42 genomes.

109 B

Congruent Transcriptional Response (CTR)

A

Glyc

PolyP

PHA

IIA_UW-1 bin_009 bin_037 Bins_5_23_2013_030 Bins_5_28_2013_025 Metabat_bins_20 Metabat_bins_21 Bins_5_28_2013_022 bin_024 bin_034 Metabat_bins_16 Metabat_bins_26 bin_062 Bins_5_28_2013_021 Bins_5_13_2013_012 Metabat_bins_13 Metabat_bins_54 Metabat_bins_68 Metabat_bins_59 Metabat_bins_19 Metabat_bins_37 Metabat_bins_61 Metabat_bins_73 Bins_5_28_2013_009 Bins_5_23_2013_020 Bins_5_13_2013_009 bin_025 Metabat_bins_53 Metabat_bins_46 Metabat_bins_62 Metabat_bins_78 bin_081 Metabat_bins_49 Bins_5_23_2013_022 bin_032 Metabat_bins_51 Metabat_bins_40 Bins_5_28_2013_018 Metabat_bins_52 bin_089 Metabat_bins_48 Metabat_bins_75 bin_011

PHA PolyP Glyc PFOR HOX

Rhodocyclaceae Accumulibacter Clade IIA Rhodocyclaceae Accumulibacter Clade IA Burkholderiales Rubrivivax Rubrivivax Burkholderiales Comamonadaceae Ottowia Burkholderiales Comamonadaceae Ottowia Burkholderiales Comamonadaceae Alicycliphilus Burkholderiales Comamonadaceae Ramlibacter Burkholderiales Rubrivivax Rubrivivax Actinomycetales Intrasporangiaceae Tetrasphaera Actinomycetales Austwickia chelonae Actinomycetales Intrasporangiaceae Tetrasphaera elongata Actinomycetales Intrasporangiaceae Tetrasphaera australiensis Actinomycetales Propionibacteriaceae Actinomycetales Microbacteriaceae Clavibacter Chloroflexales Herpetosiphon aurantiacus Bacteroidetes Sphingobacteriia Sphingobacteriales Chitinophagaceae Bacteroidetes Sphingobacteriia Sphingobacteriales Chitinophagaceae Bacteroidetes Sphingobacteriia Sphingobacteriales Saprospiraceae Bacteroidetes Sphingobacteriia Sphingobacteriales Saprospiraceae Bacteroidetes Flavobacteriaceae Chryseobacterium Bacteroidetes Cytophagaceae Leadbetterella byssophila Bacteroidetes Rhodothermaceae Rhodothermus marinus Unresolved Proteobacteria Alphaproteobacteria Firmicutes Clostridia Clostridiales Gammaproteobacteria Xanthomonadales Xanthomonadaceae Gammaproteobacteria Xanthomonadales Xanthomonadaceae Gemmatimonadaceae Gemmatimonas aurantiaca Alphaproteobacteria Sphingomonadales Alphaproteobacteria Rhodospirillales Rhodospirillaceae Alphaproteobacteria Caulobacterales Caulobacteraceae Brevundimonas Alphaproteobacteria Caulobacterales Caulobacteraceae Caulobacter Alphaproteobacteria Rhizobiales Alphaproteobacteria Rhodobacterales Rhodobacteraceae Alphaproteobacteria Rhodobacterales Rhodobacteraceae Chloroflexi Bacteroidetes Flavobacteriaceae Flavobacterium Bacteroidetes Bacteroidetes Proteobacteria Alphaproteobacteria Ignavibacteriae Ignavibacteria Ignavibacteriales Armatimonadetes Fimbriimonas ginsengisoli Cyanobacteria

Figure 5-5: Identifying polymer cycling organisms using CTR scores A) A boxplot showing the CTR scores for glycogen, polyP and PHA for each genome in this study. The average CTR scores for glycogen and polyP where significantly greater than expected; however the average PHA scores across all organisms where not. B) The CTR scores for each genome. Three types of polymer cycling organisms were identified in this study, PHA/glycogen/polyP (Accumulibacter), PHA/polyP (Comamonadaceae and Rubrivivax), and glycogen/polyP (Actinomycetalse)

5.3.5 Accumulibacter The Accumulibacter Clade IA (bin_009) had a CTR greater than the 99% quantile for glycogen (0.858), polyP (0.793) and PHA (0.966). In addition, the Clade IA genome was the only genome that demonstrated transcriptional congruency for both the HOX and PFOR operons (0.803 and 1.000) (Figure 5-5). 5.3.6 Comamonadaceae and Rubrivivax Numerous genomes within the sister lineages Comamonadaceae and Rubrivivax were identified with polymer cycling scores greater than the 99% quantile threshold. In particular, Aquincola tertiaricarbonis / Rubrivivax sp. AAP5 (renamed_bin_037) had a high CTR score for PHA (0.802) and polyP (0.800) both above the 99% quantile. The CTR score for glycogen metabolism was relatively high (0.6), but was not significant based on the 99% quantile.

110 Together, the PHA and polyP response for this bin were the highest CTR identified aside from Clade IA. Another bin within the Rubrivivax lineage that is most closely related to Ideonella (renamed_Bins_5_28_2013_022) had a relatively high PHA score (0.569) compared to the other genomes, but which was below the 99% quantile threshold. However both the glycogen and polyP response for Ideonella were above the 99% quantile (Figure 5-5). Within the Comamonadacae, two bins related to Ottowia (Bins_5_23_2013_030 and Bins_5_28_2013_025) were identified with PHA (0.740 and 0.737 respectively) and polyP (0.640 and 0.592 respectively) scores above the 99% quantile threshold. Two other bins within Comamonadacae, Ramlibacter (Metabat_bins_21) and Alicycliphilus (Metabat_bins_20), scored relatively high compared to the other unknown genomes for PHA, but not greater than the 99% quantile threshold. The polyP CTR score for Metabat_bins_21 (0.567) was greater than the 99% quantile, but was lower than the Ottowia, Rubrivivax, Actinomycetales and Xanthamonadaceae polyP CTR scores. 5.3.7 Actinomyecetales Six bins within the Actinomycetales were identified with transcriptional responses indicating the potential to cycle between polyP and glycogen based on scores greater than the 99% quantile for both traits. Of these, three were most closely related to the polyphosphate accumulating organisms Tetrasphaera (bin_024, Metabat_bins_26, Metabat_bins_16), one related to polyphosphate accumulating organism Microlunatus phosphorus (bin_062), one related to Microbacteriaceae Clavibacter (Bin2_5_28_2013_021), and one related to Austwickia chelonae (bin_034). Interestingly, all bins had a highly expressed glycogen phosphorylase and alpha_glucan debranching enzymes except the Clavibacter (Bins_5_28_2013_021), which only

111 had a highly expressed glycogen phosphorylase. The highest glycogen CTR scores within the Actinomycetales were for Tetrasphaera jenkinsii bin_024 (0.92), Propionibacteriaceae bin_062 (0.903), Austwickia chelonae bin_034 (0.874) and Tetrasphaera australiensis Metabat_bins_26 (0.872) (Figure 5-5). 5.3.8 Bacteroidetes There were 10 Bacteroidetes genome bins of which a majority (7/10) displayed a CTR score in the 99% for glycogen. Of these 7, 4 were from the Sphingobacteriales (2 Saprospiraceae, Metabat_bins_68,

Metabat_bins_59;

and

2

within

Chitinophagacae

Metabat_bins_13,

Metabat_bins_54). The others were a Chryseobacterium (Metabat_bins_19), Rhodothermus marinus (Metabat_bins_37), and Leadbetterella byssophila (Metabat_bins_61). The highest CTR score for glycogen amongst the Bacteroidetes was Chitinophagaceae, Metabat_bins_13 (0.944) (Figure 5-5). 5.3.9 Alphaproteobacteria, Gematimonas and Xanthamonadaceae Two bins within the Alphaproteobacteria, Sphingomonadales (Metabat_bins_46) and a Rhodospirillaceae (Metabat_bins_62) were identified with a polyP CTR score within the 99% quantile. Additionally, a Gematimonas bin (Metabat_bins_53) and two Xanthomonadaceae were also identified with a polyP CTR score within the 99% quantile (Figure 5-5). 5.4 Discussion Natural selection often gives rise to similar adaptive responses in divergent lineages in a process called convergent evolution (Darwin 1859). It has been demonstrated that the genes, pathways and transcriptional patterns responsible for convergent evolution are often recurrent (Gallant et al., 2014; Pfenning et al., 2014; Sommer et al., 2014). A common and

112 biotechnologically important adaptation is polymers synthesis, which is often observed as a stress response induced by non-optimal environmental conditions (Lennon & Jones, 2011; Wilkinson, 1963; Rao et al., 1998). Although polymer storage has a widespread phylogenetic distribution, it is likely that the molecular mechanisms and transcriptional patterns of different classes of polymer cycling organisms have similarities, and may be identified through comparative analysis. In this investigation we explore the ability to use comparative transcriptomics to classify complex traits by identifying CTR between unknown and model organisms for three traits, polyP, glycogen and PHA cycling. 5.4.1 A comparison of gene content and CTR Inferring organismal trait using genomic content alone provides 2N states for comparison, where N is number of genes used to infer the trait, and the genes are either present or absent. In this analysis, scoring based on the presence or absence of genes would have provided 4, 6 and 8 possible scores (percent of marker genes present) for glycogen, PHA, and polyP respectively. In contrast, by calculating the CTR for glycogen, PHA and polyP, 31, 36 and 42 different scores were obtained respectively (Figure 5-5). Thus, the benefits of further differentiating genomes using CTR scores is highlighted by the results found for glycogen. For glycogen, 24 organisms had all the genes necessary for glycogen metabolism, and no further differentiation would be possible using gene presence and absence scoring alone. However, within these 24 genomes, the CTR scores ranged from 0.44-0.98, of which 19 were above the 99% quantile (in red, Figure 56). Even amongst these, a considerable range of scores was found (0.69-0.98), and subsequent work validating predictions should focus on those that have the highest CTR score. Thus, the

113 CTR score provides substantially more information for inferring the presence and absence of glycogen cycling in different genomes than genome content alone.

PHA

0.6

0.8

1.0

Glycogen

1.0

0.5

0.0

1.0

0.5

0.0

1.0

0.5

0.0

0.4

% Marker Genes Present

polyP

Congruent Transcriptional Response (CTR)

Figure 5-6 A comparison of gene content based and transcriptionally based scoring demonstrates that added information of transcriptional profiles allows the differentiation between genomes of similar content. This is shown most clearly in the glycogen, where there are many genomes that contain all gene involved in glycogen metabolism, but only a fraction of them have high CTR scores.

5.4.2 Congruent Transcriptional Responses in Accumulibacter The unique combination of transcriptional responses in Accumulibacter (including PFOR, HOX), suggests this lineage is the only one capable of cycle all three polymers in this community (Figure 5-5). This is in agreement with published literature, which has yet to identify another organism capable of this phenotype. 5.4.3 Congruent Transcriptional Responses in Comamonadaceae and Rubrivivax All Comamonadaceae/Rubrivavix genomes identified in this analysis were closely related to organisms that have demonstrated PHA storage (Ramana et al., 2006; Felföldi et al., 2011; Oosterkamp et al., 2015; de Luca et al., 2011; Tanaka et al., 2011). It is unclear how frequently

114 polyphosphate accumulating conditions were tested in these investigations, but it is rarely reported, except suggestive evidence in (Hollender et al., 2002) and unknown Comamonadaceae capable of storing polyP and PHA in wastewater treatment systems (Ge & Batstone, 2014). In addition, Rubrivivix are common in EBPR systems (Bond et al., 1995; Zilles, Peccia & Noguera, 2002) and have been shown to store PHA (Ramana et al., 2006; Mujahid et al., 2015) (renamed_Bins_5_28_2013_022, renamed_bin_037). Comamonadaceae are common in EBPR systems and have been shown to represent a majority of PHA storing organisms (Khan et al., 2002). 5.4.4 Congruent Transcriptional Responses in Actinomycetales Tetrasphaera are an important glycogen and polyphosphate cycling lineage that has been well described (Kristiansen et al., 2013; Nguyen et al., 2011), including those found in this reactor Tetrasphaera elongata, Tetrasphaera australiensis and Tetrasphaera sp. (Maszenan et al., 2000). However, we also identified an Austwickia chelonae genome for which no previous evidence of glycogen and polyphosphate cycling exists (Hamada et al., 2010; Masters et al., 1995). Additionally, a genome distantly related to Microlunatus phosphovorus (bin_062) showed evidence of polyP and glycogen cycling. Organisms in Microlunatus phosphovorus have been identified to cycle polyP and glycogen (Kawakoshi et al., 2012; Nakamura et al., 1995). 5.5 Conclusions, perspectives and limitations In this investigation, we develop a novel and simple scoring methodology that uses a known model for a given trait to differentiate the potential of unknown organisms to display the given trait using both genomic content and transcriptional profiles. Next, a statistical analysis was used to determine whether the scores for a trait within a genome, or set of genomes, were significantly

115 greater than expected from a randomly generated subset. Using this relatively basic scoring system, we demonstrate that it is possible to differentiate between genomes with otherwise overlapping genomic content, and that the predications made from this method align with the known metabolisms of the key polymer cycling lineages Accumulibacter, Tetrasphaera and the Comamonadaceae, including novel members such as Rubrivivax, Ottowia, Austwickia chelona and Clavibacter. There is currently limited predictive understanding of how the genomic content and transcriptional patterns of an organism relates to the presence or magnitude for complex traits that are dependent upon the interaction of numerous pathways, especially those with situational dependency such as polymer cycling. Key variables that must be accounted for in future methods includes variations in gene copy number, how to deal with highly or constitutively expressed genes, how to weigh the importance of transcriptional patterns against relative abundance (e.g. is highly expressed or highly correlation – what is more important?). Another hurdle in the development of high-throughput trait predictions based on transcriptional profiles is that strong local selection for traits within an ecosystem may skew the background distribution of how correlated any random subset of genes are. For example, in this investigation, we show that both glycogen and polyP had a significantly higher CTR score than expected in the community. Thus, if there is low genomic diversity within a community, few community members, or if selection was strong across many traits within an ecosystem, than the background distribution of a randomly selected group of genes would likely skew higher, resulting in artificially high statistical cut-offs. Finally, one of the greatest challenges of classifying traits using omics will be differentiating between discrete and continuous traits, and predicting the magnitude of continuous traits.

116 As the transcriptional profiles of organisms with highly characterized metabolisms becomes available, predicting traits of unknown genomes based on their transcriptional profiles will become increasingly more accurate and precise. The growth of available transcriptional data will allow scientists to identify which traits have the greatest transcriptional variance, and those with the greatest fidelity. This type of information will be invaluable for metabolic engineers looking to develop synthetic traits. Large-scale comparative transcriptomic analysis will allow evolutionary biologists to test hypothesis on the relationship between transcriptional plasticity and metabolic network complexity and origin. 5.6 Competing Interests The authors declare they have no competing interests. 5.7 Acknowledgements KDM acknowledges funding from the US National Science Foundation (CBET-0967646 and MCB-1518130) and the UW-Madison Graduate School. The work described here would not have been possible without the ongoing support of scientists and programs at the US Department of Energy Joint Genome Institute.

117 5.8 Literature Cited 1.

Anders S, Pyl PT, Huber W. (2014). HTSeq – A Python framework to work with high-

throughput sequencing data. bioRxiv. 2.

Bond PL, Hugenholtz P, Keller J, Blackall LL, Keller RG, Blackall LL. (1995). Bacterial

community structures of phosphate-removing and non-phosphate-removing activated sludges from sequencing batch reactors . Bacterial Community Structures of Phosphate-Removing and Non-Phosphate-Removing Activated Sludges from Sequencing Batch Reac. Appl Environ Microbiol 61:1910–1916. 3.

Byrne-Bailey KG, Coates JD. (2012). Complete genome sequence of the anaerobic

perchlorate-reducing bacterium Azospira suillum strain PS. J Bacteriol 194:2767–8. 4.

Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. (2009).

BLAST+: architecture and applications. BMC Bioinformatics 10:421. 5.

Gallant JR, Traeger LL, Volkening JD, Moffett H, Chen PH, Novina CD, et al. (2014).

Genomic basis for the convergent evolution of electric organs. Science (80- ) 344:1522–1525. 6.

García Martín H, Ivanova N, Kunin V, Warnecke F, Barry KW, McHardy AC, et al.

(2006). Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge communities. Nat Biotechnol 24:1263–9. 7.

Ge H, Batstone DJ. (2014). Biological phosphorus removal from abattoir wastewater at

very short sludge ages mediated by novel PAO clade Comamonadaceae. Water Res 69:1–10. 8.

Hamada M, Iino T, Iwami T, Harayama S. (2010). Mobilicoccus pelagius gen . nov ., sp .

nov . and Piscicoccus intestinalis gen . nov ., sp . nov ., two new members of the family Dermatophilaceae , and reclassifi cation of Dermatophilus chelonae ( Masters et al . 1995 ) as Austwickia chelonae. J Gen Appl Microbiol 436:427–436.

118 9.

Hollender J, Dreyer U, Kornberger L, Kämpfer P, Dott W. (2002). Selective enrichment

and characterization of a phosphorus-removing bacterial consortium from activated sludge. Appl Microbiol Biotechnol 58:106–111. 10.

Kang DD, Froula J, Egan R, Wang Z. (2015). MetaBAT, an efficient tool for accurately

reconstructing single genomes from complex microbial communities. PeerJ 3:e1165. 11.

Kawakoshi A, Nakazawa H, Fukada J, Sasagawa M, Katano Y, Nakamura S, et al.

(2012). Deciphering the genome of polyphosphate accumulating actinobacterium microlunatus phosphovorus. DNA Res 19:383–394. 12.

Khan ST, Horiba Y, Yamamoto M, Hiraishi A. (2002). Members of the Family

Comamonadaceae

as

Primary

Poly(3-Hydroxybutyrate-co-3-Hydroxyvalerate)-Degrading

Denitrifiers in Activated Sludge as Revealed by a Polyphasic Approach. Appl Environ Microbiol 68:3206–3214. 13.

Konwar KM, Hanson NW, Bhatia MP, Kim D, Wu SJ, Hahn AS, et al. (2015).

MetaPathways v2.5: Quantitative functional, taxonomic and usability improvements. Bioinformatics 31:3345–3347. 14.

Kopylova E, Noé L, Touzet H. (2012). SortMeRNA: fast and accurate filtering of

ribosomal RNAs in metatranscriptomic data. Bioinformatics 28:3211–7. 15.

Kornberg A. (1995). Inorganic polyphosphate: Toward making a forgotten polymer

unforgettable. J Bacteriol 177:491–496. 16.

Kristiansen R, Nguyen HTT, Saunders AM, Nielsen JL, Wimmer R, Le VQ, et al. (2013).

A metabolic model for members of the genus Tetrasphaera involved in enhanced biological phosphorus removal. ISME J 7:543–54.

119 17.

Lennon JT, Jones SE. (2011). Microbial seed banks: the ecological and evolutionary

implications of dormancy. Nat Rev Microbiol 9:119–130. 18.

Li H, Durbin R. (2009). Fast and accurate short read alignment with Burrows-Wheeler

transform. Bioinformatics 25:1754–60. 19.

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. (2009). The Sequence

Alignment/Map format and SAMtools. Bioinformatics 25:2078–2079. 20.

Masters AM, Ellis TM, Carson JM, Sutherland SS, Gregory AR. (1995). Dermatophilus

chelonae sp. nov., isolated from chelonids in Australia. IntJSystBacteriol 45:50–56. 21.

Maszenan a M, Seviour RJ, Patel BK, Schumann P, Burghardt J, Tokiwa Y, et al.

(2000). Three isolates of novel polyphosphate-accumulating gram-positive cocci, obtained from activated sludge, belong to a new genus, Tetrasphaera gen. nov., and description of two new species, Tetrasphaera japonica sp. nov. and Tetrasphaera australiensis sp. no. Int J Syst Evol Microbiol 50:593–603. 22.

Mortazavi A, Williams BA, Mccue K, Schaeffer L, Wold B. (2008). Mapping and

quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5:621–628. 23.

Mujahid M, Prasuna ML, Sasikala C, Ramana CV. (2015). Integrated metabolomic and

proteomic analysis reveals systemic responses of rubrivivax benzoatilyticus JA2 to aniline stress. J Proteome Res 14:711–727. 24.

Nakamura K, Hiraishi a, Yoshimi Y, Kawaharasaki M, Masuda K, Kamagata Y. (1995).

Microlunatus phosphovorus gen. nov., sp. nov., a new gram-positive polyphosphateaccumulating bacterium isolated from activated sludge. Int J Syst Bacteriol 45:17–22.

120 25.

Nguyen HTT, Le VQ, Hansen AA, Nielsen JL, Nielsen PH. (2011). High diversity and

abundance of putative polyphosphate-accumulating Tetrasphaera-related bacteria in activated sludge systems. FEMS Microbiol Ecol 76:256–67. 26.

Oyserman BO, Moya F, Lawson CE, Garcia AL, Vogt M, Hefferenen M, et al. (2016).

Ancestral genome reconstruction identifies the evolutionary basis for trait acquisition in polyphosphate accumulating bacteria. ISME J Advanced O:1–15. 27.

Oyserman BO, Noguera DR, Glavina Del Rio T, Tringe SG, Mcmahon KD. (2016).

Metatranscriptomic insights on gene expression and regulatory controls in Candidatus Accumulibacter phosphatis. ISME J 10:810–822. 28.

Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. (2015). CheckM  :

assessing the quality of microbial genomes recovered from isolates , single cells , and metagenomes. Genome Res 25:1043–1055. 29.

Peng Y, Leung HCM, Yiu SM, Chin FYL. (2012). IDBA-UD: A de novo assembler for

single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:1420–1428. 30.

Pfenning AR, Hara E, Whitney O, Rivas M V, Wang R, Roulhac PL, et al. (2014).

Convergent transcriptional specializations in the brains of humans and song-learning birds. Science 346:1256846. 31.

Ramana C V., Sasikala C, Arunasri K, Anil Kumar P, Srinivas TNR, Shivaji S, et al.

(2006). Rubrivivax benzoatilyticus sp. nov., an aromatic hydrocarbon-degradiing purple betaproteobacterium. Int J Syst Evol Microbiol 56:2157–2164. 32.

Rao NN, Liu S, Kornberg A. (1998). Inorganic polyphosphate in Escherichia coli: The

phosphate regulon and the stringent response. J Bacteriol 180:2186–2193.

121 33.

Salinero KK, Keller K, Feil WS, Feil H, Trong S, Di Bartolo G, et al. (2009). Metabolic

analysis of the soil microbe Dechloromonas aromatica str. RCB: indications of a surprisingly complex life-style and cryptic anaerobic pathways for aromatic degradation. BMC Genomics 10:351. 34.

Seviour RJ, Mino T, Onuki M. (2003). The microbiology of biological phosphorus

removal in activated sludge systems. FEMS Microbiol Rev 27:99–127. 35.

Sommer LM, Molin S oslash ren, Johansen HK, Marvig RL. (2014). Convergent

evolution and adaptation of Pseudomonas aeruginosa within patients with cystic fibrosis . Nat Genet 47:1–9. 36.

Wang Z, Zhang XX, Lu X, Liu B, Li Y, Long C, et al. (2014). Abundance and diversity

of bacterial nitrifiers and denitrifiers and their functional genes in tannery wastewater treatment plants revealed by high-throughput sequencing. PLoS One 9:1–19. 37.

Wilkinson JF. (1963). Carbon and Energy Storage in Bacteria. J Gen Microbiol 32:171–

176. 38.

Wu Y-W, Tang Y-H, Tringe SG, Simmons B a, Singer SW. (2014). MaxBin: an

automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome 2:26. 39.

Zilles JL, Peccia J, Noguera DR. (2002). Microbiology of enhanced biological

phosphorus removal in aerated-anoxic Orbal processes. Water Environ Res 74:428–36.

122

Chapter 6 Community Assembly and Ecology of Activated Sludge Under Photosynthetic Feast/Famine Conditions

Ben O. Oyserman1*, Joseph M. Martirano1, Spenser Wipperfurth1, Brian R. Owen1, Daniel R. Noguera1, Katherine D. McMahon1,2

1

Department of Civil and Environmental Engineering, University of Wisconsin at Madison,

Madison, WI, 53706, USA; 2Department of Bacteriology, University of Wisconsin at Madison, Madison, WI, 53706, USA;

Citation: Submitted to Environmental Science and Technology

Author Contributions: BOO conceived of and directed the research. BOO, JMM, SW and BRO performed experiments and data analysis. BOO, JMM, DRN and KDM contributed to manuscript preparation. All authors have approved the final article.

123 6. Abstract The success of microbial consortia based biotechnologies is dependent on an understanding of the ecological processes that select for a particular ecosystem function. Once the selective pressures that promote the deterministic assembly of a community with a targeted function have been identified, operational parameters may be optimized to promote these functions, thereby providing stable and economical biotechnologies such as wastewater treatment. Consequently, novel operational parameters hypothesized to promote particular functions may also be assessed and the resulting communities analyzed to identify organisms and interactions that may be selected for or against. With this conceptual framework in mind, a novel operational configuration was analyzed to determine whether photosynthetic communities might be used to provide CO2 sequestration and replace mechanical aeration while maintaining the wellestablished wastewater treatment process known as Enhanced Biological Phosphorus Removal (EBPR). Stable operation of photosynthetic-EBPR was successful, and the ecology of this novel system was investigated by conducting a time series analysis of prokaryotic and eukaryotic biodiversity using the V3-V4 and V4 region of the 16S and 18S rRNA gene sequences, respectively. In the Eukaryotic community, a shift in the dominant algae was observed, from Desmodesmus to a mixed consortium of Desmodesmus, Parachlorella, Characiopodium and Bacillariophytina. The Prokaryotic community also experienced large changes. The polyphosphate accumulating organisms (PAO) and nitrifying community transitioned under photosynthetic-EBPR conditions, becoming dominated by Candidatus Accumulibacter phosphatis Acc-SG3 and Nitrosomonas ureae. Functional guilds that were not abundant initially became

enriched

including

the

putative

polyphosphate

accumulating

Cyanobacteria

Obscuribacterales and Leptolyngbya, as well as the H2-oxidizing denitrifying autotroph

124 Sulfuritalea. After approximately a month of operation, the most abundant member of the prokaryotic community belonged to an uncharacterized clade of Chlorobi classified as Chlorobiales;SJA-28 Clade III. This experiment represents the first investigation into the ecological interactions and community assembly during photosynthetic feast/famine conditions. Our findings suggest that photosynthesis may provide sufficient oxygen to drive EBPR.

Keywords: Enhanced Biological Phosphorus Removal, photosynthesis, community assembly, Chlorobiales

125 6.1 Introduction Interest in energy efficient and carbon neutral wastewater treatment processes has been stimulated by an increasing awareness that wastewater is a resource from which water, nutrients and energy may be recovered (McCarty et al., 2011). Current biological nutrient removal (BNR) technology is often energy intensive in part due to operational requirements such as mechanical aeration (Tchobanoglous et al., 2003). A common method to decrease these energy requirements is by operating treatment systems with minimal aeration (Fitzgerald et al., 2015; Meyer et al., 2005; Zeng et al., 2003; Li et al., 2008), facilitating conditions favorable for simultaneous nitrification, denitrification and phosphorus removal in the absence of differentially aerated zones (Daigger & Littleton, 2014). In addition to practices that minimize oxygen requirements, economical alternatives to mechanical aeration such as photosynthetic oxygenation have long been recognized (Oswald et al., 1953). Photosynthetic processes are commonly implemented in low tech systems such as high rate algal ponds (HRAP) (Sutherland et al., 2015), however they are rarely integrated into the predominantly heterotrophic activated sludge-type BNR systems (Rosso & Stenstrom, 2008; Sahely et al., 2006). Thus, integrating photosynthesis and activated sludge processes is an alluring aspiration because, 1) oxygen provided from photosynthesis may be sufficient to fulfill treatment requirements (de Godos et al., 2009), thus eliminating the energy costs associated with mechanical aeration, and 2) photoautotrophic growth has the potential to contribute to carbon neutral wastewater treatment processes (Mo & Zhang, 2012). In photosynthetic wastewater treatment systems, algae and bacteria may form a reciprocal association in which carbon dioxide (CO2) and oxygen (O2) are exchanged. Theoretical calculations for this symbiotic interaction in a closed system have suggested that the CO2 and O2 production are sufficient to drive each process respectively, but that nitrogen (N) and phosphorus

126 (P) remain in solution (Boelee et al., 2014). This drawback is also commonly identified in practice, as photosynthetic systems are often unable to meet nutrient removal requirements (Judd et al., 2015). One approach to address this stoichiometric imbalance, and hence improve nutrient removal in photosynthetic systems, may be to integrate established feast/famine cycles of BNR technologies with photosynthetic processes. Traditional BNR treatment plants rely on microbial communities operated primarily in the dark. Therefore, there is currently very limited understanding of the communities and interactions that may be selected for in engineered systems that combine light and dark environments. Only after these interactions have been identified, may they be managed (e.g., to select for or against specific microorganisms) through operational parameterization. In this experiment, we investigated whether P removal through polyphosphate accumulation may be successfully coupled with photosynthetic oxygenation. Furthermore, we tracked the microbial community structure derived from this coupling. Specifically, a reactor was inoculated with two activated sludge communities, a photosynthetic nitrifying culture and an EBPR culture, and operated over approximately two months under light/dark and feast/famine conditions. Throughout the duration of the experiment, we monitored EBPR function and conducted a time series 16S and 18S rRNA gene sequencing analysis to explore patterns of community assembly and function. Our results reveal that achieving P removal by coupling polyphosphate cycling with photosynthetic oxygen production is possible, and that continuously illuminated conditions may also be used to achieve biological P removal. Furthermore, we hypothesize on intriguing ecological interactions that may be exploited under photosynthetic feast/famine conditions including the enrichment of a novel uncharacterized lineage within the Chlorobi. Lastly, the community analysis suggests that a key feature of photosynthetic aeration is the

127 avoidance of gas stripping, and consequently, the development of syntrophic communities relying on gaseous metabolites such as H2. 6.2 Materials and Methods 6.2.1 Seed sludge The photosynthetic-EBPR reactor was inoculated with sludge from two lab scale wastewater treatment systems: an EBPR system achieving P removal and a photosynthetic bioreactor achieving nitrification. Operational parameters for the EBPR system are as previously reported (García Martín et al., 2006). The operational parameters for the photosynthetic nitrification system were identical to those described below, except with no added acetate (also see Karya et al., 2013). The new photosynthetic-EBPR reactor was inoculated with biomass from each parent reactor. Mixed liquor from each reactor (500 mL) was collected, allowed to settle, decanted and then rinsed with tap water. Decanting and rinsing was done to remove contaminating nutrients such as carbon, and potential terminal electron acceptors such as nitrite and nitrate. The rinsed biomass was then mixed with ~ 2000 mL of anoxic mineral medium prepared by extended purging of the medium with N2 gas. 6.2.2 Reactor operation A 2.5 L reactor was operated as a sequencing batch reactor with a 12-hour cycle, including a 1hour settling/decanting/fill period, a 2-hour dark period and a 9-hour light period. A hydraulic retention time (HRT) of 0.625  days and a solids retention time (SRT) of 13.3 days were achieved by decanting after settling to a volume of 500 mL (removing 2 L of effluent) twice per day, and wasting approximately 188 mL of the mixed volume once a day. After an initial 10 minutes of the dark period, 25 mL of media A and B, respectively, were pumped into the reactor over a

128 period of 2. 5 minutes. Media A was a mineral medium composed of 13.1 g/L sodium acetate, 0.88 g/L of KH2PO4, 9.1 g/L of NaHCO3 and 0.68 g/L of KHSO4. Media B was composed of 1.96 g/L NH4Cl, 4.44 g/L CaCl2·2H2O, 3.03 g/L MgSO4 and 20mL of a trace element solution composed of 5.51 g/L citric acid, 4.03 g/L hippuric acid, 0.73 g/L Na3NTA.H2O, 0.3 g/L Na3EDTA·4H2O, 3.03 g/L FeCl3·6H2O, 0.5 g/L H3BO3, 0.3 g/L ZnSO4·7H2O, 0.24 g/L MnCl2·4H2O, 0.12 g/L CuSO4·5H2O, 0.06 g/L KI, 0.06 g/L Na2MoO4·2H2O, 0.06 g/L CoCl2·6H2O, 0.06 g/L NiCl2·6H2O, 0.06 g/L Na2WO4·2H2O. After operating in this fashion for 46 days (~3.5 SRTs), the dark period was eliminated during an additional 21 days of operation (~1.5 SRT), to investigate whether P cycling could be sustained under continuous illumination. 6.2.3 Analytical chemistry In order to monitor reactor function, dry cell weight, dissolved oxygen (DO), soluble P, total P (TP), ammonium, nitrite and nitrate were monitored. Soluble phosphate was measured using PhosVer® 3 phosphate reagent (HACH Company, Loveland, CO). Total P was measured using the PhosVer® 3 Ascorbic Acid method with Acid Persulfate Digestion (HACH Company, Loveland, CO). Dry cell weight was measured using standard methods (APHA et al., 1999). Soluble ammonia concentrations were measured using the salicylate method (Method 10031, Hach Company, Love- land, CO). Soluble nitrite and nitrate concentrations were measured using high-pressure liquid chromatography on a Prevail™ Organic Acids (Discovery Sciences, Deerfield, IL) column and detection with UV at 214 nm. The mobile phase contained 25 mM KH2PO4, adjusted to pH 2.5 using phosphoric acid. Soluble samples were taken after feeding, at the end of the dark period, and at the end of the light period. When the reactor was operated under continuous illumination, sampling was continued at the time when the dark period had

129 previously ended to capture P release dynamics. DO was measured using WTW Multi 3410 Multiparameter Meter with a FDO 925-6 dissolved oxygen probe (WTW GmbH, Weilheim Germany). The DO concentration at the end of each cycle was recorded using the time point before settling began. 6.2.4 DNA extraction For each of seven sampling days, biomass was collected and extracted in triplicate (a total of twenty-one samples). Specifically, 2 mL of mixed sludge was centrifuged at 8000 rpm and supernatant was removed. The cell pellets were immediately frozen at -80 °C until further processing. DNA extraction was conducted using a PowerSoil DNA Isolation Kit with bead beating (Mo-Bio Laboratories Inc., Carlsbad, CA, USA). The quantity and quality of DNA was measured using a Qubit® dsDNA HS Assay Kit (Life Technologies, Carlsbad, California, USA) and by gel electrophoresis (Supplemental Spreadsheet 1). 6.2.5 Construction and sequencing of 16S and 18S rRNA gene amplicon libraries The twenty-one genomic DNA samples were submitted to the University of Wisconsin-Madison Biotechnology Center. Samples were prepared as described in the 16S Metagenomic Sequencing Library Preparation Protocol, Part #15044223 Rev. B (Illumina Inc., San Diego, California, USA) with the following modifications: The 16S rRNA gene V3/V4 variable region was amplified using S-D-Bact-0341-b-S-17 and S-D-Bact-0785-a-A-21 forward and reverse primers (Herlemann et al., 2011). The 18S rRNA gene V4 variable region was amplified using the Euk_1391f and EukBr-7R primers based on the protocol outlined by the Earth Microbiome Project initiative (http://www.earthmicrobiome.org/emp-standard-protocols/18S/) (Caporaso et al., 2012). Both sets of primers were modified to add Illumina adapter overhang nucleotide

130 sequences to the gene‐specific sequences. Following initial amplification, library size was verified on an Agilent DNA1000 chip, and cleaned using a 1x volume of AxyPrep Mag PCR clean-up beads (Axygen Biosciences, Union City, CA). 6.2.6 16S and 18S rRNA gene sequence processing Raw 16S rRNA gene amplicon reads were quality filtered (quality threshold = 30) using the FastX Toolkit (http://hannon lab.cshl.edu/fastx_toolkit/). Due to the short length of the 18S rRNA gene reads, adapter and quality trimming was done using the software Skewer (Jiang et al., 2014) with a quality score of 30. After quality trimming and adapter removal, sequences were processed with Mothur (v. 1.36.1) (Schloss et al., 2009) using the MiSeq standard operating procedures (http://www.mothur.org/wiki/MiSeq_SOP) (Kozich et al., 2013). Taxonomic classification was done using the SILVA SEED database v119 (Quast et al., 2013). For downstream analysis, each sample was sub-sampled to the minimum depth for each subset, 110,901 and 192,198 reads for the 16S and 18S datasets respectively. Operational taxonomic units (OTUs) were clustered based on 97% similarity. Principal components analysis was conducted to assess the reproducibility of replicates using any OTU whose sum abundance represented >0.01% of total reads. A correlation matrix of the top 20 OTUs within each dataset was calculated using Spearman's correlation and a hierarchical clustering of these correlations was conducted using the “complete” agglomeration method in the R package hclust. 6.2.7 Phylogenetic analysis of OTUs Key functions in wastewater treatment, such as P removal and nitrification are provided by organisms with wide intra-genera ecological amplitudes with regards to terminal electron acceptors (Flowers et al., 2009; Kim et al., 2013) and DO concentrations (Fitzgerald et al., 2015;

131 Park et al., 2002). Therefore a phylogenetic analysis of each operational taxonomic unit (OTU) classified within Candidatus Accumulibacter phosphatis (henceforth Accumulibacter) and Nitrosomonas was conducted to further resolve their taxonomy. Representative sequences for each Accumulibacter and Nitrosomonas OTU were aligned in MAFFT version 7.215 (Katoh & Standley, 2013) using the linsi option. Non-informative base pairs were masked such that gaps were allowed in up to half of the taxa using Gblocks version 0.91b (Castresana, 2000). A phylogeny was then constructed with RAxML version 8.0.14 using the GTRCAT substitution and 1000 bootstraps. To better resolve the phylogenetic position of the OTU taxonomically classified as Chlorobiales;SJA-28 a phylogenetic analysis was conducted with representative sequences from the Chlorobi, including Ignavibacerium, OPB56, and the Bacteriodetes. This phylogeny was constructed as described above; however, the maximum number of contiguous non-conserved positions and the minimum length of a block were modified from default parameters to 8 and 5, respectively, decreasing the number of base-pairs masked (Castresana, 2000). 6.2.8 Nucleotide accession numbers The full 16S and 18S data sets were deposited to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) under accession number SRP073389. 6.3 Results 6.3.1 Reactor performance Based on reactor performance and operational parameters, three distinct phases were observed (Figure 6-1A). Phase 1 was characterized by a rapid stabilization of P removal through EBPR and a slow degradation of N removal. Phase 2 was characterized by continued stability in P

132 removal and increasing N removal. Finally, Phase 3 was defined by the shift in operational mode from dark/light cycles to continuous illumination. Near complete P removal was achieved by day four of reactor operation and was maintained (94±7% removal) for the entirety of reactor operation (Figure 6-1A, Table 6-1). The average P removal showed improvement during each phase, however only the difference between Phase 1 and 3 was statistically significant (Table 6-2). Even after initial inoculation (cycle 1), when low P removal efficiency was observed, P release and cycling indicative of EBPR occurred. P release and cycling then continued for the duration of reactor operation, even when operated under continuous illumination (Figure 6-1B-D, Table 6-1). Both TP and dry cell weight increased slightly throughout operation but stabilized between Phase 2 and 3 (Table 6-1). The ratio of TP/dry cell weight did not vary significantly throughout reactor operation (Table 61). After the initial seeding event (Cycle 1), total soluble N removal was relatively high but decreased until reaching a minimal efficiency at the end of Phase 1 (Figure 6-1A). During Phase 2, without any modification in operational conditions, N removal rebounded, reaching peak efficiency during Phase 3 where, on average, approximately 90% of the soluble N was removed (Figure 6-1A, Table 6-1). During Phase 1, NH4+-N levels increased during the feast period (Figure 6-1B, Table 6-1) relative to at the beginning of the cycle. The change in concentration of NH4+-N at the anaerobic feast (Δ NH4+-N AF) was significantly different than in Phase 2 and 3, where this behavior was not observed (Table 6-1, Figure 6-1 B-D). NO2--N and NO3--N levels were low at the end of the feast period during Phases 1 and 2, and completely absent during Phase 3, a difference that was statistically significant (Table 6-1). The NO2--N levels at the end of the cycle increased significantly from Phase 1 to 3, with intermediate values at Phase 2 (Table

133 6-1 and 6-2). NO3--N was completely absent in Phase 3 (Table 6-1), a statistically significant difference from Phase 1 (Table 6-2). The summary of chemical data collected at the beginning, end of anaerobic feast, and end of the cycle may be found in Supplemental Spreadsheet 2. A summary of chemical data from the days in which a full chemical profile was conducted (Figures 6-1B-D) may be found in Supplemental Spreadsheets 3-7. During all three phases, DO concentrations were near saturation (~8 mg/L) immediately after fill. This oxygen was depleted rapidly and remained below 0.05 mg/L until the end of the cycle when oxygen would occasionally begin to accumulate until the settling period began (Figures 62A and B). Specifically, while DO concentrations were below 0.05 mg/L approximately 70% of the time, between 0.05-0.6 mg/L approximately 17% of the time, and above 0.6 mg/L only 13% of the time (Figure 6-2 C). DO accumulation at the end of the cycle differed between phases as well, especially during Phase 3, when continuous illumination resulted in frequent accumulation of DO at the end of the cycle (Figure 6-2 D). Nitrite and nitrate concentrations were positively correlated with end-of-cycle DO concentrations. In contrast, P concentrations at the end of the anaerobic feast period were negatively correlated with end-of-cycle DO concentrations. A complete DO profile and analysis of end of cycle concentrations may be found in Supplemental Spreadsheet 8.

134

Figure 6-1: Chemical profiles during photosynthetic EBPR A: The proportion of soluble nitrogen and phosphorus removal as measured by soluble reactive phosphorus and nitrogen species NH4+-N, NO2--N and NO3--N, during three phases of reactor operation. Phase 1, 2 and 3 are indicated through shaded regions. Phases 1 and 2 were operated with a light/dark cycle while Phase 3 was operated with continuous illumination. Blue arrows indicate when biomass samples were collected for 16S and 18S rRNA gene sequencing. B-D: Representative soluble P and NH4-N profiles for the three phases of operation. Phosphorus cycling was observed during all phases of operation. During Phase 1, NH4-N seemed to be released upon acetate contact. NH4-N was not released upon acetate contact during subsequent phases

135

Figure 6-2: Oxygen profiles during photosynthetic EBPR A: Dissolved oxygen measurements taken every 5 minutes of operation. Each point measurement is partially transparent, and darker circles represent when multiple successive time points had very similar readings. In contrast, light circles are symbolic of rapid transitions in dissolved oxygen concentrations occurring at the beginning of cycles after fill, and at the end of cycle when oxygen accumulation was observed. The red circles are representative of the final time point before settling began, which represents the final amount of oxygen that accumulated, if any, before the cycle ended. B: A representative oxygen curve collected over one operation cycle. Three general properties in the oxygen profile were observed. First, oxygen concentrations were near saturation after fill. Second, this oxygen was rapidly depleted and was maintained at approximately 0.3 mg/L. Finally, oxygen accumulation was occasionally observed at the end of a cycle until the settling period began. C: The percent of operation time during which dissolved oxygen was within various ranges. On average across all phases, very low dissolved oxygen concentrations were observed during approximately 70% of the total operational time. D: At the end of each cycle, oxygen accumulation was often observed. During Phase 3 when lights were on during the entire cycle, oxygen accumulation was observed at a much higher rate than during the other phases.

136

P removal (%) N removal (%) P Begin (mg/L) P End (mg/L) P AF(mg/L) NH4-N Begin (mg/L) NH4-N End (mg/L) NH4-N AF(mg/L) NH4-N AF(mg/L) NO2--N Begin (mg/L) NO2--N End (mg/L) NO2--N AF(mg/L) NO3--N Begin (mg/L) NO3--N End (mg/L) NO3--N AF(mg/L) DO End (mg/L) Dry Cell Weight TP TP/Dry Cell Weight

Table 1. Average Values Phase 1 Phase 2 0.92 ± 0.08 0.94 ± 0.07 0.47 ± 0.2 0.59 ± 0.2 2.63 ± 1.22 2.7 ± 1.02 0.26 ± 0.23 0.14 ± 0.18 22.06 ± 2.68 23.84 ± 3.09 3.6 ± 0.53 4.47 ± 0.91 1.75 ± 0.74 1.38 ± 0.6 5.09 ± 0.69 5.15 ± 1.33 -1.38 ± 0.65 -0.25 ± 1.65 0.08 ± 0.1 0.1 ± 0.14 0.11 ± 0.29 0.24 ± 0.39 0.03 ± 0.09 0.01 ± 0.04 0.01 ± 0.03 0.02 ± 0.05 0.16 ± 0.37 0.19 ± 0.44 0.01 ± 0.04 0.02 ± 0.04 1.2 ± 1.77 0.29 ± 0.31 660.7 ± 188.6 914.4 ± 253.9 114.4 ± 12.4 120.3 ± 13.5 0.19 ± 0.07 0.14 ± 0.04

Phase 3 0.97 ± 0.06 0.9 ± 0.05 2.9 ± 0.29 0.08 ± 0.16 23.85 ± 2.41 4.85 ± 0.83 0.35 ± 0.29 4.73 ± 0.59 0.12 ± 0.88 0.02 ± 0.03 0.26 ± 0.12 ND ND ND ND 1.89 ± 2.06 958.6 ± 50.6 135.7 ± 12.4 0.14 ± 0.02

All 0.94 ± 0.07 0.63 ± 0.25 2.72 ± 0.97 0.17 ± 0.21 23.16 ± 2.84 4.24 ± 0.91 1.28 ± 0.83 5.01 ± 0.91 -0.54 ± 1.3 0.08 ± 0.11 0.18 ± 0.31 0.02 ± 0.06 0.01 ± 0.04 0.14 ± 0.36 0.01 ± 0.04 1.14 ± 1.71 833.3 ± 232.9 121.9 ± 14.7 0.16 ± 0.05

Table 6-1: The average values for various measurements taken during reactor operation over each phase (Phase 1, 2 & 3) and averaged over the entire duration (All). Measurements for P and NH4-N, NO2--N and NO3--N was taken at the beginning (Begin) and end (End) of each cycle, as well as the end of the anaerobic feast (AF) period. The average change in NH4-N during the anaerobic feast phase is reported (Δ NH4-N AF). Additionally, the average dissolved oxygen at the end of each cycle (DO end) the dry cell weight, total phosphorus (TP) and TP/dry cell weight are reported.

137

P removal (%) N removal (%) P End P AF NH4-N End NH4-N AF NO2 -N Begin NO2--N End NO3--N End DO End DO End vs. NO2--N End TP/Dry Cell Weight Dry Cell Weight TP

Table 2. Pair-wise t-test, 1 sided, heteroscedastic 1v2 1v3 0.21 0.03 0.07 0.00 0.06 0.01 0.055 0.048 0.08 2E-07 0.02 2E-04 0.30 0.03 0.17 0.05 0.40 0.05 0.022 0.099 0.004 0.091 0.103 0.045 0.011 0.224 0.017

2v3 0.12 0.00 0.19 0.50 4E-05 3E-01 0.03 0.43 0.07