Molecular Ecology Resources (2012) 12, 369–373
doi: 10.1111/j.1755-0998.2011.03081.x
GeoSymbio: a hybrid, cloud-based web application of global geospatial bioinformatics and ecoinformatics for Symbiodinium–host symbioses ERIK C. FRANKLIN, MICHAEL STAT, XAVIER POCHON, HOLLIE M. PUTNAM and RUTH D. GATES
Hawaii Institute of Marine Biology, School of Ocean and Earth Science and Technology, University of Hawaii at Manoa, PO Box 1346, Kaneohe, Hawaii 96744, USA
Abstract The genus Symbiodinium encompasses a group of unicellular, photosynthetic dinoflagellates that are found free living or in hospite with a wide range of marine invertebrate hosts including scleractinian corals. We present GeoSymbio, a hybrid web application that provides an online, easy to use and freely accessible interface for users to discover, explore and utilize global geospatial bioinformatic and ecoinformatic data on Symbiodinium–host symbioses. The novelty of this application lies in the combination of a variety of query and visualization tools, including dynamic searchable maps, data tables with filter and grouping functions, and interactive charts that summarize the data. Importantly, this application is hosted remotely or ‘in the cloud’ using Google Apps, and therefore does not require any specialty GIS, web programming or data programming expertise from the user. The current version of the application utilizes Symbiodinium data based on the ITS2 genetic marker from PCR-based techniques, including denaturing gradient gel electrophoresis, sequencing and cloning of specimens collected during 1982–2010. All data elements of the application are also downloadable as spatial files, tables and nucleic acid sequence files in common formats for desktop analysis. The application provides a unique tool set to facilitate research on the basic biology of Symbiodinium and expedite new insights into their ecology, biogeography and evolution in the face of a changing global climate. GeoSymbio can be accessed at https://sites.google.com/site/geosymbio/. Keywords: biogeography, bioinformatics, ecoinformatics, hybrid web application, Symbiodinium, symbioses Received 6 May 2011; revision received 12 September 2011; accepted 16 September 2011
Introduction The genus Symbiodinium is a group of unicellular, photosynthetic dinoflagellates that are found either free living or in hospite with a wide range of marine invertebrates including scleractinian corals. Symbiodinium is genetically diverse, encompassing nine divergent lineages (clades A-I; Pochon & Gates 2010), each containing multiple types, some of which have been assigned species names (e.g. LaJeunesse et al. 2009, 2010). During the last decade, the internal transcribed spacer 2 region (ITS2) of the nuclear ribosomal array has been used extensively for assessing the distribution of Symbiodinium types in invertebrate hosts sampled from a variety of habitats (Lajeunesse 2005; Stat et al. 2006; Correa & Baker 2009; Silverstein et al. 2011). This effort has resolved 409 distinct ITS2 types that have each been assigned unique
Correspondence: Erik C. Franklin, Fax: 1 808 236 7443; E-mail:
[email protected]
2011 Blackwell Publishing Ltd
identifiers. One of the major impediments to significant progress in the global research of the diversity, ecology and biogeography of Symbiodinium–host symbioses is that the collection event details and environmental attributes associated with the sequence records are buried in the text of peer-reviewed literature and in genetic repositories, which makes comparative data analysis and geographic visualization of these records difficult. Building the methodological and informational capacity to dynamically examine the basic ecology and biogeography of symbionts in coral reefs and other marine environments would represent a significant scientific and technical advance for the field. To facilitate research on the spatial distribution, ecology and diversity of Symbiodinium, we introduce an open-access, hybrid web application, called GeoSymbio, which provides an online platform to query and visualize Symbiodinium data. The term ‘hybrid web application’ describes a web application that combines data, presentation methods and various functionalities from multiple sources to create new or enhanced data products and
370 E . C . F R A N K L I N E T A L . services. The GeoSymbio interface allows any user to perform text-based or spatial queries of a compiled global database of the occurrences of Symbiodinium–host symbioses using data based on the ITS2 marker from PCRbased techniques, including denaturing gradient gel electrophoresis (DGGE), sequencing and cloning of specimens collected from 1982 to 2010. The application structure draws functionality and information from a variety of visualization tools, digital data and reference sources, with the core of the application hosted remotely or ‘in the cloud’ using Google Apps. Thus, project information is accessible through any web browser with Internet access, so is not specific to a computer platform. The user can also download output as text files and spatial files for desktop analysis.
Data sources To build the global Symbiodinium data set, we compiled genetic, taxonomic, collection event and reference information for DNA sequences of the internal transcribed spacer-2 (ITS2) region of the nuclear ribosomal array, currently the most widely used and informative genetic marker for Symbiodinium (Sampayo et al. 2007). Data were mined from the US National Center for Biotechnology Information’s GenBank. DNA sequences of Symbiodinium were used to organize the organisms into clades and types based on partial 5.8S, complete ITS2 and partial 28S rDNA, referred to here on as ITS2 types. For Symbiodinium ITS2 sequences, GenBank possessed a multitude of redundant entries, often incomplete descriptions of associated attributes and mismatched fields in the submitted data. Comparison of ITS2 sequences from 79 published studies identified redundancies between records with synonymous sequences but different ITS2 type nomenclature. Identical sequences (i.e. 100% residue similarity) with different accession numbers were identified as synonyms with the first published record as the ‘parent’ accession number. The source literature identified in the GenBank record was then searched to confirm or ascertain the following descriptive characteristics for each sequence: host taxa, location, collection year and laboratory methodology. The accurate mapping of the location of Symbiodinium occurrences required manual data mining of the primary literature sources identified in each GenBank accession, with a cross-check of location in GEOnet Names (http://earth-info.nga.mil/gns/html) and Google Earth (http://www.google.com/earth/index.html). The data compilation yielded 4833 records in 34 variables that described bioinformatic and ecoinformatic attributes including host taxonomy, genetic sequence, methodology, collection event and scientific references for all nine Symbiodinium clades with hosts from 20 taxonomic orders in all major tropical oceans between 1982
and 2010. For more details, the complete database and an EML-formatted metadata record are available in Supporting information for archival purposes.
GeoSymbio structure and functionality The structure and functionality of GeoSymbio were developed around the data schema of the global Symbiodinium database. The GeoSymbio web application is composed of a collection of web pages with searchable text-based functions and spatial-based maps, the database with a schema that defines the organization of the data, a bibliography and downloadable map, and sequence files. The functionality of GeoSymbio is built upon the suite of Google Apps, including Google Sites, Google Spreadsheets, Google Fusion Tables, Google Earth and Google Gadgets. Using these applications, GeoSymbio provides four primary functions: (i) geospatial visualization, (ii) text-based queries, (iii) knowledge summary and (iv) data products (Table 1). A manual that describes the GeoSymbio components and functions is also available in Supporting information. Each of the primary functions and corresponding web addresses of GeoSymbio is described below.
Geospatial visualization The GeoSymbio maps web page allows searches for Symbiodinium clade and Symbiodinium type based on the ITS2 gene and the taxonomy of the host at the level of order (https://sites.google.com/site/geosymbio/maps). The Google Earth (KML) page provides a dynamic globe embedded in the website with the attributes of the GeoSymbio database accessible for each location in pop-up info windows (https://sites.google.com/site/geosymbio/ kml).
Text-based queries The database page allows filtering and grouping functions, which provide extremely flexible means to query for data (https://sites.google.com/site/geosymbio/ database). Filtering the database allows a simple yet powerful method to examine combinations of single filters for each attribute column; for example, a researcher interested in the occurrence of a specific ITS2 type within a particular host could filter automatically to view records that meet the criteria. The grouping method of the database lends an even greater capacity to summarizing data with hierarchical relationships among the database attributes. To continue the previous example, a hierarchical grouping of host and clade with a count by ITS2 type would dynamically update the table to show the selected criteria with subtotal record counts by group elements.
2011 Blackwell Publishing Ltd
G L O B A L G E O S P A T I A L I N F O R M A T I C S O F S Y M B I O D I N I U M 371 Table 1 Summary of the components of GeoSymbio Functionality
functionality
and
application
Component
Symbiodinium clade map Symbiodinium type map Google Earth globe map Text-based queries Database filter by attribute Database grouping and summation by attribute Knowledge summary Database dashboard Bibliography Database schema Data products Google Earth map file (.kml) ESRI shape file (.shp) Database table (.csv) Database schema (csv) Bibliography (.csv) Aligned files for each clade (A-I) of all type sequences (.fasta) Unaligned FASTA file of all types (.fasta)
Geospatial visualization
Knowledge summary The dashboard web page presents a set of interactive pie charts that summarize the information in the database by clade, ITS2 type, host order, collection year and location (https://sites.google.com/site/geosymbio/dashboard). The GeoSymbio bibliography web page (https://sites. google.com/site/geosymbio/bibliography) lists the references from which all data were collected. A schema page was included containing a detailed database schema, or descriptive metadata for the data elements in the map and database pages (https://sites.google.com/site/ geosymbio/schema).
Data products The download page (https://sites.google.com/site/ geosymbio/downloads) includes links to download the map files (as .kml and .shp) to view the data in a map program such as Google Earth or ESRI ArcGIS. In addition, a set of genetic sequence alignment files (.fasta) can also be downloaded from this page. Each of the nine sequence alignment files (representing each of nine existing Symbiodinium clades) was subjected to the following three steps. First, the sequence file was imported into the sequence alignment software BioEdit v7.0.9 (Hall 1999), where it was subjected to automatic alignment using ClustalW (Thompson et al. 1994) and further improved manually. Second, the aligned sequence file was run in ‘DNA to haplotype collapser and converter’ freely available at the online FASTA sequence toolbox (FABOX; http:// www.daimi.au.dk/~biopv/php/fabox/). The output file allowed for the detection of redundant sequences (i.e.
2011 Blackwell Publishing Ltd
characterization of identical ITS2 types published with different names). Third, redundant sequences were deleted from the alignment, so that a single ITS2 sequence per ITS2 type name was represented. This effort resulted in an alignment for each of the nine clades with the number of sequences in an alignment ranging from 1 in clade E to 267 in clade C. Finally, a global FASTA file named ‘Symbio_ITS2_local.fasta’ was also created by compiling all Symbiodinium clades and totalling 409 ITS2 sequences. This unaligned sequence file can be downloaded and used for local BLAST.
Discussion The need for a tool like GeoSymbio arises from the difficulties of integrating multiple data sources and information to perform bioinformatic and ecoinformatic analyses particularly in a geospatial context. These tasks can be challenging to execute without an interdisciplinary skill set of highly specialized scientific knowledge and a strong computing background, thus creating a barrier for progress among Symbiodinium researchers. Many of the data sources also do not lend themselves to automated data extraction because of the complexity of the information and poorly associated documentation (i.e. metadata); for example, submission of sequences to the primary repository for Symbiodinium genetic sequence information, the US National Center for Biotechnology Information’s GenBank, often results in redundant entries. Furthermore, because of the lack of detailed information from the submitters, records are often missing key ecological attributes, resulting in difficulty for analysis and synthesis. Additionally, the lack of geographic description or coarse geographic resolution common to GenBank submissions and primary literature severely challenges attempts to automate the linkage between genetic records to environmental data sets; for example, GenBank entries include location descriptions such as ‘Palau’ or ‘Indonesia’ that hamper geospatial analysis of genetic information with high resolution and spatially explicit environmental data sets (i.e. exact latitude and longitude of each sample). Other efforts to disseminate Symbiodinium data online (such as SD2-GED at the URL: http://www.auburn.edu/~santosr/sd2_ged. htm) have provided useful access to the sample records maintained by individual researchers, but lack visualization tools, cover a limited geographic extent and do not include documentation or hyperlinks for source publications and accession identifiers. In contrast, GeoSymbio provides a dynamic, interactive and comprehensive global Symbiodinium ITS2 data compilation. The power of Geosymbio is the ability to synthesize Symbiodinium data without requiring the user to be an expert in GIS, web programming, database administration or bioinformatics.
372 E . C . F R A N K L I N E T A L . By providing access to analyse contemporary Symbiodinium data, this application should facilitate new discoveries and stimulate novel hypotheses in symbiont–host ecology, diversity and biogeography, particularly as they relate to climate change. Plans for future development of GeoSymbio include the refinement of query and visualization tools and expansion to additional genetic markers.
Summary We created GeoSymbio, an online and freely accessible interface, with the end goal of facilitating the exploration of Symbiodinium–host symbioses by scientists and educators. GeoSymbio provides text-based queries and geospatial searches to discover and explore dynamic combinations of Symbiodinium sequences, host and collection event details. The web application also provides a suite of downloads to facilitate further desktop analysis of the information. It is our hope that this tool stimulates an open-collaborative environment among Symbiodinium researchers to accelerate the pace of discovery and innovation, as well as with educators and managers to promote accessibility of the current state of Symbiodinium knowledge in the field.
Acknowledgements The National Marine Sanctuary Program (memorandum of agreement 2005-008 ⁄ 66882), the US Environmental Protection Agency Science To Achieve Results (STAR) PhD Fellowship (FP917096 and FP917199), the US National Science Foundation (NSF) for grants through Biological Oceanography (OCE0752604, OCE-1041673) and the Long-Term Ecological Research (LTER) program (NSF 04-17412) provided financial support for this research. This work was conducted as a part of the ‘Tropical coral reefs of the future: Modelling ecological outcomes from the analyses of current and historical trends’ Working Group supported by the National Center for Ecological Analysis and Synthesis, a Centre funded by NSF (Grant #EF-0553768), the University of California, Santa Barbara, and the State of California. This is HIMB contribution #1466 and SOEST contribution #8491.
References Correa AMS, Baker AC (2009) Understanding diversity in coral-algal symbiosis: a cluster-based approach to interpreting fine-scale genetic variation in the genus Symbiodinium. Coral Reefs, 28, 81–93. Hall TA (1999) BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95 ⁄ 98 ⁄ NT. Nucleic Acids Symposium Series, 41, 95–98. Lajeunesse TC (2005) ‘‘Species’’ radiations of symbiotic dinoflagellates in the Atlantic and Indo-Pacific since the Miocene-Pliocene transition. Molecular Biology and Evolution, 22, 570–581. LaJeunesse TC, Finney JC, Smith R, Oxenford H (2009) Outbreak and persistence of opportunistic symbiotic dinoflagellates during the 2005 Caribbean mass coral ‘‘bleaching’’ event. Proceedings of the Royal Society of London. Series B, 276, 4139–4148.
LaJeunesse TC, Smith R, Walther M et al. (2010) Host-symbiont recombination vs. natural selection in the response of coral-dinoflagellate symbioses to environmental disturbance. Proceedings of the Royal Society of London. Series B, 277, 2925–2934. Pochon X, Gates RD (2010) A new Symbiodinium clade (Dinophyceae) from soritid foraminifera in Hawaii. Molecular Phylogenetics and Evolution, 56, 492–497. Sampayo EM, Franceschinis L, Hoegh-Guldberg O, Dove S (2007) Niche partitioning of closely related symbiotic dinoflagellates. Molecular Ecology, 16, 3721–3733. Silverstein RN, Correa AMS, LaJeunesse TC, Baker AC (2011) Novel algal symbiont (Symbiodinium spp.) diversity in reef corals of Western Australia. Marine Ecology Progress Series, 422, 63–75. Stat M, Carter D, Hoegh-Guldberg O (2006) The evolutionary history of Symbiodinium and scleractinian hosts—Symbiosis, diversity, and the effect of climate change. Perspectives in Plant Ecology, Evolution and Systematics, 8, 23–43. Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22, 4673–4680.
Data Accessibility Web application URL: GeoSymbio is at https://sites. google.com/site/geosymbio/. Web application manual: Available in Supporting information at Molecular Ecology Resources: GeoSymbio_manual_1_0_1.pdf. Data archival: Available in Supporting information at Molecular Ecology Resources as Symbiodinium_ITS2_ 1982_2010.zip that contains three comma-delimited ASCII text files. Symbiodinium occurrences, clade and subclade sequence type, host taxonomy, genetic sequence identifier, methodology, collection location and year are in Symbiodinium_ITS2_1982_2010_data.csv. The commadelimited ASCII text file contains 4833 records and a header row. The variable descriptions for headers in the data file are provided in Symbiodinium_ITS2_1982_ 2010_schema.csv. It contains 34 records (one for each variable) in a comma-delimited ASCII text file. The list of bibliographic references of the data sources and their URLs are contained in Symbiodinium_ITS2_1982_ 2010_refs.csv which contains 79 records. Metadata archival: Available in Supporting information at Molecular Ecology Resources: Symbiodinium_ITS2_1982_2010_metadata.pdf.
Supporting Information Additional supporting information may be found in the online version of this article. Appendix S1 Geosymbio manual. Appendix S2 A zip file containing the three following files: data, schema, and bibliography tables. Appendix S3 The data file of the Symbiodinium occurrence records and associated attributes.
2011 Blackwell Publishing Ltd
G L O B A L G E O S P A T I A L I N F O R M A T I C S O F S Y M B I O D I N I U M 373 Appendix S4 The schema of variable descriptions for the headers of the data file. Appendix S5 The list of bibliographic references of data sources and their URLs. Appendix S6 Metadata record for data, schema, and bibliography files (EML formatted).
2011 Blackwell Publishing Ltd
Please note: Wiley-Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.