In Silico Biology 5 (2005) 187–198 IOS Press
187
Bioinformatics Visualization and Integration with Open Standards: The Bluejay Genomic Browser Andrei L. Turinsky, Andrew C. Ah-Seng, Paul M.K. Gordon, Julie N. Stromer, Morgan L. Taschuk, Emily W. Xu and Christoph W. Sensen ∗ University of Calgary, Faculty of Medicine, Sun Center of Excellence for Visual Genomics, 3330 Hospital Drive NW, Calgary, AB, Canada, T2N 4N1 Edited by E. Wingender; received 7 October 2004; revised and accepted 18 November 2004; published 27 November 2004 ABSTRACT: We have created a new JavaTM -based integrated computational environment for the exploration of genomic data, called Bluejay. The system is capable of using almost any XML file related to genomic data. Non-XML data sources can be accessed via a proxy server. Bluejay has several features, which are new to Bioinformatics, including an unlimited semantic zoom capability, coupled with Scalable Vector Graphics (SVG) outputs; an implementation of the XLink standard, which features access to MAGPIE Genecards as well as any BioMOBY service accessible over the Internet; and the integration of gene chip analysis tools with the functional assignments. The system can be used as a signed web applet, Web Start, and a local stand-alone application, with or without connection to the Internet. It is available free of charge and as open source via http://bluejay.ucalgary.ca. KEYWORDS: Bioinformatics, visualization, open standards, eXtensible Markup Language (XML), Scalable Vector Graphics (SVG), JavaTM , BioMOBY web services, Bluejay
INTRODUCTION The last decade has witnessed the rise of bioinformatics as an influential area supporting biological research. In all likelihood, it will become one of the dominant technologies in life sciences, responsible for the ever larger share of the new knowledge. The amounts of data accumulated in genomic data banks are doubling, on average, faster than every two years, outpacing Moore’s Law. Even more challenging is the complexity of the data. The need to explore rich idiosyncrasies of the data has led to the proliferation of bioinformatics tools. However, the development of software, especially at the early stages, has often focused on relatively specialized problems without much thought given to interoperability with other software products. As a result, the current state of the bioinformatics software landscape is characterized by fragmentation [Stein, 2002]. Availability of software not only is a matter of convenience but also sometimes becomes a driving force in bioinformatics research. As an example from the area of DNA microarray data analysis, ∗
Corresponding author. E-mail:
[email protected].
Electronic publication can be found in In Silico Biol. 5, 0018 , 27 November 2004. 1386-6338/05/$17.00 © 2005 – IOS Press and Bioinformation Systems e.V. and the authors. All rights reserved
188
A.L. Turinsky et al. / Bioinformatics Visualization and Integration with Open Standards
early availability of free software for hierarchical clustering has contributed to the virtual domination of this type of algorithms over all others types of analysis [Kaminski and Friedman, 2002]. The main challenge before the bioinformatics community nowadays is meaningful integration of knowledge. It is widely understood that this complex task requires integration of multiple data types [Chicurel, 2002]. Advances in this direction have been two-fold: development of standards for bioinformatics data, and creation of comprehensive data repositories. Data-related standards include data formats, protocols for data exchange, and controlled vocabularies for meta-data. Data repositories typically collect heterogeneous data at a central location, either as a data warehouse [Sch o¨ nbach et al., 2000] or as a federation of separate databases tied together by a common query system [Chung and Wong, 1999; Chen et al., 1998]. However, equally important is tool integration, which has only recently started receiving due attention. Tools created for specialized purposes are now required to work in ensemble for comprehensive data exploration tasks. Support of common data standards allows tools to communicate data and results in an automated fashion. This, however, is not the only enabling factor for interoperation. Other factors include support of common interface conventions, availability of reusable software components, and the adoption of best practices in software integration. Using the Bluejay development as an example, we will demonstrate our approach to tool and data integration. METHODS Bluejay overview Bluejay (Browser for Linear Units in Java, http://bluejay.ucalgary.ca) is a visualization tool for biological sequences. It consists of a browser client side and a proxy server side. The browser side creates dynamic visualization of biological sequence data encoded as eXtensible Markup Language (XML), and provides the means for visual data exploration. The proxy server communicates with remote resources, converts non-XML data into XML, pre-processes the data, and enables session management. The overall Bluejay architecture is shown in Fig. 1. Browser Side Assembly Before XML data can be visualized, the source needs to be parsed into an internal representation. Bluejay complies with the Simple API for XML (SAX), an event-driven interface for Java XML parsers (http://www.saxproject.org). The SAX de facto standard recognizes XML namespaces and allows several parsers to be organised into a series of data filters, for instance, for hyperlink extraction. Bluejay listens to the Apache-compliant Xerces SAX parser (http://xml.apache.org/xerces2-j), which processes XML events and subsequently acts by building an internal representation of the XML elements. Bluejay is fully compliant with the XLink standard for hyperlinks between XML documents and elements (http://www.w3.org). It recognizes, extracts, and activates XLinks from the incoming data to other genomic data and web services. The XLink module acts as an optional SAX 2 filter on top of a SAX 2 parser (such as Xerces): it listens to parsing events and extracts all relevant XLinks into a link pool. Bluejay internal structures then handle dynamic link recognition and activation. To provide XLink support, we reused an existing but outdated Java XLinkFilter package [St Laurent, 1998], upgrading it to the current XLink specifications and SAX 2 interface. The resulting XLink module in Bluejay is, to our knowledge, the only free open-source light-weight XLink package that fully implements the latest
A.L. Turinsky et al. / Bioinformatics Visualization and Integration with Open Standards
189
Fig. 1. Bluejay information flow. Software components used to implement the functionality at main steps are shown in italics.
XLink standard, including extended links, inbound and third-party links, linkbases, and other nontrivial cases. The uses of XLinks in Bluejay will likely be expanding for the tasks of visual integration of data and services. In Bluejay, the SAX parser creates internal data structures that implement the Document Object Model (DOM) interface (http://www.w3.org). DOM is a de facto standard for the internal representation of XML data. As a result, XML data in Bluejay can be processed by many of the standard XML tools that operate on DOM structures. In particular, XPointer, XPath, and XQuery standards for query and retrieval of XML data can now be applied. Any DOM implementation package may be readily plugged in to provide DOM functionality. We have reused a DOM implementation module that was a part of the Apache Batik browser project (http://xml.apache.org/batik), described below. Once the XML data is parsed and XLinks are processed, Bluejay builds an interactive legend of functional categories, enabled by XPath and XPointer standards support (http://www.w3.org). XPath defines a way to describe XML elements in a DOM tree that satisfy a certain condition, e.g. have the same gene function name attribute, while XPointer standard defines locations of individual nodes in a DOM tree. The Bluejay legend allows users to manipulate the categories visually, e.g. by clicking on legend items, which then access the DOM data with customized XPath queries. To implement XPath access to DOM, an existing XPath module from the Apache Xalan-Java project (http://xml.apache.org/xalan-j) was plugged into Bluejay. Our future plans are to implement more extensive visual queries using XQuery standard (http://www.w3.org), which allows very flexible descriptions of elements. Functional categories in the legend are built in accordance to existing controlled vocabularies. Currently, Bluejay supports Gene Ontology (GO) categories [Harris et al., 2004] for genomic data but can also represent types of sequence features outside of the GO scope. Support of other ontologies and standard vocabularies will be added along with new data types.
190
A.L. Turinsky et al. / Bioinformatics Visualization and Integration with Open Standards
Bluejay creates the visual model of the data as a collection of main sequences, such as DNA or protein backbones, with respect to which all other visual elements are placed. For example, genome visualization is performed by specifying basepair positions of genes and other features on a target DNA sequence, where positions are intrinsic to the data and remain constant as the user manipulates the display. This allows the Bluejay core to take advantage of the object-oriented capabilities of Java by separating high-level placement onto a sequence from a low-level rendering onto a canvas: – A Java interface called AbstractLinearGraphics, which is a part of the Bluejay package, defines a set of Java methods for the high-level placement onto target sequences. Its functionality on linear targets resembles the behaviour of the usual Java Graphics2D on two-dimensional regions. – The low-level painting is delegated to core Bluejay classes that represent target sequences. These classes then call an implementation of the usual Java Graphics2D interface to draw onto the canvas. As the user manipulates the display, the Java objects representing each sequence recompute the canvas coordinates of all elements accordingly. Through its Java interface, Bluejay currently delegates some of its drawing to other existing Java visualization packages. For example, the JFreeChart library of Java classes (http://www.jfree.org) was integrated into Bluejay to create visual summaries of the sequence data, such as pie chart graphs of functional category distribution. The benefit of using the Java Graphics2D interface is the ability to plug in any of its implementations. An example is the implementation called SVGGraphics2D provided in the Apache Batik browser, which we plugged as a visualization module into the Bluejay. Scalable Vector Graphics (SVG) is a textual XML-based format for image description (http://www.w3.org). In the SVG-enabled Batik browser, the SVGGraphics2D implementation outputs appropriate XML image tags in SVG format (or their internal DOM representations) in response to the standard Java 2D drawing functions called by the higher-level Bluejay drawing modules. We shall discuss the results of implementing SVG-based visualization in a subsequent section. Recently, a module for gene expression data was added into Bluejay. This was achieved by plugging in the TIGR MultiExperiment Viewer (TIGR MEV) package, which has been developed at the Institute for Genomic Research and written in Java (http://www.tigr.org/software/tm4). TIGR MeV is a versatile toolkit for gene expression data normalization, analysis, and mining. We have also developed a system of mapping between MeV-supported data and the visual model of a genome in Bluejay. The results of this integration effort are discussed in a subsequent section. Proxy Side Assembly Much of the proxy functionality is a part of the core Bluejay package, written partially in Java and in Perl. However, some of the reused generic components are indicated below. While Bluejay is designed to visualize XML data, a number of non-XML data formats are frequently used in bioinformatics, such as flat files or HTML pages. The Bluejay proxy converts such data into XML on the fly. To implement the conversion functionality, an existing Java-based IUBio ReadSeq software package (http://iubio.bio.indiana.edu:7131/soft/molbio/readseq/java) was integrated as a component into the proxy. The proxy server performs much of its communication with other entities via the XML Simple Object Access Protocol (SOAP) (http://www.w3.org). We have reused the existing Apache Axis software (http://ws.apache.org/axis) to implement this functionality. Axis is a stable and comprehensive project that provides full SOAP implementation. It is being actively developed and enhanced by the Apache
A.L. Turinsky et al. / Bioinformatics Visualization and Integration with Open Standards
191
community to enable Java-based web service implementation. We also integrated the Skaringa package for Java-XML binding (http://www.skaringa.com) into Bluejay. The Skaringa module can serialize Java objects into XML documents and deserealize XML back into Java objects, is a feature which is used in communication between the Bluejay proxy and Bluejay client. We have implemented a MySQL [Widenius et al., 2002] database with user activity data collected by the proxy. This data is currently used for session management and caching by the proxy. We are also planning to use it to implement personalization. MySQL includes its own module for interfacing Java programs through the Java Database Connectivity (JDBC) API. RESULTS Bluejay offers users a dynamic and highly customizable display of genomic or proteomic data, with sequence landmark sites, annotations, and related information available for visual exploration. Bluejay creates an interactive model for biological sequences that supports several levels of visual manipulations. Sequence-wide operations include splicing, reverse complementing, rotations, and linearizations (for circular microbial DNA). Also available are operations on functional categories and operations on individual elements through direct interaction with the display. Most image elements are linked to other bioinformatics sources, such as MAGPIE gene annotation pages (http://magpie.ucalgary.ca) and BioMOBY services [Wilkinson and Links, 2002]. The general Bluejay interface is shown in Fig. 2. As a result of plugging the Apache Batik SVG visualization module into Bluejay, the visual model in Bluejay is based on SVG (an XML format) and is maintained explicitly in an SVG DOM tree-like structure. Visual SVG elements on the canvas are directly connected to the original data, making the latter responsive to mouse clicks and other GUI events by a user. This rather complex functionality is performed by the mapping between the Batik browser and the core Bluejay data classes, which was established using Java technology conventions and XML standards. Bluejay can export images in SVG format, either in full document mode or in screenshot mode. This allows the images to retain their full quality regardless of the scale, for example, preserving finest details in wall-size posters of genomes. Scalable images created by Bluejay can be visualized and further edited using most graphics packages (e.g. CorelDRAW). Other key features of Bluejay include the semantic zooming, which shows additional image details as the user zooms in, and several types of customizable image granularity. The zooming range in Bluejay is virtually unlimited. Furthermore, the semantic zoom is becoming a required feature for genomic browsers [Loraine and Helt, 2002]. Without the semantic zoom, all a user could get, e.g. by zooming in onto a DNA image, is an enlarged copy of the same image, which is rather meaningless for visual data exploration. Instead, Bluejay internally translates the new SVG zoom scale into a higher level of details, which allows the seamless visualization of finer, previously unseen features of the genomic sequence. The sequence view modes range from a pie-chart summary of the genomic features, to several types of sequence visualization, to a color-coded character view of the sequence. The integration of the TIGR MultiExperiment Viewer toolkit for gene expression analysis [Saeed et al., 2003] added several important features for Bluejay users. As a result of mapping between TIGR MEV data and the Bluejay visual model, the gene expression level is now shown for each gene directly on the image of the DNA. The results of data mining, such as clusters of genes found by K-means clustering, can also be seen directly a genome image, differentiated by different colors. A new feature, which to our knowledge is implemented here for the first time is the display of a time series of gene regulation directly on the genome, which allows the user to see a “movie” of gene activity through time (Fig. 3).
192
A.L. Turinsky et al. / Bioinformatics Visualization and Integration with Open Standards
Fig. 2. Bluejay interface with a complete Escherichia coli K12 genome, XLinks from one of the genes to MAGPIE and BioMOBY, and sample BioMOBY service results.
Given that many biologists have little knowledge of advanced computer technology, ease-of-use is one of the most important considerations in the design of bioinformatics tools. Bluejay comes with several ease-of-use features. It is written in Java TM and is therefore portable to most existing platforms. It requires no tedious expert installation: a new user can run Bluejay off the web, either as a VeriSign -certified secure applet or a Java webstart. Unlike regular web applets, the secure applet allows saving and uploading data files locally. A downloadable application version is also available (http://bluejay.ucalgary.ca), which can utilize more system resources, such as main memory. Downloading just one JAR archive file is sufficient to run the application locally. This contrasts favourably with many Perl-based bioinformatics software tools, which typically require system-dependent configuration. Moreover, Bluejay supports a number of XML standards and requires no pre-configured database
A.L. Turinsky et al. / Bioinformatics Visualization and Integration with Open Standards
193
connection: a data file can be loaded and visualized immediately. Bluejay comes with a ready-to-use data, available through bookmarks. Currently the bookmarks include nearly one hundred fully annotated genomes available at MAGPIE (http://magpie.ucalgary.ca). On-line help is available, including a step-by-step mini-tutorial. Bluejay also provides a password-enabled session management, whereby users may return to their previous data exploration sessions, possibly from a different computational environment. DISCUSSION As the discipline of bioinformatics matures, data integration and tool interoperability are becoming increasingly important in the advancement of research. These efforts are enabled by two related factors: standardization of information exchange and standardization of software interfaces. We demonstrate here that by supporting XML standards and Java interface conventions, we were able to assemble a powerful visualization tool utilizing a number of existing, mutually unrelated components. Moreover, we observe that the Bluejay browser is rather unique. No other existing bioinformatics visualization tool has taken advantage of such a wide range of current XML standards: those for data,visualization, hyperlinks, and web services. With respect to data exchange, XML support is common. XML has become the standard of choice in many areas, including bioinformatics, due to its versatility, simplicity, highly descriptive nature, and the ubiquity of HTTP protocol support [Achard et al., 2001]. A growing number of biological data providers in diverse areas have been committing to XML data view, e.g. [Wang et al., 2002; Orchard et al., 2004]. The Interoperable Informatics Infrastructure Consortium (I3C) was established to facilitate the effort (http://www.i3c.org). In the area of bioinformatics web services and middleware, XML-based systems now dominate over alternatives, such as substantially more complex CORBA or Java RMI middleware architectures. Examples of XML architectures include Distributed Annotation System [Dowell et al., 2001] and BioMOBY [Wilkinson and Links, 2002]. XML Web Service Description Language (WSDL), typically used with XML SOAP messaging, is gaining wide acceptance as the bioinformatics middleware standard of choice. However, adoption of standards for bioinformatics visualization has barely begun. One of the reasons is that even in the broader information technology context, visual data exploration is not yet a mature area. Nevertheless, a growing number of graphics tools support scalable vector graphics (SVG), the recent W3C XML standard for visualization (http://www.w3.org/Graphics/SVG/SVG-Implementations). Similar to XML standards for data and more recently for web services, standards for visualization in bioinformatics will likely follow general trends in information technology, making SVG a de facto bioinformatics standard of choice over the next several years. To the best of our knowledge, currently only six SVG-enabled bioinformatics tools have been created besides Bluejay. While none of them match the power and versatility of Bluejay, it is nevertheless important to discuss their usage of the SVG. Four of these tools – ChromoViz [Kim et al., 2004], ChromoWheel [Ekdahl and Sonnhammer, 2004], Human Protein Reference Database diagram viewer [Peri et al., 2004] and Trial/DB Pedigree Viewer [Fernando et al., 2001] – are rather simple, with very limited visual functionality. They rely almost entirely on the features of the Adobe SVG Viewer, a generic web plug-in, for direct user interaction with the SVG display. Pathway diagrams from the Human Protein Reference Database also contain simple XLinks, which the SVG Viewer plug-in recognizes. The other two visualization tools offer a substantially more advanced utilization of SVG by embedding web scripts into the SVG markup. BioViz [Lewis et al., 2003] uses the Custom Graphical User Interface
194
A.L. Turinsky et al. / Bioinformatics Visualization and Integration with Open Standards
Fig. 3. Gene expression analysis in Bluejay. Up- and down-regulation levels are shown for each gene of Guillardia theta on the inner ring (synthetic data). Also shown is the time series player for a gene activity “movie” (lower left corner).
(CGUI) toolkit, based on Javascript, to create a multi-window interactive environment within the SVG image. While still quite limited in its functionality, it provides the user with several levels of details and textual annotations for two genomes: Arabidopsis thaliana and Brassica napus. Its interactive SVG images are created by Perl CGI scripts on the BioViz server side, where the data is hosted. The Microbial Genome Viewer [Kerkhoven et al., 2004] is another tool that embeds scripts into its SVG image markup. Its features include more precise navigation over the DNA, textual annotations, links to NCBI pages, and several gene coloring methods, including one based on gene expression levels. MGV contains data for twelve public genomes (compared to over 90 genomes viewable with Bluejay from the MAGPIE site http://magpie.ucalgary.ca/). Both BioViz and MGV are available over the web and rely on a web plug-in, such as the Adobe SVG Viewer. A serious drawback of the SVG-enabled tools listed above, compared to Bluejay, is their very limited
A.L. Turinsky et al. / Bioinformatics Visualization and Integration with Open Standards
195
zoom and, more importantly, lack of semantic zoom. While scalability is a defining feature of SVG imagery, reliance on the Adobe SVG Viewer web plug-in limits it to only a short range (a factor of 16 in each zooming direction). By contrast, as discussed above, Bluejay offers a semantic zoom and a virtually unlimited zooming range. MGV also allows the DNA resolution to be set manually through a web form, but unlike the semantic zoom in Bluejay, this MGV feature is both non-visual and has a limited range (between 5 and 50 kbp per image line). Hyperlinks are a standard feature in web-based resources. With SVG, however, links can be embedded persistently into images, which provides a way of relating various pieces of information via a visual model rather than via a web resource. This may become one of the recognized benefits of using SVG standard for bioinformatics visualization. SVG data may also contain elements from XML formats. In Bluejay, interaction with the SVG-enabled canvas is based on Java technology and is coupled with XLink support. Bluejay recognizes not only commonly used simple hyperlinks in the incoming data but also the more powerful extended Xlinks, which may contain multiple links, default links, and third-party links. It also provides several XLink activation regimes. In particular, the use of extended links allows Bluejay to relate visual elements (such as genes on a DNA strand) to several external resources. To the best of our knowledge, Bluejay is currently the only bioinformatics tool that supports the XLink standard to such extent. We also believe that just like the usage of SVG, the usage of advanced XLink features in bioinformatics data will grow. Bluejay is not a web form-based browser and therefore does not embed web scripts into the SVG markup. We observe that many established genomic browsers have a web-based graphics interface, usually through dynamically generated web forms and image maps. Examples include Ensembl Genome Browser [Birney et al., 2004], NCBI Map Viewer [Feolo et al., 2000], TIGR Comprehensive Microbial Resource [Peterson et al., 2001], UCSC Genome Browser [Karolchik et al., 2003], and GBrowse [Stein et al., 2002]. The advantage of using these browsers also lies in their immediate access to a wealth of information at each facility. However, compared to Bluejay, they have several limitations: less user control over image appearance, the need for a stable internet connection, lack of a downloadable standalone application version, and lack of data and tool integration features. GBrowse is a componentbased browser, supports BioMOBY services, and can be installed as a front end to other data sources. However, it only creates static PNG web form images and uses non-generic components developed in-house. Most of the SVG-enabled tools discussed above are also web-based, and hence are subject to similar limitations. In contrast, both the web-based and the downloadable versions of Bluejay maintain an SVG-enabled Java canvas provided by Batik, thereby utilizing the power of Java technology for client-side visual interaction. Also, unlike most existing genomic browsers, Bluejay must explicitly relate its data to other knowledge sources through XLinks. This may be considered a limitation. On the other hand, it provides an explicit standards-based mechanism for integrating a wider range of data and services, including BioMOBY, which makes Bluejay more generic. As the development of XML technology shows, bioinformatics standards generally follow broader trends. By supporting open standards, we were able to take advantage of many cross-disciplinary software modules created outside of bioinformatics field. Component reuse is an established technique in software engineering. It shortens development time, reduces code interdependencies, and encourages developers to optimize component interfaces. Software components typically take on a form of code libraries with a well-defined public interface. Some of the best known examples in bioinformatics are the Bio* libraries: BioJava, BioPerl, BioPython and BioRuby [Mangalam, 2002]. Nevertheless, we have demonstrated that a variety of existing non-bioinformatics software may also be readily reused as components in bioinformatics data visualization. We suggest, parenthetically, that this practice ought to
196
A.L. Turinsky et al. / Bioinformatics Visualization and Integration with Open Standards
expand, for two reasons. First, it may allow the bioinformatics community to influence the development of such cross-disciplinary components, minimizing the required in-house alterations during component assembly. Second, reuse of high-quality generic modules, such as the Apache XML tools, will help standardize software interfaces in bioinformatics. It has also been noted that support of open community standards helps prevent the commercial domination of the field through proprietary standards [Chicurel, 2002], an important issue in Genomics and Proteomics. CONCLUSION The choice of a programming language is an important factor for software reuse. By supporting Java technology, we were able to assemble Bluejay from a number of portable components, with welldocumented Javadoc interface and usage conventions, that provided extensive XML support and powerful visualization features. A growing number of XML processing packages that conform to Java API for XML Processing (JAXP) and W3C XML specifications are being written. Other related developments include BioJava library, JavaBeans mechanism and Java Community ProcessSM program. From a user’s perspective, Bluejay offers several visual data exploration features that are novel, in addition to other features that are required of modern genomic browsers. The Bluejay visualization is highly customizable, through several types of granularity and sequence manipulation levels. Every element on its visual model is responsive to GUI events. Most visual elements are linked to other bioinformatics sources, such as BioMOBY web services (http://www.BioMOBY.org/) and MAGPIE annotation pages (http://magpie.ucalgary.ca/). The ultimate goal of the Bluejay project is to provide a versatile and easy-to-use genomic browser to the bioinformatics community, which is capable of integrating a large number of data types and links to many external resources. ACKNOWLEDGEMENTS We wish to thank Jeroen Keijser for contributions to early versions of Bluejay, and Praveen Agrawal for his contribution to the implementation of session management. This work was supported by Genome Canada through Genome Prairie’s Integrated and Distributed Bioinformatics Platform Project, as well as by The Alberta Science and Research Authority, Western Economic Diversification, The Alberta Network for Proteomics Innovation, and the Canada Foundation for Innovation. Bluejay is available to the general public as a free, open source software project. The latest downloadable versions, source code and other information is available at the main Bluejay website (http://bluejay.ucalgary.ca). At the time of this writing, the next stable version of Bluejay is scheduled for release by January 1, 2005. REFERENCES • Achard, F., Vaysseix, G. and Barillot, E. (2001). XML, bioinformatics and data integration. Bioinformatics 17, 115-125. • Birney, E., Andrews, D., Bevan, P., Caccamo, M., Cameron, G., Chen, Y., Clarke, L., Coates, G., Cox, T., Cuff, J., Curwen, V., Cutts, T., Down, T., Durbin, R., Eyras, E., Fernandez-Suarez, X. M., Gane, P., Gibbins, B., Gilbert, J., Hammond, M., Hotz, H., Iyer, V., Kahari, A., Jekosch, K., Kasprzyk, A., Keefe, D., Keenan, S., Lehvaslaiho, H., McVicker, G., Melsopp, C., Meidl, P., Mongin, E., Pettett, R., Potter, S., Proctor, G., Rae, M., Searle, S., Slater, G., Smedley, D., Smith, J., Spooner, W., Stabenau, A., Stalker, J., Storey, R., Ureta-Vidal, A., Woodwark, C., Clamp, M. and Hubbard, T. (2004). Ensembl 2004. Nucleic Acids Res. 32, D468-D470.
A.L. Turinsky et al. / Bioinformatics Visualization and Integration with Open Standards
197
• Chen, I. M., Kosky, A. S., Markowitz, V. M., Szeto, E. and Topaloglou, T. (1998). Advanced query mechanisms for biological databases. Proc. Int. Conf. Intell. Syst. Mol. Biol. 6, 43-51. • Chicurel, M. (2002). Bioinformatics: bringing it all together. Nature 419, 751, 753, 755. • Chung, S. Y. and Wong, L. (1999). Kleisli: a new tool for data integration in biology. Trends Biotechnol. 17, 351-355. • Dowell, R. D., Jokerst, R. M., Day, A., Eddy, S. R. and Stein, L. (2001). The distributed annotation system. BMC Bioinformatics 2, 7. • Ekdahl, S. and Sonnhammer, E. L. (2004). ChromoWheel: a new spin on eukaryotic chromosome visualization. Bioinformatics 20, 576-577. • Feolo, M., Helmberg, W., Sherry, S. and Maglott, D. R. (2000). NCBI genetic resources supporting immunogenetic research. Rev. Immunogenet. 2, 461-467. • Fernando, S. K., Brandt, C. and Nadkarni, P. (2001). Generation of pedigree diagrams for web display using scalable vector graphics from a clinical trials database. Proc. AMIA Symp. 174-178. • Harris, M. A., Clark, J., Ireland, A., Lomax, J., Ashburner, M., Foulger, R., Eilbeck, K., Lewis, S., Marshall, B., Mungall, C., Richter, J., Rubin, G.M., Blake, J. A., Bult, C., Dolan, M., Drabkin, H., Eppig, J. T., Hill, D. P., Ni, L., Ringwald, M., Balakrishnan, R., Cherry, J. M., Christie, K. R., Costanzo, M. C., Dwight, S. S., Engel, S., Fisk, D. G., Hirschman, J. E., Hong, E. L., Nash, R. S., Sethuraman, A., Theesfeld, C. L., Botstein, D., Dolinski, K., Feierbach, B., Berardini, T., Mundodi, S., Rhee, S. Y., Apweiler, R., Barrell, D., Camon, E., Dimmer, E., Lee, V., Chisholm, R., Gaudet, P., Kibbe, W., Kishore, R., Schwarz, E. M., Sternberg, P., Gwinn, M., Hannick, L., Wortman, J., Berriman, M., Wood, V., de la Cruz, N., Tonellato, P., Jaiswal, P., Seigfried, T. and White, R.; Gene Ontology Consortium (2004). The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32 Database issue: D258-D261. • Kaminski, N. and Friedman, N. (2002). Practical approaches to analyzing results of microarray experiments. Am. J. Respir. Cell Mol. Biol. 27, 125-132. • Karolchik, D., Baertsch, R., Diekhans, M., Furey, T. S., Hinrichs, A., Lu, Y. T., Roskin, K. M., Schwartz, M., Sugnet, C. W., Thomas, D. J., Weber, R. J., Haussler, D. and Kent, W. J. (2003). The UCSC Genome Browser Database. Nucleic Acids Res. 31, 51-54. • Kerkhoven, R., Van Enckevort, F. H., Boekhorst, J., Molenaar, D. and Siezen, R. J. (2004). Visualization for genomics: the Microbial Genome Viewer. Bioinformatics 20, 1812-1814. • Kim, J., Chung, H. J., Park, C. H., Park, W. Y. and Kim, J. H. (2004). ChromoViz: visualization of gene expression data onto chromosomes using scalable vector graphics. Bioinformatics 20, 1191-1192. • Lewis, C. T., Sharpe, A. G., Lydiate, D. J. and Parkin, I. A. P. (2003). The Brassica / Arabidopsis comparative genome browser: a novel approach to genome browsing. J. Plant Technology 5, 197-200. • Loraine, A. E. and Helt, G. A. (2002). Visualizing the genome: techniques for presenting human genome data and annotations. BMC Bioinformatics 3, 19. • Mangalam, H. (2002). The Bio* toolkits – a brief overview. Brief Bioinf. 3, 296-302. • Orchard, S., Hermjakob, H., Julian, R. K. jr., Runte, K., Sherman, D., Wojcik, J., Zhu, W. and Apweiler, R. (2004). Common interchange standards for proteomics data: Public availability of tools and schema. Proteomics 4, 490-491. • Peri, S., Navarro, J. D., Kristiansen, T. Z., Amanchy, R., Surendranath, V., Muthusamy, B., Gandhi, T. K., Chandrika, K. N., Deshpande, N., Suresh, S., Rashmi, B. P., Shanker, K., Padma, N., Niranjan, V., Harsha, H. C., Talreja, N., Vrushabendra, B. M., Ramya, M. A., Yatish, A. J., Joy, M., Shivashankar, H. N., Kavitha, M. P., Menezes, M., Choudhury, D. R., Ghosh, N., Saravana, R., Chandran, S., Mohan, S., Jonnalagadda, C. K., Prasad, C. K., Kumar-Sinha, C., Deshpande, K. S. and Pandey, A. (2004). Human protein reference database as a discovery resource for proteomics. Nucleic Acids Res. Database issue: D497-D501. • Peterson, J. D., Umayam, L. A., Dickinson, T., Hickey, E. K. and White, O. (2001). The Comprehensive Microbial Resource. Nucleic Acids Res. 29, 123-125. • Saeed, A. I., Sharov, V., White, J., Li, J., Liang, W., Bhagabati, N., Braisted, J., Klapa, M., Currier, T., Thiagarajan, M., Sturn, A., Snuffin, M., Rezantsev, A., Popov, D., Ryltsov, A., Kostukovich, E., Borisovsky, I., Liu, Z., Vinsavich, A., Trush, V. and Quackenbush, J. (2003). TM4: a free, open-source system for microarray data management and analysis. Biotechniques 34, 374-378. • Sch¨onbach, C., Kowalski-Saunders, P. and Brusic, V. (2000). Data warehousing in molecular biology. Brief Bioinform. 2, 190-198. • St Laurent, S. (1998). XLinkFilter: An Open Source Java XLink SAX Parser Filter. Avail. at http://www.simonstl.com/ projects/xlinkfilter. • Stein, L. (2002). Creating a bioinformatics nation. Nature 417, 119-120. • Stein, L. D., Mungall, C., Shu, S., Caudy, M., Mangone, M., Day, A., Nickerson, E., Stajich, J. E., Harris, T. W., Arva, A. and Lewis, S. (2002). The generic genome browser: a building block for a model organism system database. Genome Res. 12, 1599-1610. • Wang, L., Riethoven, J. J. and Robinson, A. (2002). XEMBL: distributing EMBL data in XML format. Bioinformatics 18, 1147-1148.
198
A.L. Turinsky et al. / Bioinformatics Visualization and Integration with Open Standards
• Widenius, M., Axmark, D. and MySQL AB (2002). MySQL Reference Manual. O’Reilly & Associates, Sebastopol, CA. • Wilkinson, M. D. and Links, M. (2002). BioMOBY: an open source biological web services proposal. Brief. Bioinform. 3, 331-341.