ProViz : Protein Interaction Visualization and Exploration Tool

4 downloads 1267 Views 46KB Size Report
sualization tools, ideally integrated within a software platform that is itself ... We present a software tool ... teractions of interest, either through keyword search or.
ProViz : Protein Interaction Visualization and Exploration Tool 

David Auber





, Florian Iragne



, Bertrand Mathieu



, Macha Nikolski



and David Sherman

LaBRI UMR 5800, Universit´e Bordeaux 1, 351 cours de la Lib´eration, 33405 Talence Cedex, France [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract Summary : ProViz is a tool for the visualization of Protein-Protein Interaction Networks that provides facilities to navigate through large graphs, as well as biologically relevant exploration features. Availability : ProViz is a standalone program available under the General Public Licence (GPL) and is freely downloadable at “ftp adress”. Contact : [email protected]

Introduction Analysis of protein-protein interaction (PPI) networks requires a combination of algorithmic and visualization tools, ideally integrated within a software platform that is itself integrated with access to local and distant data banks. We present a software tool called ProViz that provides highly-interactive visualization of large networks of interactions, integrated with data model under development in the IntAct[4] european project. ProViz is similar in purpose to PIMrider from Hybrigenics [5], and Osprey from the GRID project [2].

Overview of ProViz Graph drawing and interactive graph exploration are well-studied and active domains in computer science and many tools are available for this task. Adaptation of these tools and techniques to the specific needs of biologists exploring PPI networks is a current effort in bioinformatics. The challenge is to add valuable information and functions that enables the user to discover interesting biological relations hidden within the data. Intended use Integration of controlled vocabularies and understanding of the way that biologists prefer to work is an essential feature of ProViz. It is useful to exploring large graphs for identifying proteins and interactions of interest, either through keyword search or through analysis of the combinatorial structure of the network; for comparing graphs from different strains or species over orthologous sets of genes; for extracting views and subgraphs for further analysis; and for

clustering related proteins and interactions. ProViz is highly interactive, providing screen updates within 50 ms on standard workstations while manipulating graphs with a million elements. User interface The ProViz screen is intentionally uncluttered (figure 1). The right half of the screen displays the current view of the current graph; the different views available are selected through the use of tabs above the window. Above this window in the tool bar are buttons for changing the layout of the current view. The mouse can be used to select elements or to move elements, and the mouse wheel can be used to zoom in or out and to pan the image. Below are buttons for cloning and for closing the view. Layout algorithms The layout algorithms chosen for ProViz were picked both for their performance and for their capacity to highlight biologically pertinent information. GEM [3] is an efficient version of a directed force-based graph drawing algorithm. It groups related nodes and can be used to quickly identify proteins with a given role, or for visualizing protein complexes. Hierarchical layout [7] tries to reveal the ancestral relationship among nodes clearly and unambiguously. This approach is useful when looking for cascade-type interactions or mapping the interaction networks against metabolic pathway data. Circular layout [6] places all nodes on the periphery of a single circle in a single plane. It does not attribute any semantics to edge relations, and is hence a neutral choice. Integrating biological attributes The left half of the screen is used for displaying information about element attributes. Four tabs are available in this section: Views, for viewing properties about existing views; Node Ontology, for selecting proteins based on GO terms; Edge Ontology, for selecting interactions based on controlled vocabularies; and Properties, for viewing the complete set of properties associated with a node or edge element. In figure 1 we see the property list for the node corresponding to yeast Dss4 (a GDP dissociation factor), including GO evidence, gene names, and external links.

Figure 1. Two views of a PPI graph for yeast, highlighting protein Dss4. On the left is a view obtained by filtering the graph on GO terms, on the right is the same subgraph, with the GEM layout applied. Data base access Database access integration in ProViz is a good example of functionality enrichment provided by the plugins mechanism. Most user programs that manipulate PPI data will prefer to use the PSI molecular-interaction XML format developed by the HUPO PSI working group [8]. So does ProViz. PSI files contain four principal parts: (1) proteins, with descriptive information such as GO terms and crossreferences to external databases; (2) experiments, describing the methods and procedures used to establish interactions between proteins; (3) interactions, describing binary and multi-ary interactions between proteins; and (4) availabilities, describing publication, copyright and other intellectual property issues for the data in the file. Tulip Development Platform ProViz development is based on the Tulip platform [9, 1], designed for management and three-dimensional display of large graphs. It provides a rich set of basic services for operations on graphs: metric computation, node and edge layout, selection, extraction of view and subgraphs, and labeling of nodes and edges with arbitrary sets of attributes. Further operations related to the application domain are provided by means of a software plugins. Any program using Tulip can add to the core features by providing its own domain-specific plugins.

Ackowledgements This work is supported by EU grant number QLRICT-2001-00015 under the RDD programme “Quality of Life and Management of Living Resources.”

References [1] D. Auber. Tulip software home page. http://www.tulipsoftware.org, 2002. [2] B.-J. Breitkreutz, C. Stark, and M. Tyers. The GRID: The general repository for interaction datasets. Genome Biology, 4(3), 2003. [3] A. Frick, A. Ludwig, and H. Mehldan. A fast adaptive layout algorithm for undirected graphs. In SpringerVerlag, editor, Proc. Workshop on Graph Drawing 94, volume LNCS 894, pages 389–403, 1994. [4] H. Hermjakob, L. Montecchi-Palazzi, C. Lewington, D. Wu, M. Vingron, B. Roechert, P. Roepstorff, D. Sherman, A. Valencia, H. Margalit, J. Armstrong, and R. Apweiler. Intact – an open standard and software for protein-protein interaction data. http://www.ebi.ac.uk/intact, 2003. [5] P. Legrain, J. Wojcik, and J.-M. Gauthier. Proteinprotein interaction maps: a lead towards cellular functions. Trends in Genetics, 17, 2001. [6] E. M¨akinen. On circular layouts. International Journal of Computer Mathematics, 24:29–37, 1988. [7] E. Messinger, L. Rowe, and R. Henry. A divide and conquer algorithm for the automatic layout of large directed graphs. IEEE Transactions on Systems, Man and Cybernetics, SMC-21(1):1–11, Jan/Feb 1991. [8] S. Orchard, P. Kersey, W. Zhu, L. Montecchi-Palazzi, H. Hermjakob, and R. Apweiler. Progress in establishing common standards for exchanging proteomics data: The 2nd meeting of the HUPO PSI. Comp. and Fun. Genomics, 4(2):203–206, 2003. [9] M. J. Petra Mutzel, editor. Springer Book on Graph Drawing Software, chapter Tulip - A huge graph visualization software. Mathematics and visualization series. Springer-Verlag, 2003 (to appear).