SimVIZ – A Desktop Virtual Environment for ... - Semantic Scholar

5 downloads 68240 Views 471KB Size Report
are applying Desktop Information-Rich Virtual Environments (Desktop ... Our approach is based on Desktop IRVE's [1], where the users associate infor-.
SimVIZ – A Desktop Virtual Environment for Visualization and Analysis of Protein Multiple Simulation Trajectories Ricardo M. Czekster and Osmar Norberto de Souza Laborat´ orio de Bioinform´ atica, Modelagem e Simula¸ca ˜o de Biossistemas - LABIO, PPGCC - FACIN, PUCRS, Av. Ipiranga, 6681. Pr´edio 16 - Sala 106, 90619-900, Porto Alegre, RS, Brasil [email protected]

Abstract. In silico protein conformation simulation generates massive amounts of data which needs to be properly visualized and analyzed. We are applying Desktop Information-Rich Virtual Environments (Desktop IRVE’s) techniques and concepts to aid multiple trajectory simulation analysis, improving user experience and developing a problem-solving environment to help the decision making process. We will present SimVIZ, a tool which integrates visualization to simulation analysis, improving previous knowledge about trajectories. This environment shows informative panels, Contact Maps, RMSD charts, the Ramachandran Plot and a Parallel Coordinate multidimensional visualization of simulation output in a single rendering scene. SimVIZ also opens multiple trajectories along with user associated information concerning many aspects of the simulation. SimVIZ is an integrated problem solving environment of multiple trajectories of protein simulations, offering various kinds of analysis and visualization tools used by the community to validate protein structures or to gather a better understanding of the protein folding process.

1

Introduction and Motivation

When performing in silico protein conformation simulation, it is very common to produce several trajectories, each one generated by a set of parameters, often generating different outputs. This brings a very intriguing challenge to visualization system designers: how to convey and extend available information on a rendering scene in a simple and meaningful manner. Our approach is based on Desktop IRVE’s [1], where the users associate information within a trajectory, step of simulation or amino acid. We are proposing a visualization-analysis environment offering numerous analysis options along with multidimensional visualization techniques for the simulation outputs. The available visualization and analysis tools are mutually exclusive, meaning that specific tasks are done by specific software. Our idea is to build a single environment where this problem is minimized, so users can benefit from the analysis experience and improve their knowledge and decisions. M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3980, pp. 202–211, 2006. c Springer-Verlag Berlin Heidelberg 2006 

SimVIZ – A Desktop Virtual Environment for Visualization

203

The environment, which we called SimVIZ (as in Simulation VIsualiZation), is an attempt to integrate information visualization techniques with simulation trajectory analysis tools, aggregating and extending it with user information and insights about the events. A few protein visualization representations were implemented, and a protein Contact Map [2], the Ramachandran Plot [3] and RMSD Plot (Root Mean Square Deviation) [4, 5]. All graphics are drawn inside the scene, as well as their associated information (written on transparent panels). This paper is divided as follows: Section 2 discuss the theoretical aspects such as visualization, virtual environments, simulation analysis and previous related works. Section 3 lists the modules and features of the environment and presents the achieved results, showing SimVIZ and its visualization/analysis combinations. Section 4 discusses final considerations and future perspectives.

2

Theoretical Aspects

Visualization serves multiple purposes, but the most important one is that it comprehends a series of tasks aiding the decision making process about data [6, 7]. Visualizing raw data is often easier when insightful images are being used to convey the results [8]. Among its benefits we can mention its effectiveness when representing data using interesting and relevant visual techniques [9]. 2.1

Visualization and Desktop IRVE’s

Schneiderman et al [10] introduced the concept of what is known as Visual Information-Seeking Mantra: Overview first, zoom and filter, details-on-demand. This is the main set of rules visual system designers should follow in order to build and implement meaningful information visualization software. Firstly, all data is displayed in the scene so users can overview it. Then, through zooming and filtering, users can remove irrelevant objects/data from the scene. Lastly, detailson-demand dictates that the system should indicate all the possible details about a certain object. Another relevant aspect of visualization concerns the multidimensional nature of the data, varying from 1-dimensional (linear data), 2-dimensional (maps, sizes, temperatures), 3-dimensional (real world objects), temporal (data varying through time), multi-dimensional (defined by n-attributes) and tree (hierarchical data). Multidimensional visualization issues are a current topic of research, along with how users detects patterns and relations inside the data [10]. One popular multi-dimensional technique is known as Parallel Coordinates [11, 12]. Single 1-dimensional vertical lines are drawn in a rendering scene, representing an attribute of the data. Lines connect the attributes to create the final representation. This kind of visualization could be used to detect the variability throughout the dataset. Another multidimensional visualization technique, known as Glyphs [13, 14], are iconic representations of attributes, highlighting patterns or anomalies within the data, when seen in totality. This approach has unique characteristics such

204

R.M. Czekster and O.N. de Souza

as positioning and deciding which icons or shapes will convey more meaning to each dimension. IRVE’s are a combination of a VE or Virtual Environment (a synthetic, spatial world seen from a first-person point of view) and Information Visualization [1, 6]. Its main objective is to enhance virtual worlds with abstract information. Desktop IRVE’s are VE’s executing in a desktop computer screen and without any specific equipments to be operated (as those of Virtual Reality, for instance). Bowman et al [1] defined the theoretical aspects when using IRVE’s, such as concerns about abstract information positioning, association and classifications for text layout [1, 15]. 2.2

Proteins and Multiple Simulation Trajectories Analysis

Proteins are biological macromolecules formed by amino acids and perform specific functions inside the cell [3]. Proteins are classified through their hierarchical structure: primary structure (the amino acid sequence), secondary structure (α-helix, β-sheet, turns, coils), tertiary structure (combinations of secondary structure elements, forming a three-dimensional structure) and quaternary (combinations of tertiary structures) [3]. In order to either refine structures or to study the protein folding process, simulation methods are used [4]. Simulation using the method of Molecular Dynamics, or MD, computes atomic trajectories by numerically solving Newton’s equations of motion [16]. MD simulations are used for evaluation and quantification of protein folding as well as to search for energetically favorable structures. MD can predict thermodynamical properties where experimental data does not exist, are hard or even uncertain to obtain. Another practical aspect is related to system equilibrium, by selecting which parameters will determine simulation abortion or continuation. These properties include energy, temperature, pressure and structural behaviour [4]. In the context of this work, we are using AMBER software as force field [17] and the software ptraj to process the output for MD simulations which will serve as input to our environment. This application, a command line tool distributed with AMBER toolkit, is used to extract trajectory information [17]. A simulation comprehends different phases, starting with initial parameters setup to output generation and analysis. The purpose is to calculate protein thermodynamical properties and atom positioning by extracting, filtering and interpreting the massive output from the simulation trajectory data. A distinctive list of software and tools are used to help interpretation and analysis, each one of them are responsible for specific tasks. A common set of useful analysis are the Ramachandran Plot [3], the Contact Map [2] and the RMSD (Root Mean Square Deviation) Plot [4]. The Ramachandran Plot represents the φ and ψ angles that represent allowed or prohibited combinations of a conformation. This plot informs, for all amino acids, which conformations are possible and those structurally improbable. On the other hand, the Contact Map is a map of amino acid distances, used to discover protein structure formation, informing the occurrence of con-

SimVIZ – A Desktop Virtual Environment for Visualization

205

tacts among each other [2]. Finally, the RMSD Plot contains a distance between the atoms from the simulated conformation and a Reference Structure - RS, which is a structure used for comparison, often experimentally determined [4]. 2.3

Related Works

This section discusses protein visualization tools, IRVE’s and trajectory analysis software. Among the most commonly open-source protein visualization tools used nowadays, we highlight VMD - Visual Molecular Dynamics [18] and PyMOL [19]. VMD [18] is a powerful tool for visualization and analysis of biological systems, such as proteins, nucleic acids and lipids. The software authors used OpenGL for graphical aspects. We emphasize scripting capabilities and plugins installation among its main features. This software can be used for molecular dynamics interactive studies and trajectory analysis. PyMol [19] is a graphical presentation system for molecules, with a Python command line interpreter, mainly developed for real time visualizations and high quality images generation. Its main features include optimized three-dimensional structures visualization, eight different molecular representations, molecular superimposition and animation. A sample of VE and Information Visualization, previous to IRVE’s, is VDV or Virtual Data Visualizer [20]. This VE is a toolkit for exploratory data visualization. VDV manipulates multiple data sources relying on Glyphs to inform data characteristics. This toolkit is a good example of VE and Information Visualization integration. Another environment that is worth noticing is PathSim Visualizer [21], specific for analysis and visualization of pathogen agents simulation results. Its main objective is to associate relevant abstract information and annotations such as texts, links and audio to a rendering scene, intensifying user experiencing about data. Its major drawback is the fact that it solves particular pathogenic simulation problems.

3

The SimVIZ Environment

The goal of SimVIZ is to provide a coherent environment centered on visualization and simulation analysis. To achieve this intent, we implemented a basic protein viewer with four visualization formats: Lines, Bonds, VDW and CPK. SimVIZ allows changing lines and cylinders heights, widths and definition. Also, it is possible to change colors by amino acid name, by chain, by structure and by chemical element. We divided the environment in three different phases: a) Data Input and Association, b) Data Processing, Extraction and Mapping and c) Data Presentation. In a), we are considering all input from the user, for example, a full trajectory and its output, as well as user interaction. The next phase comprehends some processing to build amino acid topology and extract relevant information

206

R.M. Czekster and O.N. de Souza

within data such as distances between amino acids and simple counting and statistics. The final phase is responsible for determining which information is seen at the rendering scene and creating the visual elements such as charts, plots, abstract textual information and visual protein structure representations. The SimVIZ data flow is described as follows: when a dataset is opened, containing a trajectory, the system iterates from the first to the last step, opening the coordinate file (PDB) [22] and the output file (OUT). After the reading process, we build all amino acid topology and run STRIDE [23] (get the φ, ψ and the secondary structure) for each conformation. Then, we calculate simple counting (number of amino acids and atoms) in order to build graphical mappings, presenting it in the rendering scene, according to user defined interface selections. In addition to RS data, users can associate abstract information for the trajectory, to a given step of simulation and for each step, information related to single amino acids. This information is saved for further analysis. We also allow mapping of multiple RMSD files to the trajectory. SimVIZ will draw a single plot containing all files, defining different colors for each one. The SimVIZ implementation is based on Desktop IRVE’s concepts and characteristics to present abstract output simulation data and textual information concerning multiple trajectories. We used informative panels, textual output taxonomies and visual choices while adding charts, countings, statistics and different analysis plots, such as colors, fonts and transparencies. As we mentioned before, besides the files we add to the scene such as RMSD and RS, users can associate abstract information in three elements: a) the trajectory (world-fixed), b) individual steps (display-fixed) and c) individual amino acids of individual steps (object-fixed) [1]. We also defined regions for textual abstract information containing simulation output data. The upper region in the rendering scene shows transparent information panels with AMBER’s parameters for Resource Use and Control Data. This panels shows dynamic information, based on the current simulation step. The lower region shows the Outputs of AMBER, the energies, temperature and pressure for a given conformation. The right side of the rendering scene lists all conformation amino acids and the secondary structure for each one, using STRIDE’s one letter code. One relevant aspect is creation concerning the panels, the plots and the charts. They are all transparent, so users can still see the structural behavior along the simulation trajectory run. On the center, we show the rest of the charts, plots, countings and statistics. Also, we draw the Ramachandran Plot, the Parallel Coordinates and the Contact Map. Some visual results are shown in Section 3.2. 3.1

Implementation Details and Main Features

SimVIZ is implemented with C/C++ for the core, FLTK [24] for the Graphical User Interface and OpenGL [25] for graphics. To determine secondary struc-

SimVIZ – A Desktop Virtual Environment for Visualization

207

ture formation throughout the trajectory and for the φ, ψ angles, we integrated STRIDE [23] on SimVIZ. We are combining the use of fonts for drawing the atoms and amino acids names with pure OpenGL text primitives and PLIB’s font library [26]. This library is very useful to represent diverse font types, colors and sizes. The differential features for the environment are: opens multiple simulation trajectories; allows other relevant abstract information to be associated with the trajectory, the conformation and individual amino acids; offers simple, but useful representations formats such as Lines, Bonds, VDW and CPK; is integrated with STRIDE; colors by various options; scaling, rotating and translating; simulation controls (play, goto, stop); use a Parallel Coordinates multidimensional representation for output data; offers charts, RMSD’s plots, Ramachandran Plot and Contact Maps for all trajectory conformations as well as for RS (if added by users). 3.2

Experiment and Visual Results

In order to illustrate SimVIZ, we used a dataset containing 100 nanoseconds of a MD simulation trajectory for a small 33 amino acid protein known as Helical Hairpin (or HH). The protein structure with PDB code 1ZDB is used as a Reference Structure (RS), for comparison. This study produced 100 coordinate files in PDB format, one for every 1 nanosecond, with 100 output files in the OUT format, containing analysis data for the simulation run.

Fig. 1. Backbone conformation, amino acids, outputs and Parallel Coordinates

208

R.M. Czekster and O.N. de Souza

Fig. 2. All-atom conformation, parameters, outputs and the Ramachandran Plot

Fig. 3. RMSD data, outputs and counting

SimVIZ – A Desktop Virtual Environment for Visualization

209

SimVIZ opens user produced AMBER output files, normally using ptraj’s options. The more details it provides, more information is presented and more reliable inferences can be drawn about the simulation run. The environment is prepared to open whatever output data is produced, to the memory limit. An issue observed while implementing the environment is related to simultaneous visualization massive output data pertaining each conformation. To solve these occlusion problems, SimVIZ opens multiple trajectories, on multiple rendering windows. For example, if the same trajectory is opened twice, multiple analysis tools are presented, without occlusion, enhancing visualization and interpretation. Each new rendering window sets its own visual options, so users can view different analysis aspects on different windows, about different (or same) trajectories. These feature enhance and extend analysis possibilities, showing multiple kinds of information about the same trajectory. We can see in the Figure 1 the protein conformation for a given simulation step containing, in the upper region, the Resource Use, in the lower region, the Output, in the right, the amino acid list plus the secondary structure, and, in the center of the rendering scene, the Parallel Coordinates for 8 dimensions of the output data and the conformation using the format Lines, for the backbone atoms only. The all-atom conformation, colored by the Backbone, with the Ramachandran Plot and simulation parameters are shown in the Figure 2. The Ramachandran Plot’s coloring follows the color selected for the conformation. Figure 3 shows 4 files generated by ptraj containing RMSD data. The system assigns a different color for each file, with the corresponding subtitle. We are

Fig. 4. Protein conformation, Contact Map, abstract textual information and outputs

210

R.M. Czekster and O.N. de Souza

showing the counting of amino acids and atoms with simulation output. In this example, the conformation is not presented at the scene, clearing the rendering window. Figure 4 shows the conformation, the Contact Map, simulation output and abstract related information concerning the volume (on the upper left) and an individual step (on the upper right). The structure is designed as Lines, colored by chemical element.

4

Conclusions, Perspectives and Acknowledgements

There is a need in the structural biology community for high-quality, insightful visualization software. Visualization offers meaningful techniques to help researchers perform better analysis through visual inspection. On the other hand, there should be a concern about simulation quality when multiple trajectories are produced. This work is committed to this aspect, presenting a graphic platform where users can learn more about their simulation. We develop a simulation annotation environment, combining visualization and analysis features. This work focused on visualization and analysis of related abstract information for improving previous knowledge about trajectories. Our next tasks includes adding more classic representations such as Ribbons and developing other user interaction techniques such as Semantic Zooming and 3D User Interfaces. We also intend to research about integrating other multidimensional visualization techniques such as Glyphs. We thank FAPERGS and CAPES for financial support. Also Dr. Marcio Pinho and Regis Kopper for helpful discussions.

References 1. Bowman, D. A., North, C., Chen, J., Polys, N. F., Pyla, P. S., Yilmaz, U.: Informationrich virtual environments: theory, tools, and research agenda. VRST ’03: Proceedings of the ACM symposium on Virtual reality software and technology, 2003, pp. 81-90. 2. Vendruscolo, M., Domany, E.: Efficient dynamics in the space of contact maps. Folding and Design, vol. 3(5), 329–336, 1998. 3. Branden, C., Tooze, J.: Introduction to protein structure. Garland, 1999. 4. Leach, A. R.: Molecular Modeling: Principles and Applications. Person Education, 2001. 5. van Gunsteren, W. F., Mark, A. E.: Validation of molecular dynamics simulation. J. Chem. Phys., v. 108, p. 6109–6116, 1998. 6. Ware, C.: Information Visualization: Perception for Design. Morgan Kaufman, 2000. 7. Ma, Kwan-Liu: Visualization - A Quickly Emerging Field. ACM SIGGRAPH Computer Graphics Quarterly, 2004, vol. 38, number 1, pp. 4-7. 8. Vailaya, A., Bluvas, P., Kincaid, R., Kuchinky, A., Creech, M., Adler, A.: An Architecture for Biological Information Extraction and Representation. Bioinformatics, 2005, vol. 21, pp. 430-438.

SimVIZ – A Desktop Virtual Environment for Visualization

211

9. de Oliveira, M. C. F., Levkowitz, H.: From visual data exploration to visual data mining: a survey IEEE Transactions on Visualization and Computer Graphics, vol. 9(3), 378–394, 2003. 10. Shneiderman, B.: The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations. VL ’96: Proceedings of the 1996 IEEE Symposium on Visual Languages, 1996, pp. 336-343. 11. Inselberg, A., Dimsdale, B.: Parallel Coordinates: A Tool for Visualizing Multidimensional Geometry. Proceedings of IEEE Visualization, 1990, pp. 361-375. 12. Fua, Y. -H., Ward, M. O., A. Rundensteiner: Hierarchical Parallel Coordinates for Exploration of Large Datasets. VISUALIZATION ’99: Proceedings of the 10th IEEE Visualization 1999 Conference (VIS ’99), 1999. 13. Ribarsky, W., Ayers, E., Eble, J., Mukherjea, S.: Glyphmaker: Creating Customized Visualizations of Complex Data. Computer, 1994, vol. 27, number 7, pp. 57-64. 14. Ward, M. O.: A taxonomy of glyph placement strategies for multidimensional data visualization. Information Visualization, 2002, vol. 1, number 3, pp. 194-210. 15. Polys, N. F., Bowman, D. A.: Design and display of enhancing information in desktop information-rich virtual environments: challenges and techniques. Virtual Reality, 2005, vol. 8, pp. 41-54. 16. van Gunsteren, W. F., Berendsen, H. J. C.: Computer Simulation of Molecular Dynamics: Methodology, Applications and Perspectives in Chemistry. Angewandte Chemie International Edition in English, v. 29, p. 992.1023, 1990. 17. Pearlman, D. A., Case, D. A., Caldwell, J. W., Ross, W. S., Cheatham III, T. E., DeBolt, S., Ferguson, D., Seibel, G., Kollman, P.: AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules. vol. 91, 1–41, 1995. 18. Humphrey, W., Dalke, A. and Schulten, K.,: VMD - Visual Molecular Dynamics. J. Molec. Graphics, 1996, vol. 14, pp. 33-38. 19. DeLano, W. L.: The PyMOL Molecular Graphics System. DeLano Scientific, San Carlos, CA, USA. 2002. 20. van Teylingen, R., Ribarsky, W., van der Mast, C.: Virtual Data Visualizer. IEEE Transactions on Visualization and Computer Graphics, 1997, pp. 65-74. 21. Polys, N. F., Bowman, D. A., North, C., Laubenbacher, R., Duca, K.: PathSim visualizer: an Information-Rich Virtual Environment framework for systems biology. Web3D ’04: Proceedings of the ninth international conference on 3D Web technology, 2004, pp. 7-14. 22. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N., Bourne, P. E.: The Protein Data Bank. Nucleic Acids Research, 2000, vol. 28, pp. 235-242. 23. Frishman, D., Argos, P.: Knowledge-based protein secondary structure assignment. Proteins, vol. 23(4), 566–579, 1995. 24. Fast Light Toolkit (FLTK). Accessed on November, 2005. Available on http://www.fltk.org/ 25. OpenGL - The Industry Standard for High Performance Graphics. Accessed on November, 2005. Available on http://www.opengl.org/ 26. A Font Library for OpenGL. Accessed on November, 2005. Available on http://plib.sourceforge.net/fnt/index.html

Suggest Documents