Indented Pixel Tree Browser for Exploring Huge ... - Semantic Scholar

2 downloads 903 Views 599KB Size Report
using node-link diagrams as used by Reingold and Tilford [1]. The objects of a ... are many empty spaces between the graphical primitives which leads to scalability problems for large ..... Reingold, E.M., Tilford, J.S.: Tidier drawings of trees.
Indented Pixel Tree Browser for Exploring Huge Hierarchies Michael Burch, Hansj¨org Schmauder, Daniel Weiskopf VISUS, University of Stuttgart

Abstract. In this paper we introduce the Indented Pixel Tree Browser—an interactive tool for exploring, annotating, and comparing huge hierarchical structures on different levels of granularity. We exploit the indented visual metaphor to map tree structures to one-dimensional zigzag curves to primarily achieve an overview representation for the entire hierarchy. We focus on space-efficiency and simultaneous uncovering of tree-specific phenomena. Each displayed plot can be filtered for substructures that are mapped to a larger space and hence, unhide more finegranular substructures that are hidden in the compressed overview. By representing tree structures side-by-side, the viewer can easily compare them visually and detect similar patterns and also anomalies. In our approach, we follow the information seeking mantra: overview first, zoom and filter, then details-on-demand. More interactive features such as expanding and collapsing of nodes, applying different color codings, or distorting the tree horizontally as well as vertically support a viewer when exploring huge hierarchical data sets. The usefulness of our interactive browsing tool is demonstrated in a case study for the NCBI taxonomy that contains 324,276 species and organisms that are hierarchically organized.

1

Introduction

Data sets containing hierarchically organized elements exist in a variety of forms. Today’s software systems are huge and may consist of many thousands of files in a hierarchically structured file system. Such a hierarchy may contain millions of elements if we also take into account all the implemented lines on the source code level or the abstract syntax tree deduced thereof. Another interesting application where hierarchical data occurs is given by phylogenetic analyses that are applied to compute an evolutionary tree of life. The produced hierarchical data sets are oftentimes very large and an exploration of the textual data needs to be supported by interactive visualizations to uncover interesting insights in the data. The most convenient visual metaphor for depicting parent-child relationships is by using node-link diagrams as used by Reingold and Tilford [1]. The objects of a hierarchy are mapped to circular shapes and direct links between two circles express some kind of subordination, see Figure 1 (a). By exploiting the law of good continuation of Gestalt theorists [2] an effective and efficient interpretability of a tree structure via the link information can be achieved. Several node-link tree layouts have been developed that focus on making the tree structure clearer and on showing symmetries. The major drawback of node-link diagrams is the fact that there are many empty spaces between the graphical primitives which leads to scalability problems for large

2

Michael Burch, Hansj¨org Schmauder, Daniel Weiskopf

trees. Icicle plots as used by Kruskal and Landwehr [3] have been developed to have a more space-efficient representation that gives a clear impression about the tree structure but too much space is used to visually map inner nodes, see Figure 1 (b). Treemaps developed by Shneiderman [4] are the most space-efficient diagram type that do not have any information gaps on screen and do not waste visual space for mapping hierarchical elements to the display, see Figure 1 (c). The major drawback of treemap representations is an error-prone interpretability of the tree structure and hence, parent-child relationships for huge hierarchies.

(a)

(b)

(c)

(d)

Fig. 1. An example hierarchy with eleven vertices shown in four possible visual tree metaphors: (a) Node-link diagram. (b) Layered icicle plot. (c) Treemap. (d) Indented Plot.

We build the tree browser proposed in this work on the indented tree plot visual metaphor as used by Burch et al. [5] which has the deciding advantage that huge trees can be displayed in a single static diagram and the tree structure remains visible even

Lecture Notes in Computer Science

3

for deep and huge trees, see Figure 1 (d). Furthermore, the indented plots can be represented in a compressed view with the benefit that the tree structure still remains visible. By using one-dimensional zig zag curves for displaying tree diagrams as in the indented plots we can easily provide interactive features to manipulate and navigate in the original entire plot that serves as an overview representation. The user of the Indented Pixel Tree Browser can easily explore subtrees by selecting subregions in an indented plot. The selected part is mapped to a larger display space below the plot where the selection was applied and hence, more fine-granular substructures of the hierarchy can be made visible. The selection process of subregions can be applied until the deepest level of a tree is reached. The detail view can be used to explore the labelling information and to make annotations based on it. Furthermore, a certain number of selected subregions of the hierarchical data set can be displayed in a side-by-side view and can easily be compared to eachother with the goal to detect similar patterns or anomalies. To illustrate the usefulness of our Indented Pixel Tree Browser we applied it to the NCBI taxonomy [6] containing 324,276 species and organisms. In Section 4 we demonstrate how to use the tool to explore this very huge hierarchical data set and provide some insights that we got by using our browsing tool. Interactive features are also supported by the tool, explained in Section 3.3, and illustrated in the case study in more detail for the real-world biological data set.

2

Related Work

The visualization and exploration of hierarchical structures was in focus of research since ancient times [7, 8] and is still today [9, 10]. The most convenient type of diagram for depicting parent-child relationships is hereby obtained by using the node-link visual metaphor as used by Reingold and Tilford [1]. Showing connectedness for parent-child relationships as explicit links makes use of the law of good continuation of the Gestalt principles [2]. To even obtain more efficient diagrams many variants of the node-link visual metaphor exist apart from the classical approach that places the root node on top of the display space and the child nodes on horizontal layers depending on their depth in the tree. Radial diagrams [11–13] and bubble or balloon layouts [12, 14] have been developed to use the available space more efficiently. As a drawback, a comparison of elements on the same depth in the hierarchy is difficult because of spatial distortions and differently oriented subtrees. Bubble trees suffer from the fact that representative circles on which subtrees are visually encoded become very small even for subtrees with a low depth. Another strategy for displaying node-link diagrams is by displaying the links in an orthogonal way, which means allowing only ninty degree angled bends leading to many parallel lines for huge hierarchies. Node-link tree visualization techniques and more general node-link graph visualizations are typically designed to follow some aesthetic drawing rules as presented by Purchase [15] such as avoiding link crossings or preserving symmetries if those exist. A more space-filling approach to depict hierarchies is by layered icicles as proposed by Kruskal and Landwehr [3] that use stacked rectangles where representatives for inner nodes use as much horizontal space as the sum of the spaces used by all their child elements. The hierarchical structure is clearly discernable but the drawback is the wasted

4

Michael Burch, Hansj¨org Schmauder, Daniel Weiskopf

space for the inner nodes and the fact that borders between inner nodes cannot be uncovered when trees have a high branching factor. Radial variants have been developed to achieve more aesthetically pleasing diagrams and to better exploit the given display space for deeper subtrees. Existing tools that focus on radial layered icicles are the Information Slices by Andrews and Heidegger [16], Sunburst by Stasko and Zhang [17], or InterRing by Yang et al. [18]. A compact and space-filling visualization technique is obtained by using nested rectangular shapes such as treemap representations introduced by Shneiderman [4]. Also there, many variants exist that focus on a better interpretability of the tree structure such as squarified treemaps by Bruls et al. [19], cushion treemaps by van Wijk and van de Wetering [20], or Voronoi treemaps by Balzer et al. [21]. The problem with treemap representations is the fact that they need the full screen to scale for huge trees and hence, a side-by-side representation of a variety of tree structures with different granularities is difficult. Furthermore, compared to Indented Pixel Tree Plots, the run time complexity of the layout and rendering algorithm is much higher in most cases, especially for Voronoi treemaps. In this work we are focusing on a space-efficient representation of hierarchies where the structures and substructures are still interpretable and a side-by-side representation is possible. Hence, we base our hierarchy browsing tool on Indented Pixel Tree Plots proposed and evaluated by Burch et al. [5]. These can be drilled down to very small regions on a display with the benefit that the structure of the hierarchy remains still visible and as little ’ink’ as possible is used to draw such a plot. Furthermore, little space is used in the vertical direction since the hierarchical data is mapped to zigzag curves. The remaining space can be used to display additional information such as detail views of a certain number of subregions of more coarse-granular structures. Our browsing tool is based on the visual information-seeking mantra: overview first, zoom and filter, then details-on-demand [22]. The Information Slices by Andrews and Heidegger [16] exploit the radial layered icicle metaphor and use a similar concept as the Indented Pixel Tree Browser for representing more fine-granular hierarchical substructures. Regions on semi-circular discs can be expanded to obtain a more detailed view of a subhierarchy. However, their approach does not allow a side-by-side comparison of many subhierarchies on several granularity layers. The hyperbolic tree browser [23] is based on radial node-link diagrams and supports several interactive features to explore hierarchical data. As a consequence of using node-link diagrams the technique only scales for a couple of thousand of nodes. The TreeJuxtAposer [24] is used to allow good structural comparisons of large trees but by using orthogonal node-link representations a good overview of huge and deep hierarchies in a single static view is difficult there. Annotations and highlighting are used in their approach to obtain a contextual information. The InterRing browser by Yang et al. [18] exploits the radial layered icicle visual metaphor but it is difficult to browse into very deep and huge trees with the additional goal to have a side-byside view on many subhierarchies at the same time. To the best of our knowledge there isn’t any hierarchy browsing tool that is able to show huge trees completely in a single static view and simultaneously allows for side-by-side comparisons of the substructures supported by interactive features.

Lecture Notes in Computer Science

3

5

Indented Pixel Tree Browser

We introduce the Indented Pixel Tree Browser for exploring huge hierarchical structures. The visualization technique is based on the indented visual metaphor omnipresent in graphical file browsers and pretty printing of source code. The approach benefits from space-efficiency that leaves enough space for comparisons of side-by-side and more fine-granular representations of subregions of the original entire data set. Apart from the entire overview plot many interactive features are supported for manipulating, exploring, and comparing tree structures and substructures on different levels of granularity.

(a)

(b)

(c)

Fig. 2. A hierarchy with 54 vertices represented as: (a) Node-link diagram. (b) Indented tree plot without color coding. (c) Indented tree plot with black to red color coding depending on the depth in the tree. Alternating horizontal gray and white bars indicate hierarchy levels.

3.1

Indented Pixel Tree Plot

We model a hierarchy in the graph-theoretic sense as a tree T = (V, E) where V denotes the set of vertices and E ( V × V denotes the set of directed edges that express parent-child relationships directed from the root to the leaves of the tree. One vertex is designated as the root vertex. Edges are only shown implicitly in an indented tree plot in contrast to node-link diagrams where edges are explicitly drawn by direct links. The root vertex and all inner vertices are mapped to vertically aligned lines, whereas leaf vertices are mapped to single horizontal lines. This asymmetric handling of inner and leaf vertices leads to a better separation of both types of visual hierarchy elements. Parent-child relationships are expressed by indentation of the corresponding geometric shapes with respect to the hierarchy levels of the respective parent and child vertices. Figures 1 (a) and (d) and Figures 2 (a)-(c) illustrate how hierarchical data sets are visually mapped to node-link diagrams and also to indented tree plots without (Figure 2 (b)) and with (Figure 2 (c)) color coding depending on the depth in the tree. 3.2

Selection of Subregions

To tap the full potential of the space-efficient indented tree plots we support interactive features such as selecting subregions and displaying them in a side-by-side representation on different layers starting with the overview layer on top of the display and ending at the detail layer closest to the bottom of the display.

6

Michael Burch, Hansj¨org Schmauder, Daniel Weiskopf

(a)

(b)

(c)

(d)

Fig. 3. A certain number of regions can be selected in an Indented Pixel Tree Plot and subregions can be selected again from it that are displayed in a side-by-side layered representation: (a) Selection of one region. (b) Selection of several regions on the first layer. (c) Selections on already selected regions are applied and displayed on several layers. (d) Side-by-side representation of selected regions on several layers.

Figures 3 (a)-(d) show how regions and subregions can be selected in an Indented Pixel Tree Plot and how layering and side-by-side views are achieved. In Figure 3 (a) we can see how one single subregion is selected from an Indented Pixel Tree Plot and how it is displayed on the layer below in a larger space which allows to explore more fine-granular substructures of the tree. In Figure 3 (b) several subregions are selected on the same tree layer and displayed on the layer below in a side-by-side view. Figure 3 (c) shows how subregions and subsubregions can be selected and displayed on several layers until a certain granularity of the tree structure is reached. Figure 3 (d) shows a side-by-side representation of selected subregions and subsubregions on many layers. All views in a side-by-side representation are visually connected by linking and brushing features. Selecting one subregion in one plot highlights all occurences of common elements in all other displayed plots in all side-by-side views. This mechanism supports to preserve a viewer’s mental map and helps to set the selected region in context to others. 3.3

Interactive Features

Our browsing tool supports a variety of interactive features to manipulate the hierarchical data and to explore, compare, and annotate it on different levels of granularity. – Region selection: A certain region on each indented plot can be selected by the mouse drag-and-drop functionality. The selected subregion is displayed as another indented plot right below on the next layer in a larger display space. – Marking and annotation: Relevant nodes can be annotated and marked as interesting elements. All of them are highlighted in each of the currently displayed indented plots in case the respective subregion contains that node.

Lecture Notes in Computer Science

7

– Hierarchy expanding/collapsing: Clicking on a graphical element that represents a node leads to a collapsed subtree with the selected node being the root node of that subtree. If the subtree is already collapsed it will be expanded again. – Vertical and horizontal distortion: To use the indented plots efficiently vertical and horizontal distortions are allowed for each plot independently. – Color coding: A variety of predefined color scales can be applied to each Indented Pixel Tree Plot separately and the user is supported by creating his own color scale. – Text pattern search: By typing in a text fragment all hierarchical elements containing this text fragment as a subsequence in its label are highlighted. – Geometric zoom/Lense function: A lense function can be used to geometrically zoom into a region on screen. – Details-on-demand: By moving the mouse cursor over a graphical primitive on screen the corresponding detail information of that element is displayed as a tooltip at the current mouse cursor position. – Data and PNG export: Selected subregions of an indented plot can be stored in the predefined newick data format that allows additional edge length informations or arbitrary lists of attributes attached to the nodes and edges. Furthermore, selected views can be exported as a PNG image.

4

Case Study

To demonstrate the usefulness of our Indented Pixel Tree Browser we apply it to the NCBI taxonomy that is a hierarchical organization of species and organisms [6]. First of all, the entire tree structure is displayed in a single static diagram as a starting point for tree exploration.

Fig. 4. An Indented Pixel Tree Plot serves as an overview representation of the entire provided hierarchical data set. This plot is used as the starting point for the exploration and as a contextual view during the exploration process.

4.1

NCBI Taxonomy

The NCBI taxonomy [6] contains 324,276 hierarchically organized species and organisms. Exploring such a huge hierarchy is a challenging task since an overview of the entire tree in a single static view is difficult to provide. Figure 4 shows an overview representation of the entire data set. We can easily get a hint about how the structure of the hierarchy behaves and where the tree branches in deep substructures by using both vertical indentation and color coding that is depending on the depth in the tree. Our

Michael Burch, Hansj¨org Schmauder, Daniel Weiskopf 8

Fig. 5. The graphical user interface of the Indented Pixel Tree Browser provides a variety of interactive features to manipulate the hierarchical data, to annotate and highlight subregions, to compare subtrees, and to navigate in it.

Lecture Notes in Computer Science

9

Fig. 6. Collapsing all subhierarchies belonging to levels deeper than 2 in the entire hierarchy and showing the label details uncovers insights about the hierarchy structure right below the root.

first observation from the overview plot in Figure 4 is that the entire tree is structured asymmetrically. This means that it has an unbalanced form and it consists of several tree structures on the first hierarchy level that do not show similar hierarchy patterns. For instance, many subregions close to the center part of the plot are branching very deep down to depth 42. The leftmost and rightmost substructures behave differently. Their maximal depth is 10 to 15. We already got many insights from inspecting the static plot but to get even more insights we can interactively explore different subregions of the tree in side-by-side and layered representations. This is illustrated in Figure 5 where a snapshot of the graphical user interface of our Indented Pixel Tree Browser is represented. The center view is used to browse the hierarchical structure. By using drag-anddrop operations the user can select subregions in each currently displayed plot whereas subregions can also be parts of other already selected subregions in the same plot or they can totally enclose them. Each plot can be distorted vertically as well as horizontally and color codings can be adjusted independendly for both the Indented Pixel Tree Plot and the guiding patterns. Tree nodes can be highlighted as interesting elements and are consequently color coded in yellow (or any other user-defined color) in all of the displayed plots in which they occur. Already selected subregions can be shifted by using the drag-and-drop operation on the corresponding guiding pattern. All layers below the shifted one are also shifted and adapted. The leftmost panel of the GUI provides a variety of parameters that can be used to modify the appearance of the plot and to load hierarchical data from file or to store selected regions in the predefined newick format or as a PNG image. The rightmost

10

Michael Burch, Hansj¨org Schmauder, Daniel Weiskopf

panel shows details-on-demand for selected graphical elements. The text fields in the lower part are used to apply textual as well as hierarchy-specific filters. By filtering the tree structure for depth 1 and apply the detail-on-demand function we can find out that there are five child nodes named Unclassified sequences, Viroids, Viruses, Other Sequences, and Cellular organisms, see Figure 6. The Cellular organisms branch is splitting again into three subhierarchies, namely Archaea containing 3,312 nodes, Eukaryote containing 232,984 node, and Bacteria containing 79,613 nodes, the three main domains of life.

Fig. 7. Two selected subregions in the entire plot are mapped to the layer below to a larger display space and make more fine-granular substructures visible. Repetitions of this selection process finally lead to side-by-side detail views and an additional labeling information.

Figure 7 shows how the entire plot can be used to browse the tree and get interesting insights in deeper hierarchical substructures. First of all, we are interested in the deepest substructure of the hierarchy and select this subregion. Further selections lead the viewer to the detail view where he can find the labeling information of some species located in this subhierarchy. On the deepest level in this subhierarchy we find the Cyprinodon species that is a genus of small killifish belonging to the family Cyprinodontidae of ray-finned fish. Using the functionality of the browsing tool we can find out that it belongs to the Eukaryote subtree at the first hierarchy level. Fishes are structured in the deepest levels of the phylogenetic tree and their biodiversity is the reason for this visual phenomenon in the left part of the provided Indented Pixel Tree Plot in Figure 7. As a second example, we are interested in the long horizontal band on the right hand side of the plot that catches one’s eye. Using the detail view again, we find out that it

Lecture Notes in Computer Science

11

belongs to Unclassified Bacteria Miscellaneous. Tracking the path to the root leads to the Bacteria main branch. There are many more insights that one might find in this data set by using the Indented Pixel Tree Browser but we chose just some of them as illustrative examples.

5

Conclusion and Future Work

In this paper we demonstrated how the indented tree visual metaphor can be used to efficiently explore and compare huge and deep trees. We proposed the Indented Pixel Tree Browser—a tool for interactively navigate through huge hierarchical data sets. Furthermore, subtrees can easily be compared and explored for patterns and anomalies on different levels of granularity. We applied it to the NCBI taxonomy, a hierarchical data set containing 324,276 nodes classifying species and organisms. In this case study, we demonstrated how the interactive features can be applied to navigate, filter, and explore such a huge tree very fast and get many insights from the data set. In future, we plan to apply the browsing tool to data sets from software development where hierarchical structures may consist of many million elements when inspecting the source code level or the abstract syntax tree. By applying filtering functions, linking and brushing to the real source code, color coding, and details-on-demand functions such a browsing tool can be a great help for software developers when maintaining the source code or inspecting several statistics of interest in it. Static indented tree plots have been evaluated in our former work [5] and compared to node-link diagrams. A user study addressing more interactive plots should follow to check the usability of the Indented Pixel Tree Browser.

Acknowledgements We would like to thank Dr. Kay Nieselt, University of T¨ubingen, for providing the NCBI taxonomy data set.

References 1. Reingold, E.M., Tilford, J.S.: Tidier drawings of trees. IEEE Transactions on Software Engineering 7 (1981) 223–228 2. Koffka, K.: Principles of Gestalt Psychology. Harcourt-Brace, New York (1935) 3. Kruskal, J., Landwehr, J.: Icicle plots: Better displays for hierarchical clustering. The American Statistician 37 (1983) 162–168 4. Shneiderman, B.: Tree visualization with tree-maps: a 2D space-filling approach. ACM Transactions on Graphics 11 (1992) 92–99 5. Burch, M., Raschke, M., Weiskopf, D.: Indented pixel tree plots. In: Proceedings of International Symposium on Visual Computing. (2010) 338–349 6. Sayers, E.W., Barrett, T.: Database resources of the national center for biotechnology information. Nucleic Acids Research 37 (2009) 5–15 7. Bertin, J.: Semiologie graphique: Les diagrammes, Les reseaux, Les cartes. Editions Gauthier-Villars, (2nd edition 1973, English translation 1983), Paris (1967)

12

Michael Burch, Hansj¨org Schmauder, Daniel Weiskopf

8. Knuth, D.: The Art of Computer Programming, Volume I: Fundamental Algorithms. Addison-Wesley, Reading, MA, USA (1968) 9. McGuffin, M.J., Robert, J.M.: Quantifying the space-efficiency of 2D graphical representations of trees. Information Visualization 9 (2009) 115–140 10. J¨urgensmann, S., Schulz, H.J.: A visual survey of tree visualization. In: Poster Compendium of the IEEE Conference on Information Visualization. (2010) 11. Battista, G.D., Eades, P., Tamassia, R., Tollis, I.G.: Graph Drawing: Algorithms for the visualization of graphs. Prentice Hall, Upper Saddle River, NJ (1999) 12. Herman, I., Melanc¸on, G., Marshall, M.S.: Graph visualization and navigation in information visualization: A survey. IEEE Transactions on Visualization and Computer Graphics 6 (2000) 24–43 13. Eades, P.: Drawing free trees. Bulletin of the Institute for Combinatorics and its Applications 5 (1992) 10–36 14. Grivet, S., Auber, D., Domenger, J.P., Melanc¸on, G.: Bubble tree drawing algorithm. In Wojciechowski, K., Smolka, B., Palus, H., Kozera, R.S., Skarbek, W., Noakes, L., eds.: Computer Vision and Graphics, Dordrecht, The Netherlands, Springer (2006) 633–641 15. Purchase, H.: Metrics for graph drawing aesthetics. Visual Languages and Computing 13 (2002) 501–516 16. Andrews, K., Heidegger, H.: Information slices: Visualising and exploring large hierarchies using cascading, semi-circular discs. In: Proceedings of the IEEE Information Visualization Symposium, Late Breaking Hot Topics. (1998) 9–12 17. Stasko, J.T., Zhang, E.: Focus+context display and navigation techniques for enhancing radial, space-filling hierarchy visualizations. In: Proceedings of the IEEE Symposium on Information Visualization. (2000) 57–66 18. Yang, J., Ward, M.O., Rundensteiner, E.A., Patro, A.: InterRing: a visual interface for navigating and manipulating hierarchies. Information Visualization 2 (2003) 16–30 19. Bruls, M., Huizing, K., van Wijk, J.: Squarified treemaps. In: Proceedings of Joint Eurographics and IEEE TCVG Symposium on Visualization. (2000) 33–42 20. van Wijk, J.J., van de Wetering, H.: Cushion treemaps: Visualization of hierarchical information. In: Proceedings of Information Visualization. (1999) 73–78 21. Balzer, M., Deussen, O., Lewerentz, C.: Voronoi treemaps for the visualization of software metrics. In: Proceedings of Software Visualization. (2005) 165–172 22. Shneiderman, B.: The eyes have it: A task by data type taxonomy for information visualizations. In: Proceedings of the IEEE Symposium on Visual Languages. (1996) 336–343 23. Lamping, J., Rao, R., Pirolli, P.: A focus+content technique based on hyperbolic geometry for viewing large hierarchies. In: Proceedings of Human Factors in Computing Systems. (1995) 401–408 24. Munzner, T., Guimbreti`ere, F., Tasiran, S., Zhang, L., Zhou, Y.: TreeJuxtaposer: scalable tree comparison using focus+context with guaranteed visibility. ACM Transactions on Graphics 22 (2003) 453–462

Suggest Documents