A Scalable Visualization for Dynamic Data in Software System Hierarchies Michael Burch∗ , Michael Raschke† , Adrian Zeyfang∗ , and Daniel Weiskopf∗ ∗ VISUS,
University of Stuttgart, Allmandring 19, 70569 Stuttgart, Germany, Email:
[email protected] † Blickshift, Schelmenwasenstr. 34, 70567 Stuttgart, Germany, Email:
[email protected]
Abstract—Software systems can grow large, consisting of thousands of hierarchically organized elements like directories, subdirectories, files, and functions. Moreover, those hierarchy elements can carry additional information worth investigating for a software developer. Getting an overview of both the hierarchy and the attached static or dynamic data can become a tedious task if it is not supported by a visually scalable visualization technique. In this paper, we use a hierarchy visualization based on the visual metaphor of indentation to generate an overview of the software system hierarchy and easily attach additional attributes. The extra information is aligned with the hierarchy elements and, hence, supports visual comparisons of the attachments on different levels of hierarchical granularity. Through interaction, we provide additional views on the data, e.g., by filtering, hierarchy transformations, or details-on-demand. We illustrate the usefulness of our hierarchy visualization technique by means of an application example exploring data from the open-source software project jEdit. We investigated the readability of the hierarchy visualization with a user experiment, comparing indentation to node-link diagrams for varying sizes of a hierarchy.
I. I NTRODUCTION Large software systems are typically developed over many years by several developers, resulting in a deeply structured system consisting of hierarchically organized software entities [1]. Moreover, not only the system structure itself, but also the extra data attached to each of the hierarchy elements can become of particular interest for the developers. Traditional visualization or visual analytics systems either support the visual exploration of the hierarchy alone or separate views on the extra data [2], but if those are represented in a combined way, they do not scale to large software systems. It is important to first get an overview [3] of the entire system together with additional information, providing a global overview of all of the data or at least a large part of it at a glimpse. Such an approach lets the user freely decide where to start the data exploration process, and supports the building of hypotheses about the data to confirm, reject, or refine, which is a major ingredient of a visual analytics system [4], [5]. Today’s software systems can be considered as being big data [6], [7] and, hence, demand for a visually scalable approach, providing a starting point for further data exploration. It is challenging to design such an overview representation that reflects the hierarchical organization on the one hand, while being visually scalable on the other hand. The additional data attachments make this task even more challenging
because these should be visually encoded in an efficient and effective way to support comparison tasks done by the user without leading to misinterpretations of the data or a mapping to a wrong subhierarchy that would lead to false hypotheses about the data. In this paper, we describe a hierarchy visualization [8] combined with additional data recorded, measured, and acquired in a software system. The hierarchy representation is based on the visual metaphor of indentation [9] with which a software developer is already familiar when implementing source code or when inspecting file systems in a file browser. The concept of pretty printing exploits the same idea to make the source code clearer and, thus, easier to maintain. The indented hierarchy visualizations can be scaled-down to pixel or subpixel size by applying aggregation techniques, making them useful for large software systems. The hierarchy itself can be used as navigational aid to reduce the amount of displayed data or inspect the data on different levels of hierarchical granularity. Moreover, the additional data can be attached to each hierarchy element, inner vertices and leaf vertices alike, treating them similarly; this is an important advantage compared to other visualization techniques such as node-link diagrams, layered icicles, or treemaps. We show the usefulness of our visualization technique by applying it to the open-source software project jEdit [10]. Moreover, we conducted a user experiment comparing the indentation metaphor with node-link diagrams for varying dataset sizes and 5 different tasks. II. R ELATED W ORK Today, there are various visualization techniques for hierarchy data as surveyed and classified by Schulz et al. [8]. Although most of the newly developed hierarchy visualizations scale to thousands of hierarchically organized elements, not many of them allow the attachment of additional data to each of the hierarchy elements as, for example, shown by Vehlow et al. [11] for dynamic hierarchies and graphs. Mostly, the combined visualizations do not scale to many entities or they do not allow one to efficiently compare the attached data with respect to the corresponding subhierarchies, i.e., for inner vertices as well as leaf vertices equally well. Moreover, comparisons on different levels of hierarchical granularities are not well supported in many systems since the hierarchy
elements are not aligned on common scales as evaluated by Cleveland and McGill [12]. We focus on the visual scalability aspect as well as the combination of large hierarchies with data attachment. This combined data structure is inherent to most software development data [1], [13], [14] that is built as a hierarchy while extra data is attached to each of the hierarchy elements (like file sizes, developer ownerships, bug probabilities, time stamps, number of lines of source code, deepest nesting structure in the code, and many more). Such data is typically not static but evolves over time, requiring a visualization technique that efficiently supports those comparison tasks with the goal to detect trend patterns or anomalies on different hierarchical granularities. In particular, Burch [15] developed a hierarchy visualization based on node-link diagrams [16] in which each link is used to visually encode an axis on which software metrics are visually encoded. Although the system allows interactions like expanding and collapsing, it is hard to visually compare the different software metrics because those are not aligned on common scales caused by the node-link layout. Moreover, the hierarchy visualization is not scalable to thousands of elements as in today’s software systems. For example, 3D hierarchy visualizations like 3D landscapes or code cities [17], [18] build an intuitive and easy-to-understand visual metaphor, particularly designed for software developers, but they do not scale to thousands of elements, they generate occlusion effects, and additionally attaching extra data with the goal to be efficiently and perceptually comparable becomes challenging. Many visual metaphors for hierarchy data have been investigated for attaching extra data, including treemaps [19], layered icicles [20], [21], [22], or node-link diagrams [16]. We chose indented hierarchy visualization [9], [23] as the basis for our work because it is visually scalable in the hierarchy dimension and allows us to show additional attributes [24], [25]. In this paper, we apply this approach of indented visualization combined with additional attributes to software system data. We believe that the indented visual metaphor is especially useful for software system data because it is intuitive for software developers who work with the concept of pretty printing on a regular basis. Moreover, algorithmic concepts like automatic hierarchy traversal based on value similarities are not supported in previous work.
III. DATA M ODEL AND T RANSFORMATIONS Before describing the visualization technique for software systems with additionally attached data, we present a data model reflecting data typically stored during a software development process. Moreover, we illustrate how the hierarchy data can be ordered based on the data attachments to get a visual overview that rapidly reflects visual patterns like groups or clusters. This is in particular useful to understand certain data patterns on different levels of hierarchical granularities.
A. Software System Hierarchy A software system hierarchy can be modeled in a graphtheoretic sense as a hierarchy H = (V, E) in which V := {v1 , . . . , vn }, n ∈ N denotes the finite set of software entities. These are typically functions/methods, classes, files, subdirectories, directories, or packages, but can be extended to any other, even finer hierarchical granularity, for example, if we start on the level of an abstract syntax tree. The set E := {e1 , . . . , em } ⊂ V × V, m ∈ N models the parent-child relationships indicating which elements are hierarchically related in the software system. B. Attached Data Attributes and Aggregation Each vertex v ∈ V , i.e., each element in a software system hierarchy, can be accompanied by a list of attributes Av := {av1 , . . . , avk }, k ∈ N where each avj expresses a value that can be of categorical or quantitative nature. A time-varying behavior of data attributes can be modeled by the index k in which a chronological order is naturally given by the increasing index number. Certain subhierarchies can be collapsed, leading to an aggregation of the involved data attributes. If a list of attributes a1 , . . . , al has to be aggregated, we support different options like building the average value, the sum of all attributes, or the maximum/minimum of them in the case we have to deal with quantitative data that allow arithmetic operations. In the case of categorical attributes, a representative one is computed: the one that occurs most frequently, or in an equal distribution, the first one in the list. This can, for example, be the case for a software system hierarchy in which we are interested in the different file types (that build a categorical information). C. Hierarchy Traversal The question arises how the hierarchy should be presented visually since there are many possible hierarchy traversal options, all preserving the hierarchical organization. In our technique, we build a one-dimensional representation of the software system hierarchy in an indented visual metaphor. Based on the attached data attributes, we can compute a suitable order among the subhierarchies and hierarchy elements. To reach this goal we start with the hierarchical order on the topmost level and compute pairwise distances between the involved hierarchy elements. Those distances are based on the attributes Av for each representative parent vertex of a subhierarchy. If Av and Au are two different subhierarchies starting with the root vertex v ∈ V and u ∈ V , we first compute the Euclidian distance: p dv,u := (av1 − au1 )2 + . . . + (avk − auk )2
Applying this distance function to all parent vertices on a certain hierarchy level results in a distance matrix that can be used to order the hierarchy elements by applying a matrix reordering approach [26] based on heuristics of the minimum linear arrangement (MinLA) problem [27]. This can be done recursively and separately for the deeper, less aggregated levels, still preserving the hierarchical organization.
Removing the explicit orthogonal links does not change the appearance of the indented hierarchy visualization with respect to the meaning and interpretability that is not based on the links anymore but purely on the indentation.
IV. S OFTWARE S YSTEM H IERARCHY V ISUALIZATION Software systems can grow large and can become deeply nested, which requires a suitable visualization technique. Moreover, the attached data for each hierarchy element demands for a technique that supports visual comparisons in an aligned way. The indented visual metaphor is an appropriate concept meeting these needs while it is also an intuitive concept for software developers, in particular, if they are mostly working as code implementers. Consequently, we got inspired by this visually scalable and intuitive idea and describe in detail how it can be used to support software developers. A. Principle of Indentation The idea of using indentation for displaying hierarchies [9] is not new, but it can be applied as a powerful concept for visualizing software systems since it generates visually scalable diagrams. Those do not require explicit links nor do they need explicit stacking or nesting as in layered icicles or treemap visualizations. The indentation of the hierarchical elements based on their location and depth in a hierarchy is sufficient to fully encode such a data structure.
(a)
(b)
Fig. 1. A small-sized hierarchy in two different visual metaphors: (a) nodelink diagram, (b) indented hierarchy visualization (still with explicit orthogonal links that are not needed to reflect the hierarchical organization).
Figure 1 (a) shows a hierarchy consisting of 8 vertices in a traditional node-link layout. The vertices are color-coded to better link them with the corresponding indented hierarchy visualization depicted in Figure 1 (b). We can see that the explicit links needed in the node-link diagram are still present in the indented diagram, even though they are not required to reflect the hierarchy there. In this scenario, the diagram is read from top to bottom and left to right while the depth of the hierarchy grows from left to right.
Fig. 2. Top-down indented hierarchy visualization: Nodes without grid, edges are given implicitly by relative node layout (left). Explicit edges between nodes and uniform grid (right).
Figure 2 shows the small hierarchy from above in both scenarios, i.e., as standard indented hierarchy visualization and with explicit orthogonal links (and additional grid). The additional links are not needed since then the hierarchical organization in the data is encoded by two visual variables, the indentation and explicitly by the links. The hierarchy orientation does not change the hierarchical organization since it is invariant due to rotations and orientations provided that the root vertex is clearly indicated. Interpreting such a hierarchy visualization for specific features in a hierarchy like the root vertex, siblings, inner vertices, or leaf vertices can be explained in a straightforward way. Figure 3 shows how a hierarchy in an indented visual metaphor can be examined to identify the siblings of an interesting vertex. Here, we still show the orthogonal links as visual aid for understanding the detection of sibling nodes. In the figure on the left-hand side, the green node is under exploration and we are interested in identifying the nodes on the same hierarchy level, i.e., its sibling nodes. This task is pretty easy since we only have to visually scan the diagram in the vertical direction and look for the direct ancestor that is located toward the root node (colored red) and the next higher hierarchy level that is located in the vertical direction away from the root node (also colored red). All vertices located on the same horizontal position between these two border elements (marked with ‘STOP’) are the siblings of the green vertex in the figure on the left-hand side; these siblings are all colored in green on the right-hand side. Although the indented diagrams can be rotated easily, this operation should be applied with care since it can become perceptually problematic if the hierarchy has to be read
(a)
Fig. 3. Identifying siblings of a node marked green (left). All of its siblings are marked in green (right); the upper and lower search bounds, starting with the parent and ending with the next neighboring node, are marked in red.
from right-to-left, as illustrated in Figure 4. By default, we choose left-to-right and top-to-bottom orientation since we designed our software for users who are familiar with the same orientation for reading text. For other audiences outside western countries, different orientations may be more suitable. This cultural background is also the reason why we base the stimuli in the user experiment (described in Section VI) on this layout.
(b) Fig. 5. Two representations for hierarchy data: (a) node-link diagram, (b) indented hierarchy visualization.
hierarchy elements, i.e., all of them are rendered on screen. Figure 5 (a) shows a larger hierarchy in a traditional topdown node-link diagram [28]. Although the structure of the hierarchy becomes clear, explicit links are required to read and understand the hierarchy. In Figure 5 (b), indentation is applied, which indicates the hierarchy without explicit links. This has the benefit that each hierarchy element has a unique representative location on a horizontal line virtually placed below the diagram. B. Data Attachments
Fig. 4. Visual demonstration of a drawback of indented hierarchy visualization: Unclear indented diagram that cannot be parsed from left to right (top). Explicitly drawing edges and specifying the root node reveals that the diagram is read right-to-left (bottom).
To reach the full potential of the indented hierarchy visualization we can use a scale-down to pixel or subpixel size, which supports the visualization of very large hierarchies. If not enough pixels are available to display all hierarchy elements, we use binning to visually aggregate the hierarchical data, which looks like rendering them in subpixel size. In any other case, there are no overlaps or aggregations of
The real benefit of such a pixel-based indented hierarchy visualization is achieved when additional data has to be explored in the context of the hierarchical organization. This is in particular the case in software systems that are developed over longer time periods and produce a wealth of additional data sources like source code metrics or developer-specific properties. Also, relations between software system elements are generated like call graphs or inheritance hierarchies. Moreover, code-developer properties can be of interest and might be changing over time. All of these aspects require a visually scalable visualization that provides an overview of the entire data first and then supports further data exploration by filtering. The combined visualization additionally supports the inspection of the dynamic data on several hierarchical granularities. Figure 6 shows a scenario in which additional data is
Fig. 6. Possible indented hierarchy visualization extension showing varying node widths depending on a node attribute.
attached to a hierarchy visualization. However, this direct mapping to the hierarchy elements does not support comparison tasks. The problem is the placement at non-aligned scales [12] that leads to performance problems when judging and comparing values. This issue can be addressed by first showing the scalable hierarchy visualization in a separate view, but on top of this, aligning the additional data with the diagram and with each of the hierarchy elements, applying the approach of indented timeline diagrams [24]. This results in judgments along one aligned and common scale, making value judgements and comparison tasks much easier, faster, and more accurate [12]. This approach is illustrated in Figure 7. Moreover, the visual elements are stretched in length to better connect the hierarchical elements. C. Hierarchy as a Navigation Aid Apart from using the hierarchy to understand the data on different hierarchical granularities, it can also be used to navigate in the data, i.e., to reduce the amount of data attachments by collapsing certain (uninteresting) subhierarchies. If this interaction is applied, the corresponding data is aggregated and the user has to decide how to do that, as described in Section III. There are several characteristic features in a hierarchy that might be used to change the view on the data, i.e., to change the amount of data to be displayed and the level of data aggregation. For example, inner vertices, leaf vertices, branching factors, depths, or subhierarchy sizes play a crucial role when it comes to hierarchy data exploration. Hence, such characteristics are useful as navigational aids and to further inspect the data. D. Visual Pattern Identification There are several visual patterns that can be identified easily and then be used to remap them to the data, serving as a data exploration tool. By this mapping and remapping strategy supported by interaction techniques, we are able to confirm, reject, or refine formerly built hypotheses about the data. With our visualization technique, these hypotheses can take into account the hierarchical structure, the dynamic data attachments, but also both in combination, supporting
Fig. 7. Indented hierarchy visualization of the dynamics of attached quantities. The visual design shows both the hierarchy (top) and the data attachments (bottom). Each quantity of the data attachment is color-coded, aligned over time (vertically), and attached to the respective hierarchy element (horizontally). The indentation uses stretched visual elements to better link the hierarchy elements.
us in understanding the data on certain levels of hierarchical granularity. There are three categories of visual patterns caused by data properties and the visual encoding of the data. We base our visual pattern description on a left-to-right and top-to-bottom representation of the hierarchy with the root vertex placed at the leftmost and topmost position. As the first category, traditional static hierarchy patterns are observable: • Root node: The root node is located at the leftmost position and hence, can be detected easily. • Inner nodes: Inner nodes can be detected by vertical pixel-based lines that are longer than one pixel and, hence, can be distinguished from leaf nodes. • Leaf nodes: Leaf nodes are visually encoded by single pixels and are the smallest entities in this representation. More complex tree structures can also be examined: • Subtrees: Subtrees on the same level can be observed by inspecting the vertical lines that start at the same vertical position. • Size of subtrees: The number of nodes in a subtree can easily be estimated by the area that is enclosed by two vertical lines that represent the same hierarchy level. This information can be used to compare subtrees with respect to their sizes and depths. • Depth of subtrees: The depth of a subtree can be observed by inspecting the lowest horizontal line or pixel in the right direction of this subtree that ends at the next vertical line that starts at the same vertical position. The
depth of two subtrees can be compared by examining the leaves that are located further below. • Symmetries: Symmetrical subtrees can be observed by inspecting the global structure of the visualization and searching for identical substructures. • Balanced/unbalanced trees: The same is true for balanced trees where a viewer must detect symmetry axes in each of the subtrees. Otherwise, the tree is unbalanced. • Complete n-ary trees: All structures and substructures should be present in all hierarchy levels in the same way and with the same splitting number n. Also time-varying (i.e., dynamic) visual patterns can be detected by inspecting the attached dynamic data: • Trends: A trend in the hierarchical structure may be a subhierarchy that grows slowly by adding more and more substructures to that tree. Parallel trends are detected in the form of several subtrees that are evolving side by side. The quantitative values might exhibit some trends as well; these may either grow or decrease in some way or they may not change at all. • Countertrends: Two or more subtrees may be evolving in opposite directions, some subtrees might be growing, whereas others might be shrinking in a similar way. This is a typical phenomenon in software evolution, where parts of the source code are outsourced and copied to a different location of the project’s directory tree. Countertrends might also be detected in the metric values. • Anomalies/outliers: Some unexpected phenomena in the evolution should be classified as anomalies, a subtree might grow immensely during the evolution and at some time its growth might stop abruptly or the metric values might show an alternating behavior that could be a sign for serious problems. E. Interaction Techniques Our visualization tool provides several interaction techniques to explore the global hierarchical structure on the one hand as well as the dynamics in the quantitative data on the other hand. The key concepts of our visualization tool refer to Shneiderman’s information visualization mantra [3]: overview first, zoom and filter, then details on demand. • Expanding and collapsing of subtrees: After having inspected the whole dataset or at least a large part of it, a user may decide to collapse several subtrees of lesser interest. This reduces the data space immensely. Doing a collapse or an expand operation will not influence the order of the remaining tree elements, which helps preserve one’s mental map. • Time aggregation and deaggregation: Certain periods of time may be classified by the viewer as constant development phases of the quantitative data. Those periods might be less relevant and collapsed or filtered out by the user. Again, the order of the time-series data is not changed and, hence, the mental map is preserved in this dimension after applying such an operation.
•
•
•
Zooming: Rectangular regions of the graphical representation of the data can be selected on user’s demand and enlarged to a predefined scale. Filtering: Subhierarchies as well as time periods and intervals may lead to a degradation of performance at some tasks when using the visualization tool. For this reason, several filtering functions are provided that allow the user to reduce the displayed information. Details-on-demand: Hierarchical elements can be selected and a details-on-demand function shows textual information about the tree element. When moving the cursor to a position that belongs to the timeline, quantitative information is displayed as a text fragment in a separate frame. V. A PPLICATION E XAMPLE
We demonstrate our visualization technique for the example of the open-source software project jEdit [10]. The examined dataset consists of transaction data that can be obtained by analyzing the software archive for this project in the time period from September 2, 2001 until October 30, 2004. As metric, we use the number of lines added or deleted between two consecutive versions of a file. To see the progress in the evolution we sum up the number of changed lines. Just displaying the changes version-by-version gives not as much insight in the time-series data. Figure 8 shows our visualization of jEdit. It is separated into two views. The topmost part indicates the hierarchical structure of the project as an indented hierarchy visualization. The part below shows the timeline view for the evolving software project with respect to the changed number of lines. The timeline is along the vertical direction. The project hierarchy consists of one large subhierarchy annotated by org. This subdirectory contains the main part of the source code to which many JAVA files belong. The org directory is again subdivided into certain subdirectories that contain gif and png files. These software artifacts are located at the leftmost position in the presented view. One can see that the gif files were checked in after the first third of the project start and have never been changed since then. The rest of the source code files are altering from time to time, a fact that can be obtained by inspecting the color coding for the timeline view. The project is organized into hierarchical structures. Even the documentation files are located in a certain subdirectory and we can see that there are several files that evolve in a similar way as the source code files. VI. U SER E XPERIMENT In this section, we describe a user study to test the effectiveness and the readability of the indented hierarchy visualization. We compared this visualization type with the classical nodelink tree diagram. There are several user experiments for tree visualization techniques [28], [29], [30], but only Burch et al. [9] investigated the readability of node-link tree diagrams compared to
Fig. 8. The indented hierarchy visualization shows the number of changed lines of files in different hierarchy levels during the evolution of the open-source software project jEdit. Interesting trends can easily be recognized by inspecting the color-coded timelines.
indented hierarchy visualization. The major difference to our study is that Burch et al. [9] recruited non-expert participants: students with not much background knowledge in visualization and, in particular, in hierarchy visualization. In the current user experiment, we asked visualization experts to perform the given tasks that are also different from the previous study by Burch et al. Moreover, we employed a between-subjects study design to compare between node-link and indented visualizations. A. Experimental Setup To study advantages and disadvantages of the two visualizations we performed a 2 (visualization: indented hierarchy visualization vs. classical node-link diagram) × 3 (dataset: small vs. middle vs. large) mixed study design. For both visualization types, we used the same datasets. These parameter variations served as the independent variables in the study. We recruited 18 visualization experts (3 female, 15 male) from our scientific research group, all familiar with node-link diagrams as we checked by asking test questions prior to the study. We classified them randomly into two groups with the same size of 9 people. The first group only worked with the new indented hierarchy visualization. The second group only used the classical node-link diagrams. The between-subjects study design helped avoid learning effects between the two visualization types. For each of the three datasets, the procedure was the same.
Each participant was given an introduction to the visualization technique with a two-page tutorial document. After five minutes, five control questions were asked to ensure that the subjects understood the basic rules of both visualization techniques. B. Tasks and Open Questions Then, they were asked to perform five tasks (F1–F5) focusing on typical hierarchy properties: • • • • •
F1: Count the numbers of subtrees starting at the root. F2: Count the number of subtrees starting on the second level. F3: Find out whether there is a level in the tree without any nodes. F4: Find the deepest subtree in the displayed tree. F5: Find the most symmetric subtree in the tree.
Time and errors were measured as dependent variables during the experiment. We defined a timeout after one minute for each task. Incorrect answers were counted as a failure. Exceeding the timeout was also defined as a failure. In the last part of the user experiment, the participants had to answer open-ended questions: • •
What was your first impression when you saw the nodelink diagram/the indented hierarchy visualization? What have been the advantages or disadvantages of the node-link or indented hierarchy visualization technique?
(a)
(b)
(c)
Fig. 9. Accuracy results of the user study (number of correct answers) for the node-link visualization (blue bars) and the indented visualization (red bars), each for tasks F1–F5 and varying dataset size: (a) small-sized, (b) medium-sized, and (c) large-sized hierarchy.
(a)
(b)
(c)
Fig. 10. Average response times (in seconds) for the node-link visualization (blue) and the indented visualization (red), each for tasks F1–F5 and varying dataset size: (a) small-sized, (b) medium-sized, and (c) large-sized hierarchy.
Did you use a specific strategy when you carried out the tasks? We worked with a thinking-aloud method. The answers were recorded and gave us general impressions about the cognitive background of the subjects when they worked with the visualization techniques. •
C. Results We start with results from the first main part of the user experiment: correctness (accuracy) and completion time. We use descriptive statistics (with statistical graphics) to show the results. Figure 9 depicts the correctness of the tasks. Blue bars show the number of participants who performed the study task correctly using node-link diagrams, red bars show those who worked with the indented hierarchy visualizations. The results are separately reported for each task and each dataset size. The completion times are shown in seconds in Figure 10 in the form of box plots. Again, blue indicates node-link visualization, red indicates indented visualization. The green thick line represents the median; the error bars show the variance of the fastest and slowest participant. Green circles display outlier values. The results are separately reported for each task and each dataset size. For the open-ended questions, many participants stated that they were familiar with node-link diagrams, but those who worked with the indented hierarchies mentioned difficulties in
interpreting the diagrams. Many participants found indented hierarchies a clever way to represent this kind of data, in particular, that they could be read easily once they understood the visual concept behind this idea. Then, the tasks become easy to solve, in particular, the more tasks are answered, the easier and faster they may be solved due to learning effects. However, similar learning effects were also reported by the participants who used the node-link diagrams. D. Discussion and Open Questions The number of participants who performed the user study tasks correctly is approximately identical for both techniques. Only the last task (“Find the most symmetric subtree in the tree”) was performed with a higher value of correctness using the node-link tree diagrams. In case of all datasets—small, middle, and large—the median of the time measurements is in 11 of 15 cases approximately the same. This shows that the effectiveness of the indented hierarchy visualizations is nearly the same as of the node-link diagrams. Studying the variance between the fastest and the slowest subject in all tasks gives no exact preference of one of the two techniques. The goal of the evaluation was to study the effectiveness and readability of the indented hierarchy visualization and if it is a suitable concept for visualizing large hierarchies like software systems. Being used to node-link diagrams,
all participants understood the basic rules of the indented hierarchy visualization within a short time span of introduction by reading a tutorial text. After this brief explanation, they were able to reliably work with the technique, which was indicated by the results on the task performances. We would like to point out that this user experiment only addressed the readability of the indented hierarchy visualization because this is a key component of our overall visualization. We have not considered the additional data attachments, nor have we investigated the usability of the interaction techniques (Section IV-E). Furthermore, we have not investigated the learning effects for both indented and node-link diagrams. These aspects are left for future work. VII. C ONCLUSION AND F UTURE W ORK In this paper, we described an approach for a visually scalable hierarchy visualization based on the indented visual metaphor that is intuitive for software developers because it resembles the pretty printing concept. The hierarchy visualization can be scaled down to pixel size and allows the attachment of additional data typically occurring during software development, but also as derived measures from snapshots of a software system during the development. Interaction techniques are applicable like hierarchy filtering, hierarchy transformations, or details-on-demand. We illustrated the usefulness of our technique by means of an application example focusing on the jEdit open-source software system. Moreover, we conducted a user experiment to test how well the indented visualization can be read. For future work, we plan to experiment with further interaction techniques. Also, additional data attachments might be worth investigating. For example, a better view on the source code itself and computed code metrics might be worth considering while also more explicit code comparisons on different hierarchical granularities should be computed algorithmically and then displayed, resulting in a scalable comparison-based overview of the entire system. Finally, the user-based evaluation could be extended. ACKNOWLEDGMENTS We thank the German Research Foundation (DFG) for financial support under grant WE 2836/6-1. R EFERENCES [1] S. Diehl, Software Visualization – Visualizing the Structure, Behaviour, and Evolution of Software. Springer, 2007. [2] M. Burch, “Visual analysis of compound graphs,” in Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC, 2016, pp. 54–58. [3] B. Shneiderman, “The eyes have it: A task by data type taxonomy for information visualizations,” in Proceedings of IEEE Symposium on Visual Languages, 1996, pp. 336–343. [4] D. A. Keim, G. L. Andrienko, J. Fekete, C. G¨org, J. Kohlhammer, and G. Melanc¸on, “Visual analytics: Definition, process, and challenges,” in Information Visualization – Human-Centered Issues and Perspectives, 2008, pp. 154–175. [5] P. C. Wong and J. Thomas, “Visual analytics,” IEEE Computer Graphics and Applications, vol. 24, no. 5, pp. 20–21, 2004. [6] I. A. T. Hashem, I. Yaqoob, N. B. Anuar, S. Mokhtar, A. Gani, and S. U. Khan, “The rise of big data on cloud computing: Review and open research issues,” Information Systems, vol. 47, pp. 98–115, 2015.
[7] A. de Mauro, M. Greco, and M. Grimaldi, “What is big data? a consensual definition and a review of key research topics,” AIP Conference Proceedings, vol. 1644, no. 1, pp. 97–104, 2015. [8] H. Schulz, S. Hadlak, and H. Schumann, “The design space of implicit hierarchy visualization: A survey,” IEEE Transactions on Visualization and Computer Graphics, vol. 17, no. 4, pp. 393–411, 2011. [9] M. Burch, M. Raschke, and D. Weiskopf, “Indented pixel tree plots,” in Proceedings of International Symposium on Advances in Visual Computing, ISVC, 2010, pp. 338–349. [10] jedit Developer Community, “jedit – Programmer’s Text Editor,” http://jedit.org/. [11] C. Vehlow, F. Beck, and D. Weiskopf, “Visualizing dynamic hierarchies in graph sequences,” IEEE Transactions on Visualization and Computer Graphics, vol. 22, no. 10, pp. 2343–2357, 2016. [12] W. S. Cleveland and R. McGill, “An experiment in graphical perception,” International Journal of Man-Machine Studies, vol. 25, no. 5, pp. 491– 501, 1986. [13] M. Burch, S. Diehl, and P. Weißgerber, “EPOSee – A tool for visualizing software evolution,” in Proceedings of the 3rd International Workshop on Visualizing Software for Understanding and Analysis, 2005, pp. 127– 128. [14] ——, “Visual data mining in software archives,” in Proceedings of the ACM 2005 Symposium on Software Visualization, 2005, pp. 37–46. [15] M. Burch, “Visualizing software metrics in a software system hierarchy,” in Proceedings of International Symposium on Advances in Visual Computing, ISVC, 2015, pp. 733–744. [16] E. M. Reingold and J. S. Tilford, “Tidier drawing of trees,” IEEE Transactions on Software Engineering, vol. 7, no. 2, pp. 223–228, 1981. [17] R. Wettel and M. Lanza, “Visualizing software systems as cities,” in Proceedings of the 4th IEEE International Workshop on Visualizing Software for Understanding and Analysis, VISSOFT, 2007, pp. 92–99. [18] F. Fittkau, A. Krause, and W. Hasselbring, “Exploring software cities in virtual reality,” in Proceedings of 3rd IEEE Working Conference on Software Visualization, VISSOFT, 2015, pp. 130–134. [19] B. Johnson and B. Shneiderman, “Tree-maps: A space-filling approach to the visualization of hierarchical information structures,” in Proceedings of IEEE Visualization Conference, 1991, pp. 284–291. [20] K. Andrews and H. Heidegger, “Information slices: Visualising and exploring large hierarchies using cascading, semi-circular discs (late breaking hot topic paper),” in Proceedings of the IEEE Symposium on Information Visualization, 1998, pp. 9–12. [21] J. Stasko and E. Zhang, “Focus+context display and navigation techniques for enhancing radial, space-filling hierarchy visualizations,” in Proceedings of the IEEE Symposium on Information Visualization, 2000, pp. 57–65. [22] J. Yang, M. O. Ward, E. A. Rundensteiner, and A. Patro, “InterRing: a visual interface for navigating and manipulating hierarchies,” Information Visualization, vol. 2, no. 1, pp. 16–30, 2003. [23] M. Burch, H. Schmauder, and D. Weiskopf, “Indented pixel tree browser for exploring huge hierarchies,” in Proceedings of International Symposium on Advances in Visual Computing, ISVC, 2011, pp. 301–312. [24] M. Burch, M. Raschke, M. Greis, and D. Weiskopf, “Enriching indented pixel tree plots with node-oriented quantitative, categorical, relational, and time-series data,” in Proceedings of International Conference on Diagrammatic Representation and Inference, Diagrams, 2012, pp. 102– 116. [25] M. Burch, C. M¨uller, G. Reina, H. Schmauder, M. Greis, and D. Weiskopf, “Visualizing dynamic call graphs,” in Proceedings of the Vision, Modeling, and Visualization Workshop, VMV, 2012, pp. 207–214. [26] M. Behrisch, B. Bach, N. H. Riche, T. Schreck, and J. Fekete, “Matrix reordering methods for table and network visualization,” Computer Graphics Forum, vol. 35, no. 3, pp. 693–716, 2016. [27] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979. [28] M. Burch, N. Konevtsova, J. Heinrich, M. H¨oferlin, and D. Weiskopf, “Evaluation of traditional, orthogonal, and radial tree diagrams by an eye tracking study,” IEEE Transactions on Visualization and Computer Graphics, vol. 17, no. 12, pp. 2440–2448, 2011. [29] T. Barlow and T. Neville, “A comparison of 2-d visualizations of hierarchies,” in Proceedings of the IEEE Symposium on Information Visualization, 2001, pp. 131–138. [30] Y. Wang, S. T. Teoh, and K.-L. Ma, “Evaluating the effectiveness of tree visualization systems for knowledge discovery,” in Proceedings of EuroVis, 2006, pp. 67–74.