parallel program monitoring, researchers, devel- ... copied or distributed royalty free without further permission ...... The SNMP manager could not access the.
Architecture and applications of the Hy visualization system
+
by M. P. Consens F. Ch. Eigler M. 2.Hasan A. 0. Mendelzon E. G.Noik A. G.Ryman D. Vista
The Hy+ system is a generic visualizationtool that supports a novel visual query language called GraphLog. In Hy+, visualizations are based on a graphical formalism that allows comprehensible representationsof databases, queries, and query answersto be interactively manipulated. This paper describes the design, architecture, and featuresof Hy+ with a number of applications in software engineering and network management.
V
isual presentations arewidely considered an effective tool to help manage large and complex collectionsof data. Researchers in scientific visualization(see, for example, McCormick et a1.l) were the first to exploit computer graphics technology to achieve dramatic improvementsin the ability of people to understand the data with which they work. The motivation for this exploitation is summarized in the following words from the “Panel Report on Visualization in Scientific Computing,’’ which appeared in the November 1987 issue of Computer Graphics: The gigabit bandwidth of the eyekisual cortex system permits much faster perceptionof geometric and spatial relationships than any other mode, making the power of supercomputers more accessible. Users from industry, universities, medicine and governmentare largely unable to comprehendor influence the “fire
458
CONSENS ET AL
hoses” of dataproduced by contemporary sources such as supercomputersand satellites. In other diverse domains, such as software engineering, computernetwork management, and parallel program monitoring, researchers, developers, and usersincreasingly consider theimportance of visual presentation of data. A related idea is to provide datamanipulation tools that are themselves visually oriented. The iconic userinterfaces common in today’s workstations are examples of these tools. Visual query languages for databases2 are more ambitious tools for visual data manipulation. In this paper, we give an overview of the approach to visual display andmanipulation of databases that we have been investigating at the University of Toronto for the past few years. We present the design and architecture of the Hy’ visualization system and itsassociatedvisual query language, GraphLog.Moreover, we describe the use of Hy+ and GraphLog in two difWopyright 1994 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free without furtherpermission by computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor.
IBM SYSTEMS JOURNAL, VOL 33, NO 3, 1994
ferent application areas:software engineering and network management. In both areas the data have agraph-like structure that can visualized. be Visual queries explore that structure and help the user of H y + better understand it. The next section gives a tourof the Hy+ system, using a software engineering application as the example. The succeeding section describes the architecture and implementation of the system,
Hy+ provides a user interface with extensive support for visualizing structural (or relational) data as hygraphs.
is fundamentalforconveying manageable volumes of visual information to the user. Selective data visualization can be used to locate relevant data, to restrict visualization to interesting portions of the data (that is, deciding what data to present), and to control level the of detail at which the data are presented (that is, choosing how to see the data). To describe queries, H y + relies on a visual pattern-based notation. The patterns are expressions ofGraphLog the query Overall, the system supports queryvisualization (that is, presenting the description of the queryusing a visual notation), visualization of data constituting the input to thequery, and visual presentationof the result. We present anexample of using GraphLog and its environment H y + for visualizing the structure of the National Institutesof Health (NIH) public domain c+ class library. The IBM XL C+ + compiler was used to extract 13000 facts from the 35 000 lines of code in the library. In the generated hygraph, the nodes represent classes, functions, andvariables.The edges representrelations among them, such as the subclass and friend relationships between classes, the mem relationship between a classand its member functions or variables, the ref relationship between afunction and all variables referenced by it, and a calls relationship between functions.
+
particularly of the query processing and graph layout components.Following that section, there is adiscussion of further applications to software engineering and, subsequently, a discussion of further applications to networkmanagement. We conclude with a discussion for further work. A tour of Hy+
H y + provides a user interface with extensive support for visualizing structural (or relational) data as h y g r ~ p h s an , ~ extension of graphs inspired by Harel's higraphs.4 The H y + system supports visualization of the actual database instances, not just diagrammatic representations of the database schema. Given the large volume of data that the system must presentto theuser, it is fundamental to provide the user with two capabilities. First is the capability to define new relationships by using queries. This capabilityis the traditional way of using database queries: thenewly defined relationship either gives a direct answer to a user question, or it provides anew view of the existing data. The derived data can later be presented visually by the system. The second capabilityis a way of using queries to decide what data to show. The user can selectively restrict the amount of information to be displayed. This filteringof relevant information3 IBM SYSTEMS JOURNAL, VOL 33, NO 3, 1994
CASE (computer-aided
softwareengineering) tools usually provide a staticvisualization of the architecture of a software system.H y + permits a more dynamic approach to the visualization process, allowing different visualizations of the samecomponent to be obtained through different GraphLog queries. In that respect, Hy' cannot only visualize the structureof the classlibrary, but canalso explore and better understand that structure. Any visualization can be queried. Figure 1 contains two examples of queries. The first query is what in GraphLog is called afilter query: a GraphLog expression enclosed in a showGraphLog box. It describes a pattern to be matched in the hygraph designated as the database. As can be seen, oneof the edges in this patternis thick, that is, distinguished. This pattern is a visual way of distinguishing edges that the user actually wants to see after the match is found. This particular filter query searches the database for subclasses CONSENS ET
AL. 459
Figure 1 Examples of a filter and a define query
Figure 2 Definition and filteringof members 0 I
showGraphLog
Dl
F1
L @function(F,L)
I'
C of class class('0bject') that redefine some funcHygraphs extend graphs by using blobs in addition F defined in class('0bject'). From all objects tion to edges to represent relationships among that match this pattern, only the relation mem benodes. A blob in a hygraph represents a relation tween the subclassC and the inherited functionF between a node,called the container node, and a is displayed to the user. Throughout the paper we set of other nodes, called the contained nodes. use the convention that wordsbeginning with an Blobs are hence generalizations of edges and can uppercase letter denote variables, whereas words be used to cluster related nodes together. Visually beginning withalowercaseletterdenoteconthey are represented as a rectangular area assostants. ciated with the container node.The define query of Figure 2 demonstrates how we can change the The other query in Figure 1 is a define query: a representation of a relationship from edges to GraphLog expression in a defineGraphLog box. A blobs. The blob called members clusters thememthick edge here has a different meaning. It repfunction berfunctions of aclasstogether:a resents a relation that is defined every time the F defined at line L is enclosed in the blob assopattern is found in the hygraph designated as the ciated with class(C), whenever mem(class(C), funcdatabase. Thedefine query in Figure 1defines the tion(F,L)). A blob relation can be treatedsimilarly relation uses between a classC1 and another class to an edge relation. The secondhygraph in Figure C2, whenever C1 contains a function F1 that di2 is a filter query that generates a hygraph conrectly or indirectly calls a function F2, which is taining all the members blobs and all the subclass, defined in C2. calls, and uses edges that exist in the database.
460
CONSENS ET AL.
IBM SYSTEMS JOURNAL, VOL 33, NO 3, 1994
Figure 3 Hy+ architecture overview
c = J LAYOUT
I
TEXTUAL LABEL EDTTOR FUNCTOR ICONMAP
T==J
DATA ACQUISITION
PREDICATE COLORMAP EXTERNALLAYOUT
I I VISUAL ATTRIBUTES
QUERY EVALUATOR
u lNETfdAP
DB BACKENDS
+ GXF
QUERY TRANSLATION
READEWRITER
FACTS 4 ~ADE~UlTER
b ANSWER M A N A ~ ~ M E N T BACKEND 4 ~ ~ U ~ I C A T I O N
,
1
'
System architecture and implementation
Hy+ has evolved from the G + Visual Query System presented in Reference 7. The new system is implemented as a frontend, written in Smalltalk, that communicates with other programs tocarry on its tasks, including multiple database backends for the actual evaluationof the queries. An overview of the Hy+ system architecture is given in the diagram in Figure 3.
1
IBM SYSTEMSJOURNAL,VOL
33, NO 3, 1994
Hygraph browsers. Hy+ browsers allow users to interact with the hygraph-based visualizations that the system manipulates. They have extensive facilities for interactively editing hygraphs, including copy, cut and paste, panning and zooming, and textual editing of node and edge labels. Iconsareautomaticallyselectedfornodesaccording to the functor of the node label (that is, the type of data object representedby the node). Similarly, the colors for edges and blobs are auCONSENS. ET AL.
461
tomatically selected basedon the predicatein the label (that is, the relationship represented by the edge or blob). Within Hy+, all blobs associated with one container node are represented by rectangles contained within a rectangular region that has the container node in the topmost left corner. Blob labels are drawnin the interiorof the topmostleft corner of the rectangle representingtheblob. Controloverthe level of detail displayed is achieved by interactively hiding and showing blob contents. When blob contents are hidden, the incoming and outgoing edges are also hidden of the (but there are options to present summaries information carried by the hidden edges). Query processing. Queryprocessing in Hy+ is performed by translating queries (and data, if neeessary)into logic programssuitableforexecution by one of two backends: the logic programming language LDL of Microelectronics and Computer Technology Corporation (MCC)' and the experimental deductive language CORAL of the University of Wisconsin.9 In this subsection, we give a more precise definition of GraphLog, and we describe how the query processing proceedswithin the H y + system.
In G r a p h h g , a term is one of the following: a constant, avariable, an anonymousvariable in (as Prolog), an aggregate function f E {MAX, MIN, COUNT, SUM, AVG} applied to a variable, or a functorf applied to a number of terms. An edge (blob) label is a path regular expression E generatedbythe following grammar, where is a sequence of terms a n d p is a predicate: E + EIE; E.E; -E; (E);E+; E+;E?;p(T); Tp(T)
Databaseinstances are hygraphswhosenodes are labeled with ground terms and whose edges and blobs are labeled with predicates. Database instances of the object-orientedor relational model can easily be visualized as hygraphs. For example, anedge (blob) labeledp(x) from a node labeled T I to a node (containing a node) labeled T , correspondsto tuple ( T 1 , T Z , z )of relation p in the relational model. No key is associated with the relation p. Queries are sets of hygraphs whose nodes arelabeled by terms, and eachedge (blob) is labeled by
462
CONSENS ET AL.
an edge (blob) label. As explained in the previous section, there aretwo types of queries: define and filter. In both, the queryhygraph represents a pattern; the query evaluator searches the hygraph designated as the databasefor all occurrences of thatpattern.The difference betweenthe two types of queries stemsfrom their interpretation of distinguished elements, explained below. A hygraph pattern in a define query (which is enclosed in a defineGraphLog box) must have only one distinguished edge or blob labeled by a positive literal. The meaning of the define query hygraph is to define thepredicate in thisdistinguished literal in terms of the rest of the pattern. The semantics of define queries is given by a translation to stratified Datalog.' Each define hygraph translates to a rule with the label of the distinguished edge or blob in the head and as many literals in the body as there are nondistinguished edges and blobs in the hygraph. Additional rules may be necessary todefine the predicates of nondistinguished edges or blobs that are labeled by regular expressions. The generationof these additional rules is based on the structureof the regular expression. An alternative translation is described in Reference 10. A hygraph pattern in a filter query (which is enclosed in a showGraphLog box) may have several distinguished nodes, edges, and blobs. The meaning of a filter query hygraph is: for each instance of the pattern found in the database, retain the databaseobjectsthat match the distinguished objects in thequery. Given a hygraph in a showGraphLog box, for each distinguished edge (blob), we generate a set of define queries that match thedistinguished object; thatis, when they are evaluated, they determineall instances of the edge (blob) that exist in the portions of the database that match the hygraph pattern. The query evaluator evaluates each of the define queries in turn. The results are combined, and the answerto the filter query is found. From a logic programming point of view, adefine query corresponds to a conventional set of Horn clauses defining a certain predicate, whereas a filter query can be viewed as a ofsetHorn clause bodies in which certain literals are retained after each match and the rest are discarded. In a way, IBM SYSTEMS JOURNAL, VOL 33, NO 3, 1994
Figure 4 Examples with aggregate functions 0
0
defineGraphLog
defineGraphLog class(C1)
"._ ".
afunction(F1,Ll)
class(C2)
-. - $ails+.
rnern
2 .
Adapted from M. P. Consens and M. 2. Hasan, "Supporting Network Management Through Declaratively Specified Data Visualizations," Proceedings of the IEEE/IFIP Third InternationalSymposium on IntegratedNetwork Management, III
define queries generate theorems, while filter queries generate proofs. A formal definition of filter queries and filtering logic programs can be found in Reference 3. GraphLog has the ability to collect multisets of tuplesand to compute aggregate functions on them.The aggregate functionssupported in GraphLogaretheunaryoperators MAX, MIN, COUNT, SUM, and AVG. They are allowed to appear in the argumentsof the distinguished relation of a define query as well as in its incident nodes. As an example of theuse of aggregation in GraphLog, consider the two defineGraphLog blobs of Figure 4. The first one defines the relation quant-uses betweentwoclasses. Relation quant-uses is different from relation uses of Figure 1; the additional attribute in quant-uses defines the degree of coupling between the two classes. Thus, quant-uses(C1,C2,COUNT(F2)) is defined between C1 and C2, whenever C1 contains a function F1 such that F1 calls COUNT(F2) functions F2 in class C2. The second pattern defines the relation avg-usage to measure the average usage of each class as follows: avg-usage is defined between a class C1 and a number AVG(N), where AVG(N) is the average of all numbers N that count the number of calls from functions of class C1 to functions of some other classC2. Note that there is no explicit GROUP-BY list (as in SQL-StrUCtured QueryLanguage). Instead, grouping is done IBM SYSTEMS JOURNAL, VOL 33, NO 3, 1994
implicitly over all variables appearing in the distinguished edge or its end points. GraphLog hashigher expressive power thanSQL; in particular, it can express, with no need for an explicit recursion construct, queries thatinvolve computing transitive closuresor similar graph traversal operations.The language is also capableof expressing first-orderaggregation queries, as well as aggregation along path traversals(for example, shortest path queries). l1 Data acquisition. The Hy+ system relies on other programs (which are part of the Data Acquisition subsystem) to supply the raw data to bevisually manipulated within the system. The File Manager subsystemcandirectly import files containing logical facts (like the ones produced by the XL C + + compiler for the NIH database of the previous section). These files can also be obtained from relational and deductive databases. External hygraph representation. Graph Exchange Format (GXF) is a specification for a portable external representation for directed graphs and hygraphs. l2 Hy+ can read and write graphs in GXF format from and to UNIX** files, and uses optional records or "extensions" of GXF to store all layout and display-related information. Processor programs may use these Hy+-specific extensions to manipulate the visual representation of a hygraph, for example, for layout tasks. CONSENS ET AL.
463
Figure 5 Synchronized graphical and textual browsingof source code
?le