Comparative Study of Various Graph Layout Algorithms - IJETTCS

7 downloads 1348 Views 244KB Size Report
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS). Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected] Volume 3, Issue 2, March – April 2014 ISSN 2278-6856

Comparative Study of Various Graph Layout Algorithms Mayur Narkhede1, Dr. S. T. Patil2 and Vrushali Inamdar3 1

M.Tech CSE/IT, Vishwakarma Institute of Technology, Pune, India 2 Prof., Vishwakarma Institute of Technology, Pune, India 3 Team Lead, Persistent Labs, Pune, India

Abstract:

Information visualization has become a very large and extensively useful for the users in various domains like biology, computer network visualization, social networking etc. Information visualization and graph drawing uses a basic data structure called as graph. As the technology is getting developed, more information needs to be analyzed which leads to devise a mechanism of complex data to be made available to user as a graph drawing. This complex data is known as clustered graph visualization. In this user need to view the relation among the various clusters and their hierarchy. But to start studying the clustered graph algorithms we need to have basic understanding of different graph visualization mechanisms. This article is intended to beginners who are interested in developing their clustered graph algorithm. This paper will solve various analytical questions like which parameters for the graph should be considered, what type of edges directed or undirected, which type of aesthetics should be considered, maintaining the hierarchy between the clusters, which type of grid has to be chosen. The different parameters are analyzed with respect to clustered graph algorithm, considering the basic layout algorithms present already.

Keywords: Graph layout, information visualization, graph drawing, clustered graph, circular layout, tree layout, force based layout, layout algorithm.

1. INTRODUCTION Networks are increasingly encountered in numerous fields of study. A wide variety of situations can be modeled using networks (i.e. graphs) and many data sets are most naturally interpreted and depicted as networks. Graph visualization can be used to analyze various types of data in the computer world and cutting edge era of internet of things [5]. Graph is the structure which can be used to implement visualization of data. This structure has made the clustered visualization also possible in practical scenarios to graph drawing [4]. Comprehensive surveys of techniques for information visualization are available, and an entire discipline called graph drawing has matured, despite the availability of such software, researchers, students, and others who are competent at programming may wish to implement their own network visualizations and drawing. This may be to implement graph visualization on a new computing platform, or to integrate visualization within a larger software application. It may also be to learn the details of network visualizations, possibly as the first step of a research project. Fortunately, there are some basic network lay outing Volume 3, Issue 2 March – April 2014

algorithms that are easy to understand and implement. This article discusses such available algorithms and their usages with various requirements in different fields of study, and gives sufficient details for a competent programmer to implement them. In information visualization the kind of information which user wants to visualize can be stored in the graph structure where Graph is a set of nodes, edges and the same graph structure in itself as child graphs within hierarchy. We use the term network as a substitute for graph, which can be defined as an ordered pair(N;E) of a set N of nodes and a set E of edges. Two node n1,n2 ϵ N are adjacent if and only if there exists an edge {n1,n2} ϵ E in which case n1 and n2 are neighbors [4]. The degree of a node is a number of adjacent nodes connected to this node with an edge existing in between them. The most common graphical representation of a network is a nodeedge diagram, where each node is shown as a point, square, circle, ellipse, or some other small graphical object and each edge is shown as a line segment or curve connecting two nodes [4]. Drawing directed graphs for layout algorithms is an important data visualization technology. Take every entity as a node in the directed graph and the dependency relationship between the entities can be expressed by the edge between nodes. Edge direction is determined by the dependency relationship, i.e., if entity A use the service provided by entity B, then we say entity A depends on entity B and there exists a directed edge from the node on behalf of entity A(source node) to the node on behalf of entity B(target node). Many sophisticated algorithms exist for computing the positions of nodes and edges in such diagrams. We will discuss some basic algorithms commonly used in information visualization with considering their various parameters. 1.1 Typical Application Areas Graph visualization and drawing has various applications in different areas. Most people have encountered a file hierarchy structure on a computer system. A file order can be represented as a tree (a type of graph). It is frequently necessary to traverse through the file order in order to find a particular file. Any person who has done this has possibly experienced a few of the complications involved in graph visualization: Where am I? , Where is the file that I'm looking for? Other conversant types of graphs include the hierarchy illustrated in an Page 183

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected] Volume 3, Issue 2, March – April 2014 ISSN 2278-6856 organizational chart and taxonomies that represent the relations between species. Web site maps are other application of browsing history and graphs. In chemistry and biology, graphs are applied to molecular maps, protein functions, evolutionary trees, genetic maps, phylogenetic trees and biochemical pathways. Other areas of application include object–oriented systems (class browsers), data flow diagrams, data structures (compiler data structures in particular), subroutine–call graphs, entity relationship diagrams (e.g. UML and database structures), real–time systems (state–transition diagrams, Petri nets), VLSI (circuit schematics), virtual reality (scene graphs), semantic networks and knowledge– representation diagrams, project management (PERT diagrams), logic Programming (SLD–trees), and document management systems [5]. The information isn’t at all times assured to be in a purely hierarchical format, this necessitates techniques which can deal with more general graphs and different type structures than trees. 1.2 Key Issues in Graph Visualization The size of the graph to view is a key issue in graph visualization and drawing. Complex graphs pose a number of difficult problems. Even if it is probable to layout and displays all the elements, the problem of viewing ability or usability arises, because it will become difficult to discern between nodes and edges [4]. If the number of components is large, it can compromise performance or even reach the boundaries of the viewing platform. In fact, usability becomes an issue even before the problem of discernibly is reached. Modelling of the clusters in the whole structure is very difficult, clusters invokes more complexity while implementing the algorithm. It is well known that comprehension and detailed analysis of data in graph structures is easiest when the size of the displayed graph is small. In general, lay outing an entire large graph may give a sign of the overall structure or a position within it but makes it difficult to comprehend. 1.3 Aesthetics Aesthetics are the parameters in the graph layout which decides the complexity and usability of layout in visualization [11]. Following are some aesthetics that are considered in most of the layouts. Bends: Increasing the number of bends reduces the understandability of graph. Crosses: Increasing the crosses between edges reduces the understandability of graph. Angles: Maximizing the minimum angle between edges leaving the nodes in the graph drawing increases the visualization of graph. Orthogonality: Fixing the nodes and edges of graph to an orthogonal grid increases understandability of graph. Symmetry: Increasing symmetry makes the layout more understandable. Volume 3, Issue 2 March – April 2014

2. LAYOUT ALGORITHMS 2.1 Circular Layout Circular layout keeps the physical objects on the circumference of a circle. Some circular drawings place hub nodes, or nodes only connected to other nodes, at the center of the circle as well [1]. This layout incurs the difficulty in lay outing objects if the size of the objects is variable. Circular layout is mainly based upon the calculation of radius for variable size of the nodes in the graph structure. In this layout we can have both the types of layouts, straight lines edges and curved edges. We will consider the straight line edge so that the readability of the graph enhances. This layout requires ordering to be applied to the nodes so that the crossing between the edges remains minimal [2],[12]. As we need to arrange the nodes on the circumference, it restricts the area for the nodes to spread. This may incur crossing between the edges and hence we cannot minimize the crossings beyond a certain extent.

Figure. 1. Simple circle layout showing the variable size and proper ordering of nodes. Circular layout is used in the applications where that application requires the visualization of information having more emphasis on the aesthetics of drawing. Aesthetics indicates the designers of the lay outing algorithms the most effective way of analyzing the relational information. Figure 1 shows circular layout for simple graph with variable node sizes. An inherent problem with circular layouts is that the rigid constraint on node placement often gives rise to long edges and an overall dense drawing. The circular layout is mainly used for small-mediumsized data analysis, which combines the main advantages of radial layout and hierarchical layout. As a result, we got high-level layout space utilization in the same proportion, with clearer hierarchy structure and less cross-borders, and the clarity of clustering performance is better. Simple circular layout is based upon the various parameters to be considered at the time of layouting. The parameters like ordering of the nodes decides the crossing the circle, the size of every node decides the radius and Page 184

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected] Volume 3, Issue 2, March – April 2014 ISSN 2278-6856 circumference of a circle. The distance between the adjacent nodes should be symmetric. These all parameters are handled by only tweaking the following calculations for every node as:LaidOutX = CenterX + Radius * cosine(Angle). LaidOutY = CenterY + Radius * sine(Angle). To shift center of any cluster you need to modify the center coordinates in the above models. Adjusting of the spreading between the nodes will be added by incrementing the radius in the above models. Next node will be placed at a deviation of angle used in the above models. In this way tweaking is performed in the circular layout by modifying the above mathematical models. The ordering of the nodes is also a critical dynamic problem which needs to solve in the circular layout. Hence we have used the topological ordering with respect degree of a nodes. This ordering will select the highest degree node first and will place all adjacent node according to the next immediate node and applies the circle to all nodes in the ordering list. Calculation of radius for the circle is also a difficult problem considering the size parameters of the node which is variable for every node entity in the graph. Boundaries of Clusters need to be calculated considering the size parameters of the graph entities. 2.2 Tree Layout The standard, top–down drawing of trees has huge advantages and was widely used in many applications. Its use for presenting hierarchies benefits from its natural interpretation. In the improved tree layout algorithm, the user can designated an entity as root node according to known information. Then, the layout will be carried out, while each sub-node of the root node will be the parent node of the next level nodes. The clarity of hierarchy structure is clear so that it becomes easy to view the association properties of special entities. This algorithm is broadly divided into four steps. Firstly, we should determine the layers which each node belongs to according to the direction of edges. Then, adjust the order of nodes in each layer in order to reduce global crossings in the whole structure. After that, we need to adjust the location of nodes in each layer to shorten the length of the edge and finally we should draw the edges. Computational complexity of the algorithm is O ( ). The advantages of this algorithm are shown as following: clarity of hierarchy structure of dependency relationships and less cross-border. However, the drawback is that when the number of nodes is too large, the drawing area occupied will be relatively large, which is inconvenient to display all nodes in overall view. The tree layout is suitable for large and medium-sized data, and the data need to have a clear hierarchy. 1) Assign y coordinates: 1. Define yCurrentCoordinate 0 Volume 3, Issue 2 March – April 2014

2. Calculate yIncrement(Y Space available – margin) / depth of the graph 3. Iterate every vertex V in in-order traversal 3.1 Assign y coordinate to the vertex. 3.2 yCurrentCoordinatedepthOfVertex * yIncrement. 2) Distance (Vertex U, Vertex V) Application specific Distance function logic is defined here 1. distance  difference between Depth(U) & Depth(V) * distanceConstant (layout width parameter) 2. Return distance value. 3) Assign x coordinates: 1. Set width needed by every vertex including its child vertices. 2. Forach node(BFS Order) – rootNode 2.1 Distribute space of rootNode into its child nodes in proportion of their width requirement. 2.2 Draw child vertices in the center of their assigned horizontal space. Where, space means horizontal x coordinate range. And width means integer based on no. of child vertices spread tree width horizontally. 4) Display layout: Display the layout on the rectangular grid canvas according to the x, y coordinates of the nodes and edges. In the first step, the algorithm computes the y coordinates for every node in graph. The coordinates are provided in in-order traversal sequence therefore the child nodes of any root node within the graph are equally distributed below the root node, this improves the symmetry of the layout. Subsequently the x coordinates are assigned to the nodes. This is a unique feature of the algorithm that the x coordinates are assigned lately. Because of this the application specific Distance(U, V) gives way to apply custom distance properties in the layout. Calculate the distances between nodes and assigns the x coordinate. This layout is easy to interpret because of its horizontal and vertical nature for tree types of graphs. It leaves space for edge annotations which is a good feature horizontal tree algorithm provides explicitly. On the other side it is not suitable for cyclic graphs. The cyclic graph does not have tree structure it produces graph algorithm to compute for infinite time due to recursion. This implementation is challenging for clustered graphs but we can device at a cost of more inter cluster crossings. As it is very simple layout hence difficult depict the complex clustered graphs. Also trees with large breadth stretch the layout vertically. This decreases the readability of the laid out graph. 2.3 Force Based Layout A basic force-directed layout algorithm with certain additions to satisfy the general drawing conventions in clustered graphs is explored. This clustered graph model Page 185

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected] Volume 3, Issue 2, March – April 2014 ISSN 2278-6856 adds the analogy of a graph structure which is set of nodes, edges and clusters [3]. The elementary idea of the layout algorithm is to simulate a physical system in which nodes are assumed to be physical objects with has a certain electrical charge, connected via springs of a prespecified desired length in system[3],[12]. Objects attract or repel each other depending on the current lengths of any connected springs in the model. Repulsive force acts between the objects i.e. nodes that are too neighboring to each other so that layout will consists of very fewer nodeto-node overlaps.

Figure. 2. Force based layout for simple graph, circles are the nodes of graph and lines indicate the edges between the nodes We consider gravitational forces to keep graph components together on the surface area which acts on the components of graph and keeps the components bounded with the force. In order to handle varying node sizes i.e. clusters, variable size nodes and to avoid overlaps with neighboring nodes, calculation of edge lengths are based on the parts of edges in between the borders of end-nodes, as opposed to their centers. And finally total energy of a system is calculated based upon considering all the forces acting on the system. If total energy of a system is calculated as the minimal among other solutions then this state of the system is optimal layout state. Figure 2 shows force based layout applied for the simple graph and produces aesthetically beautiful graph laid out due to forces acting among nodes and edges. Force based layout is applied in various stages with respect to some specific force acting in the specific stage. At the initial stage the size of nodes, random initial position of the node is calculated and threshold values for deciding the convergence is calculated. The graph which is prepared for the processing in the initialization stage is called the skeleton graph [3]. This first initial stage incurs the complexity of visiting the node once in the graph. And we have three stages to apply forces as follows:  In stage 1, the skeleton graph which is prepared for the processing under the forces is undergone in the influence of spring force. And other forces acting on it are not considered. Volume 3, Issue 2 March – April 2014

 The skeleton graph which is reduced in the initialization stage is expanded level by level and kept under the influence of forces. In this stage 2 all the forces i.e. gravitational, spring, repulsive forces are considered.  Stage 3 is the stabilization of the layout and it involves the polishing of the graph layout. In force based layout, as we are using clustered graph and treating single cluster as a node, which don’t allow us an edge between two cluster nodes or between a cluster node and node belonging to that cluster. All such scenarios are allowed for compound graph. Hence in clustered graph when applying the forces if cluster nodes do not allow edges, forces are not present which needs to be handled so as to avoid overlaps. Hence we need to focus at the start of implementing the algorithm to apply forces as per the clusters or whole graph without considering the clusters and at last we can maintain clusters after applying forces. In this way clusters impose some difficulty level in the layout algorithms. Force Based Layout Algorithm: Force based layout using following concepts:1) Clipping Points: - Non- uniform node dimensions require force calculations to be based on clipping points rather than node centres. Figure 3 shows the clip points Px and Py for two nodes X and Y respectively.

Figure. 3. Clipping point concept 2) Spring Force: - Applied only between directly connected nodes.

Where

is elasticity constant. Spring force applied between connected nodes,

Cut point at node X, Cut point at node Y, Unit vector of 3) Repulsive Force: - Applied to every node within certain range.

Page 186

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected] Volume 3, Issue 2, March – April 2014 ISSN 2278-6856

Overlaps

Where, Repulsive force, Repulsive force constant. 4) Gravitational Force: - Applied to nodes within the same cluster which is a force attracting nodes towards the centre of cluster.

Edge Length

Area

5) Application Specific Force:Sophisticated graph visualization applications require specific constraints to be integrated into layout algorithms. These constraints may vary arbitrarily. Algorithm: ApplyForceBasedLayout(Graph) 1) Initialize graph M 2) phase := 1 3) if ( layout is incremental ) then //respect current positions 4) phase :=3 5) while ( phase 3 ) do 6) step := maxIterCount[ phase ] //use predefined iteration limits per phase 7) while ( step> 0 or !allTreesGrown of skeleton graph ) do 8) Apply spring forces on graph M 9) Apply repulsion forces on graph M 10) if ( phase 1) then 11) Apply gravitational forces on graph M 12) Apply application specific forces on graph M 13) Calculate node positions and sizes according to total forces acting on per node 14) if( phase=2 and !allTreeGrown and step % treeGrowingStep = 0 ) then 15) Grow trees one level of graph M 16) step := step – 1 17) phase := phase + 1 Force based algorithm generates the drawing by applying all these forces calculations and finally all forces are added and total force is applied to get a layout [12]. This is carried for many number of iterations till graph gets laid out.

3. COMPARISON OF LAYOUT ALGORITHMS Table 1 shows the comparison of all three basic lay outing algorithms. These parameters are very useful when anyone is planning to start implementation on the complex graph layouts. Table 1: Various parameters in layout algorithms

Computatio ns Crossings

Force based Layout More calculations Less Edge Crossings

Circular Graph Layout Less calculations More Edge Crossings are

Volume 3, Issue 2 March – April 2014

Tree Graph Layout Less calculations Very less crosses

are present Less Overlappin g Edge length is important to calculate forces Packing is required

Symmetry

Produces symmetric structure

Parameters Handling

More parameters need to be taken care Large number of iterations are required Bends are not present

Iterations

Bends

present More overlapping can be present May or may not differ

Other methods are required No symmetry

Very less overlapping

Handling is difficult

Packing is required

Less parameters

More symmetric structure can be produced Less parameters

Less iterations

Less iterations

Not present

Need to handle bending mechanism

When we want to implement the clustered graph we need to manage the two approaches between clusters and crossing[3],[6],[11]. This arises in two cases as, in the first approach, the whole graph lay outing i.e. if you apply layout first on whole graph and then added vicinity to maintain clusters leads to handling of spacing and crossings of edges in the whole graph. And another way if you apply layout cluster wise leads to intra-cluster crossings and packing of clusters.

4. CONCLUSION We have studied and presented some of the basic layout algorithms which need to be analyzed before going for any kind of new complex visualization mechanism. These elementary algorithms should consider some parameters while using its techniques are also discussed. We have analyzed the features of each layout algorithm and presented some suggestions to users on how to choose layout algorithm for specific areas.

REFERENCES [1] J. M. Six and I. G. Tollis, “A framework and algorithms for circular drawings of graphs.” Journal of Discrete Algorithms, 4(1), pages 25–50, 2006. [2] E.R. Gansner and Y. Koren, “Improved Circular Layouts,” Proc. 14th Int’l Conf. Graph Drawing, pp. 386-398, 2006. [3] U. Dogrusoz, E. Giral, A. Cetintas, A. Civril, and E. Demir, “A Layout Algorithm for Undirected Compound Graphs,” Information Sciences, vol. 179, pp. 980-994, 2009. Page 187

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected] Volume 3, Issue 2, March – April 2014 ISSN 2278-6856 [4] Michael J. McGuffin, “Simple Algorithms for Network Visualization: A Tutorial,” TSINGHUA SCIENCE AND TECHNOLOGY, pp383-398, Volume 17, Number 4, August 2012. [5] Ivan Herman, Member, IEEE CS Society, Guy Melançon, and M. Scott Marshall, “Graph Visualization and Navigation in Information Visualization: a Survey”, IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 6, 2000. [6] T. Dwyer and F. Schreiber. “Optimal leaf ordering for two and a half dimensional phylogenetic tree visualization.” In N. Churcher and C. Churcher, editors, Information Visualisation (Proc. invis.au 2004), volume 35 of Conf. Res. Pract. Information. Technology, pages 109–115, 2004. [7] Bruce H. McCormick, Thomas A. DeFanti and Maxine D.Brown. “Visualization in Scientific Computing. ACM,” Press 1997 [8] Press, 1987 J. Felsenstein. “The newick tree format.” http://evolution.gs.washington.edu /phylip/newicktree.html, 1995 [9] Reingold E.M. and Tilford J.S. Tidier, “Drawing of Trees,”.IEEE Transactions on Software Engineering, vol.7, NO.2, 1981, pp.223 – 228 [10] Battista G D, Tamassia R, Tollis I G, “Graph drawing: Algorithms for the visualization of graphs,” Prentice-Hall, 1999 [11] Helen Purchase, “Which aesthetic has the greatest effect on human understanding?” The University of Queensland, Australia [12] Ugur Dogrusoz, Mehmet E. Belviranli, and Alptug Dilek, “CiSE: A Circular Spring Embedder Layout Algorithm” IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 19, NO. 6, JUNE 2013 [13] Luhe Hong, Fanlin Meng, and Jianli Cai, “Research on Layout Algorithms for Better Data Visualization” ISCSCT, Huangshan, P. R. China, 26-28, pp. 369372, Dec. 2009.

Volume 3, Issue 2 March – April 2014

Page 188

Suggest Documents