2013 10th International Conference Computer Graphics, Imaging and Visualization
Improving the Quality of Clustered Graph Drawing Though a ‘Dummy Element’ Approach Mao Lin Huang1.2 School of Computer Software, Tianjin University, Tianjin 300072, China 2 Faculty of Engineering and Information Technology University of Technology Sydney, Sydney 2007, Australia {
[email protected]}
Jie Hua Faculty of Engineering and Information Technology University of Technology Sydney, Sydney 2007, Australia
1
in other groups (clusters). Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances among the cluster members, dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multi-objective optimization problem [1, 7]. There are many different clustering methods, and more commonly those algorithms on based on the following arrangements: • Connectivity based clustering (hierarchical clustering) • Centroid-based clustering (k-means clustering) • Distribution-based clustering • Density-based clustering People may get very different final layouts of the same clustered graphs by running different clustering algorithms as described above. Force-directed layout algorithms use a physical analogy to draw graphs. A graph is viewed as a system of bodies with forces acting between the bodies. The algorithm seeks a configuration of the bodies with locally minimal energy, that is, a position for each body, such that the sum of forces on each body is zero. And the method is easy to understand, the results is normally good [3, 4, 5, 6, 9, 17]. The combination of the clustered graph structure and the force-directed layout algorithm may provide better quality graph layouts in term of aesthetical satisfaction. However, the layouts could be varied based on different clustering methods, and the final layouts could be sometimes still difficult to understand after applying the force-directed algorithm. This is because there are different connections among the vertices and therefore this may lead to the totally different result. To solve this problem, we propose to use the ‘dummy element’ to reduce the side effects of applying different clustering algorithms to provide aesthetically pleasant graph layouts.
Abstract— Clustered graphs have been widely consisted in Graph Drawing to overcome the problem of drawing large (or huge) graphs with thousands, or perhaps millions of nodes. Force-directed algorithm is one of the available approaches to draw such graphs with good layouts. However, the forcedirected methods are often produce very different layout outcomes based on different clustering algorithms that greatly affect the quality of the final result. In this paper, we propose a ‘dummy element’ approach for drawing clustered graphs by using the traditional force-directed algorithm. Our approach attempts to maximize the satisfaction of aesthetics in graph drawing. The experimental results shown that our new method can reduce the side effects on choosing different clustering methods and improve the quality of final layouts. Keywords- Graph Drawing, Graph Clustering, ForceDirected Graph Layout, Information Visualization
I.
INTRODUCTION AND BACKGROUND
Graphs generated in real-world applications could be very large with thousands or perhaps millions of nodes, such as academic citation and collaboration networks and the World Wide Web (WWW). As the result of rapid increasing of the size in networks, how to draw a large graph with clear representations of data and its network structures is the challenge to the graph drawing community. The key issue here is not only to provide users with a comprehensive display of large graphs on the screen, but also to efficiently show users the whole structure of the graphs. Therefore, some attempts to overcome this problem have proceeded in two main directions: • Clustering - groups of related nodes are clustered into super-nodes. The user sees a summary of the graph with the super-nodes (clusters) and super-edges between the supernodes (clusters). Some clusters may be shown in more detail than others [1, 6, 22, 24]. • Navigation. The user sees only a small subset of the nodes and edges at any one time, and facilities are provided to navigate through the graph [2, 8, 10, 12, 13, 20, 25]. Clustering is the task of grouping a set of objects in such a way that objects in the same group (called cluster) are more similar (in some sense or another) to each other than to those 978-0-7695-5051-0/13 $26.00 © 2013 IEEE DOI 10.1109/CGIV.2013.23
88
II.
Therefore, the selection of different constant nlim will lead to the different grouping of clusters. In our experiments, we set the value of nlim to the largest number of connections of a single vertex in graph G.
METHODS
In our experiments, we used ‘dummy element’ approach to draw clustered graphs in two different ways and then compared the final results after applying the forces to the graphs, to see if the ‘dummy elements’ added improve the final layouts. We describe our approach in the following three subchapters. A. Container Clustering Algorithm (CCA) The CCA clustering is based on the connectivity of vertices in the graph. The basic idea is that if a vertex vi is assigned in a cluster cj, then we intend to include all its connected vertices in the graph in this cluster. Suppose that nlim is an initial (minimum) number of the vertices in a cluster. We calculate the value of nlim based on the maximum connection numbers in the graph. Assume that G = (V, E) is a connected undirected graph, where V is the set of vertices and E is the set of edges among V. A cluster graph C = (G’, T) consists of graph G’ = (V’, E’) and a rooted tree T, where G’ is a sub-graph of G. The CCA algorithm can be described below: a.
b. c.
d. e.
f. g.
Figure 1. A graph G consisting 12 vertices and 20 edges.
B. A Classic Force-Directed Layout Algorithm The force-directed algorithm’s is to position nodes with as few crossing edges as possible by assigning forces among the set of nodes and edges for drawing graphs in an aesthetically pleasing way. The spring forces are used to keep all elements in reasonable distances: not too close and not too far. The force-directed algorithms achieve this by assigning forces amongst the set of edges and the set of nodes. The entire graph is then simulated as if it were a physical system. In the force-directed algorithm, we need to calculate all the forces work on every element, and then place them to suitable position to avoid edge crossings. There are three steps for each iterative calculation.
If vię V has ni connections, where ni >= nlim, then we add all vertices that connected to vi into the same cluster cji, except those vertices that have more connections than nlim; Repeat step (a) until every vertex satisfies the conditions described in (a); If a vertex vml does not belong to any cluster, where vml בV’ and it has only one connection (vm1, vm2), if vm2 does not belong to any cluster vm2 בV’, then we add those two vertices vm1 and vm2 into a new cluster cmi אC; Repeat step (c) until every vertex vml בV’ satisfies the condition described in (c); After steps (a) and (c), if there is an excess vertex vk which is still not included in any cluster vk בV’, we then randomly choose a vertex vrandom to connect to v k, 1) If vrandom does not belong to any cluster vrandom בV’ then we add those two vertices into a new cluster ck; 2) If vrandom belongs to any cluster vrandom אV’, then we add vertex vk into the same cluster c, where; vrandom
אǤ Repeat step (e) until all vertices are included in clusters; The final clusters C = { cj 1, … cj n,, cm 1, … cm m , ck1, …, ckx }.
1) Calculate the effect of attractive forces fa(d) = d2/k between adjacent vertices; 2) Calculate the effect of repulsive forces fr(d) = k2/d between all pairs of vertices; 3) Finally stop the iteration if fa and fr tend to be not changed. Where d is the distance between two vertices and k is the optimal distance between vertices. C. Dummy Elements As the traditional force-directed layout can handle only limited number of vertices in terms of it computational complexity, we propose the use of a set of dummy elements in the drawing level to reduce the computation complexity in drawing large graphs as well as to reduce the human cognition process through the design of extra level of view abstraction. The dummy elements include ‘dummy vertices’ and ‘dummy edges’.
For example, Figure 1 shows a graph G consisting 12 vertices and 20 edges. If we choose nlim = 4, then the proposed algorithm will produce six clusters with six corresponding vertex groups: {v0, v1, v5, v6, v9}, {v2, v3}, {v4}, {v7}, {v8 , v11}, { v10}; If we choose nlim = 5, then only two clusters are produced as {v0, v1, v5, v6, v9}, {v2, v3, v4, v7, v8, v10, v11}.
89
n between two vertices vm If there is a connection and vn (vm אcj, vn אck,,cj אC, ck אC, mn, jk) where (vm, vn אE) , theen we add a dummy edge e’bi(cj, ck); Thereforre, the dummy edges Edummy(C) = {e’b0 ,.. e’bj}; See picture showed in Figure 2.
1). Dummy Vertex A ‘dummy vertex’ is a red dot (or ssmall circle) that representing a cluster cj in its abstract statuue (or its closing state). In our experiments, we classifiedd the undirected connected graphs, and used dummy verticees to represent all clusters generated by CCA. For example, after applied CCA on grraph G shown in Figure 1, we got six clusters { c0 , c1, c2 , c3 c4, c5}, where {v0, v1, v5, v6, v9}ę
Ͳ,, {v2, v3}ę
ͳ,, {v4}ę
ʹ,, {{v7}ę
͵,, {v8, v11} ę
Ͷ,, {v10}ę
ͷ, and used {c0, c1, c2, c3, c4, c5} as shown in Figure 2. The use of dummy vertices can reduce tthe complexity of views and provides users with a clear undeerstanding of the clustering structure of the vertices.
Figure 3. The example of Entire Dummy Edges E in clustered graph C(G) drawing.
As we can see from the two picctures in Figures 2 and 3, there are five more red edges drawn n in Figure 3. These extra red dummy edges {e’v0 ... e’v4} app plied among the clusters because that no third party conneections are found in the graph. Therefore, it is obvious that there are normally more dummy edges in B. In fact, the drrawing in Figure 3 looks better than the one in Figure 2, beecause that the additional spring forces have been applied amo ong dummy vertices.
Figure 1. The visualization of a clustered graph C(G) in its abstract view, displaying only its dummy elements, where G is the saame graph shown in Figure 1.
3). Dummy Edge Length In our experiments, the dummy edge lengths are not the same and the basic aims for changiing the length of dummy edges is to:
2). Dummy Edge A ‘dummy edge’ is a virtual edge (ci, cj) that links two clusters and it is also visually represented aas a straight-line. In addition, if two nodes vm and vn havee no connection, where (vm, vn) בE, but they are all conneected to the third node vj, where (vm, vj) אE and (vn, vj) אE then we will consider that vm and vn are third party connected. We used two ways to add dummy edgees among dummy vertices.
• •
Keep all the clusters faarther; Keep all the verticess within a same cluster closer.
Suppose lini is the edge length in the original graph, and nvi is the edges number of the veertex vi , nci is the edges number of the cluster ci , we definee four models of dummy edge lengths:
A. Entire dummy edges: If there is a connection beetween any two vertices vm and vn (vm אcj, vn אck,,cj אC, ck א C, mn, jk) where (vm, vn אE E) , then we add a dummy edge e’bi(cj, ck); If there is no connection beetween any two vertices vm and vn (vm אcj, vn אck,,cj אC, ck א C, mn, jk) where (vm, vn בE)) , and there is no third party connection betweenn vm and vn , then we add a dummy edge e’vi(ccj, ck); Therefore, the dummy edges Edummy(C) = { e’b0 ,.. e’bj , e’v0 .. e’vk }; see the picture shown in Figure 3.
1. 2.
3.
4. B. Limited dummy edges :
90
Edge length between two clusters cm and cn is Lcmn = (1+(0.12*(ncm+ncn-1)))*lini i *1.3; Edge length between two vertices v vi and vj within a same cluster ci is Lcv2v = (1+(0.11*(nvi+ nvj -1)))*lini*3/2; Edge length between one vertex vi and a cluster cn (viאcm, vjאcn, mn, ij) wheere (vi, vj ) אE, Lv2cc = (1+(0.115*( nvi+ ncn -1)))*lini/2; Edge length between onee vertex vi and a cluster cn(viאcm, vjאcn, mn, ij) where w (vi, vj) בE,
Lv2cn = (1+(0.115*(nvi+ ncn -1)))*lini*1.5;
63.107 64.111 65.142 66.226 67.117 68.137 69.185 70.144 71.108 72.179 73.155 74.129 75.154 76.151 77.205 78.165 79.286 80.222 81.222 82.287 83.248 84.227 85.260 86.172 87.171 88.171 89.224 90.223 91.204 92.149 93.164 94.258 95.224 96.139 97.272 98.260 99.206
We apply those four models on C(G) to change the edge length on Entire dummy edges and Limited dummy edges methods. III.
OUR ‘DUMMY ELEMENT’ APPROACH
This section describes the details of our approach that uses the dummy elements to improve the quality of forcedirected graph drawings: • • •
• • • •
The first step is to apply the force-directed algorithm on a given graph G = (V, E); Apply the CCA to create a clustering structure about the given graph G = (V, E); Apply the force-directed algorithm on clustered graph C(G) = (G’, T) with all its clusters ‘close’ as red dots in the layout based on two different dummy edges methods with variable edge lengths; Wait until the its (force-directed drawing) convergence process completed and reaches the energy balance; ‘Open’ all its clusters in the layout of C(G); Apply the forces on the vertices within same clusters separately again to achieve the energy balance; In the final step, compare the qualities of final layouts (e.g. edge crossing). IV.
EXPERIMENTS AND ANALYSIS
We created 50 connected / undirected graphs randomly range from 50 vertices to 99 vertices for testing, then applied the CCA and forces on each graph, compared the qualities of the final layouts. See Table 1 for the results of our experiments.
Original Edge Crossings
50.128 51.146 52.138 53.89 54.130 55.126 56.124 57.111 58.118 59.108 60.106 61.156 62.86
1227 1656 1091 170 1063 1143 1107 607 578 445 322 2075 137
Cluster Opened Edge Crossings Limited 992 1397 1177 218 1202 1058 835 633 513 380 409 1878 201
360 571 1234 3608 582 962 2043 952 376 1775 1172 765 1129 1000 2763 1298 6129 3239 2516 6370 4468 3199 5018 1339 1008 1227 2483 2410 2656 629 855 4903 2659 542 5255 4303 1756
241 254 544 2596 266 593 1366 586 153 957 675 341 633 541 1529 707 4159 1742 1738 4130 2842 2179 3048 714 706 578 1947 1678 1123 444 612 2363 1463 235 3170 2565 1258
In Figure 4, the X-axis is the |V|; the Y-axis is the number of the edge crossings. The blue line indicates the changes of the edge crossings on the original graph; the orange line represents the changes of the edge crossings on the clustered graph based on the limited dummy edges method when clusters opened; and the red line shows the changes of the edge crossings on the clustered graph based on the entire dummy edges method when clusters opened. From the image in Figure 4 we can see that most values of the limited dummy edges method are below the original ones, but some values of the limited dummy edges method are even higher than the original ones, e.g. two black points in the image of Figure 4, and all the numbers of the edge crossings on entire dummy edges method are smaller than the original edge crossing numbers, it proves that our methods can reduce the edge crossings effectively, and they have more efforts on the ‘complex’ graphs with more edge crossings, as we can see from the peak values on Figure 4. Figure 4 shows the experiments results.
Table 1. The results of edge crossings of the experiments from 50 nodes to 99 vertices.
Graph (|V|.|E|)
404 417 1006 4848 400 904 2632 1345 257 2365 1058 454 1180 580 3112 1299 8507 4525 3237 7936 4242 3473 5727 1545 981 1130 3420 3344 2017 672 1022 4629 3355 310 7259 4580 1565
Entire 577 823 653 151 566 499 420 389 314 279 132 1016 78
91
[7]
Brandes, U., Delling, D., Gaertler, M., Gorke, R., Hoefer, M., Nikoloski, Z., Wagner, D., On Modularity Clustering, Knowledge and Data Engineering, IEEE Transactions on, pp. 172 - 188 Volume: 20, Issue: 2, Feb. 2008. [8] Sarkar, M. & Brown, M.H. 1994, ‘Graphical fisheye views’, Communications of the ACM, Volume 37 Issue 12, pp. 73 – 83. [9] Battista, G.D., Eades, P., Tamassia, R. & Tollis, I.G. 1999, Graph drawing algorithms for the visualization of graphs, Prentice-Hall, New Jersey, USA. [10] Huang, M.L., Eades, P. & Wang, J. 1998, On-line animated visualization of huge graphs using a modified spring algorithm, Journal of Visual Language and Computing, no. vl980093, pp. 623645. [11] Hua, J., Huang, M.L., Huang, W.D., Wang, J.H., & Nguyen, Q.V, 2012, Force-directed Graph Visualization with Pre-positioning Improving Convergence Time and Quality of Layout, 16th International Conference on Information Visualisation, pp. 124-129. [12] Huang, M.L., Eades, P. & Cohen, R.F. 1998, Webofdav – navigating and visualizing the web on-line with animated context swapping, Computer Networks and ISDN Systems, 30(1), pp. 638-42. [13] Nguyen, Q.V. & Huang, M.L. 2005: EncCon: an approach to constructing interactive visualization of large hierarchical data. Information Visualization 4(1): 1-21. [14] Huang, M.L., Huang, W. & Lin, C.C. (2012) Evaluating ForceDirected Algorithms with a New Framework. The 27th ACM Symposium on Applied Computing (SAC’12), Trento, Italy, 25 Mar 29 Mar, 2012. [15] Bertault, F. (1999) A force-directed algorithm that preserves edge crossing properties. In Proc. the 7th International Symposium on Graph Drawing (GD’99), LNCS vol. 1731, Springer, 351-358. [16] Eades, P., Huang W. and Hong, S.H. (2009) A Force-Directed Method for Large Crossing Angle Graph Drawing. TR No. 640, University of Sydney. [17] Lin, C.C. & Yen, H.C. 2008, A new force-directed graph drawing based on edge-edge repulsion, 9th International Conference on Information Visualization IV2008, 6-8 July, London, England, pp. 329-34. [18] Huang, W., Eades, P., Hong, S.H. & Lin, C.C. 2010, Improving Force-Directed Graph Drawings by Making Compromises between Aesthetics. In ProcVL/HCC’10, 176-183. [19] Helen C. Purchase 1998: Performance of Layout Algorithms: Comprehension, not Computation. J. Vis. Lang. Comput. 9(6): 647657. [20] Do Nascimento, H.A.D. & Eades, P. 2002: A Focus and ConstraintBased Genetic Algorithm for Interactive Directed Graph Drawing. HIS 2002: 634-643. [21] Huang, W., Huang, M.L. & Lin, C.C. 2011, Aesthetic of angular resolution for node-link diagrams: Validation and algorithm. VL/HCC 2011: 213-216 [22] Huang, X. & Lai, W., 2005, Clustering graphs for visualization via node similarities. J. Vis. Lang. Comput. 17, 3 (June 2006), 225-253. Elsevier Ltd. [23] Di Battista, G., Garg, A., Liotta, G., Tamassia, R., Tassinari, E., Vargiu, F.: An experimental comparison of four graph drawing algorithms. Comput. Geom. 7, 303–325 (1997). [24] Huang, M., Nguyen, Q. V., A space efficient clustered visualization of large graphs Image and Graphics, 2007. ICIG 2007. Fourth International Conference on, 920-927 [25] Nguyen, Q. V. and Huang, M. L., A focus+ context visualization technique using semi-transparency, The Fourth International Conference on Computer and Information Technology, 2004. CIT'04, 101-108.
Figure 4. The comparison of the edge crossings on the final layouts.
From the results of the Figure 4 above we can see that the Entire dummy edges method can improve the final layout via reducing edge crossings, and it has a better effort than the Limited dummy edges method. But the effects are only significant on the original graphs with large number edge crossings. Therefore, using dummy elements does affect the final layouts. V.
CONCLUSIONS AND FUTURE WORKS
The final layout quality could be improved by adding dummy edges on the clustered graphs based on the traditional force-directed algorithm. Our new method Entire dummy edges algorithm was proven to be useful, but we only compared them within a small amount of vertices, and we have only compared the final results based on edge crossings, still need to consider the facts of time complexity and displaying etc., therefore, the comparisons are limited. In our future works, we will consider more aspects and compare the graph qualities from different facets with a large amount of elements. REFERENCES [1] [2]
[3]
[4] [5]
[6]
Wikipedia, P.2013, Cluster analysis, Sydney, viewed 24 February 2013, < http://en.wikipedia.org/wiki/Cluster_analysis>. Huang, M.L., & Nguyen, Q.V. 2007, Navigating Large Clustered Graphs with Triple-Layer Display, 11th International Conference Information Visualization, 2-6 July 2007, Zürich, Switzerland, pp. 684-692. Battista, G.D., Eades, P., Tamassia, R. & Tollis, I.G. 1999, Graph drawing algorithms for the viisualization of graphs, Prentice-Hall, New Jersey, USA. Eades, P., A heuristic for graph drawing. Congress Numerantium, 42:149-160, 1984. Lin, C.C., Yen, H.C. 2005, A New Force-Directed Graph Drawing Method Based on Edge-Edge Repulsion, Ninth International Conference on Information Visualisation, 6-8 July 2005, London, UK, pp.329-324 Omote, H. & Sugiyama, K. 2007, Force-Directed Drawing Method for Intersecting Clustered Graphs, APVIS 2007, 6th International Asia-Pacific Symposium on Visualization 2007, 5-7 February 2007, Sydney, Australia, pp.85-92
92