models and methods for big graph visualization

0 downloads 0 Views 17MB Size Report
Jan 20, 2013 - ographic networks and financial trading data, have suggested that edge bundling ...... The input of a visualization is based on (non-visual) data. ..... predictions, or reconstructing MRI scans into virtual reality models in medial.
MODELS AND METHODS FOR BIG GRAPH VISUALIZATION

BY QUAN NGUYEN

A thesis submitted in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

The School of Information Technologies

2013

ORIGINALITY STATEMENT ‘I hereby certify that the work embodied in this thesis is the result of original research and has not been submitted for a higher degree to any other University or Institution.’

.......................................................................... Quan Nguyen

Sydney 20 January 2013

ABSTRACT Graph visualizations create pictures to ease the understanding of networkstructured data, especially in today’s big data era. Graphs become larger and more complex (e.g., multiple attributes and high connectivity) and are dynamically generated from modern applications ranging from social networks, financial industry to biology. Visualization methods, to cope with this, are becoming more complex. There is a real challenge to justify how reliable visualization methods and models are.

In this thesis, we propose a general model for measuring the quality of graph visualizations. We introduce a new criterion, namely faithfulness, to more effective graph visualizations. Unlike the classic “readability” criteria from the 30 years of graph visualization literature, faithfulness is intuitively concerned about the consistency between the data and its transformed forms, such as in pictures or human knowledge. We use a high-level model of the Data-Visualization-Human interaction loop to show the intuitions of the faithfulness concept. We then demonstrate the usefulness with several representative visualization methods including classic force-directed methods, multi-dimensional scaling methods, edge bundling approaches, matrixbased visualizations and map-based visualizations.

We then investigate methods for visualization of large graphs. We target our study toward edge bundling, which has been extensively studied to reduce visual clutter and improve readability of big graph visualizations. We propose a general model of edge bundling towards more effective visual analysis of large graphs. We integrate important aspects such as topology and network analysis. The final results help display high level topological structures of the studied social networks, biological networks and geographic networks.

We further extend our bundling model to deal with dynamic graphs. In particular, we propose a new framework to study edge bundling in a broader and more challenging context that deals with streaming data. In stream graphs, graph elements appear and disappear continuously at different timestamps and graph elements may be associated with multiple dimensional data. The new framework enables the analyses of massive graph data.

Our case studies have suggested that bundled graph visualizations have faithfully presented the underlying data for studied tasks. Empirical results show that edge bundling adds little overheads to graph visualizations while bundling-driven visualizations yield significant improvements for visual analysis. The results in studied application domains including geographic networks and financial trading data, have suggested that edge bundling increases both readability and task-faithfulness for big graph analytics.

DEDICATION To my beloved wife.

ACKNOWLEDGEMENTS This thesis could not have been completed without the great support and help of a number of people to whom I indebted. Special thanks to my supervisor, Prof. Peter Eades, for constant support and guidance over the years. I also would like to thank A.Prof. Seok-Hee Hong to help me to shape up the first part of my work that integrates network analysis with visualization. I am very grateful for the financial support from the Sydney University and Capital Markets CRC (CMCRC) Limited. I would like to thank to CMCRC for placing me in R&D activities in financial markets. A special thank to SmartsGroup-Nasdaq OMX, especially to Dr. Robert Lang and Mr. Andrew Franklin, for the great work experience in market surveillance industry. I also deeply thank to Dr. Hui Zheng for the CMCRC TrADeLab project and CMCRC PhD students George Li and Sean Foley for the collaborative work. Thanks to other CMCRC staff members: Prof. Michael Aitken, Dr. Will Renner and Mr. Alastair Ferguson, just to name a few. Thanks to CMCRC Limited, SIRCA Limited and SmartsGroup-Nasdaq OMX for providing data for my studies. Thanks to all others who gave permissions to use their images or data in this dissertation. I also would like to thank to Dr. Bernhard Scholz for encouraging me over the years. Thanks to other administrative staffs at School of Information Technologies, University of Sydney to help with university procedure on many occasions. I would like to thank to visiting PhD students, Sebastian Janowski for some collaborative work and Salvatore Romeo for some research discussions. Last but not least, I deeply thank to all reviewers for very kindly and helpful comments that help me to improve my work and the completion of this dissertation.

Contents

List of Figures

xi

1 Introduction

1

1.1

Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.2

Graphs and networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

1.3

Big data and visualization challenges . . . . . . . . . . . . . . . . . . . .

6

1.4

Aims and thesis contributions . . . . . . . . . . . . . . . . . . . . . . . .

10

1.5

Research methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

1.6

Application domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

1.7

Outline of this dissertation

14

. . . . . . . . . . . . . . . . . . . . . . . . .

2 Related work 2.1

2.2

17

Information visualization

. . . . . . . . . . . . . . . . . . . . . . . . . .

17

2.1.1

Visualization models . . . . . . . . . . . . . . . . . . . . . . . . .

18

2.1.2

Visual clutter . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

2.1.3

Visualization tasks . . . . . . . . . . . . . . . . . . . . . . . . . .

21

Graph visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.2.1

Node-link diagrams . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.2.1.1

Tree drawing . . . . . . . . . . . . . . . . . . . . . . . .

23

2.2.1.2

Circular layout . . . . . . . . . . . . . . . . . . . . . . .

24

2.2.1.3

Force-directed method . . . . . . . . . . . . . . . . . . .

25

2.2.2

Matrix layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

2.2.3

Region adjacency drawings . . . . . . . . . . . . . . . . . . . . .

27

iii

CONTENTS 2.2.3.1

Treemap . . . . . . . . . . . . . . . . . . . . . . . . . .

27

2.2.3.2

Map-based visualization . . . . . . . . . . . . . . . . . .

29

Hybrid visualization . . . . . . . . . . . . . . . . . . . . . . . . .

30

2.2.4.1

Clustered graphs . . . . . . . . . . . . . . . . . . . . . .

30

2.2.4.2

Tree+Link layout . . . . . . . . . . . . . . . . . . . . .

31

2.2.4.3

Map+Link visualization . . . . . . . . . . . . . . . . . .

32

2.2.5

3D drawing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33

2.2.6

2.5D layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

2.2.7

Graph visualization tasks . . . . . . . . . . . . . . . . . . . . . .

36

2.2.8

Dynamic graph visualization . . . . . . . . . . . . . . . . . . . .

37

2.2.8.1

Mental map . . . . . . . . . . . . . . . . . . . . . . . .

38

2.2.8.2

Visual clutter for dynamic graphs . . . . . . . . . . . .

39

Readability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

2.2.9.1

Layout metrics for readability . . . . . . . . . . . . . .

40

2.2.9.2

Visual clutter . . . . . . . . . . . . . . . . . . . . . . . .

40

Visual clutter reduction . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

2.3.1

Graph simplification . . . . . . . . . . . . . . . . . . . . . . . . .

42

2.3.2

Edge routing approaches . . . . . . . . . . . . . . . . . . . . . . .

43

2.3.2.1

Edge concentration . . . . . . . . . . . . . . . . . . . .

44

2.3.2.2

Edge bundling . . . . . . . . . . . . . . . . . . . . . . .

44

2.3.2.3

Semantic zoom . . . . . . . . . . . . . . . . . . . . . . .

47

Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

Network analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

2.4.1

Social network analysis . . . . . . . . . . . . . . . . . . . . . . .

50

2.4.2

Centrality analysis . . . . . . . . . . . . . . . . . . . . . . . . . .

50

2.4.2.1

Vertex centrality . . . . . . . . . . . . . . . . . . . . . .

50

2.4.2.2

Edge centrality . . . . . . . . . . . . . . . . . . . . . . .

52

k-core analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

Visual analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

2.5.1

Community detection . . . . . . . . . . . . . . . . . . . . . . . .

54

2.5.2

Dimensionality reduction . . . . . . . . . . . . . . . . . . . . . .

54

2.5.3

Stream algorithms . . . . . . . . . . . . . . . . . . . . . . . . . .

55

2.5.3.1

Data-based techniques

. . . . . . . . . . . . . . . . . .

56

2.5.3.2

Task-based techniques

. . . . . . . . . . . . . . . . . .

57

Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59

2.2.4

2.2.9

2.3

2.3.3 2.4

2.4.3 2.5

2.6

iv

CONTENTS 3 Visualization model 3.1

3.2

3.3

3.4

3.5

3.6

3.7

61

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

61

3.1.1

Motivating examples . . . . . . . . . . . . . . . . . . . . . . . . .

63

3.1.2

Aims and contributions . . . . . . . . . . . . . . . . . . . . . . .

64

Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67

3.2.1

Evaluation of visualization . . . . . . . . . . . . . . . . . . . . . .

67

3.2.2

Readability in graph visualization . . . . . . . . . . . . . . . . .

67

3.2.3

Mental map preservation in graph visualization . . . . . . . . . .

69

3.2.4

Temporal and spatial analysis . . . . . . . . . . . . . . . . . . . .

69

Graph visualization model . . . . . . . . . . . . . . . . . . . . . . . . . .

70

3.3.1

Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

70

3.3.2

Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

72

3.3.3

Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

73

Faithfulness model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

75

3.4.1

Information faithfulness . . . . . . . . . . . . . . . . . . . . . . .

75

3.4.2

Task faithfulness . . . . . . . . . . . . . . . . . . . . . . . . . . .

75

3.4.3

Change faithfulness

. . . . . . . . . . . . . . . . . . . . . . . . .

76

3.4.4

Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

77

3.4.4.1

Faithfulness and correctness . . . . . . . . . . . . . . .

77

3.4.4.2

Faithfulness and readability . . . . . . . . . . . . . . . .

77

3.4.4.3

Faithfulness and determinism . . . . . . . . . . . . . . .

78

3.4.4.4

Faithfulness in space and time . . . . . . . . . . . . . .

78

Faithfulness metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

80

3.5.1

An example of information faithfulness metrics . . . . . . . . . .

81

3.5.2

An example of task faithfulness metrics . . . . . . . . . . . . . .

81

3.5.3

An example of change faithfulness metrics . . . . . . . . . . . . .

82

Example 1: Multidimensional scaling and force directed approaches . . .

83

3.6.1

Information faithfulness . . . . . . . . . . . . . . . . . . . . . . .

83

3.6.2

Task faithfulness . . . . . . . . . . . . . . . . . . . . . . . . . . .

83

3.6.3

Change faithfulness

. . . . . . . . . . . . . . . . . . . . . . . . .

84

3.6.4

Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85

Example 2: Edge bundling . . . . . . . . . . . . . . . . . . . . . . . . . .

85

3.7.1

Information faithfulness . . . . . . . . . . . . . . . . . . . . . . .

86

3.7.2

Task faithfulness . . . . . . . . . . . . . . . . . . . . . . . . . . .

87

3.7.3

Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

87

v

CONTENTS 3.8

3.9

Example 3: Visualization metaphors . . . . . . . . . . . . . . . . . . . .

90

3.8.1

Matrix representation . . . . . . . . . . . . . . . . . . . . . . . .

90

3.8.2

Cartography . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

91

3.8.3

Compound visualizations . . . . . . . . . . . . . . . . . . . . . .

91

Discussions and future work . . . . . . . . . . . . . . . . . . . . . . . . .

93

3.9.1

Display device . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

3.9.2

Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

94

3.9.2.1

Affine transformation . . . . . . . . . . . . . . . . . . .

95

3.9.2.2

Distortion techniques . . . . . . . . . . . . . . . . . . .

95

3.9.2.3

Level of detail . . . . . . . . . . . . . . . . . . . . . . .

95

3.9.2.4

Model extension . . . . . . . . . . . . . . . . . . . . . .

95

3.9.2.5

Metrics for compound visualizations . . . . . . . . . . .

97

3.10 Concluding Remarks

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

97

3.10.1 Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

99

3.10.2 Remarks on 3D drawings . . . . . . . . . . . . . . . . . . . . . .

99

4 TGI-EB: Edge Bundling integrating Topology, Importance and Geometry 4.1

4.2

101

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.1.1

Motivating example . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.1.2

Aims and contributions . . . . . . . . . . . . . . . . . . . . . . . 103

Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.2.1

Edge bundling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.2.1.1

4.2.2

4.3

4.4

Compatibility . . . . . . . . . . . . . . . . . . . . . . . 109

Social network analysis . . . . . . . . . . . . . . . . . . . . . . . 109 4.2.2.1

Centrality analysis . . . . . . . . . . . . . . . . . . . . . 109

4.2.2.2

k-core analysis . . . . . . . . . . . . . . . . . . . . . . . 110

New edge compatibility measures . . . . . . . . . . . . . . . . . . . . . . 110 4.3.1

Importance compatibility . . . . . . . . . . . . . . . . . . . . . . 110

4.3.2

Topology compatibility

4.3.3

Plane compatibility

. . . . . . . . . . . . . . . . . . . . . . . 110

. . . . . . . . . . . . . . . . . . . . . . . . . 111

Integrated framework for edge bundling . . . . . . . . . . . . . . . . . . 111 4.4.1

The framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.4.1.1

TGI framework (2D) . . . . . . . . . . . . . . . . . . . 112

4.4.1.2

TGIP framework (2.5D) . . . . . . . . . . . . . . . . . . 112

vi

CONTENTS 4.4.2

Centrality based edge bundling (CenEB) . . . . . . . . . . . . . . 112

4.4.3

Topology based edge bundling (TopoEB) . . . . . . . . . . . . . 113

4.4.4

4.4.3.1

TopoEB-A . . . . . . . . . . . . . . . . . . . . . . . . . 115

4.4.3.2

TopoEB-B . . . . . . . . . . . . . . . . . . . . . . . . . 115

Radial edge bundling (RadEB) . . . . . . . . . . . . . . . . . . . 116 4.4.4.1

4.5

4.4.5

Orthogonal edge bundling (OrthEB) . . . . . . . . . . . . . . . . 117

4.4.6

2.5D bundling (2.5D-EB) . . . . . . . . . . . . . . . . . . . . . . 118

4.4.7

Time complexity and implementation . . . . . . . . . . . . . . . 120

Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 4.5.1

4.5.2

4.5.3

4.5.4

4.5.5

4.6

Clustering constraints . . . . . . . . . . . . . . . . . . . 117

Social networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 4.5.1.1

Data set . . . . . . . . . . . . . . . . . . . . . . . . . . 121

4.5.1.2

Visual analysis . . . . . . . . . . . . . . . . . . . . . . . 122

Biological networks . . . . . . . . . . . . . . . . . . . . . . . . . . 130 4.5.2.1

Data set . . . . . . . . . . . . . . . . . . . . . . . . . . 130

4.5.2.2

Visual analysis . . . . . . . . . . . . . . . . . . . . . . . 130

Geographic networks . . . . . . . . . . . . . . . . . . . . . . . . . 134 4.5.3.1

Data set . . . . . . . . . . . . . . . . . . . . . . . . . . 135

4.5.3.2

Visual analysis . . . . . . . . . . . . . . . . . . . . . . . 135

Clustered graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 4.5.4.1

Data set . . . . . . . . . . . . . . . . . . . . . . . . . . 140

4.5.4.2

Visual analysis . . . . . . . . . . . . . . . . . . . . . . . 140

2.5D visualizations . . . . . . . . . . . . . . . . . . . . . . . . . . 145 4.5.5.1

Data set . . . . . . . . . . . . . . . . . . . . . . . . . . 145

4.5.5.2

Visual analysis . . . . . . . . . . . . . . . . . . . . . . . 145

Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

5 StreamEB: Stream Edge Bundling 5.1

5.2

153

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 5.1.1

Graph streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

5.1.2

Motivating example . . . . . . . . . . . . . . . . . . . . . . . . . 155

5.1.3

Aims and contributions . . . . . . . . . . . . . . . . . . . . . . . 157

Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 5.2.1

Stream algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 161

5.2.2

Dynamic graph visualization . . . . . . . . . . . . . . . . . . . . 163

vii

CONTENTS 5.2.3

5.3

5.4

Edge bundling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 5.2.3.1

Hierarchical edge bundling . . . . . . . . . . . . . . . . 164

5.2.3.2

Force-based edge bundling . . . . . . . . . . . . . . . . 165

StreamEB Framework

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

5.3.1

Problem definition and notation . . . . . . . . . . . . . . . . . . 166

5.3.2

General model for stream bundling . . . . . . . . . . . . . . . . . 167

5.3.3

Criteria for stream analytics . . . . . . . . . . . . . . . . . . . . . 168

5.3.4

Compatibility metrics . . . . . . . . . . . . . . . . . . . . . . . . 168 5.3.4.1

Temporal compatibility . . . . . . . . . . . . . . . . . . 168

5.3.4.2

Neighborhood compatibility

5.3.4.3

Spatial compatibility . . . . . . . . . . . . . . . . . . . 171

5.3.4.4

Data-driven compatibility . . . . . . . . . . . . . . . . . 171

. . . . . . . . . . . . . . . 170

5.3.5

Aggregate compatibility . . . . . . . . . . . . . . . . . . . . . . . 171

5.3.6

High-level stream bundling methods . . . . . . . . . . . . . . . . 171

5.3.7

Aggregating compatibility . . . . . . . . . . . . . . . . . . . . . . 172

Stream bundling algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 173 5.4.1

5.4.2

FStreamEB: Force-directed stream bundling . . . . . . . . . . . . 173 5.4.1.1

S-FStreamEB . . . . . . . . . . . . . . . . . . . . . . . 173

5.4.1.2

F-FStreamEB (FStreamEB with Dynamic layout) . . . 174

5.4.1.3

G-FStreamEB (FStreamEB with Static Geometry)

. . 175

TStreamEB: Tree-based stream bundling . . . . . . . . . . . . . 176

5.5

Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

5.6

Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 5.6.1

5.6.2

5.6.3

Geographic networks . . . . . . . . . . . . . . . . . . . . . . . . . 181 5.6.1.1

Data set . . . . . . . . . . . . . . . . . . . . . . . . . . 181

5.6.1.2

Visual analysis . . . . . . . . . . . . . . . . . . . . . . . 182

Trading networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 5.6.2.1

Data set . . . . . . . . . . . . . . . . . . . . . . . . . . 186

5.6.2.2

Visual analysis . . . . . . . . . . . . . . . . . . . . . . . 186

Performance comparison . . . . . . . . . . . . . . . . . . . . . . . 191

5.7

Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

5.8

Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 5.8.1

Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

viii

CONTENTS 6 General remarks, Conclusion and Future work

195

6.1

Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

6.2

Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

6.3

Concluding remark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

Bibliography

201

ix

CONTENTS

x

List of Figures

1.1

Charles Minard’s chart . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

1.2

A visualization of text documents (by Spire [325]) . . . . . . . . . . . .

3

1.3

The Seven Bridges of K¨onigsberg problem: (a) Euler’s drawing [31]; (b) Ball’s abstract drawing [31] . . . . . . . . . . . . . . . . . . . . . . . . .

1.4

5

Examples of node-link graph visualization and graph splatting (by Van Liere et al. [311]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

1.5

Example of large and complex graph visualization . . . . . . . . . . . .

7

2.1

Visualization pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

2.2

Examples of node-link tree layouts: (a) classical hierarchical view [247]; (b) radial view [247]; (c) Balloon view [68]

. . . . . . . . . . . . . . . .

24

2.3

Examples of Matrix Views (by Becker et al. [41]) . . . . . . . . . . . . .

26

2.4

Example of tree maps . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

2.5

Map-based visualizations (by Gansner et al. [138]) . . . . . . . . . . . .

29

2.6

Examples of clustered graph visualizations . . . . . . . . . . . . . . . . .

30

2.7

Examples of Tree+link visualization . . . . . . . . . . . . . . . . . . . .

32

2.8

A Hybrid Visualization of Social Networks using NodeTrix . . . . . . . .

32

2.9

Examples of 3D graph visualizations . . . . . . . . . . . . . . . . . . . .

33

2.10 Examples of 3D tree visualizations . . . . . . . . . . . . . . . . . . . . .

34

2.11 Botanical visualization of a unix home directory

. . . . . . . . . . . . .

34

2.12 Examples of 2.5D graph visualizations . . . . . . . . . . . . . . . . . . .

35

2.13 Refugee Flows between the Worlds Countries in 1996, 2000, and 2008 (by JFlowmap [1]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

38

LIST OF FIGURES 2.14 Examples of graphs simplified at different levels (by van Ham and van Wijk [309]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

2.15 Examples of Edge Bundling (by Holten [170]) . . . . . . . . . . . . . . .

45

2.16 Examples of edge compatibility measurements in FDEB [171] . . . . . .

47

2.17 Examples of EdgeLens (by Wong et al. [328]) . . . . . . . . . . . . . . .

48

3.1

An example of edge concentration [239] . . . . . . . . . . . . . . . . . .

63

3.2

Example of confluent drawing of a bipartite graph [109]. . . . . . . . . .

64

3.3

An example graph using force-directed edge bundling . . . . . . . . . . .

65

3.4

A 10 percent modification of the example graph in Figure 3.3(a) and the result using force-directed edge bundling . . . . . . . . . . . . . . . . . .

66

3.5

Graph visualization model . . . . . . . . . . . . . . . . . . . . . . . . . .

71

3.6

Interaction groups between Health researchers in the EuroSiS dataset .

77

3.7

Examples of US airline network visualizations using edge bundling . . .

86

3.8

Comparisons of the faithfulness of the edge bundled worldcup visualizations 88

3.9

Visualization of FIFA worldcup data year 2006 . . . . . . . . . . . . . .

89

3.10 Map-based visualizations (by Gansner et al. [138]) . . . . . . . . . . . .

92

3.11 A Hybrid Visualization of Social Networks using NodeTrix . . . . . . . .

93

3.12 Enhanced graph visualization model . . . . . . . . . . . . . . . . . . . .

96

4.1

NF-κB network visualization using force-directed layout and without bundling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

4.2

NF-κB network visualization using our RadEB . . . . . . . . . . . . . . 104

4.3

Example of forces in FDEB . . . . . . . . . . . . . . . . . . . . . . . . . 107

4.4

Forces in radial layout: radial forces and clustering forces . . . . . . . . 116

4.5

Examples of forces in CenEB and OrthEB . . . . . . . . . . . . . . . . . 118

4.6

Simple visualization of the collaboration network . . . . . . . . . . . . . 122

4.7

The collaboration network with k-cores . . . . . . . . . . . . . . . . . . 123

4.8

Collaboration network using CenEB . . . . . . . . . . . . . . . . . . . . 126

4.9

Collaboration network using OrthEB . . . . . . . . . . . . . . . . . . . . 127

4.10 Collaboration network with RadEB and CenEB . . . . . . . . . . . . . . 128 4.11 Collaboration network using RadEB, OrthEB and CenEB . . . . . . . . 129 4.12 NF-κB network in radial layout and without bundling . . . . . . . . . . 131 4.13 NF-κB network using RadEB: (1) important elements are marked, (2) important edges are wider and less transparent, (3) six important pathways are circled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

xii

LIST OF FIGURES 4.14 PPI network using RadEB (zoomed) . . . . . . . . . . . . . . . . . . . . 133 4.15 A subgraph of 50 highest centrality edges from the PPI network

. . . . 134

4.16 Visualizations of US airline network using FDEB [171] . . . . . . . . . . 136 4.17 Visualization of US airline network using CenEB . . . . . . . . . . . . . 137 4.18 Visualization of US Airline network using OrthEB . . . . . . . . . . . . 138 4.19 Visualization of US Airline network using RadEB . . . . . . . . . . . . . 139 4.20 Dense clustered graph with 20 clusters in Circular-Circular layout

. . . 141

4.21 An 8-cluster clustered graph in Circular-Circular layout . . . . . . . . . 142 4.22 A Circular-Circular visualization of 9-cluster clustered graphs . . . . . . 143 4.23 A 9-cluster clustered graphs in Circular-Circular layouts using TopoEB

144

4.24 2.5D drawings of a clustered graph . . . . . . . . . . . . . . . . . . . . . 146 4.25 2.5D drawings of a clustered graph using 2.5D-EB . . . . . . . . . . . . 147 4.26 A clustered graph in multi-plane drawings . . . . . . . . . . . . . . . . . 148 4.27 A clustered graph in multi-plane drawings . . . . . . . . . . . . . . . . . 148 5.1

The classes of dynamic graphs . . . . . . . . . . . . . . . . . . . . . . . . 155

5.2

Visualizations of the stock trade data from TSX Venture-Canada equity exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

5.3

Visualizations of the stock trade data from TSX Venture-Canada equity exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

5.4

Visualizations of the stock trade data from TSX Venture-Canada equity exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

5.5

Stream Bundling Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . 167

5.6

Force models: (a) S-FStreamEB: forces applied on control points only; (b) F-FStreamEB: forces applied on nodes + control points . . . . . . . 174

5.7

TStreamEB: hierarchy and double-interpolation . . . . . . . . . . . . . . 177

5.8

Visualizations of a random graph using (a) FDEB and (b) S-FStreamEB 179

5.9

Visualizations of a random graph using TStreamEB . . . . . . . . . . . 180

5.10 Visualization of 5000-second windows using S-FStreamEB . . . . . . . . 183 5.11 Visualization of sequence-based windows using TStreamEB . . . . . . . 184 5.12 Visualization of the dataset V on sequence-based sliding window W =160. Using MDS layout and S-FStreamEB. . . . . . . . . . . . . . . . . . . . 187 5.13 Visualization of the dataset V on sequence-based sliding window W =160. Using MDS layout and TStreamEB. . . . . . . . . . . . . . . . . . . . . 188 5.14 Visualizations of the trade data using TStreamEB . . . . . . . . . . . . 190

xiii

LIST OF FIGURES 5.15 Runtime comparisons of stream bundling methods. . . . . . . . . . . . . 191

xiv

Chapter

1

Introduction “Imagination is more important than knowledge.” — Albert Einstein “Visualization and belief in a pattern of reality activates the creative power of realization.” — A. L. Linall, Jr. This dissertation addresses the concerns with visualization models and techniques for visual analysis of large and evolving graphs. In particular, we concentrate on the development of a new model in which graph visualizations can be judged and measured. We then propose several approaches to help visual analysis and make high-level structures in graph visualizations more visible. In this introductory chapter, we start with the current needs of formal models and improved methods in graph visualizations in the era of ever-growing data and the opportunities offered by as well as the challenges imposed from the current technological advances.

1.1

Visualization

The term “visualization” has been used in a broad number of areas. Specifically, there are a number of examples for visualization, such as, architectural visualization, terrain visualization, 3D medical / volume visualization, 2D or 3D flow visualization, flow topology visualization. One may also find other examples of visualization including

1

1. INTRODUCTION presentation graphics, abstract data visualization, information dashboards, music visualization, photomontage or collage, traffic signs, icons, visualization of concepts, and many others. There are several reasons for visualizations being so popular. First, it is well-known that a picture is worth more than a thousand of words. It seems more convenient and efficient to communicate by using visual representations (such as graphs and charts) than using the long list of data records. Second, human brains have strong visual processing capacities. With visualizations, one can perform tasks easier by directly perceiving and using the visual representations without being reprocessing and reformulating explicitly. Despite of a broad variety of usages of visualizations, there is no clear or generally accepted definition of visualization. Such a definition would apparently vary between fields, such as computer science, graphics design and arts. In fact, there are very different types of visual communication, and many of them are not generally considered visualization. Intuitively, visualization is the process of making pictures to represent data. The following criteria are a minimal set of typical requirements for any visualization [206]: • The input of a visualization is based on (non-visual) data. The input data are non-visual and come from outside the visualization channel / program. • A visualization produces an image(s). The main goal of visualization is to produce image(s). • The result of a visualization is readable and recognizable. Output images are readable to a viewer to help understand the underlying data. The images are recognizable and must not represent something else. Two common goals for visualization are communication and investigation (or analytics) [120]: • Communication: A common practice of visualization is to convey information about the underlying data [306]. A good example of visualization is the Charles Minard’s chart, which is a flow-map depiction of the fate of Napoleon’s Grand Army in the 1812 Russian campaign (see Figure 1.1). The figure encodes within it the relationships between six variables: time, temperature, army size, geographic location and direction of movement. This is regarded as “the best graphic ever produced” [307].

2

1.1 Visualization • Investigation or Analytics: Another goal is to examine the unknown dataset by using different visualizations of the data. This process is also known as visual data mining [201], which aims to discover the unexpected properties of the data through the visual representations of the data. A good example is the story of Dr. John Snow, who discovered a link between cholera and the London water supply: homes of the victims were positioned around a water pump [146].

(a) Figure 1.1: A depiction of the moments of Napoleon’s 1812 Russian campaign army (by Charles Minard [274])

As an example of information visualization, Figure 1.2 shows a visualization of large text documents. The visualization is from Spire [325], which is based upon Wise’s themescape [326]. A document corpus is represented via a landscape metaphor to represent a collection of visual attributes.

(a) Figure 1.2: A visualization of text documents (by Spire [325])

3

1. INTRODUCTION There have been a number of definitions and models of Information Visualization to guide researchers, scientists and visualization practitioners as well as users [312, 262, 67, 70, 211]. Unfortunately, most of them have not been focused on graph visualization. This thesis is more concerned about Information Visualization, focusing on Graph Visualization – the visualization of relational data which are comprised of data entities and relationships. Typical types of network data and the challenges in Graph Visualization are described in Section 1.2 and 1.3. Subsequently, we review the formal models for graph visualization and investigate implications of the models on the existing graph visualization methods.

1.2

Graphs and networks

The notion of graphs is an elegant and powerful abstraction that has been widely applied in computer science, social science, chemical data analysis, computational biology, web link analysis, finance industry, computer networks, geovisualization and many other fields. Graphs can be used as a representation of any domain that can be modelled as a collection of nodes that are linked together. With the long history, graphs have been used for scientific purposes since the famous paper of Euler in 1736 [112]. This paper used the concept of a graph comprising nodes and edges for a path-tracing problem, in which Euler questioned whether it was possible to walk in the town of K¨onigsberg in such a way as to cross each of the seven bridges only once. The first abstract graph drawing appeared in Ball’s book [31], in which the K¨onigsberg problem was redrawn using a node-link diagram. Figure 1.3 shows Euler’s graph and its abstract drawing node-link diagram by Ball [31]. Since then, there has been a huge amount of foundational research in graph theory and its applications. A number of domain independent algorithms have been proposed for efficient processing of graphs. Over the last 30 years, there has been a vast amount of research in graph visualization. Classical visualization metaphors for general graphs include node-link diagrams [303], matrix plots [308], and graph splatting [311]. Figure 1.4 depicts a node-link diagram and the result after applying graph splatting. For specific types of graphs, many other methods exist. For example, radial layouts and treemaps can be used for drawing

4

1.2 Graphs and networks

(a)

(b)

Figure 1.3: The Seven Bridges of K¨onigsberg problem: (a) Euler’s drawing [31]; (b) Ball’s abstract drawing [31]

hierarchies or trees. Section 2.2 gives a comprehensive survey of common graph visualization techniques.

(a)

(b)

Figure 1.4: Examples of node-link graph visualization and graph splatting (by Van Liere et al. [311]) A vast number of graphs come from relational (and / or multi-variate) data sources and may range from simple to complex structures. Traditional sources of graphs include, for example, friendship networks, migration networks or collaboration networks. Often, these graphs are static networks, which are a summary of some time sampled (e.g., monthly or yearly) results of the examined data. There are a number of examples across numerous applications in a wide variety of application domains. Some examples of data sets that have well studied graph models are described next:

5

1. INTRODUCTION • In the domain of the World-Wide Web [15], nodes represent web pages and links represent hyper-links. • In web browsing domain, web site maps and browsing history are another application of graphs [76]. • In computer networks, such as the Internet, nodes are routers and computers, and edges are physical or wireless links connecting them [14]. • In biology and chemistry, graphs are applied to evolutionary trees, phylogenetic trees, molecular maps, genetic maps, biochemical pathways, and protein functions. Biological networks such as protein-protein interaction networks are general graphs: nodes represent proteins, and links represent possible interactions between them [331]. • In social networks, such as co-authorship or citation networks and friendship networks, nodes represent the persons or scientists and links represent their involved relationships [318, 220, 240, 298]. • For lexicon or semantic networks [183, 288] such as WordNet [229], in which nodes represent words and links represent word relationships. • In software engineering, graphs appear in object-oriented systems (such as object call graphs, subroutine-call graphs and class hierarchies); data structures (such as compiler data structures); real-time systems (state-transition diagrams, Petri nets); software modelling (such as data flow diagrams and entity relationship diagrams); project management (such as PERT diagrams) [173, 140, 186, 275, 88]. Figure 1.5 shows a visualization of the Internet graphs by the Opt Project1 . This is an example of graphs typically seen from the World-Wide Web [15] with web pages as nodes and hyper-links as links.

1.3

Big data and visualization challenges

Graphs have a natural visual representation as node-link diagrams in which nodes are often drawn by simple shapes such as circles or boxes and connecting links are drawn by lines. These classic node-link diagrams are the default representation studied an 1

The Opte project available at http://opte.org/maps/ (2012)

6

1.3 Big data and visualization challenges

Figure 1.5: Example of large and complex networks. Visualization of the Internet graph (by the Opte Project)

enormous amount of research in Graph Drawing and Information Visualization communities [303]. Visualizations of graphs are helpful for an insight of a domain, for understanding the network structures, and for performing tasks such as selecting nodes and tracing paths. The past decade has witnessed a significant revolution in computational power. Multicores have become ubiquitous and computers with terabyte memory and/or highperformance graphical processing capacity have become commonplace. These recent advances in hardware and software have enabled the generation of massive amount of data in a wide range of fields. These sources of data are produced continuously and in quite high data rates. Examples include sensor networks, web logs and computer network traffic. With the prospect of Internet and Web 2.0, a number of social networking and social media sites are emerging, and people can get connected together easily and frequently. The Internet speed has recently rocketed and has brought in many benefits. For example, fast internet connections help data transfers from place to place, between organizations an ever-easier task and eases social / commercial networking communications between people in the cyberspace. Consequently, Graph Mining and Graph Visualization have taken on a much larger scale: millions of actors or even more in a network and the data

7

1. INTRODUCTION size is in gigabytes or more. Examples include email communication networks [91], instant messenger networks [218], mobile call networks [238], friends networks [231]. Various applications result in quite different kinds of graphs and the corresponding challenges also vary. We examine two types of challenges from data-centric and layoutcentric perspectives. Data-centric challenges arise for processing and visualizing graphs, with respect to scalability, complexity and dynamics: • Scalability: The size of the graph to view is a key issue in large graph visualization; for example, numerous large-scale graphs have been typically seen from domains such as the web, biological networks, social networks. Making large number of nodes distinct is a challenge. In fact, it is well-known that comprehension and detailed analysis of data in graph structures is easiest when the size of the displayed graph is small, with respect to the human perceptual capacities. Ghoniem et al. [142] show that even for a simple task such as locating a node or finding paths between two nodes, the performance on node-link diagrams declines for graphs with 20 nodes or more. • Complexity: Beside the size of the graphs, complexity of the graph data is also another challenge. The input may be multi-attribute networks and may have a clustering structure. For example, in a Facebook social network, nodes represent people with information such as age, gender, identity; edges represent links between people with supplementary information of whether they are classmate, colleague or family relationships. Many other application domains are associated with complex networks, such as co-authorship or citation networks [298], biological networks, metabolic pathways, genetic regulatory networks, food web and neural networks. The challenges are to find ways to overlay the underlying graph and the clustering structure effectively and to display multi-attributes in the visuals efficiently. • Dynamics: In some cases, the graphs may be dynamic and time-evolving. Dynamic graphs are graphs that change over time and they may change very quickly. Dynamically generated graphs arise in modern applications, such as social networks like Facebook and Twitter, financial activities, flight monitoring activities, email communication networks and software component dependences. Such

8

1.3 Big data and visualization challenges graphs change rapidly in structure over time. The temporal aspect of network analysis is important, and it is a challenge to characterize the changes and to identify trends. For data streams, it involves a combination of the challenges for large, complex and dynamic graphs. Many graph applications such as those in telecommunications, social networks and financial industry create continuous streams of edges. In such applications, the number of graph elements is large and the graph is also very dynamic. In fact, the entire graph is often too big to be held either in main memory or on disk; this creates tremendous constraints for the underlying graph algorithms and visualizations. Another challenge is due to the standard “one-pass constraint”, which is required for processing and analyzing the stream in a single pass / iteration over the data. It is difficult to construct a global view of stream graphs to explore the structural characteristics of the underlying graph. From the layout-centric perspectives, the major challenges are as follows: • Visual clutter: Generating overviews is the first step for analyzing large graphs, summarised in the widely-known Visual Information Seeking Mantra (“Overview first, zoom and filter, then details-on-demand”) of Shneiderman [286]. Visual clutter refers to the problem of high-density and large number of edge crossings when visualizing large, complex and dynamic graphs. The picture of the whole graph is useful for conveying information and used for finding global patterns, such as clusters and outliers in a data set. Yet visualizations often do incur visual clutter, which hinders human understanding and analysis of the networks. To achieve “nice” graph layouts, graph drawing methods often aim for one or more aesthetic criteria to improve the “readability” of graph layouts. These criteria include, for example, small number of edge crossings, small number of bends per edges and small total area used [303]. On the other approach, “edge bundling” has been demonstrated in extensive studies as a useful way to reduce visual clutter of graph visualization. Clutter reduction helps unveil potential high-level patterns and improves human comprehension to derive information from the visualizations of large graphs. • Display of changes: Visualizing dynamic graphs is difficult to show the changes of the graphs over time. Since the graph structure changes over time, temporal aspect of network analysis is important.

9

1. INTRODUCTION An important criterion of dynamic node-link diagrams is to preserve the user’s mental map [232]. On the other hand, displaying graph changes in other metaphors is not well-studied. For example, matrix representation, which displays graphs in an adjacency matrix, requires quadratic space and a careful adaptation to dynamic graphs. Many scalable and high-quality algorithms in the Graph Drawing literature have been proposed to tackle some of the above challenges. These algorithms include, for example, multi-level force-directed algorithms [131, 156, 174] and scalable multidimensional scaling methods [55]. Nonetheless, with recent advances in computer power and technologies, the speed of data produced is exponentially greater than the speed of processing it. As a consequence, the demands of faster and more scalable graph visualizations have increased. For large and complex graphs, the large number of elements can compromise performance or overwhelm the capacity of the viewing platform. The typical “hair-ball” layouts resulting from visualization of large graphs using force-directed methods are well-known. For such visual representations, it is hard to distinguish between nodes and edges. Displaying an entire large graph may give an overall structure but at the same time makes it difficult to comprehend. Despite many revolutionary achievements in hardware technologies, “screen size” is still a precious but limited resource. Here, screen size refers to the number of pixels in a display; current commercial computer screens at the year of 2012 are bounded within nine million pixels.

1.4

Aims and thesis contributions

This section first briefly describes a common process for exploring large graphs. We then introduce our approaches to measure and improve quality of graph visualizations. A general process commonly used for exploring large graphs is summarized in the Visual Information Seeking Mantra by Shneiderman [286]: “Overview first, zoom and filter, then details-on-demand”. The process usually starts with choosing a layout to give the users a global structure of the whole data set. With the overall structure, users may find some interesting trends, clusters, or outliers in the data set. Once these are found, they can proceed with subsequent investigations; users can reduce the samples by using dimension reduction, filtering, or navigation methods.

10

1.4 Aims and thesis contributions Faithful representation of graphs: With a large pool of methods and techniques for visualizing the data sets at different granularity levels, one could ask how reliable the visualizations being produced are. This question has been addressed for some time. In recent years, there has seen an upsurge of interest and several formal models have been proposed [312, 262, 67, 70, 211] for assessing the visualizations as well as for guiding the future of research in Information Visualization. This demand is especially urgent given the popularity of network data (see Section 1.2), the information overload from technological advances and various challenges for visualizing large complex and dynamic networks (see Section 1.3). In response, this thesis proposes a formal model of the visualization of graph data. In particular, this thesis introduces a general model of the quality of graph visualizations. We distinguish two important concepts: the “faithfulness” and the readability of visualizations of graphs. As extensively studied and embedded in numerous visualization systems [303, 296], readability criteria have been built up from the motivation of depicting the visuals clearly. The “presumption” that quality of visualization is measured by the readability of the visualization is commonplace; see [303, 296, 265, 263]. Such criteria have been based on the presumption that readability implies that the picture is a faithful representation of the data. The presumption may be true for traditional node-link diagrams with a fairly limited number of nodes and edges. Nonetheless, for modern visualization metaphors, such as 2.5D visualizations, map-based visualizations, matrix representations and their hybrid variants, this presumption may need to be re-examined. With the extensive use of clutter reduction techniques, such as edge bundling (see Section 2.3), the presumption can be demonstrably false. Consequently, we believe that readability alone is insufficient for modern graph visualizations and in this thesis we argue that faithfulness is the one that has been missing. Chapter 3 describes our visualization model and shows the position of readability as well as the new faithfulness criteria in the model. Large graph visualization: Beyond the proposed faithfulness model for assessing graph visualizations, we are also concerned about techniques that help visual analysis of large and complex graphs. In particular, we consider edge bundling, which has been extensively studied for clutter reduction of large and dense graphs.

11

1. INTRODUCTION Although the results from existing edge bundling techniques do show high-level patterns and are perceived much better than their unbundled counterparts, they are limited to show simple geometric or semantic patterns. Research in edge bundling has yet been able to support analysis of the importance and topological structures of the underlying graphs. Furthermore, existing edge bundling techniques do not exploit modern visualization metaphors, such as 2.5D visualizations. To overcome these limitations, we propose an edge bundling framework, namely TGIEB, that takes into account the concerns of topology and importance analysis (see Chapter 4). In the framework, we also address how to extend edge bundling to deal with 2.5D visualization. With these extensions, the benefits seem significantly useful for visual analysis, such as identifying important links within a social network and critical links bridging across different network communities / clusters. Stream graph visualization: Last but not least, our other piece of research is concerned about visual analysis of massive data sets of relational time series data. These data sets are commonly seen in financial activities and security monitoring systems. The volume and velocity of the data are the major difficulties in analysing such massive data sets. In particular, the visual clutter commonly incurred in large dynamic graph visualizations is also the major issue. Furthermore, the visualization of graph streams needs to take the mental map into account to support visual-temporal analysis. To address these challenges, we propose a framework, namely StreamEB, for analysis of massive data. We have used a common technique, called “sliding windows”, for visually processing stream data (see Section 2.5.3.2) and have applied it for visualizing the graph streams. We introduce a new force-directed method that integrates visual clutter reduction technique within a force-directed layout algorithm. Chapter 5 describes our approach to visualizing the data streams. In summary, this dissertation makes three primary contributions: 1. A faithfulness model for Graph Visualization: We develop a general framework to justify different quality categories of graph visualizations. The framework is defined based on the Data-Visualization-Human interaction loop to separate between faithfulness (or consistency) with the well-established readability criteria. This thesis then defines several metrics, called faithfulness metrics, to evaluate graph visualization methods and to show how faithful the results are regarding the underlying network data.

12

1.5 Research methodology 2. Edge bundling for large graphs: We present a TGI-EB framework for edge bundling of static graphs. Our new framework integrates within its model the concerns about importance and topology. Edge bundled results from our framework can help show core patterns and simplified graph structures in citation networks and geographic networks. 3. Edge bundling for stream graphs: Based on our general model, we integrate with edge bundling for stream graphs. We also have developed a framework, StreamEB, for dynamic edge bundling. We have studied our framework and have experimented with datasets in various application domains, such as Geographic Networks and Financial Trade Data.

1.5

Research methodology

The research methodology of our approach is primarily empirical consisting of three broad stages, design, implementation and validation. We design general models for graph visualization and distinguish readability criteria from the “missing” faithfulness criteria. Then we propose models and frameworks for bundling edges in static and streamed graphs, using various criteria that have yet been explored in the previous work. These frameworks are implemented in visualization systems, which enable experimental validation and evaluation. The validation is performed with each framework in an application-specific context using real-world data. The results receive feedbacks from application experts, such as biologists. We report the usefulness of our approaches from the experimental results of several case studies.

1.6

Application domains

The approaches introduced in this thesis have a broad scope of applicability. The application domains vary from geographic networks, social networks (such as collaboration networks), biological networks, to trading networks (of financial activities). In this thesis, we give the results of our approaches on several representative types of graphs, which are typical examples of large complex and dynamic graphs.

13

1. INTRODUCTION Although graph visualization has no well-established benchmarks, we can evaluate our methods using some data sets that have been commonly used. The specific application domains and data are categorised as follows. Geographic Networks: Many network structured data are associated with geographic locations. A good example is the airline network in which airports are nodes and flights are edges connecting the airports. In many cases, these networks are dynamic and can have multiple attributes, such as data from world-wide flight monitoring systems. The number of flights connecting domestic airports is huge and flight records are associated with multiple attributes. Collaboration Networks: Humans interact or collaborate with each other to complete tasks. For scientific advancements, researchers can collaborate with each other to do research project and to produce research papers. Bioinformatics: The interactions between bio-elements, such as at gene-level or at protein-level. By examining the interaction networks, for example, protein-protein interaction networks, scientists can determine important proteins and discover critical connection between a certain gene with a specific disease. Trading networks: The activities in stock trading have generated a huge amount of data. These sources of data are relational time series with multiple attributes, such as prices, volumes, tags, etc. Stock market surveillance systems examine these trading data in order to ensure the compliance rules for everyone.

1.7

Outline of this dissertation

This dissertation comprises six chapters. In addition to this introduction, the dissertation is organized in the following chapters. • Chapter 2 gives a comprehensive introduction to visualization and data mining. We survey representative models, methods and techniques used in Visualization, Graph Visualization and Data Mining. For clarity, we separate graph visualization and visual clutter reduction in separate sections. • Chapter 3 describes the inspiration of the new type of quality for graph visualizations – namely the faithfulness. We model the visual knowledge discovery process of Data-Visualization-Human loop and based on the model we define the

14

1.7 Outline of this dissertation faithfulness concept, which is essentially the consistency from data to pictures to human knowledge. We then define three different types of faithfulness and propose several metrics. • Chapter 4 describes our general model TGI-EB for edge bundling that is characterised by importance and topology concerns for large and dense graphs. • Chapter 5 gives the details of our approach to analysing massive graphs and stream graphs. This chapter describes our StreamEB framework and the applications for financial trading data and geographical event tracking activities. • Finally, Chapter 6 summarizes the work presented in the dissertation including the contributions made. Then the chapter discusses several possible improvements and extensions in future work.

15

1. INTRODUCTION

16

Chapter

2

Related work “ The whole idea is to enable you to see mentally the picture at all hours of the day.” — Claude M. Bristol The work in this dissertation is founded on the concepts of visualization (specifically graph visualization) and data mining (specifically, clustering and graph analysis). The basic materials given in this section aim to ease the discussions in the later chapters. For clarity, we put graph visualization, visual clutter reduction, network analysis and visual analytics in separate sections.

2.1

Information visualization

The term “visualization” has been used in many aspects of life. We possibly have seen them from a broad number of areas from architectural visualization, terrain visualization, medical visualization, music visualization, traffic signs, and many others. Visualization is popular due to its effectiveness for communication. First, a picture is worth more than a thousand of words; it seems more convenient and efficient to communicate by using graphical / visual representations (such as graphs, charts) than the long list of data records. Second, human beings have strong visual processing capacities; thus tasks can be performed easier by perceiving and using the visual representations without being reprocessing and reformulating explicitly.

17

2. RELATED WORK There are very different types of visual communication, yet there is no single definition of visualization that is widely accepted; Such a definition would apparently vary between fields, such as computer science, graphics design and arts. Typical visualizations share the minimal set of requirements. 1. The input is based on (non-visual) data. Input data must not be images by themselves as in the case of image processing or photography. 2. It produces an image(s). The main aim of a visualization is to produce image(s) from data. The image(s) are primary means of communicating the data. 3. The result is readable and recognizable. There are often many ways to transform data into images. Output images must be “readable” by a viewer, although sometimes training and practice are required. Output images must also be “recognizable” and must not appear as a representation of something else. In general, research in visualization, from the taxonomy of Card et al. [65] can be categorised into two sub-fields: Scientific Visualization and Information Visualization. • Scientific visualization: includes any type of visualization of scientific data, which is usually related to a physical geometry of the real-world objects or the environment. Typical examples include, for examples, modelling the flows of air in physics and engineering, modelling fluid flows over surface for natural disaster predictions, or reconstructing MRI scans into virtual reality models in medial applications. • Information visualization: deals with more abstract data that is often not related to an underlying physical geometry and generally offers more freedom than scientific data. Examples include financial data such as trading data, price/volume data, age/education/population data, collaboration networks, citation networks and traffic monitoring data. Although the data in Information Visualization is abstract, it sometimes has some georeferences. For example, traders are located in cities, which are geographic location.

2.1.1

Visualization models

There are a number of models for visualizations. One such approach is to model visualizations as routines or stages that are wired together. Examples include the

18

2.1 Information visualization InfoVis reference model [65], the Model Human Processor [66], Norman’s seven stages of action [249] and Pinker’s computational model of graph comprehension [259]. One approach, from Card et al. [65], defines visualization as “the use of computersupported interactive visual representations of data to amplify cognition”, where cognition is “the acquisition or use of knowledge”. This definition focuses on the purpose of visualization as the means. Card et al. further state that “the purpose of visualization is the insight not pictures”. This insight aims for discovery, decision making and explanation. Gaining insights helps information assimilation and monitoring of large amounts of data. Information Visualization is a useful tool for several reasons. First, visualization gives additional resources to the human for perceptual processing and expanded working memory. Second, it can reduce the search for information. Third, visualization can enhance the recognition patterns. Fourth, visualization enables use of perceptual inference and perceptual monitoring. Finally, visualization can be manipulable and interactive, which are helpful to perform tasks. Task

Raw Data Data Transformation

Views

Visual Structures

Data Tables

View Transformation

Visual Mapping

User Interaction

Figure 2.1: Visualization pipeline Visualization typically involves a number of steps that transform from raw data to pictures. Figure 2.1 depicts the visualization reference model of Card et al. [65], which was adapted from the data state model of Chi [71]. • A Data Transformation maps Raw Data that is data in some idiosyncratic format, into Data Tables that is relational descriptions of data. • Visual Mapping transforms Data Tables into Visual Structures that is structures that combine spatial substrates, marks, and graphical properties. • Finally, View Transformation creates Views of theVisual Structures by specifying graphical parameters of the visuals. Examples of the graphical parameters include position, scaling, color, transparency, and clipping.

19

2. RELATED WORK • User Interaction controls parameters of these transformations. For example, users may restrict the view to certain data ranges, or change the nature of the transformation. • The visualization and user interactions enable users to perform tasks. Another approach presents models of visualization in terms of the structure and formal descriptions of visualizations. Examples of this approach include the models of Mackinlay [224] and Wilkinson [323]. These models analyze different types of visualizations and extract or synthesize a fundamental set of graphical language elements. Algebraic combinations of these basic elements can generate different visualizations with different semantics. Mackinlay [224] addresses the Graphical Presentation Problem, which abstracts the problem of visualization as a graphical language. This problem is a synthesis of a graphical design to express a set of relations and their structural properties effectively. McKinlay further proposes two important criteria for graphical presentations: expressiveness and effectiveness. • Expressiveness: Any type of communications can be possible if the participants know how messages are constructed and interpreted. For graphical communication (of visualization results), it is important to determine how information is encoded by graphical objects. “A set of facts is expressible in a language” if it contains a sentence that either “encodes all the facts in the set”, or “encodes only the facts in the set”. • Effectiveness: Given two graphical languages that express some information, an often raised question is “which language involves a design that specifies more effective presentation”. Unlike expressiveness, which only depends on the syntax and semantics of the graphical language, effectiveness also depends on the capabilities of the perceiver.

2.1.2

Visual clutter

The term “clutter” in information visualization has been used with different meanings. Sometimes clutter simply refers to the number of objects or the density of objects; whereas many visualization researchers concern other criteria beyond the density. According to Tufte [305, 307], clutter is anything that causes confusion in the visual

20

2.1 Information visualization representation. “Clutter and confusion are failures of design, not attributes of information”, as said by Tufte [305]. In addition to the size of data, Tufte identifies several other sources of clutter, such as poor layouts, strong contrast, improper coloring, and “chartjunk”. Chartjunks may include extra lines, unnecessary icons, just to name a few. Further, Lloyd [223] presents several case studies of the visual clutter and models for clutter reduction. In general, clutter refers to the interference of different visual elements when displaying information. Such interference may cause confusions to the viewers. Specifically, visual clutter in graph visualizations is defined in Section 2.2.9.2 and 2.2.8.2.

2.1.3

Visualization tasks

Visualization is a useful means for analysis of data. In many cases, visualizations are developed to serve for domain-specific tasks [304]. Examples of common tasks include identifying important actors and communities in a social network, or exploring possible pathways in a biological network. Domain case studies (see, for example, [318, 52]) can be used to identify such tasks. Tasks can be distinguished as “low-level” tasks [321] and “high level” tasks [168]. A number of low-level tasks are relevant across a wide variety of domains. Such tasks have been identified and classified; for example, Wehrend and Lewis [321] describe a list of possible tasks that one can perform for data analysis. These tasks include: identify, locate, distinguish, categorize, cluster, distribute, rank, compare, associate and correlate. Tasks can also be categorised as high level tasks. Hibino [168] describes seven high level tasks for analysis of a data set using a visualization tool. They include: prepare (gathering background information), plan (generating hypothesis and strategy), explore (getting users familiar with the data set), present (organizing the data), overlay (comparing different displays), reorient (reviewing goals and progress) and other (such as gathering statistics).

21

2. RELATED WORK

2.2

Graph visualization

This section gives a brief survey of different graphical representations for network data and their layout algorithms. Graphs are the most popular abstract representation for inherent relations within data [152, 151]. Formally, a graph G=(V ,E) is comprised of a set of nodes V and a set of edges E, which is a subset of V 2 . An edge can be directed or undirected. A directed edge connects a source node to a target node. Directed edges are often represented by arrows in the drawings. In some cases, nodes and edges may have additional data attributes. For example, in Facebook friendship networks, persons are nodes that may have name, age, education, country, etc; whereas links between them may be associated with types of either friendship, team mate or business. Furthermore, there are two types of graphs: static graphs and dynamic graphs. Static graphs are those having fixed sets of nodes and edges. Dynamic graphs are graphs that change over time. Change can be structural or in the attributes. The primary goal of graph visualization is to create pictures of graphs. The visualization methods are different depending the types of the graphs and can be divided into static graph visualization and dynamic graph visualization. We first survey current state-of-the-arts graph layouts and aesthetics for static networks. Then the last part of this section, we cover some literature of the common approaches and methods for dynamic graphs. Surveys of graph visualizations can be found [39, 167, 80, 296].

2.2.1

Node-link diagrams

Node-link graphs are most popular for visualizing inherent relations within data. In node-link diagrams, nodes are drawn by circles or boxes, and edges are drawn by lines connecting the nodes. Given a graph, the basic graph layout algorithm needs to determine the positions of the nodes. Formally, graph drawing algorithms can be seen as node position algorithms that define a mapping of every vertex v ∈ V of the graph G=(V ,E) to a location pv . Typically, the location pv can be two or three dimensions depending on applications.

22

2.2 Graph visualization Edges are then drawn as straight line segments connecting related nodes together. For extended layout methods, edges can be represented using polylines or curves. Edge representations have an effect on the graph visualizations. Smooth curves are more preferable, when compared with polylines, for human perception as polylines do have bends. According to Gelstalt’s continuity principle [317], humans are more likely to construct visual entities which are smooth rather than ones with abruptly changed directions.

2.2.1.1

Tree drawing

In specific applications, graphs may appear as hierarchies or trees, instead of “general” graphs. Node-link layouts of trees show the parent-child relationships by using links between nodes. Comparing to general network structures, the problem of drawing trees is more tractable and easier to understand. Many optimized methods have been proposed for laying out trees. Tree drawing problems typically have lower complexity than the general graphs of that size. The classical layout draws trees using hierarchical views, in which children nodes are placed under their parent node. A very satisfactory solution for node-link layout was proposed by Reingold and Tilford [271] (see Figure 2.2(a)). Other layout is the radial layout. Radial views [167, 247] place the focus node at the center of the layout and the other nodes on outward circles. Children of a sub-tree are positioned into a circular wedge shape according to their depths in the tree (see Figure 2.2(b)). Another layout is balloon layout [68]. In balloon layouts, siblings of sub-trees are placed in circles around their parent node. This layout can be seen as a projection of a cone tree [273] onto the plane (see Figure 2.2(c)). There are many other ways to visualize trees such as treemaps [192, 285]. We give a brief overview of treemaps in Section 2.2.3.1. Furthermore, some examples of tree layouts in 3D are also shown in Section 2.2.5. For more examples, see a survey of tree layouts [167].

23

2. RELATED WORK

(a)

(b)

(c)

Figure 2.2: Examples of node-link tree layouts: (a) classical hierarchical view [247]; (b) radial view [247]; (c) Balloon view [68]

2.2.1.2

Circular layout

Circular layout is one of the most commonly used methods for drawing general graphs [289, 40, 245]. In circular layout, nodes are placed along a circle and edges are drawn to connect the relevant nodes together. The layouts can be computed very quickly; yet they may have a lot of edge crossings. The problem of minimizing the number of edge crossings in circular layouts is NPcomplete. There are several heuristics to reduce the number of crossings by exploiting the fact that the number of crossings can be decided from the ordering of vertices along the circle. For example, Baur and Brandes [40] present a “shifting method” for vertex reordering to reduce edge crossings. On the other hand, Nguyen et al. [245] proposes an approach to increase the crossing angles in circular layouts. They introduce a post-processing step for circular layouts to improve crossing angles when the ordering of vertices on the circle has been preassigned. Nguyen et al. model the problem of enlarging crossing angles as a quadratic

24

2.2 Graph visualization programming problem and show significant improvements from experimental results. In another approach, “edge bundling’ has been used to reduce the crossing effects in circular layouts [135, 78, 79]. More details of edge bundling are covered in Section 2.3.

2.2.1.3

Force-directed method

Spring layout, also known as spring-embedder layout or force-directed layout, is one of the most popular strategy for drawing general graphs. This approach models graphs as physical systems of rings and springs. Intuitively, there are repulsive forces that put nodes away from each other; meanwhile, connected nodes are attracted together by attractive forces. Eades [101] proposes a spring algorithm in 1984. This is one of the first few practical algorithms for drawing general graphs. Since then, there are a number of improvements to the original method. The improved methods include, for example, Fruchterman and Reingold [130], Gansner and North [137], Noack [248] and Dwyer et al. [100]. Furthermore, there are also a number of extensions to force-directed algorithms for drawing large graphs, including the multi-scale method by Harel and Koren [161], the multi-level approach by Walshaw [316] and the space-division by Quigley and Eades [267]. Formally, spring layout is built upon a mathematical cost / energy function, which encodes different layouts of a graph to some numeric values. Spring algorithms typically approach to minimizing the energy function. Different spring approaches differ from each other by the energy models and the strategies used to achieve minimization. A few selected examples include Newton-Raphson method used by Kamada and Kawai [196], simulated annealing method used by Davidson and Harel [84]. In general, the physical analogy of spring layouts can be easily extended to achieve some aesthetic requirements by adjusting the forces between nodes. Spring layouts can be adapted to avoid node overlapping problems. Such approaches include, for example, the force-scan algorithm by Eades et al. [232]. Li et al. [219] also proposed two similar spring embedder models to reduce overlaps. Generally, transforming a given overlapping drawing into a minimum-area layout without node overlapping while preserving the orthogonal orders is an NP-hard problem. Force-directed algorithms work very well for relatively small graphs and produce nice results. Yet they do not scale nicely for large graphs. For large graphs, the energy

25

2. RELATED WORK function often takes time to converge to reasonable quality results. Most of the time, a local optimal is reached instead of the global optimal. In addition, force-directed algorithms are typically incremental in the sense that it requires to run in a number of iterations. Node positions are recalculated and updated at every iteration; the process usually takes O(n2 + m) time, where n is the number of vertices and m is the number of of edges. A reasonable layout result usually needs a number i of iterations, which sometimes is proportional to n or even n2 . The overall running time is O(i ∗ (n2 + m)) for i iterations. Moreover, force-directed algorithms show a lack of “determinism”, in which results of the same algorithm over the same graph are unlikely to be identical or alike. This lack of determinism may be troublesome for user navigation and exploration of graphs.

2.2.2

Matrix layout

Another approach, which is commonly used in addition to node-link diagrams, is matrix-based representations [41, 163, 164]. In this approach, graphs can be presented by their adjacency matrices or connectivity matrices. This representation uses a matrix of glyphs. The glyph at the entry (i,j) represents the edge connecting node i to node j. In many situations, edge attributes can be encoded as some visual attributes of the glyphs, such as color, shape, size and transparency. Figure 2.3 shows examples of matrix-based representations of graphs.

Figure 2.3: Examples of Matrix Views (by Becker et al. [41])

26

2.2 Graph visualization The major benefit of adjacency matrices is the readability. With matrices, one can easily show graphs with thousands of nodes. The visual clutter issues incurred in node-link diagrams can be inherently avoided from the matrix representations. Yet arbitrary ordering of rows and columns may make the networks hard to perceive and make detection of outliers or clusters intractable. In recent years, there have been several research works on reordering rows and columns to better unveil the outliers, clusters and patterns in the underlying data sets. For example, Henry and Fakete [163] present two matrix layout methods. These techniques compute the distances between every pair of nodes and thus do not scale well. Abello and van Ham [5] present a method using hierarchical aggregation to visualize and navigate matrices. This method shows the hierarchical view of the matrix data in which users can view and navigate matrices as trees.

2.2.3 2.2.3.1

Region adjacency drawings Treemap

In contrast with traditional node-link tree layouts, treemaps [192, 285] draw trees in a nested way. Treemaps place nodes within their parent nodes and inherently omit using links to show parent-child relationships. Figure 2.4(a) shows a file system drawn by Johnson et al. [192]. A different file system is depicted in Figure 2.4(b) using Cushion treemaps by van Wijk [313]. Another interesting variation of treemaps is the Voronoi treemap by Balzer et al. [33]. Figure 2.4(c) shows Voronoi treemap visualization of the static structure of the JFree library software system. Figure 2.4(d) shows another example of tree map, which is Market Map1 . The visualization shows the performance of different market sectors. It also supports user interactions; for example, displaying more details when users moves the mouse over a region of the treemap. 1

Market Map at www.smartmoney.com/map-of-the-market/

27

2. RELATED WORK

(a)

(b)

(c)

(d)

Figure 2.4: Example of tree maps: (a) a treemap (by Johnson et al. [192]); (b) Cushion treemaps (by van Wijk et a. [313]); (c) Voronoi treemaps (by Balzer et al. [33]; (d) by Market Map

28

2.2 Graph visualization 2.2.3.2

Map-based visualization

Map-based approach is a promising way to produce appealing visualizations of graphs [138, 222, 175]. Map-based visualization draws look like typical geographic maps that ones usually see. Maps created by this approach look appealing and are comprised of a set of “countries” on the plane (see Figure 2.5). Each country encapsulates a node or a set of nodes within its boundary. Edges are then drawn to connect its adjacent nodes. For large graphs, map-based visualization may rely on “good” clustering of nodes.

(a) A map of trade relations between countries Figure 2.5: Map-based visualizations (by Gansner et al. [138])

Recently, such approaches have shown quite useful to display clustered graphs as maps. For example, Gronemann and J¨ unger [150] have used maps for social network analysis to reveal the underlying structural properties of the network. Most recently, map-based visualizations have been studied for dynamic data to show changes and trends. For example, Mashima et. al. [226] use maps to visualize the user trends from the Internet radio station last.fm, and TV viewing trends from an IPTV service. Gansner et al. [133] propose to use maps to visualize stream data, specially to analyze and find trends in daily tweets from Twitter.

29

2. RELATED WORK

2.2.4 2.2.4.1

Hybrid visualization Clustered graphs

(a) Clustered graph visualizations in 2.5D (by Eades and Feng [102])

(b) Clustered graphs in 2D (by Pulo [260])

(c) Clustered graph with different level of details (by Balzer and Deussen [32])

Figure 2.6: Examples of clustered graph visualizations

A clustered graph C=(G(VG , EG ), H(VH , EH )) is defined from a graph G with an additional cluster hierarchy tree H. The nodes of the underlying graph are the leaf nodes of the cluster hierarchy, and so VG ⊂ VH . The remaining nodes in VH \ VG are cluster nodes. The edge sets EG and EH are disjoint. That is, EG ∩EH = ∅ because EG contains only edges connecting the underlying graph nodes VG , whereas EH contains only edges between cluster nodes and edges between cluster nodes and graph nodes.

30

2.2 Graph visualization This definition relies on a recursive grouping or clustering of the nodes of G specified by the cluster hierarchy. Figure 2.6 shows an example of clustered graph visualization. Clustered graphs can be drawn in a similar fashion to normal graphs. Some visualizations of clustered graphs draw nodes as boxes and restrict the nodes of H such that child graph and clusters nodes must be drawn within their parent’s boundaries. Other models [117] are concerned with c-planarity, a condition where edges cannot intersect clusters that they have no relationship. There are also more general graph models, such as compound graphs [293, 278] and higraphs [160]. Pulo [260] gives a detailed survey of clustered graph visualizations.

2.2.4.2

Tree+Link layout

Large graphs are much more difficult to visualize than trees. One way to make the visualization of graphs faster is to directly layout their spanning trees. A particular set of graphs called quasi-hierarchical graphs reported by Munzner can be visualized using minimum spanning trees. However, for general graphs all links are important and thus choosing a representative spanning tree to use is non-trivial. Different spanning trees may give different results and choosing arbitrary spanning trees may give misleading results. Beyond the tree-based methods for visualizing graphs, another approach is to visualize graphs with tree+link layouts. In this approach, spanning trees are first extracted from general graphs and then drawn using tree visualization techniques such as node-link tree layouts [271, 68, 273] or tree maps views [170]. Subsequently, the rest of the edges are added back to give the final representation. In many practical cases, data sets may contain several kinds of relations. When one of them is hierarchical relationship, such as in software dependency graphs or web browsing history, tree+link layouts become a suitable tool. Tree visualization techniques can be used to show the hierarchical structure by using, for example, node-link layouts or space-nesting approaches to display the parent-child relationships. Then other relations can be shown by connecting the corresponding nodes together; each new link is added to connect the node to its ancestor node. Figure 2.7 depicts two example results of this approach.

31

2. RELATED WORK

(a)

(b)

Figure 2.7: Tree+link layouts: (a) Balloon view with added links; (b) treemap with added links (by Danny Holten [170])

2.2.4.3

Map+Link visualization

Another example of compound visualization include MatLink [164] and NodeTrix [165], which integrate matrix views with node-link diagrams. Figure 2.8 shows an example of the hybrid visualizations by NodeTrix [165]. NodeTrix proposed by Henry et al. [165] integrates node-link diagrams to show the global structure of the network and matrix-representation of groups of nodes. Thus, the method reduces visual complexity and clutter of the network, while still providing all the information.

Figure 2.8: A Hybrid Visualization of Social Networks using NodeTrix

32

2.2 Graph visualization

2.2.5

3D drawing

Three dimensions seem to be a natural way that people see objects surrounding in real-life. A number of 2D layouts have been extended to 3D. The additional dimension give more flexibility and can help display larger graph structures. Furthermore, 3D seems to be a natural way to design metaphors to help perceiving complex structures.

(a) Phylogenetic trees in Hyperbolic space (by Walrus [179])

(b) Hyperbolic view of a tree (by T. Munzner [234])

(c) The SGI File System Navigator

(d) Information Cube (by J. Rekimoto [272])

Figure 2.9: Examples of 3D graph visualizations

Figure 2.9 depicts several example results of this approach. Hyperbolic views of trees in hyperbolic space are shown in Figure 2.9(a) and (b). The File System Navigator, which came with earlier versions of SGI Workstations is depicted in Figure 2.9(c). The layout shows a tree representing a user file directory space, in which files are represented by 3D blocks on the plane whose sizes are proportional to the file sizes.

33

2. RELATED WORK Other example is the Information Cube proposed by Rekimonto [272] (See Figure 2.9(d)). The algorithm draws information cubes inside their parent cubes to represent parentchild relationships. By using transparency, the layout can show nested cubes. Textual labels are displayed on semi-transparent cube surfaces. Another examples are the cone trees [273, 162]. Cone trees use cone to represent parent-child relationships. Figure 2.10 shows some examples of 3D drawings of trees.

(a) A cone tree (by Robertson et al. [273]

(b) Another cone tree (by Hemmje et al. [162])

Figure 2.10: Examples of 3D tree visualizations

The Botanical visualization by Kleiberg et al. [204] is also a good example of 3D tree visualization (see Figure 2.11). It draws tree structures in a similar way as real-life trees with branches and leafs, and creates appealing 3D visualizations of trees.

Figure 2.11: Botanical visualization of a unix home directory (by Kleiberg et al. [204])

34

2.2 Graph visualization Although 3D visualizations are useful for examining a data set in three dimensions, 3D graph visualizations are generally hard for navigation and often incur visual clutter due to the projection into a 2D computer display.

2.2.6

2.5D layout

Two-and-a-half dimensional (2 12 D or 2.5D) visualization [99, 51, 172, 169] is another way to visualize graphs. In 2.5D visualization, a graph is drawn on multiple planes: each plane contains a subset of the nodes, and edges are drawn to connect the nodes (see Figure 2.12 and Figure 2.6a). Edges that connect nodes in the same plane are called intra-plane edges; whereas edges that connect nodes of different planes are called inter-plane edges.

(a) Evolution of research area in InfoVis community (by Tim Dwyer [99])

(b) 2.5D visualizations of trees (by Hong and Murtagh [172])

(c) Clustered graphs in 2.5D (by Hong and Ho [169]) Figure 2.12: Examples of 2.5D graph visualizations

35

2. RELATED WORK Visualizing graphs in 2.5D brings several advantages. For example, compared with 2D visualization, 2.5D visualization offers more freedom with the additional “half” dimension. Also, 2.5D visualization intuitively has less issues with visual clutter than 3D visualization, since nodes are located on a plane rather positioned in an arbitrary 3D location. As such, the navigation is easier in 2.5D visualization than in 3D visualization.

2.2.7

Graph visualization tasks

A good graph layout permits users to easily get information they want. There are a number of tasks that users commonly perform with visualizations [287, 329]. Simple tasks, such as counting the number of nodes, can give the users an idea about the scale of the data. More complex tasks, such as finding shortest paths, are often required from some applications. Several common tasks are given as follows, though many more are not listed. • Given a graph, count the number of nodes; • Given a node, count its in- and out- degree, or identify its adjacent nodes, or find the nodes reachable from it in a certain number of steps; • Given a pair of nodes, find a shortest path between them; • For the whole graph, find connected components or closely connected clusters of nodes. For certain tasks, the performance of different visualizations may be surprisingly different. Different tasks may require different types of visualizations. Graphs can also have different visual representations, such as node-link diagrams and treemaps. A visualization of graph can include different visual elements, from basic networks, to labels, to attributes [287]. Good visualizations, in general, should be simple, because showing extra information may become a distraction to perform tasks. In general, “chartjunk” should be avoided [307]. In addition, many tasks can work best with user interaction and navigation support. Interaction techniques help users to reveal the detailed structures in large graphs, such as, by filtering out unimportant parts and focusing on important parts of the graphs. Yi et al. [335] have categorised interaction techniques into: • select: highlighting certain graph elements in a user’s focus.

36

2.2 Graph visualization • explore: changing the current view point so that users can see other parts of the data; for example, by panning, zooming or rotating. • reconfigure: switching between different layouts. • encoding: switching between different visual representations; for example, changing from node-link representation to treemap representation. • abstract/elaborate: adjusting the level of abstraction of a data; for example, using zooming and clustering to give different insights. • filter: reducing the amount of data to display in order to make the remaining data more visible. • connect: highlighting the connections between items in a user’s focus. Amongst these tasks, Yi et al. further show that three of them (e.g., explore, abstract/elaborate and filter) are especially helpful for large graphs.

2.2.8

Dynamic graph visualization

Most research in Graph Visualization has focused on static graphs; however, many graphs in practice change over time. Dynamic graph visualization has the basic goal to make visual representations to help the viewers to see the changes. A huge amount of research has been proposed for visualizing dynamic graphs [251, 251, 56, 177, 59, 90, 128, 330, 215, 215, 129, 221, 221]. Many of these have focused on large-scale dynamic graphs; for example, [283, 157]. Several other techniques have been proposed, such as graph / edge splatting [63], to display connectivity patterns in dynamic graphs. Further, large dynamic clustered graphs have also been studied to show the evolution of clusters and nodes over time; for example, node-link metaphors [128, 210, 277] and map-based metaphors [226, 133]. In general, the most common techniques for representing temporal data are via animation and the “small multiples” display (see, for example, [25]). • The animation approach shows visualizations of the sequence of graphs displayed in consecutive frames. • The small multiples display uses multiple charts laid side-by-side and corresponding to consecutive time periods or moments in time [20]. A thorough discussion

37

2. RELATED WORK of such techniques can be found in [20]. For example, the small multiples display is used for representing and analyzing trajectories. Figure 2.13 shows the small multiples of the global migration networks.

(a) Figure 2.13: Refugee Flows between the Worlds Countries in 1996, 2000, and 2008 (by JFlowmap [1])

When dealing with dynamic graph visualization, the following challenges are often encountered: preserving the user’s mental map and reducing visual clutter.

2.2.8.1

Mental map

An important criterion for dynamic graph drawings is mental map preservation [104] or stability[256]. The mental map concept refers to the abstract geometric structures in a person’s mind when he or she explores visual information. The better the mental map is preserved, the easier for the person to grasp and understand the structural change of a graph. With mental map, the user’s familiarity with the old drawing can help to understand the new drawing with less effort. Preserving mental map is one of the most important concepts for visualizing dynamic graphs. For analysis of dynamic graphs, it is important to depict the statistical trends and changes over time, while preserving user’s mental map [232]. Some layout adjustment algorithms use a notion of proximity to preserve the mental map and update a drawing in order to improve some aesthetic criteria. These algorithms include, for example, incremental drawing of directed acyclic graphs [251], or computing the layout of a sequence of graphs offline [90]. Other algorithms, applying different

38

2.2 Graph visualization adjustment strategies in order to compute the new layout, include the DA-TU system [] which allows navigating and interactively clustering huge graphs. Other works have studied the mental map preservation while the graph is being updated [128, 215, 221].

This can be achieved by using force directed layout tech-

niques [128], or by using simulated annealing[215, 221]. Often, dynamic graph visualization is concerned with the balance between quality and stability [53]. In this stream of works, quality of the layout refers to the assessment of the overall layout from the vertex positions with respect to some layout metrics. Stability refers to the changes in node positions between frames. Controlling between quality and stability when visualizing graph changes is a challenging problem. In particular, a common technique uses stress-majorization [136], or more generally, multi-dimensional scaling. Setting the initial layout for each graph with the preceding layout is the most common strategy [178, 233]. However, using this strategy, layout readability may degrade over the sequence of graphs. Other approaches address stability by either: • placing vertices of fixed vertex locations from the layout of an aggregate (over time) of all graphs [233]; or • anchoring vertices to reference positions [56]; or • linking vertices to instances of themselves that are close in the sequence [111]. A comprehensive study of existing models and the trade-offs between stability and readability is given in Brandes et al. [53].

2.2.8.2

Visual clutter for dynamic graphs

The second and also challenging problem is to reduce visual clutter in the visualization of dynamic graphs. Visual clutter reduction has been studied quite extensively for static graph visualizations (see Section 2.2.9.2), whereas there has been quite a little of research of visual clutter reduction in dynamic graph visualization.

39

2. RELATED WORK

2.2.9

Readability

Much research in graph visualization aims to produce “nice” drawings of graphs. A number of criteria have been proposed and implemented in commercial visualization systems. Most related research of these criteria can be divided into improving some layout aesthetics and reducing visual clutter. These concepts are different although they may overlap.

2.2.9.1

Layout metrics for readability

The problem of making “nice” graphical layouts is generally very hard. There are certain criteria when designing such layouts. A number of generally accepted aesthetic measures [263, 265] include: • Distribute nodes and edges evenly, • Avoid edge crossings, • Display isomorphic substructures, • Minimize the number of bends along the edges. Among these aesthetics, Purchase et al. [265] show that reducing the crossings is the most important aesthetic. Purchase et al. also show that minimizing the number of bends and maximizing symmetry has a lesser effect. Although each aesthetic is useful, it is quite impossible to satisfy all of them at the same time. This is because some of them may conflict with each other and some may involve very computationally expensive optimization. In fact, the problem of simultaneously optimizing multiple criteria, in many cases, is NP-hard. As a consequence, graphical layouts are often the results of compromise among several aesthetics.

2.2.9.2

Visual clutter

Visual clutter is related to aesthetics, but they are not the same. Visual clutter is a problem commonly seen in large graph visualization. The clutter is due to the density

40

2.3 Visual clutter reduction of graphical elements or the size of graphs, and the number of crossings (see Section 2.1.2 for other meanings of clutter in information visualization). A good layout should intrinsically minimize visual clutter. Aesthetic based approaches and visual clutter reduction based approaches are both concerned with a criterion of reducing the number of edge crossings in the visualization. However, they are different from each other, in several other criteria. For example, layout aesthetics may try to reduce the number of bends along each edge, while visual clutter reduction techniques may use polylines and curves to draw edges. Further, even if a large graph visualization that satisfies all the aesthetic criteria, such as no crossings, it may still incur visual clutter due to the density of elements (for example, it is very dense on some region of the visualization). In addition, an aesthetic criterion is to minimize the total area used for drawing a graph, whereas visual clutter is concerned more about the density or the number of elements per area unit. We give an overview of techniques for reducing visual clutter of graph visualizations in Section 2.3. More details on clutter reduction can be found in the taxonomy of clutter reduction in visualization by Ellis and Dix [107].

2.3

Visual clutter reduction

Visual clutter refers to the common problem typically seen for displaying large graphs. This problem is due to the density of nodes and edges, and a large number of edge crossings in the visualizations. Visual clutter can happen in static and dynamic graph visualizations, as aforementioned in Section 2.2.9.2 and 2.2.8.2. Reducing visual clutter is a common aim to achieve good layouts. However, finding optimal layouts with minimal visual clutter often involves optimization that is very difficult or sometimes not practical for large graphs. There are a number of research attempts to reduce visual clutter for large graphs. This section briefly describes the most commonly used methods: graph simplification, edge routing and sampling.

41

2. RELATED WORK

2.3.1

Graph simplification

Graph simplification techniques reduce visual clutter by simplifying the graph prior to layout. This simplification step is done by grouping very cohesive nodes and edges into meta-nodes. Then classical node-link layouts for visualization can be employed. Several simplification methods exist, for example, [6, 24]. Typical graph simplification requires node clustering. Node clustering is also a key concept for cluster analysis, grouping, classification and pattern recognition. In the context of visual clutter reduction, clustering refers to the process of dividing the whole graph into a number of subgraphs (clusters), which are then represented as metanodes. By grouping similar visual elements, the graphs can be represented in a more compact form, to save free space and reduce visual clutter. An important benefit of this approach is to allow us to emphasize the global structure and ease the perception tasks. Figure 2.14 shows the original graph and the representation of its simplified abstraction.

(a)

(b)

Figure 2.14: Examples of graphs simplified at different levels (by van Ham and van Wijk [309])

A number of clustering methods have been used to reduce the visual clutter. Different clustering methods use different metrics to measure the distance or similarity between two nodes. Such metrics can be defined either semantically in semantic-based approach or graph-topologically in structure-based approach.

42

2.3 Visual clutter reduction • In semantic-based (or content-based) approach, the semantic data associated with the graph elements are used to define the distance between a pair of nodes. The approach can produce meaningful results and thus is often used in data mining tasks, such as, for classifying data items or detecting outliers. Yet the approach is application dependent; a semantic-based clustering technique that is specialized to a certain problem, may not be good for other problems. • In structure-based approach, clusters are defined from the connections of the nodes. Clusters refer to the subgraphs, in which there are more connections within than outside them. A number of heuristics have been proposed to guide the construction and the quality measurement of clusters. These criteria can vary from connectivity, cluster size, geometric proximity to statistical approximation. Node clustering algorithms are often complex; finding the optimized clusters of a graph is believed to be NP-complete. Heuristic clustering algorithms, such as k-means, require polynomial time. Newman [242] proposes a fast algorithm, which takes O((m + n)n) time to detect community structure in networks, where m and n are the number of edges and nodes, respectively. Graph simplification is attractive in the sense that it can reuse existing node-link layouts. Nonetheless, the results are often sensitive to simplification parameters; these parameters further, in turn, depend on the type of graph being processed. A major drawback of simplification is that it may change node positions due to grouping into meta-nodes. The method is undesirable when positions encode information, such as in geovisualization.

2.3.2

Edge routing approaches

In many graphs, the positions of the nodes are fixed and cannot be changed. For example, node positions may encode some semantic data, such as geographical positions or results of some graph layouts. In such cases, common methods to reduce visual clutter by moving nodes and changing the layout cannot be applied. In general, finding straight-line embedding of a graph in a planar surface so as to minimize the number of edge crossings is NP-complete [141] and even in the optimum drawing, there may still be a lot of crossings. Besides, this approach has a known side-effect due to the zig-zag links introduced in the final layouts.

43

2. RELATED WORK For visualizations in which nodes have fixed locations, edges can be drawn in different shapes to reduce visual clutter. The effects of edge crossing can be reduced by drawing edges as splines or polylines, as first studied by Gansner et al. [139].

2.3.2.1

Edge concentration

Newbery [239] first introduces a method, called edge concentration, for reducing clutter in drawings of directed graphs, using a biclique. A modification of this approach was implemented in Graphviz [140]. A similar approach, called confluent drawing, is proposed by Dickerson et al. [89]. This work merges edges of a graph by reducing non-planar graphs into planar ones. Consequently, planar algorithms can then be applied.

2.3.2.2

Edge bundling

In edge bundling (also known as edge clustering) approaches, edges are represented by polylines or curves such that edges, which are close to each other geometrically or semantically are referred to as compatible. The main idea of edge bundling is to combine these compatible edges together to reduce the area covered by the edges. The result bundles can form high-level structures and show node and edge patterns. The use of attractions on control points for curved edges was first introduced by Brandes and Wagner [57], though the term “edge bundling” was coined several years later by others. Phan et al. [258] present FlowMap, which is the first work of edge bundling. Flow map layouts merge edges that share destinations to produce nice visual flows. Their layout algorithm take as input a set of input edges which share a common end point. The common end point is considered as the tree root and the algorithm builds a hierarchy based on the leaf positions. The final layout draws edges as free-style binary tree layout in which edge width is determined by edge weight. This flow map layout can show clearly the flow distribution. However, this layout can only deal with a small subset of graph visualization problems, such as migration flow from one source. Showing multiple flow maps from multiple sources in a graph is apparently difficult. Overlaps between flow maps are very distracting and make the flows difficult to read.

44

2.3 Visual clutter reduction Gansner and Koren [135] improve circular layouts by merging splines of edges together. This method is based on the minimization of the total amount of ink needed to draw the edges. Holten [170] introduces Hierarchical Edge Bundling method for bundling hierarchical graphs using B-splines (see Figure 2.15). This method uses two types of relations as input. One input is a hierarchical structure, such as the reference relations among the files in a file directory tree. The bundling method then clusters links in tree+link layout layouts (see Section 2.2.4.2 for details on tree+link layout). In final layout, edges are drawn as curves along the tree structure.

Figure 2.15: Examples of Edge Bundling (by Holten [170]) Dywer et al. [100] introduce a method that integrates edge routing into force-directed layout. This method uses curved edges to minimize crossings and draws bundle-like shapes in final layouts. Routing edges is based on constrained stress majorization. Several constraints are proposed to move nodes and edge bend points so as to prevent edges from overlapping with nodes and from crossing each other. Balzer and Deussen [32] propose a multi-level compound visualization using transparent surfaces and edge bundling for a hierarchical 3D visualization. Zhou et al. [338] present a hierarchical edge clustering using Delaunay triangulation, where control points are hierarchically clustered by energy-based optimization. Geometry Based Edge Bundling by Cui et al. [81] uses a control mesh for edge clustering, where edge bundles share the same control points on the mesh. Lambert et al. [214] generalized a control mesh to route graph edges using a shortest path algorithm and mesh edge weights are updated to encourage graph edges to share mesh edges.

45

2. RELATED WORK Cornelissen et al. [79] present a circular bundle view of the hierarchical graphs to study in software engineering, such as, the program execution traces. Holten and van Wijk introduce a Force-Directed Edge Bundling (FDEB) algorithm[171], which models edges intuitively as flexible springs that can attract each other. The attractive force depends on the distance of the springs and the compatibility of the edges. The method achieves smoother bundles that are easy to read, although it incurs high computational complexity. Lambert et al. [22] present an edge bundling method using GPU processing for real-time interactions in real applications. Telea and Ersoy [301] propose an Image-Based Edge Bundles that aims for coarsegrained edge shapes of bundled edges to further simplify visual representation of the network structure. Pupyrev et al. [261] consider edge bundling in layered drawings in which edges already routed as polylines or splines; the method preserves the topology of the original drawing and disambiguates edges. Gansner et al. [134] introduce a multi-level method which approximates k-neighbor edge proximity graphs using kd-tree as input for their agglomerative bundling algorithm. They reported experiments on the approach up to one million edges in a few minutes. Compatibility Most edge bundling methods are based on the geometry of the graphs to define geometry compatibility between the edges. Here, the geometry may come either from a result of a graph layout (or visual mapping) or may encode semantic information (for example, the geographic information of nodes such as the coordinates of the airlines in airline networks). Several metrics are proposed in FDEB (force-directed edge bundling) method [171] to define geometry compatibility between a pair of edges e, e′ . Figure 2.16 illustrates these different metrics. The final geometry compatibility is defined as the product of these metrics (see [171] for details). • “Angle” metric is designed to avoid bundling edges that are almost perpendicular. Angle compatibility is computed from the cosine of the angle α between e and e′ . • “Scale” metric ensures edges that differ considerably in length should not be bundled together. Scale compatibility is defined as a function of the length of e

46

2.3 Visual clutter reduction

(a) Angle compatibility

(b) Scale compatibility

(c) Position compatibility

(d) Visibility compatibility

Figure 2.16: Examples of edge compatibility measurements in FDEB [171]

and e′ . • “Position” metric aims to avoid bundling edges that are very far apart. Position compatibility is computed from the average length of e and e′ and the distance between midpoints em of e and e′m of e′ . • “Visibility” metric avoids bundling edges that are parallel and equal in length. Visibility compatibility is computed from the distance between e and the projection of e′ on e (e.g., line segment i). While those edge bundling methods can reduce visual clutter and show some high level edge patterns, they may not necessarily highlight the important skeletal structure of the network.

2.3.2.3

Semantic zoom

Another approach reroutes edges based on user interactions depending on the current focus. For examples, Wong et al. [328] propose EdgeLens – a local strategy for remov-

47

2. RELATED WORK ing edge clutter (see Figure 2.17). With EdgeLens, graph edges are curved way from their focus area without changing the node positions. It gives a clear view of the local structure of nodes and edges. Further, Wong and Carpendale [327] introduced Edge Plucking, an interesting technique to pluck edges apart to clarify node-edge relationships.

(a) A radial layout

(b) EdgeLens view

Figure 2.17: Examples of EdgeLens (by Wong et al. [328])

2.3.3

Sampling

Sampling or filtering is another approach to reduce visual clutter by removing a number of nodes and edges. Reducing the amount of information to be displayed can be achieved either by filtering out elements randomly, while maintaining the properties of the graphs, or by filtering data that is out of the user’s interests. • The first approach, also the most common strategy, is to randomly select nodes for removal. In Krishnamurthy et al.’s study [208], most graph properties, such as the degree distribution, can still be preserved by random removal of nodes with sample sizes down to less than a third. Radiei et al. [269] study several sampling policies and show that in most cases the topological properties of a network can be preserved. This paper further proposes a sampling method that is based on the “focus” (a region) specified by the users. The visualization then emphasizes the focal area and its neighborhood in the graph. • The second approach refers to filtering nodes and edges that are not important to the users. For example, users may filter by connected components or the degree

48

2.4 Network analysis of the nodes. Users may also filter data based on some semantic criteria, such as filter data by time or by name. Though reducing visual clutter, sampling has a major disadvantage. First, sampling can cause problems due to the randomness in node selection. The approach is unpredictable in which different sampling results of the same graph may lead to different interpretations by the users. When filtering out nodes and edges, the network topology is changed and some properties may be lost. Furthermore, finding a good sampling strategy has remained a very challenging problem.

2.4

Network analysis

Networks, which consists of sets of nodes or vertices joined together by links or edges, appear frequently in various technological, social and biological contexts [290, 14, 95, 241, 36]. These networks include the Internet [114], the World Wide Web [15], social networks [318, 220], scientific collaboration networks [240], lexicon or semantic networks [183, 288], neural networks [319], food webs [324], metabolic networks [191], and protein-protein interaction networks [331]. They have been shown to share global statistical features, such as “small world”, “scalefree” and “clustering” property. • Small world: A important feature is that the average distance between nodes in the network is short. The average distance is usually logarithmically proportional with the total number of nodes[320]. In graph analysis, the distance between a pair of nodes is often referred as the length of a shortest path between them. • Scale-free: Several real-world networks share a common characteristic in which there are many nodes with low degree and only a small number with high degree (or so-called “hubs” [35]). The node degree is simply the number of ties or links a node has with other nodes. In scale-free networks, the node degree follows a power-law distribution. • Clustering: Clustering is a property of two linked nodes that are each linked to a third node [15]. In consequence, these three nodes form a triangle and the clustering is frequently measured by counting the number of triangles in the network [147]. Both triangles and other types of subgraphs are important in real

49

2. RELATED WORK networks. We say that a graph G′ = (V ′ , E ′ ) is a subgraph of G = (V ,E) if V ′ ⊆ V and E ′ ⊆ E. Network motifs designates those patterns that occur in a network more often than in random networks with the same degree sequence [230]. Network motifs found in technological and biological networks are small subgraphs that capture specific patterns of interconnection characterizing the networks at the local level [230, 334].

2.4.1

Social network analysis

Social network analysis [318] is the analysis of relations between individuals in social structures. The relationship between individuals, for example, kinship, friendship and neighbourhood are presented as a network, in which nodes representing individuals and edges representing the interaction between them. Traditional social network analysis is confined to a limited scale, typically around hundreds of actors per network.

2.4.2

Centrality analysis

Centrality is one of the most well-known individual-level network analysis method, which determines the relative prominence of vertices and edges in a network. The analysis is based on their connectivity within the network structure[127, 318]. For example, using centrality measures, one can identify the most prominent and influential researchers in a research area in a research collaboration network, or can identify the most important and influential papers cited by other scientists in a citation network.

2.4.2.1

Vertex centrality

A centrality is a function that assigns a node u in the node set V of a given graph G a value C(u). These centrality values are then compared to determine the importance of nodes; a node u is more important than v if and only if C(u) > C(v). There are many different centrality measures such as degree, betweenness, closeness, eccentricity, eigenvalue and status [46, 125, 126, 127, 207, 243, 318], which can be used to analyze networks. Several centrality measures are described as follows:

50

2.4 Network analysis • Degree centrality (DC) is one of the most used centrality measures [15]. Given the degree value d(v) of a vertex v, the centrality measure is define as: Cdegree (v) = d(v).

This measure is a fundamental quantity describing the topology of scale-free networks [320]. DC can be interpreted as a measure of immediate influence [126]; for instance, if a certain proportion of nodes in the network are infected, those nodes having a direct connection with them will also be infected. • Betweenness centrality (BC) characterizes how influential a node is in communicating between pairs of nodes [125]. In other words, BC measures the number of times that a shortest path between nodes i and j travels through a node k whose centrality is being measured. Betweenness centrality is defined by: Cbetweenness (v) =

X

X σst (v) , σst

s6=v∈V t6=v∈V

where σst is total number of shortest paths from node s to node t and σst (v) is the number of those paths that pass through v. A related measure is called stress centrality (SC), defined as: X

Cstress (v) =

X

σst (v)

s6=v∈V t6=v∈V

• Closeness centrality (CC) is defined from the farness of a vertex with respect to the whole network. That farness of a vertex is the sum of the lengths of the geodesics to every other vertex and the closeness is the inverse of the farness. The more central a node, the lower its farness is. The closeness centrality is defined as: Ccloseness (v) = P

1 , dist(u, v)

v∈V

where dist(u, v) is the shortest distance between u and v. Closeness can be seen as a measure of how fast to pass information from a node v to all other nodes. This measure is only applicable to connected networks,

51

2. RELATED WORK since the distance between unconnected nodes is undefined. Normalized closeness centralities of a vertex also exist [15, 126]. • A centrality measure that is not restricted to shortest paths is the eigenvector centrality (EC) [46]. The centrality is defined as the principal or dominant eigenvector of the adjacency matrix A, which represents the connected component of the network. That is, it is defined as: λ · Ceigen = A · Ceigen

The main idea is that each node has direct effects to all of its neighbors simultaneously [47]. EC can be regarded as an extended version of degree centrality such that the value is proportional to the sum of the centralities of the node’s neighbors. In addition to the measures presented here, a total of 17 centrality values was used for analysis of biological networks in Junker et al. [194]. For a further survey on centrality indices, see Brandes et al. [52]. They account for the different node characteristics that permit them to be ranked in order of importance in the network.

2.4.2.2

Edge centrality

Centrality can be defined for edges as well, to determine the importance of connections within a network. Certain edges are more important than the other edges. Often, important edges are located on many shortest paths between nodes. High betweenness centrality edges typically connect communities of nodes, each of which typically contains dense connections. The connections within the same community usually have low betweenness centrality. Inter-community connections often have high betweenness centrality. A simple approach to measure edge centralities first constructs an edge graph EG from a graph G. Each node in EG is corresponding to a edge in G. Two nodes in EG are connected only if the corresponding edges in G are adjacent. Then standard methods for computing vertex centralities can be applied to the edge graph EG.

52

2.5 Visual analytics A number of work has used edge centrality for visualising small worlds[310]. Edge centrality has been used in analyzing biological networks [336]. Also, various degrees of edge centralities have been studied for community detection [253].

2.4.3

k-core analysis

One of the major concerns of social network analysis is identification of cohesive subgroups (i.e., important dense subgroups) of actors within a network. Cohesive subgroups are subsets of actors among whom there are relatively strong ties [318]. One of the most important concept in group analysis is the k-cores of a graph. A k-core of a graph G is a maximal connected subgraph of G such that the degree of each vertex is at least k in the subgraph. According to Batagelj et al. [38], k-core number of graphs is the maximum k such that there exists a k-core. Note that, k-core can be computed efficiently in linear time by an algorithm that repeatedly removes minimum-degree vertices [38]. The notion of k-cores has been used in social networks such as collaboration networks [64], as well as biological networks including the analysis of protein interaction networks [30] or in the prediction of protein functions [322, 332, 315].

2.5

Visual analytics

The intelligent data analysis has passed several stages. Statistical exploratory data analysis is to explore the input data and to test for specific hypotheses. Then machine learning field has arisen with the advances in computing technologies with an objective to search for computationally efficient solutions to data analysis problems. Data analysis then aims for scalability to deal with very large databases. Subsequently, data mining is a interdisciplinary field of study to extract models and patterns from large amounts of information stored in data repositories [158, 159, 43]. Lately, visual analysis has emerged as a cross-discipline research field that combines visualization with data mining to help people to explore and analyze data. The work in this thesis is more related to visual analytics of large and dynamic graphs; this section gives a brief introduction to visual analytics. In particular, we cover com-

53

2. RELATED WORK mon data analysis methods, which include community detection, dimensionality reduction and stream algorithms for big data analysis.

2.5.1

Community detection

Detecting clusters or communities in real-world graphs, such as large social networks, web graphs, and biological networks, is a problem of considerable practical interest [147, 154, 123, 73, 199]. A “network community” (also sometimes referred to as a module or cluster) is typically a group of nodes with more interactions amongst its members than between its members and the remainder of the network [268, 147]. To extract such sets of nodes one typically chooses an objective function that captures the above intuition of a community as a set of nodes with better internal connectivity than external connectivity. Unfortunately, the objective function is often NP-hard to optimize exactly [216, 26, 279]. Thus, one employs heuristics [147, 200, 87] or approximation algorithms [217, 282, 19] to find sets of nodes that approximately optimize the objective function and that can be understood or interpreted as “real” communities. Alternatively, one might define communities as the output of a community detection procedure, and might presume to find good communities [147, 244]. Once extracted, such clusters of nodes are often interpreted as organizational units in social networks, functional units in biochemical networks, ecological niches in food web networks, or scientific disciplines in citation and collaboration networks [147, 268].

2.5.2

Dimensionality reduction

Dimensionality reduction is common for mining data. Real-world data often have many variables or attributes and thus make visualization difficult. A human being can typically perceive in at most three dimensional space easily. When there are more dimensions, understanding the visualization becomes not straight-forward. Thus, a number of dimensionality reduction techniques have been developed and shown to be useful in various fields. The easiest way to reduce dimension is simply to reduce the number of variables. One may select only the most interesting variables based on his / her domain-knowledge

54

2.5 Visual analytics and then visualize them to help their tasks. This technique, however, inevitably results in a loss of some information. To overcome this limitation, several advanced techniques have been proposed. Amongst them, Principal component analysis (PCA) and multi dimensional scaling (MDS) are quite common [159, 225]. Linear discriminant analysis (LDA)[314] is also an alternative to reduce dimensions. Principal component analysis aims at finding linear combinations of a data set that preserve the maximum amount of information. Typically, the information is measured by variance. PCA is commonly used for exploratory data mining. A reduced dimension is obtained when the original high dimensional data is projected from the original p space into the lower dimensional q space (p ≫ q). The values are determined by the principal components. Other approaches include, for example, Isomap [302] for non-linear reduction. On the other hand, self-organizing maps (SOMs) [205] employ artificial neural networks that are then trained to produce a low-dimensional representation. But this method is computationally intensive and the implicit vector quantization is not always desirable.

2.5.3

Stream algorithms

Advances in hardware and software technologies have led to mass production of data in a broad number of application domains. In many cases, these sources of data are generated continuously and in quite high data rates. Data stream mining is important, and applications of data stream analysis can vary from critical scientific and astronomical applications to important business and financial services. Examples include data that has been generated from finance industry, security activities, sensor networks, web logs, and computer network traffics. At the highest level, the grand challenges are to solve the storage, querying and mining of such data sets, which are highly computationally challenging tasks. Mining data streams involves extracting knowledge structures represented in models and patterns in non-stopping streams of information. Numerous algorithms, systems and frameworks that address streaming challenges have been proposed and developed over the past few years to address these challenges [28, 235].

55

2. RELATED WORK More specifically, research on data streams has extensively studied graph problems [115], such as counting triangles, distance estimation, connectivity and graph matching. Node clustering determines groups of nodes based on the density of linkage behavior; for example, graph-partitioning, minimum-cut and dense subgraph determination. Stream reasoning has become popular for stream research [37]. Research problems and challenges that have been arisen in mining data streams have solutions using statistical and computational approaches. These solutions can be categorized into data-based and task-based ones. In data-based solutions, the idea is to examine only a subset of the whole dataset or to transform the data vertically or horizontally to an approximate smaller size data representation. On the other hand, taskbased solutions have adopted techniques from computational theory to achieve time and space efficient solutions. In this section, we review these theoretical foundations.

2.5.3.1

Data-based techniques

Data-based techniques refer to summarizing the whole dataset or choosing a subset of the incoming stream to analyze. Sampling, load shedding and sketching techniques are representatives of the former. The later includes synopsis data structures and aggregation techniques. These techniques are briefly described as follows, together with their common applications in data stream analysis.

Sampling

Sampling refers to a statistical approach that probabilistically chooses

a data item to process. In this approach, the boundaries of the error rate of the computation are given as a function of the sampling rate. Several techniques [93] have used Hoeffding bound to measure the sample size with respect to some derived loss functions. In the context of data stream analysis, a major problem of the sampling approach is often due to the unknown size of the dataset being examined. For data streams, a special analysis is often required to find the error bounds. Sampling is also not suitable for surveillance analysis, which tries to check for anomalies in the data streams. In addition, sampling may have an issue with fluctuating data rates. Typically, sampling algorithms have addressed the relationship among the three parameters: data rate, sampling rate and error bound.

56

2.5 Visual analytics Load shedding

Load shedding approach is a technique that drops a sequence of

data streams based on a set of sampling policies [29, 299]. Load shedding has been used successfully in querying data streams. Yet it has the same problem as those of sampling. Load shedding is difficult to be used with mining algorithms because the dropped chunks of data streams could affect the generated outcomes as the dropped data could be part of a pattern of interest for the analysis.

Sketching

Sketching [28, 235] refers to the process of randomly projecting a subset

of the features. Typical sketching algorithms sample vertically the incoming stream. Sketching has been applied to compare across different data streams and has been used for aggregate queries. The major issue of sketching is that of accuracy; thus, it is hardly used for data stream mining. Principal Component Analysis (PCA) (see Section 2.5.2) would be a better solution that has been applied in streaming applications [198].

Synopsis data structures

Another approach is to create synopsis of data. This

approach applies techniques to summarize the incoming stream for further analysis. For instance, Wavelet analysis [145], histograms, quantiles and frequency moments [28] have been proposed as synopsis data structures. Synopsis of data does not represent all the characteristics of the dataset; consequently, only approximate answers can be produced when querying such data structures.

Aggregation

Aggregation techniques compute statistical measures, such as means

and variance, to summarize the incoming stream. The aggregated data can be used for mining algorithms. But aggregation does not perform well with highly fluctuating data distributions. Several research works have studied the merging of online aggregation with offline mining [8, 9, 10] for better results.

2.5.3.2

Task-based techniques

Task-based techniques address computational challenges of data stream processing. They include, for example, approximation algorithms and sliding windows. We cover some basics of these techniques and their application for data stream analysis.

57

2. RELATED WORK Approximation algorithms

Approximation algorithms [235] are used for computa-

tionally hard problems. These algorithms involve approximating answers and estimating error bounds. The approximation is often difficult due to data speed and resource constraints; high data rates and limited available resources have rendered most approximation algorithms. As such, other tools should be combined with these algorithms..

Sliding windows

The idea of sliding window is that users are often more concerned

with the analysis of the most recent data items in the streams than the old ones. The detailed analysis is often performed over the most recent data items and summarized versions of the old ones. Sliding windows have been widely used in for a number of comprehensive data stream mining systems [94]. In window-processing [23], a window extracts from the stream the last data stream elements, which are considered by the query. The sliding window techniques can be used for producing an approximate answer to a data stream query. They evaluate the query not over the entire past history of the data streams, but rather only over the most recent data from the streams. For example, answers to user query may be deduced only from the data within the the last day, whereas data older than that are being discarded. Data-window extraction in sliding windows can be sequence-based or time-based. The sequence-based approach is concerned with a specific number of tuples or data items from the data streams. The time-based approach takes all the tuples occurring during a given time interval; the number of tuples in this case often varies over time. Sliding windows on data streams have been used as a natural method for approximation with several nice properties. First, semantics of the approximation from sliding windows are well-defined, which can increase the confidence for analysis results. Second, the techniques are deterministic since common problems with arbitrary random choices, such as in load-shedding techniques, can be avoided. Finally and most importantly, it emphasizes on recent data. In most real-world applications, recent data is often more important and more relevant than old data. For instance, if one is trying to identify patterns in real-time from network traffics, phone calls, transaction records, or scientific sensor data, then general insights based on the recent information in the data stream are more informative and useful than insights based on staled (old) data. In fact, for many such applications, approximation answers from sliding windows are the desired

58

2.6 Concluding remarks and expected answers from the users; whereas computation over all historical data is not relevant.

2.6

Concluding remarks

This chapter has presented background materials to ease the discussion in later chapters. The chapter has described important concepts and research works in the literature in information visualization, graph visualization, graph analysis and graph mining.

59

2. RELATED WORK

60

Chapter

3

Visualization model “See things as you would have them be instead of as they are.” — Robert Collier In this chapter, we address some grand challenges in understanding visualization and then propose a formal model for judging and measuring graph visualization. To begin with the chapter, we start with some interesting comments about visualization. First, the human visual system accounts for over 70 percent of the neurons in the human brain and thus visualization plays an important role for data mining research, especially visual mining. Second, visualization is multidisciplinary that synthesizes theory from computational vision, cognitive science, graphical design and image processing. In addition, the principle that “We see only what we are prepared to see” by Ralph W. Emerson, forms a basis when designing contexts for data visualization.

3.1

Introduction

The goal of visualization is to enable a transformation or a deduction from information to knowledge through visual means. Visualization aids human acuities for detecting pattern and knowledge discovery. Visualization has been widely used for illustrating, formulating hypothesis, identifying patterns, constructing knowledge, performing tasks and making decisions.

61

3. VISUALIZATION MODEL Advances in information technologies have dramatically increased the volume of network data that are being generated in modern applications. As such, graphs become larger and more complex. Graph visualization techniques, to cope with this, are getting more sophisticated and involving complex parameter settings to turn graphs into drawings. Thus, the major challenge is to justify how reliable visualizations methods and models are. Graph drawing algorithms developed over the past 30 years aim to produce “readable” pictures of graphs. Here “readability” is measured by aesthetic criteria, such as: • Crossings: the picture should have few edge crossings. • Bends: the picture should have few edge bends. • Area: the area of a grid drawing should be small. Readability criteria have been extensively studied in numerous visualization systems [303, 296]. Algorithms that attempt to optimise aesthetic criteria have been successfully embedded in systems for analysis in a wide variety of domains, from the finance industry to biotechnology. In this chapter, we argue that readability criteria for visualizing graphs, though necessary, are not sufficient for effective graph visualization. Traditionally, it is quite commonplace for the “presumption” that quality of visualization is measured by the readability of the visualization; see [303, 296, 265, 263]. Such readability criteria have been based on the presumption that readability implies that the picture is a faithful representation of the data. The presumption may be clearly true in traditional ways of visualization using node-link diagrams, which are comprised of a fairly limited number of nodes and edges. However, for modern visualization metaphors, such as 2.5D visualizations, map-based visualizations, matrix representations and their combinations (see Section 2.2), this presumption may need to be reviewed. Further, the presumption can be demonstrably false because of the extensive use of clutter reduction techniques, such as edge bundling (see Section 2.3). In response, we introduce another kind of criterion, generically called “faithfulness”, that we believe is necessary in addition to readability. Intuitively, a graph drawing algorithm is “faithful” if it maps different graphs to distinct drawings1 . Using math1 In Mathematics, a faithful representation of a group on a vector space is a linear representation in which different elements in the group are represented by distinct linear mappings.

62

3.1 Introduction ematical terms, a faithful graph drawing algorithm encodes an injective function. In other words, a faithful graph drawing algorithm never maps distinct graphs to the same drawing. We show several motivating examples below for the new faithfulness criteria.

3.1.1

Motivating examples

Many graph visualization algorithms have embedded one or more aesthetic criteria to achieve readable layouts. In some cases however, graph layout algorithms cannot avoid visual clutter or edge cluttering due to high edge density from intrinsic connectivity and overlapping between nodes and edges. In such cases, edge concentration [239], confluent drawing [89, 108] and edge bundling [170, 338, 134, 246] become useful. These edge routing algorithms share the same idea: they simplify edge connections in the picture to increase readability. Such improved readability is helpful with respect to a number of tasks, yet the pictures may become less faithful.

(a)

(b)

Figure 3.1: An example of edge concentration [239]

As an example, edge concentration [239] simplifies edge connections in the picture to increase readability. Figure 3.1 shows two pictures of a bipartite graph; Fig. 3.1a is a simple drawing with straight-line edges and Figure 3.1b is another drawing with “concentrated” edges. Figure 3.1b has less edge crossings and is more readable. However, Figure 3.1b would also give a viewer, who does not know how the picture was made, at the first sight: a graph comprising of ten circle nodes and two boxes connected by twelve lines. This demonstrates the lack of faithfulness of edge concentration. For our second example, Figure 3.2a shows another bipartite graph; the confluent drawing of the graph is depicted in Figure 3.2b. Confluent drawings may be not faithful. For example, one may find there is no link connecting the two red circles in Figure 3.2a.

63

3. VISUALIZATION MODEL

(a)

(b)

Figure 3.2: Example of confluent drawing of a bipartite graph [109]. In contrast, one may see from Figure 3.2b a curve connecting the two red circles. This is clearly an inconsistency in the confluently drawn graph. As an example of edge bundling, Figure 3.3 shows two pictures of the same graph. Figure 3.3a is a simple circular layout with straight-line edges and Figure 3.3b has the same node positions but with “bundled” edges. In the example of edge bundling, bundling often sacrifices faithfulness. Remarkably, while the visual effects of bundled edges give a more readable visual representation of the overall graph, the ability to locate, select or navigate individual edges, hopping between nodes is lost. Even worse, bundling can result in situation where two different graphs can be mapped to the same picture. For example, Figure 3.4a shows a graph that differs from the graph Figure 3.3a by almost 10 percent of the total number of edges. This graph is bundled in Figure 3.4b. The bundled representations of the two different graphs (Figure 3.3b and Figure 3.4b) are identical.

3.1.2

Aims and contributions

The examples in previous section have demonstrated that faithfulness is an important aspect of graph visualizations. Yet, faithfulness has yet been paid enough attention by the visualization community. There are a few notions along the lines of faithfulness in scientific visualization, such as fidelity of the picture [227], and visual reconstructability for flow visualization [187]. In addition, several formal models have been proposed [312,

64

3.1 Introduction

(a) Unbundled

(b) Bundled

Figure 3.3: An example graph using force-directed edge bundling

262, 67, 70, 211]; they aim for assessing the visualizations as well as for guiding the future of research in Information Visualization. Nevertheless, there is no model of faithfulness for graph visualization. In this chapter, we distinguish two important concepts: the “faithfulness” and the readability of visualizations of graphs. Faithfulness criteria are especially relevant for modern methods that handle very large and complex graphs. Information overload from very large data sets means that the user can get lost in irrelevant detail, and methods have been developed to increase readability by decreasing detail in the picture. With a vast amount of methods and techniques for visualizing the data sets at various levels of granularity, it is important to know how reliable the visualizations being produced are. This demand has become urgent given the popularity of network data (see Section 1.2), the information overload from technological advances and various challenges for visualizing large complex and dynamic networks (see Section 1.3). In summary, we make the following contributions: • General model of graph visualization: In order to describe the faithfulness concept, we present a general model of graph visualization process in Section 3.3. • General model of the faithfulness: Here, the faithfulness concept is divided into three types of faithfulness: information faithfulness, task faithfulness and

65

3. VISUALIZATION MODEL

(a) Unbundled

(b) Bundled

Figure 3.4: A 10 percent modification of the example graph in Figure 3.3(a) and the result using force-directed edge bundling

change faithfulness. • Examples of faithfulness metrics: We further define some examples of the faithfulness metrics, which aim to compare the usefulness / effectiveness of different visualizations. We should emphasize that these metrics are just the examples and are not the “only” ones. • Case studies of faithfulness: As for demonstration, we use popular visualization methods including force-directed methods, multidimensional scaling, matrix representations, and hybrid metaphors to evaluate our faithfulness concept. The rest of the Chapter is outlined as follows. Section 3.3 presents our general model of graph visualization process. We use this visualization model to describe the faithfulness concept. Section 3.4 describes our general model of the faithfulness of a graph visualization method. Here, we divide the general concept into three kinds of faithfulness: information faithfulness, task faithfulness, and change faithfulness. Section 3.5 shows our examples of faithfulness metrics, which are computed for quantifying and comparing the faithfulness of different visualizations. We illustrate faithfulness with several examples in Sections 3.6, 3.7 and 3.8: multidimensional scaling, edge bundling and several selected visualization metaphors (matrix-based and map-based visualizations). Section 3.10 concludes the chapter with a remark about the failure of 3D graph

66

3.2 Related work drawing to make industrial impact.

3.2

Related work

3.2.1

Evaluation of visualization

Recent years have witnessed an increase of interests in evaluative research methodologies and empirical work [312, 262, 67, 70, 211, 212]. There are several notions along the lines of faithfulness in scientific visualization. These include fidelity of the picture [227], and visual reconstructability for flow visualization [187]. Other related work is the concept of faithful “functional visualization schema” introduced by Tsuyoshi et. al. [292]. This notion is to judge the transformation from data schema to visual abstraction schema. In particular, several quality metrics have been recently proposed for evaluating highdimensional data visualization [44]. A survey of quality concerns for parallel coordinates is given in [83]. There is also proposal for judging visualizations regarding the presence of difficulties and distracting elements (or so-called “chartjunk”) [180]. Other research focuses on narrative visualization and the effects on interpretations with respect to the intended story [181]. The previous research on visualization evaluation has focused much on information visualization in a broad sense, while our research has mainly targeted to the quality of graph visualizations.

3.2.2

Readability in graph visualization

Graph drawing algorithms in the past 30 years typically take into account one or more aesthetic criteria to aim to increase the readability of the drawing and to achieve “nice” drawings. There are a wide range of aesthetic criteria proposed for graph visualizations. They include, for example: • minimizing the number of edge crossings: A major graph drawing aesthetic is edge crossing minimization [271, 303]. The number of edge crossings should be minimized [271], or alternatively, should be kept as few as possible.

67

3. VISUALIZATION MODEL • minimizing the number of bends in polyline edges [271]. • increasing orthogonality: placing nodes and edges to an orthogonal grid [271, 254]. • increasing node distribution [297, 84, 300]. • maximizing minimum edge angles between all edges of a node [264, 300]. • minimizing the total area used [297]. • maximizing the symmetries in the underlying network structure [137]. • short edge lengths: edge lengths should be short but not too short[74]. Amongst these aesthetics, minimizing edge crossings is the most important criterion from previous user studies [263, 265]. Optimizing two or more criteria simultaneously is an NP-hard problem in general. As such, graph drawings are often the results of compromise among several aesthetics. Most of previous research has also aimed for computational efficiency while achieving the drawing readability. In the literature, graph drawing algorithms can be generally classified, for example, in one the following criteria. • One popular technique is force-directed layout, which uses physical analogies to achieve an aesthetically pleasing drawing [196, 39, 50]. Several works extended force-directed algorithms for drawing large graphs in [161, 316]. • Multidimensional scaling is another popular method for graph visualization and visual mining [58, 237, 62, 189]. • Many other graph drawing algorithms try to take advantage of any knowledge on topology (such as planarity or SPQR decomposition [239, 118, 89, 155, 236, 103]) to optimize the drawing in terms of readability. • Other approaches offer representations composed of visual abstractions of clusters to improve readability. The faithfulness criterion we propose in this chapter is important to compare the usefulness of graph visualizations. This faithfulness is different from readability criteria in the literature.

68

3.2 Related work

3.2.3

Mental map preservation in graph visualization

An important criterion for dynamic graph drawings is mental map preservation [104] or stability[256]. The mental map concept refers to the abstract geometric structures of a person’s mind while exploring visual information. The better the mental map is preserved, the easier the structural change of a graph is understood. The user’s familiarity with the old drawing can help to understand the new drawing with less effort. To preserve the mental map, some layout adjustment algorithms use a notion of proximity and rearrange a drawing in order to improve some aesthetic criteria. These algorithms include, for example, incremental drawing of directed acyclic graphs [251], or computing the layout of a sequence of graphs offline [90], or using different adjustment strategies in order to compute the new layout [177]. Other works have studied the mental map preservation while the graph is being updated [128, 215, 221]. This can be achieved by using force directed layout techniques [128], or by using simulated annealing[215]. There are some trade-offs between the readability and the stability of (offline) dynamic graph drawing methods. Brandes et al. [53] compares different methods for readability and stability. Section 2.2.8.1 gives more details about the techniques for mental map preservation. In contrast to the mental map in the previous work, this thesis defines “change faithfulness”, which captures the sensitivity of the pictures to changes in the data. This change faithfulness aims for the understanding and evaluation of dynamic graph visualization.

3.2.4

Temporal and spatial analysis

For analysis of dynamic graphs, it commonly requires to show the statistical trends and changes over time. Visualization of dynamic graphs also needs to preserve user’s mental map [232]. The most common techniques for representing temporal data are via animation and the “small multiples” display (see [25]). The animation approach shows visualizations of the sequence of graphs displayed in consecutive frames. The small multiples display uses multiple charts laid side-by-side and corresponding to consecutive time periods or moments in time [20]. Section 2.2.8 gives more details about dynamic graph visualization.

69

3. VISUALIZATION MODEL Generally speaking, this previous work has been focused on the space dimension. For example, most of the work considers readability and stability or mental map preservation of 2D / 3D drawings of graphs. In this chapter, we are concerned about the faithfulness in both the space and the time dimensions.

3.3

Graph visualization model

In this section we describe a semi-formal model for graph visualization. The section first gives basic annotations of graphs and then describes our graph visualization model. A graph G = (N, E) consists a set of nodes N and a set of edges E. Note that, in this chapter we use N to denote the set of nodes and preserve V to denote visualization. In practice, the nodes and edges may have multiple attributes, such as textual labels. For example, for Facebook friendship networks, a node may be associated with names, education, marriage status, current position, etc; whereas an edge may have relationship types between two persons. These attributes can be important for visualization and analysis. Further, nodes and edges may be timestamped, and the visualization varies over time. A broad range of examples of graphs are given in Section 1.2 and Section 1.3. Figure 3.5 shows our visualization model. It is an extension of the van Wijk model [312]. Our model encapsulates the whole knowledge discovery process, from data to visualization to human; unlike the van Wijk model, our model includes tasks. The main processes of the model are “visualization” V , “perception” P , and “task” T , and described as follows.

3.3.1

Visualization

The visualization process maps a data item d ∈ D (an attributed graph) to a layout item (sometimes called a “picture”) ℓ = V (d) ∈ L according to a specification s ∈ S. We write this as follows1 : V : D × S → L,

(3.1)

with data space D, specification space S and layout space L 1

For this semi-formal model, we use mathematical notation more as a concise shorthand rather than a precise description. For example, we describe processes as functions, but we should warn the reader that the domains and ranges of these are sometimes ill-defined.

70

3.3 Graph visualization model

S

D

Human

Picture

Data

V

dS/dt

L

E

P

dK/dt

K

[ Readability ]

[ Faithfulness ]

Task

T

R

Figure 3.5: Graph visualization model

• The type of data space D to be visualized can vary from a simple list of nodes and edges to a time-varying graph with complex attributes on nodes and edges. There are a vast amount of network data (see Section 1.2) and many of them have been generated in fairly high rates (see Section 1.3). • The specification space S includes, for example, a specification of the hardware used such as the size and the resolution of the screen. The specification can also include parameter inputs for visualization, navigation or interaction. • The layout space L may consist of graph drawings in the usual sense, but more generally consist of structured objects in a multidimensional geometric space. Sometimes it is convenient to regard the layout space as the screen; in this case, using the language of Computer Graphics, it is an image space. Section 2.2 gives a comprehensive list of graph visualization methods and their example outputs. In many applications, nodes and edges of a graph in the data space D may contain several attributes. For example, a social network from Facebook have nodes representing people and edges representing their social connections. In this network, nodes (people) may have other information such as age, gender, identity, marital status, education, and many more. Edges may be associated with additional information of whether they are classmate, colleague or family relationships.

71

3. VISUALIZATION MODEL Typically, to represent different attributes in the layout space L, the drawings may comprise of a variety of visual cues, such as color, shape or transparency. For classic drawing algorithms, the visualization process may compute the layout directly from the data. For incremental algorithms, the visualization process may use the previous layout when computing the current layout. The capability of modelling incremental algorithms as well as dynamic graphs makes our visualization model more general than the van Wijk model [312]. In this case, the general form of visualization process becomes: V : D × S × L → L, where the previous layout is the input for computing the current layout. This is necessary, for example, when a layout algorithm attempts to preserve the user’s mental map. However, unless otherwise stated we take the simpler model in equation (3.1).

3.3.2

Perception

The perception process maps a picture from the layout space L to the knowledge space K. We write this as follows: P :L→K

(3.2)

In this model we use the term knowledge – sometimes called insight or mental picture – to denote the effect on the human of his/her observation of the picture. Again, in time-varying situation, the human’s perception can depend on the previous picture, and perhaps it is better to write: P :L×K →K However, we use the simpler model (3.2) unless otherwise stated. Of course, it is difficult to formally model human knowledge or insight, and it is difficult to assess its value [250]. In particular, in some situations such as exploratory visualization, the information that is contained in the data is not known a priori, and we make pictures to get serendipitous insight. This is a well-known paradox; we do not know in advance the aspects or features that should be visible and thus it is hard to assess how successful we are.

72

3.3 Graph visualization model In terms of perception, the human perceptual system has several constraints. Some visual channels are probably more representational, expressive or perceivable than others. For instance, size and length are more effective for quantitative data. Yet, for ordinal or nominal data, size and length are less useful. In other example, one person can distinguish some pairs / groups of colors more easily than other persons can. Sometimes, the effectiveness of a visual channel may vary between people; different people may perceive a picture differently.

3.3.3

Task

Visualizations are a useful means for exploration and examination of data using visual representations or pictures. In many practical cases, visualizations are developed to serve for domain-specific tasks [304]. Examples of common tasks include, for example, identifying important actors and communities in a social network, or exploring possible pathways in a biological network. Domain case studies (see, for example, [318, 52]) can be used to identify such tasks. Tasks can be distinguished as “low-level” tasks [321] and “high level” tasks [168]. To understand faithfulness, it is important to model these tasks. Low-level tasks are relevant across a wide variety of domains; such tasks have been identified and classified by psychologists. For example, Wehrend and Lewis [321] describes a list of possible tasks that one can perform for data analysis. These tasks include: identify, locate, distinguish, categorize, cluster, distribute, rank, compare, associate and correlate. On the other hand, tasks can also be categorized as high level tasks. For example, seven high level tasks are described in Hibino’s study [168] of a data set of tuberculosis using a visualization tool. These tasks include: prepare (gathering background information), plan (generating hypothesis and strategy), explore (get users familiar with the data set), present (organizing the data), overlay (comparing different displays), reorient (reviewing goals and progress), other (such as gathering statistics). This thesis models all tasks as processes that map the data space D, the layout space L, and the knowledge space K to a result space R. Note that, our graph visualization model differs from the van Wijk model [312] by the task model. The result space R may be a simple boolean space {true, false}; or more commonly it is a multidimensional space, with each dimension modelling a separate subtask.

73

3. VISUALIZATION MODEL Tasks are often executed by users or data analysers. However, tasks are not necessarily performed by the same person, who creates the pictures from the data. Furthermore, tasks can be performed directly by extracting the answers from the data with or without using a picture of that data. The central model for the task process is T = (TD , TL , TK ), where TD , TL and TK are three functions: TD : D → R TL : L → R TK : K → R.

We extend this common notion with two less common notions TD and TL . These functions are more abstract; they do not take the users perception or knowledge into account. The function TD returns a result directly from the data. Intuitively, one can imagine that TD is computed by a “data oracle”, who can extract a result perfectly from the data. Similarly, one can think of the function TL as returning a result directly from the picture. Again, one can imagine a “picture oracle”, who extracts perfect results from the picture. If the visualization mapping creates a picture that is not entirely consistent with the data, then it is possible for the picture oracle to return a different result from the data oracle. In this way, the picture oracle may be limited by the faithfulness of the visualization mapping. In this model, all the details (such as questions) of a task are considered less important. It is similar to how we do ignore the details (such as algorithms/mechanisms) of a visualization method. Thus, our model is not concerned with specific questions of tasks. This task model is perhaps over-simplistic; for example, it does not model the a priori knowledge of the human. In addition, the task process can be made more general by taking some combination of data, layout and knowledge and then mapping to the result space. However, the simple model is sufficient to demonstrate the concept of faithfulness - task faithfulness. We use this simple model of task throughout the rest of the chapter.

74

3.4 Faithfulness model

3.4

Faithfulness model

Informally, a graph visualization is faithful if the underlying network data and the visual representation are logically consistent. In this section, we develop this intuition into a semi-formal model. In fact, we distinguish three kinds of faithfulness: information faithfulness, task faithfulness, and change faithfulness. Then we discuss the difference between faithfulness and correctness, and then between faithfulness and readability.

3.4.1

Information faithfulness

The simplest form of faithfulness is information faithfulness. This is based on the idea that the visual representation of a data set should contain all the information of the data set, irrespective of tasks. In terms of the notation developed above, a visualization V is information faithful if the function V is injective, that is, V has an inverse. As an example, consider the classical barycenter visualization function that takes as input a planar graph G = (N, E), places nodes from a specified face on the vertices of a convex polygon, and places every other node at the barycenter of its neighbors (see [296]). This function is information faithful on internally triconnected (see [296]) planar graphs. However, if the input graph is not internally triconnected, then same picture can result from several input graphs (each internal triconnected components is collapsed onto a line), and the method is not information faithful.

3.4.2

Task faithfulness

The intuition behind task faithfulness is that the visualization should be accurate enough to correctly perform tasks. In terms of the functions V and T defined above, a visualization V is task faithful with respect to specification s ∈ S if TL (V (d, s)) = TD (d)

(3.3)

for every data item d ∈ D. If a visualization is information faithful, then clearly it is task faithful, assuming the picture oracle can extract all information from the picture to perform tasks. In many

75

3. VISUALIZATION MODEL practical cases, such extraction may depend on the perception skills of the viewers. However, the converse may not hold. Consider, for example, a visualization function Vcir that draws all nodes a graph G on the circle. Clearly, Vcir is information faithful; Figure 3.3a is an example, in which we can find all nodes and edges in the drawing. Further, Vcir is task-faithful: all the data is represented in the drawing, and so all tasks can be performed correctly using the drawing. On the other hand, Figure 3.3b shows a graph drawing using edge bundling. Consider the task to estimate the number of edges between two contiguous groups of nodes on the boundary of the circle. Another task is to determine if there is a link connecting groups of nodes. The edge bundled drawings are certainly task-faithful for these tasks. However, it is not information faithful, as the original graph is no longer reconstructable from the bundled layout.

3.4.3

Change faithfulness

The intuition behind change faithfulness is that a change in the visual representation should be consistent with the change in the original data. Note that this is a different concept to the mental map [104] or stability[256]; while these concepts are concerned with the user’s interpretation of change, the concept of change faithfulness is concerned with the geometry of change. Change faithfulness is important in dynamic settings, such as interactive or streamed graph drawing. However, it is also valid in static settings, because the difference between two pictures should be consistent with the difference between the two data items that they represent. Consider, for example, a function Vgroups that visualizes the interaction networks d that occur between European Science in Society researchers in Health1 . Suppose that Vgroups uses a force directed algorithm to draw the connected components of a graph d ∈ D separately, and arranges these components horizontally across the screen, as in Figure 3.6. Note that Vgroups is information faithful. However, Vgroups is 1 Available at http://wiki.gephi.org/index.php/Datasets. Data use in this example are filtered for Health researchers only.

76

3.4 Faithfulness model

Figure 3.6: Interaction groups between Health researchers in the EuroSiS dataset

not change faithful, because a small change in the graph d (such as adding an edge) can result in a large change in the picture.

3.4.4 3.4.4.1

Remarks Faithfulness and correctness

We should stress that faithfulness is a different concept to the classical idea of correctness of an algorithm. An algorithm is correct if it does what it is required to do; correctness is an essential property of every algorithm. However, a visualization method may do exactly what it is required to do without achieving faithfulness.

3.4.4.2

Faithfulness and readability

More importantly, faithfulness is a different concept to readability. Readability is well studied in the Graph Drawing literature. It refers to the perceptual and cognitive interpretation of the picture by the viewer. Readability depends on how the graphical elements are organized and positioned. It does not depend on whether the picture is a faithful representation of the data. We can divide the readability concept into three subconcepts in the same way as we divided faithfulness:

77

3. VISUALIZATION MODEL 1. Information readability: A drawing is information-readable if the perception function P is one-one; that is, if two pictures appear the same to the user, then they are pictures of the same graph. Effectively, this is saying that all the information from the picture can be perceived by the human. 2. Task readability: A visualization is task-readable for a task T = (TD , TL , TK ) if TK (P (ℓ)) = TL (ℓ) for every layout ℓ ∈ L. 3. Change readability is the classical mental map. A good graph visualization method should achieve both faithfulness and readability; in practice, however, there may be a trade-off between the two ideals. This is especially true with large graphs, when the data size is too large for the visualization to be information-faithful; indeed, the number of pixels may be smaller than the graph size. In such cases, faithfulness is sometimes be sacrificed for readability. In specific domains, there are important tasks for which the visualization can be both readable and taskfaithful.

3.4.4.3

Faithfulness and determinism

Faithfulness is a different concept than the idea of determinism. Determinism refers to the requirement that applying the same visualization on the same graph should give the same or similar results. For example, tree drawing algorithms are deterministic, whereas other graph drawing algorithms, such as spring-embedder, are non-deterministic. Determinism is an important criterion for visual exploration and navigation of graphs. However, a visualization method may either be deterministic or non-deterministic, while aiming for faithfulness.

3.4.4.4

Faithfulness in space and time

The concepts of faithfulness (information-, task- and change-faithfulness) are mostly concerned with the “space” dimension. That is, the visual mapping from data d ∈ D to image l = V (d) ∈ L does not (explicitly) consider the “time” dimension. We can extend the faithfulness concepts to integrate the time dimension. Let T denote the time domain and let t ∈ T be a time point. A graph d at time t can be presented

78

3.4 Faithfulness model by a two-dimensional data item (d, t) of the graph d and the time t. The visualization process V transforms the two-dimensional data item into: • a drawing V (d, t) in which the drawing of V (d) is placed at a location in space determined by t (in small-multiple display approach). • a drawing V (d) at a time frame V (t) (in animation approach). Dynamic graph visualizations often considers the sequential number of the graph d in the sequence of input graphs as the time t; in other words, V (t) = t is an identity function. Thus, the faithfulness concepts may disregard the time factors in these cases without loss of accuracy. However, when time t is considered in a more general setting, we should consider the followings. 1. First, consider the faithfulness in the small-multiple approaches. • Information faithfulness should consider the reversibility of the time t; for example, placing graph elements of d at a time t close together and avoid mixing elements of different times. • Task faithfulness should consider the accuracy of task performance regarding the time attributes. For example, one should correctly identify if two data elements belong to the same time or not. • Change faithfulness should further consider the change in time in the final visualizations. For example, two graphs d at time t and d′ at time t′ are placed close together if |t − t′ | is small; or placed far if |t − t′ | is large. 2. Second, consider the faithfulness in the animation approaches. The transformation of time t to V (t) can be, for example, the identity function, a linear function, a sequence-based function, or a non-linear function. • Information faithfulness should consider the inversibility of the time t; for examples, place graph elements of d at a time t in a separate frame t. • Task faithfulness should consider the accuracy of task performance regarding the time attributes. For example, one should correct identify whether or not a graph element exists at time t.

79

3. VISUALIZATION MODEL • Change faithfulness should further consider the change of graphs together with the change in time in the animation. For example, two graphs d at time t and d′ at time t′ appear at frames V (t) and V (t′ ) that are close/far in the animation if |t − t′ | is small/large.

3.5

Faithfulness metrics

Faithfulness is not a boolean concept; a visualization method may be a little bit faithful, but less than 100% faithful. Often, one may not tell whether a picture is absolutely faithful or unfaithful. But one can compare if a picture is more faithful than the other picture of the same data. The classical concept of readability of a graph drawing can be evaluated using a number of metrics. These include, for example, the number of edge crossings, the number of edge bends, and the area of a grid drawing. These readability metrics are formal enough that the problem of constructing a readable graph drawing can be stated as a number of optimization problems; thus optimization algorithms can be used. There is not really a clear winner amongst the metrics, and there is not a single readability metric. For example, edge crossing number does not equal to the graph readability as a graph with one more crossings is not necessarily “less readable”. We aim to create a list of faithfulness metrics that play the same role. Yet we should emphasize that the faithfulness metrics proposed in this section are just the examples to show the “possibility”; they are not the only ways to define measurement of the faithfulness concepts. Furthermore, we do not aim to create a generic “faithfulness number”, but we try to show how quantitative measurements of the faithfulness concepts are possible. In this section we develop a framework for such metrics. We assume that the spaces D, L and R each has a norm, which we generically denote by k · k. • First, each data item d in data space D can be modelled in multiple dimensional space A1 ×A2 ×· · ·×Ai , where Ai is the i-th attribute dimension. A norm in data space D can be Euclidean norm (k · k2 ), or Manhattan norm (k · k1 ), or p-norm (k · kp ).

80

3.5 Faithfulness metrics • Second, norms in result space R can be defined in a similar manner of the norm in data space D. • Third, for layout space L, a definition of norms can be a bit more complex. Layout space L can be modelled as R3 ×C1 ×· · ·×Ci , where Ci is i-th dimension of visual attributes. In a visualization, visual attributes of nodes include location, color, shape, transparency, texture, etc; while visual attributes of edges may further include edge bends. Further, we assume that each of these spaces has a distance function ∆ that assigns a positive real number ∆(a, b) to each pair a, b of elements of the space.

3.5.1

An example of information faithfulness metrics

The simplest way to measure the information faithfulness of a graph visualization function V with a specification s, is to measure its “ambiguity”. For each data d and visualization ℓ = V (d, s) ∈ L, let V −1 (ℓ) denote the set {(d, s) ∈ D × S : V (d, s) = ℓ}. Let |V −1 (ℓ)| denote the number of elements in V −1 (ℓ). Then the information faithfulness of V is a function finfo , defined by finfo (d, ℓ) =

1 |V

−1 (ℓ)|

.

(3.4)

The metric can have values ranging from 1 (very faithful) to 0 (unfaithful). A more subtle approach is to measure the information faithfulness of a graph visualization function V as the information loss in the channel. The loss of information during the visualization process is defined as entropy in information theory. The information loss is easier to measure than the total information content of a data set. There are several techniques for measuring information content and information loss (for a full discussion, see [262]).

3.5.2

An example of task faithfulness metrics

We can measure task faithfulness as the difference between the result from the data d and the result from the visualization ℓ = V (d, s). The task faithfulness of V is a

81

3. VISUALIZATION MODEL function ftask , defined by: ∆task (d, ℓ) = ∆(TL (ℓ), TD (d))

(3.5)

with respect to the task T = (TD , TL , TK ) and specification s ∈ S. One could define a normalized version of ftask as: ftask (d, ℓ) =

1 . ∆task (d, ℓ) + 1

(3.6)

The metric can have values ranging from 1 (very task-faithful) to 0 (task-unfaithful). Our task faithfulness metric is kept simple enough; we neither take the transformation from data to task result, nor the interpretation of the users from the drawing to task result. In fact, different users may perceive a drawing differently in many practical cases.

3.5.3

An example of change faithfulness metrics

Tufte [307] defines the “lie-factor” as the ratio of change in a graphical representation to the change in the data. We can express Tufte’s concept in terms of our model. Then, for a visualization V with specification s ∈ S, the lie factor for two distinct data d, d′ ∈ D with d 6= d′ is defined by: lie(d′ , d, ℓ′ , ℓ) =

∆(ℓ, ℓ′ ) , ∆(d′ , d)

where ℓ = V (d, s) and ℓ′ = V (d′ , s). Tufte’s aim is to measure the quality of static visualizations in terms of the lie-factor, but we can apply the same principle in a dynamic setting. In the ideal case (no “lie”), the lie metric has the value of 1. Intuitively, the lie factor increases as change faithfulness decreases, and so for two “distinct” data elements d′ and d we can measure the change faithfulness (normalized) as: ′



fchange (d′ , d, ℓ′ , ℓ) = e−lie(d ,d,ℓ ,ℓ)

(3.7)

The values of this change metric can vary from 1 (very change-faithful) to 0 (changeunfaithful).

82

3.6 Example 1: Multidimensional scaling and force directed approaches

3.6

Example 1: Multidimensional scaling and force directed approaches

This section discusses the multidimensional scaling (MDS) [54] and force directed approaches to Graph Drawing [101, 130, 267, 121, 137, 176] in terms of faithfulness. The MDS approach to Graph Drawing works as follows. The input is a graph G = (N, E), and a |N | × |N | matrix of dissimilarities δu,v . The goal is to map each node u ∈ N to a point pu ∈ Rk such that the given dissimilarities δu,v are well-approximated by the distances du,v =kpu − pv k. The set of points pu forms the layout ℓ = V (G) of G. In practice, k is commonly 2 or 3. In most applications, δu,v is chosen to be the graph theoretic distance between nodes u and v. To measure the success of an MDS function, a “stress” [209] function is commonly used to compute the distortion between dissimilarities δu,v and fitted distances du,v . In the simplest case, the stress of a layout ℓ ∈ L is: stress(node) (ℓ) =

X

(δu,v − du,v )2 ,

(3.8)

u6=v

MDS can be seen as an optimization problem where the goal is to minimize this stress function. Force directed algorithms have a similar flavour, but view the problem as finding equilibrium in a system of forces.

3.6.1

Information faithfulness

For most MDS approaches, there is a likelihood that vertices overlap in the “optimal” layout. In these cases, it is not information faithful, and different MDS methods would produce the same result from different data sets.

3.6.2

Task faithfulness

The stress formula (3.8) can be seen in terms of our task faithfulness framework. Suppose that T is a task that depends on the graph theoretic distance between nodes; let R be the set of real-valued matrices indexed on the node set. For a graph G = (N, E) ∈ D,

83

3. VISUALIZATION MODEL let TD (G) be the matrix [δu,v ]u,v∈N . Suppose that the visualization V places node u at location pu ; let TL (V (G)) be the matrix [du,v ]u,v∈N . Then define (node)

ftask (G) = max kTL (V (G)) − TD (G)k2 ,

(3.9)

G∈Dn

where Dn is the set of graphs of size n and k · k2 is the Frobenius norm. Clearly, minimising task faithfulness is equivalent to minimising the stress defined by (3.8).

3.6.3

Change faithfulness

Further, we can evaluate the change faithfulness of an MDS method. In fact, MDS methods have been used extensively in dynamic settings, using stress to preserve the mental map. Suppose that at time t, we have a graph G(t) , and the visualization (t)

function places node u at point pu at time t. A stress function can calculate the ′

difference between the layout ℓ(t) ∈ L at time t and the layout ℓ(t ) ∈ L at an earlier time t′ : ′

stress(node) (ℓ(t) , ℓ(t ) ) =

X



(t ) 2 kp(t) u − pu k .

u∈N

In the so-called “anchoring” approach, t′ is zero; in the “linking” approach, t′ is the previous time frame before t (see [53]). These measures, however, aim for the mental map preservation - or change readability - rather than change faithfulness. For example, they aim to ensure that if the change in the graph is small, then the change in the layout is small. They do not ensure that if the change in the graph is large, then the change in the layout is large. However, we can use the stress approach to define the lie factor, such as: (t ) (t) (t ) X d(t) X δu,v − δu,v u,v − du,v k k ,ℓ ) = ( k)/( k) (t′ ) (t′ ) du,v δu,v u6=v u6=v ′

lie(d

(t)

(t′ )

(t)

(t)

,d ,ℓ

(t)

(t′ )



(t)

(t)

where du,v = kpu − pv k and δu,v denotes the graph theoretic distance between u and v in G(t) . Then the normalized change faithfulness can be measured in terms of the distortion of

84

3.7 Example 2: Edge bundling the data change relative to the layout change: ′



fchange (d(t ) , d(t) , ℓ(t) , ℓ(t ) ) = e−lie(d

3.6.4

(t′ ) ,d(t) ,ℓ(t′ ) ,ℓ(t) )

Remarks

The force directed and MDS approaches has had considerable impact on the commercial world, despite the fact that they do not have explicit or validated readability goals. We believe that the success of these approaches is due to the fact that they have explicit and validated task faithfulness goals with respect to tasks that depend on graph theoretic distances. We suggest that better MDS methods could be designed by optimising their change faithfulness using the lie factor stress above.

3.7

Example 2: Edge bundling

Edge bundling, as illustrated in Figure 3.3b and 3.4b, has been extensively investigated to reduce visual clutter in graph visualizations. Many edge bundling techniques have been proposed, including hierarchical edge bundling [170], geometry-based edge clustering [170, 338, 81, 214], force-directed edge bundling [171, 203, 246, 280] and multi-level agglometive edge bundling [134]. The US airline networks are typical examples used for demonstration of edge bundling methods. Figure 3.7 shows several edge bundling results. Edge bundling seems to increase task readability with respect to some tasks; for example, the classic bundling of air traffic routes in the USA (see [171, 81, 301, 134, 246]) seems to make it easier for a human to identify the main hubs and flight corridors. However, some readability metrics are sacrificed; for example, the number of bends is increased, making individual paths difficult to follow (the authors are not aware of any human experiments that measure readability for edge bundling). In this Section we make some remarks about the faithfulness of edge bundling.

85

3. VISUALIZATION MODEL

(a) Unbundled

(b) FDEB [171]

(c) IBEB [301]

(d) Lambert et al. [213]

(e) TGI-EB [246]

(f ) MINGLE [134]

Figure 3.7: Examples of US airline network visualizations using edge bundling

3.7.1

Information faithfulness

As noted in Section 3.7, edge bundling reduces information faithfulness: as more edges are bundled together, it becomes harder to reconstruct the network data from a bundled layout. We can propose a rough-and-ready metric for this reduction based on the model presented in Section 3.5. Given an input graph G = (N, E), an edge bundling visualization process V partitions E into bundles E = B1 ∪ B2 ∪ . . . ∪ Bk . Let Gi denote the subgraph of G with edge set Bi and node set Ni consisting of endpoints of edges in Bi . Edge bundling methods ensure that Gi is bipartite; suppose that Ni = Xi ∪ Yi is the bipartition of Gi . In the bundled layout, Gi is indistinguishable from a complete bipartite graph on the parts Xi and Yi . This representation has inherent information loss. Let xi = |Xi |, and yi = |Yi |. The number of (labelled) bipartite graphs with P parts Xi and Yi is 2xi yi . Thus if q = ki=1 xi yi , then there are 2q graphs that have the

86

3.7 Example 2: Edge bundling same layout as G. This can be used as a simple model for computing the information faithfulness of V .

3.7.2

Task faithfulness

Most bundling methods use a compatibility function; roughly speaking, a compatibility function C assigns a real number C(e, e′ ) to each pair e, e′ of edges. Two edges e and e′ are more likely to be bundled together if the value of C(e, e′ ) is large. A number of compatibility functions have been proposed and tested; these include spatial compatibility [171] from length, position, angle and visibility between edges, semantic compatibility [203] for bundling multi-attributed edges, connectivity compatibility [280], importance compatibility and topology compatibility in TGI-EB [246]. Some of these functions depend only on the input graph G, and some depend also on the layout of G. For a number of tasks, such as identifying hubs in a network, highly compatible edges are equivalent; the correct performance of such tasks does not depend of distinguishing between them. Here we show that stress functions can be used to define metrics for computing the task faithfulness of the edge bundled layout relative to such tasks. Given a pair of edges e and e′ in an input graph G, let C(e, e′ ) denote their compatibility. We assume that this compatibility function depends only on G and not on its layout. Let ℓ ∈ L be the layout of G. For two edges e and e′ , let d(e, e′ ) be the distance between the curves representing e and e′ in ℓ. We can choose from a number of distance functions for curves, such as the Fr´echet distance [17] and several distance measures for point sets [60]. The stress in ℓ is then defined as: stress(edge) (ℓ) =

X

(C(e, e′ ) − d(e, e′ ))2 .

(3.10)

e6=e′

3.7.3

Remarks

Despite the plethora of recent papers in edge bundling [182], there are few evaluations of effectiveness. Using the formal models outlined above, one can begin to evaluate faithfulness and compare bundling methods. For example, one can test the following hypotheses:

87

3. VISUALIZATION MODEL Hypothesis 1 The force-directed edge bundling (FDEB) [171] methods and its forcedirected variants [203, 246, 280] are task-faithful. Hypothesis 2 Force-directed edge bundling methods are more task-faithful when using more control points per edge.

(a) cycle vs. stress ratio

(b) number of control points vs. stress ratio Figure 3.8: Comparisons of the faithfulness of the edge bundled worldcup visualizations

We have conducted several experiments with the faithfulness metrics, described in Section 3.5. We use the worldcup data1 as our examples. Our initial studies using the metrics above have shown a confirmation of the above hypotheses. 1

data available at http://gd2006.org/contest/WCData/

88

3.7 Example 2: Edge bundling

(a) cycle 0

(b) cycle 2

(c) cycle 5

(d) cycle 9

Figure 3.9: Visualization of FIFA worldcup data year 2006

Figure 3.8 shows statistics of stress values at different iteration cycles of force-directed edge bundling algorithm (FDEB) [171]. The figure has shown that FDEB achieves more faithful results when more number of iterations are performed. Furthermore, in later iterations, the result becomes more faithful as the algorithm uses more control points to improve its layout.

89

3. VISUALIZATION MODEL Figure 3.9 shows the visualizations of the football matches in the year 2006. The figure also shows the bundled results computed by FDEB.

3.8

Example 3: Visualization metaphors

This section discusses our new notions of faithfulness for several representative graph visualization metaphors.

3.8.1

Matrix representation

Besides node-link diagrams, visualization of graphs as matrices form is also popular. In fact, matrix and node-link diagrams have different characteristics and can be used as suitable representations for different tasks and datasets. Generally speaking, matrix metaphor is very faithful. All the nodes and edges are represented in the visualization. Matrices do not suffer from node overlapping and edge crossing. However, matrix metaphor is not very task-readable for several tasks. Ghoniem et al. [142] report their studies on the performance of matrix and node-linked diagrams for several low level tasks. Their results show that node-link diagrams are in favour of very small (20 vertices or less) and sparse networks. Ghoniem et al. also show that matrices are more effective even for large graphs, except when the tasks are related to path tracing. Path-related tasks (such as path tracing between nodes or finding a shorted path for a pair of nodes) are the weakness of matrices; this known problem is reported in the study of Ghoniem et al. [142]. For certain tasks, matrix representations are more task-readable than node-link diagrams. For example, for tasks such as locating and selecting nodes, this representation is more appropriate as node labels are often more readable. Matrices do not suffer from edge crossing, which is the most trouble-some for viewing node-link visualization of dense graphs. Lastly, matrices could give an immediate overview of the sparse and dense regions within a network as well as the directedness of the connections. A number of methods aim to improve readability of the matrix metaphor. For example, reordering columns/rows to show highly connected groups of nodes [257, 163] increases

90

3.8 Example 3: Visualization metaphors readability without sacrifice faithfulness. The matrix metaphor often requires a quadratic space (of the number of nodes) to display. To visualize large graphs, information-reduction methods such as collapsing rows and columns are applied. However, such information-reduction methods increase readability and sacrifice faithfulness.

3.8.2

Cartography

Map-based approach is a promising way to produce appealing visualizations of graphs [138, 175, 150]. In map-based visualization, a set of “countries” are drawn on the plane (see Figure 3.10). Each country encapsulates a node or a set of nodes within its boundary. Edges are then drawn to connect its adjacent nodes. The visualization produced by this approach appears appealing and looks like a typical geographic map that ones usually see. Maps have been studied to display changes and trends in dynamic data [226] and stream data [133]. See Section 2.2.3.2. These map-based approaches increase task faithfulness for many tasks, such as identifying clusters or similar topics. However, map-based approaches sometimes sacrifice information faithfulness. For example, links with small weights are sometimes discarded to enable users to focus on more important links and to create appealing maps with more readable boundaries.

3.8.3

Compound visualizations

Compound visualization techniques combine several types of visualization metaphors into the final result. Examples of compound visualizations include MatLink [164] and NodeTrix [165], which integrate matrix views with node-link diagrams. More examples are given in Section 2.2.3 and Section 2.2.4. Figure 3.11 shows an example of the hybrid visualizations by NodeTrix [165]. NodeTrix proposed by Henry et al. [165] integrates node-link diagrams to show the global structure of the network and matrix-representation of groups of nodes. Thus, the method reduces the complexity and clutter of the network, while still providing all the information.

91

3. VISUALIZATION MODEL

(a) Author collaboration map for the GD conference (1994-2004)

(b) A map of trade relations between countries Figure 3.10: Map-based visualizations (by Gansner et al. [138])

92

3.9 Discussions and future work

Figure 3.11: A Hybrid Visualization of Social Networks using NodeTrix

In general, these techniques [164, 165] seem to increase readability for some parts of the networks that are of most interest and at the same time may sacrifice faithfulness and readability in other parts of the network that are less important. Furthermore, there can be a trade-off between global readability and local readability, and between global faithfulness and local faithfulness.

3.9

Discussions and future work

This section discusses some important factors in visualization that have direct impacts on visualization research, and their implications on the faithfulness concepts. Research in visualization in general and graph visualization in particular is influenced by two main factors: (1) display capacity (hardware), and (2) interaction. The former is related to the availability of display resources. The latter refers to the development of novel interaction methods to adapt the main display parameters such as the level of details, data selection and aggregation. These factors have a major impact on how the data is presented. We discuss each of these below.

3.9.1

Display device

Although there have been numerous revolutions in hardware technologies, screen size is still a precious but limited resource. Technically speaking, “screen size” refers to the number of pixels in a display rather than the physical dimensions of the screen.

93

3. VISUALIZATION MODEL Examples of displays include the high-resolution displays, large-scale power walls and small portable devices. In 2012, commercial computer screens have at most around 9 million pixels. Faithfulness metrics should be addressed in a specific formula depending on the characteristics of the available output devices. For visualizing large data sets, the large number of data elements can compromise performance or overwhelm the capacity of the viewing platform. In addition, visualization methods should take user-centric requirements (from user inputs or interactions) into account to produce best results that balance between faithfulness and readability.

3.9.2

Interaction

User interactions on the data help users to derive knowledge. Interaction techniques, such as zooming, panning or rotating, are very helpful to explore large data sets. Ideally, the user interface and interaction techniques should be simple enough to help users to perform the tasks with less cognitive efforts; a complex user interface may sometimes distract the users. Novel interaction techniques need to seamlessly support visual communication (user inputs) of the user with the system. These techniques are useful for navigating and analyzing the data, memorizing insights and making decisions. Consider the case when the user may need to explore a certain part of the graph. For example, the user may make data queries or manipulations of the same graphical elements several times. The visualization system can derive the users’ area of interest from the interactions. The layout adjustment algorithms should consider these interests in adapting user needs. These algorithms could cleverly sacrifice the overall faithfulness (the overall structure) to the local faithfulness (of the part of the graph that the user is interested in). We then discuss the implications of faithfulness concept from three specific types of interactions.

94

3.9 Discussions and future work 3.9.2.1

Affine transformation

Intuitively, faithfulness does not depend on global affine transformations such as rotation, translation and scaling. As long as the drawing is consistent with the data, the visualization is considered faithful. In fact, a faithful picture preserves its faithfulness under these affine transformations because a faithful visualization is injective. However, some other transformations of the local parts of the graphs may increase readability of these parts of the graphs, yet may degrade faithfulness of other parts of the graphs.

3.9.2.2

Distortion techniques

Distortion techniques, such as lens effects and occlusion reduction [328, 327] also provide the analyst with trade-off between faithfulness and visual clarity. These methods typically transform object’s position to improve local readability at the cost of accuracy of global relations.

3.9.2.3

Level of detail

In many cases, relevant data patterns and relationships may need to be visualized in several levels of detail, which combines visualizations of selected analysis details and a global overview. In some cases, these patterns may need visualizations at appropriate levels of data and visual abstraction. The overall goal is to balance between faithfulness and readability to maximize user expectations.

3.9.2.4

Model extension

A possible future work is to extend our graph visualization model (Section 3.3) and faithfulness model (Section 3.4). Figure 3.12 shows our enhanced model of the graph visualization model, which is depicted in Figure 3.5. In this new model, the faithfulness concept has been extended. Not only the faithfulness is defined from the visualization process (from data to pictures), but also from the

95

3. VISUALIZATION MODEL Data

Picture S

Human E

dS/dt

P

V

dK/dt L

D

Readability

K

Faithfulness

Task

T

R

Figure 3.12: Enhanced graph visualization model

combination of the visualization and the perception (from data to pictures to knowledge). Intuitively, the new model of faithfulness is also concerned with the consistency between what the users are perceived and what is given in the data. We extend the faithfulness concept in Section 3.4 to perceptual faithfulness. Generally speaking, this new concept of perceptual faithfulness is a compound of the visualization faithfulness and the visualization readability. This is inherently more general than our basic model. We show the following intuitions for such an extension: • A visualization, which is readable, may not be faithful. This observation is the main motivation of this chapter; and our motivating example is given in Section 3.1.1. • A faithful visualization may not be readable. Intuitively, one can choose a visual encoding such that: (1) it is faithful (so that the user can still decode to the underlying data, given that the user knows how to decode it) and complex enough (so that it is not straight-forward to other users who do not know how to decode the visualization to get the underlying data). • A visualization is perceptually faithful if it is both faithful and readable. Ideally, it is a faithful encoding from data to picture (injective). Meanwhile, the viewer can see the picture and perceive all the information within it.

96

3.10 Concluding Remarks This notion of perceptual faithfulness is apparently useful for modern visualization techniques to handle large and complex graphs. Visualization techniques to be considered good should aim for both faithfulness and readability (or so-called perceptual faithfulness).

3.9.2.5

Metrics for compound visualizations

Section 3.8.3 discusses some implications of faithfulness on compound graph visualizations. It would be interesting to define faithfulness metrics for compound visualizations. Intuitively, the compound methods should balance between faithfulness values of different parts of the networks, and they should also balance between faithfulness globally (overall) and locally (sub-networks).

3.10

Concluding Remarks

This chapter has introduced the concept of faithfulness for graph visualization. We believe that the classical readability criteria are necessary but not sufficient for quality graph drawing; faithfulness is a generic criterion that is missing. The chapter has described the faithfulness concept in a semi-formal model. We have distinguished three kinds of faithfulness: information faithfulness, task faithfulness, and change faithfulness. Table 3.1 gives a summary of these faithfulness concepts.

Based on the visualization model and the faithfulness concepts, we have also presented a model for faithfulness metrics. In Section 3.6, 3.7 and 3.8, we illustrate the faithfulness concept with three examples. • The first example is multidimensional scaling / force-directed methods. We believe that future directions of these methods would need to balance the aims of readable outputs versus faithful representations. • The second example is edge bundling. Despite a recent upsurge of interest in edge bundling, there are very few evaluations; we show that faithfulness metrics may prove the key to evaluation.

97

3. VISUALIZATION MODEL Readability

• The data set is faithfully represented by the picture.

• The user perceives the all the data in the picture.

• All the original data is in the picture.

• The perception process is injective.

Task

• The visualization process is injective.

• The picture contains enough data to correctly perform the task

• The user perceives enough data from the picture to correctly perform the task.

Change

Information

Faithfulness

• Changes in the picture are consistent with changes in the data.

• The mental map is preserved. • The user can remember one screen from the previous screen.

Table 3.1: Faithfulness and readability

• The last example includes matrix metaphors and map-based visualizations. These methods are useful for large graphs; we show that future directions of these methods would need to balance between global / local readability versus global / local faithfulness.

Faithfulness Visualization Force-directed MDS Edge concentration Confluent drawing Edge bundling Matrix metaphor Map-based metaphor Combination

Readability

Info

Task

Change

Info

Task

Change

+ -

+ + + + + + + +

+ + + + +

+ + -

+ + + + + + +

+

Table 3.2: Faithfulness of existing visualization methods

98

3.10 Concluding Remarks

3.10.1

Guidelines

Table 3.2 summarizes the faithfulness of some selected visualization methods. In this table, ‘−’ represents ‘no’ or ‘low’; ‘+’ represents ‘yes’ or ‘high’. These results may give readers a reference, which may not be the same with the readers’ viewpoints. A ‘+’ for task faithfulness should be interpreted as: a method is faithful for some tasks, rather than faithful for all tasks. In Section 3.6, 3.7 and 3.8, we describe several specific tasks, in which these visualizations are task-faithful.

3.10.2

Remarks on 3D drawings

We conclude this chapter with a remark about 3D graph drawing (for more details of 3D graph drawing techniques, see Section 2.2.5). The occlusion problem for 3D means that, even with binocular displays, some part of the graph is always hidden. This can be seen as a lack of not only readability but also information faithfulness. We suggest that the lack of commercial impact of 3D graph drawing is partially due to its inherent lack of faithfulness. The next two chapters describe our research focusing on edge bundling. Rather than considering the readability and faithfulness of edge bundling in general as described in Section 3.7, we propose new frameworks to help visual analysis of large, complex and dynamic graphs.

99

3. VISUALIZATION MODEL

100

Chapter

4

TGI-EB: Edge Bundling integrating Topology, Importance and Geometry “Visualize this thing that you want, see it, feel it, believe in it. Make your mental blue print, and begin to build.” — Robert Collier

4.1

Introduction

In this chapter, we describe our approaches for visualizing large, complex, and dense graphs based on edge bundling. For large and complex networks, constructing an “overview” is very useful for conveying information and commonly used for extracting global patterns, such as clusters and outliers in a data set. However, visualising large and complex networks is very subtle and challenging. It is especially true for large dense graphs. A known issue for large, complex and dense graph visualizations is “visual clutter”; this impairs understanding of the underlying graph structures. The visual clutter is caused by overlapping of nodes and edges (see Section 2.2.9.2). As such, one can find it hard to get into details the graph structures. This clutter hinders human understanding and analytic tasks.

101

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY There are a number of approaches to reduce visual clutter in visualization. Amongst these approaches, edge bundling methods became popular for visualising large dense networks since the work of Phan et al. [258] and Holten [170]. In edge bundling methods, edges are typically polylines or splines that are bundled together if they are “compatible”. Edge bundling has received much attention by the Graph Drawing community and Information Visualization community [135, 170, 81, 171, 214]. Section 2.3 covers more background about visual reduction techniques and edge bundling. This chapter also makes use of social network analysis methods (for example, see [318]), which can be used to find structural properties in networks. One important method is centrality analysis, which determines the relative importance of vertices and edges in a network [46, 125, 126, 127, 207, 243, 318]. Another important method is k-core decomposition, used to identify cohesive groups of actors within a network [318, 38, 64]. Section 2.4 covers more details of social network analysis. In this section, we give a motivating example and then describe our aims and contributions.

4.1.1

Motivating example

To illustrate the usefulness of new bundling framework, we use an example of biological networks. We use the data set that we have examined collaboratively with bioinformatics group including S. J. Janowski, J. Stoye and C. Kaltschmidt from University of Bielefeld, Germany [188]. The example dataset is an integrated NF-κB protein-protein interaction and signalling transduction network. Figure 4.1 shows a graph visualization1 by a force-directed method, in which visual clutter is severe from the density of the edges and many crossings between edges. It is almost impossible to conduct any useful analysis with this “hair-ball” visualization. In contrast, Figure 4.2 shows a k-core radial layout, in which our centrality-based edge bundling is applied. This figure clearly shows six important paths and structures of the network. From this figure, important elements can be easily identified, for example, more important elements are placed closer to the center. One also can examine connections among important elements, and between important with less important elements. 1 High resolution versions of the figures used in this chapter can be downloaded from http://it.usyd.edu.au/∼qnguyen/edgebundling/highresolution

102

4.1 Introduction

Figure 4.1: NF-κB network visualization using force-directed layout and without bundling

These results have helped the bioinformatics scientists from Bielefeld to confirm known connections [188]. Furthermore, the visualizations produced by our framework have guided the biologists to derive new biological hypotheses, and laboratory experiments have subsequently been conducted (see Section 4.5.2 for details).

4.1.2

Aims and contributions

The main goal of visualizing the overall structure of a data set is to help convey information about the topological structure and to help analysis tasks. Typical edge

103

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY

Figure 4.2: NF-κB network visualization using our RadEB

bundling approaches have not yet integrated directly within their models the concerns of structural importance and topology of the graphs. Furthermore, existing edge bundling methods have yet exploited modern metaphors such as orthogonal layouts [106, 105] and 2.5D visualizations (see Section 2.2.6), although the end results of edge bundling are impressive and show high-level simplified structures of the network. To overcome such limitations, we propose a new approach to help improve edge bundling results. In this chapter, we present a new framework for edge bundling, which tightly

104

4.1 Introduction integrates topology, geometry and importance. In particular, we introduce new measures of edge compatibility based on network analysis and topology, namely importance compatibility and topology compatibility. The new compatibility measures are independent of the geometry of the given input drawing. • Importance compatibility As an example of a definition of importance compatibility, we use social network analysis methods [318]. For example, centrality analysis determines the relative importance of vertices and edges in a network. The k-core decomposition can be used to identify cohesive groups of actors within a network. • Topology compatibility As an example of a definition of topology compatibility, we use clustered graph model. We also introduce plane compatibility adapted from geometry compatibility for edge bundling in 2.5D visualizations. We present five variations of force directed edge bundling method, based on our new framework. The first three versions aim at topology and important analysis. • CenEB (Centrality-based edge bundling): integrates edge centrality analysis with edge bundling. • TopoEB (Topology-based edge bundling): integrates clustered graph topology with edge bundling. • RadEB (Radial edge bundling): integrates k-core analysis with edge bundling. The other two variations are introduced to work with modern visualization metaphors including orthogonal graph drawing and 2.5D visualizations. • OrthEB (Orthogonal edge bundling): uses orthogonal-like edge representation to produce orthogonal-like crossings. • 2.5D-EB (2.5 dimensional edge bundling): integrates 2.5D visualizations with edge bundling. We implemented our new framework and conducted experiments with social networks, biological networks, geographic networks and clustered graphs. Our experimental results show that our new framework is useful for highlighting the most important topological skeletal structures of the input network; this is significantly useful for visual analysis (see Section 4.5).

105

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY In summary, this chapter makes the following contributions: • New compatibilities, such as, topology compatibility, importance compatibility, and plane compatibility for edge bundling; • TGI-EB framework for edge bundling integrates new compatibilities, for example, topology compatibility, importance compatibility and plane compatibility with the base geometry compatibility; • Based on the framework, we propose five different variations of force-directed bundling methods. The rest of the chapter is organized as follows. Section 4.2 gives a survey of related work on edge bundling and network analysis. Section 4.3 describes our new edge compatibility metrics. Section 4.4 presents our TGI-EB framework with the five variations of force-directed methods. The experimental results are given in Section 4.5. Section 4.6 concludes and suggests several future directions for work.

4.2

Related work

4.2.1

Edge bundling

The use of attractions on control points for curved edges was introduced [57], though the term “edge bundling” was coined several years later by others. Hierarchical approaches The study of edge bundling has attracted more attention since the work of Phan et al. [258], which presents an edge clustering method for the edges of migration networks. Holten [170] present Hierarchical Edge Bundling method for hierarchical graphs using B-splines. Then Zhou et al. [338] present another hierarchical edge clustering using Delaunay triangulation, where control points are hierarchically clustered by energybased optimization. Circular layout Gansner and Koren [135] improve circular layouts by merging splines of edges to minimize the total amount of ink needed to draw the edges. Cornelissen et al. [78, 79] apply a circular bundle view of hierarchical graphs in software engineering, such as program execution traces.

106

4.2 Related work Force-directed approaches Holten and van Wijk introduce a Force-Directed Edge Bundling (FDEB) algorithm [171], which models edges intuitively as flexible springs that can attract each other. The attractive force depends on the distance of the springs and the compatibility of the edges. The method achieves smoother bundles that are easy to read, although it incurs high computational complexity. More specifically, the FDEB algorithm first inserts control points in each edge, and then uses a force-directed method to compute the position of the control points. In the original version, forces depend on the “geometry compatibility” G(e, e′ ), for edges e, e′ ∈ E. For an edge e which has k subdivision points or control points, let ei (1 ≤ i ≤ k) denote the i-th control point of e. For consistency, let e0 and ek+1 denote the end points of e.

Figure 4.3: Forces in FDEB. Two interacting edges e and e′ . The spring (Fspring ) and electrostatic forces (Felec ) on a control point e2

For a subdivision point ei on edge e, the total force F (ei ) exerted on ei is a sum of the two spring forces exerted by two neighbours ei−1 and ei+1 , and the total of electrostatic forces Fspring . Figure 4.3 depicts examples of the forces. In this figure, the forces are: spring forces (Fspring ), and electrostatic forces or repulsion forces (Felec ). The total force F (ei ) is defined by: F (ei ) = Felec (ei ) + Fspring (ei ) Felec (ei ) = ke (|pei−1 − pei | + |pei − pei+1 |)

(4.1) (4.2)

where ke is the stiffness of edge e, and px is the location of x. Note that, these forces only apply on control points of edges; these forces do not apply on the end points of edges.

107

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY In the original FDEB, electrostatic force model Fspring is defined by: Fspring (ei ) =

X

G(e, e′ ) · |pei − pe′i |−d ,

(4.3)

e′ ∈C(e)

where d is a numeric constant, and C(e) is the set of “compatible edges” of e. The set C(e) contains all edges e′ ∈ E such that compatibility G(e, e′ ) is greater than a given threshold value. This helps to reduce running time for computing new positions for control points of edges. At each iteration, the algorithm examines those edges contained in C(e) for each e, rather than all edges in E \ {e}. Geometry-based approaches Geometry Based Edge Bundling [81] uses a control mesh for edge clustering, where edge bundles share the same control points on the mesh. The control mesh is then generalized to route graph edges using a shortest path algorithm and mesh edge weights are updated to encourage graph edges to share mesh edges [214, 213]. Qu et al. [266] present controllable and progressive edge clustering for large networks. Image-based approaches Telea and Ersoy [301] propose an image-based edge bundling approach that aims for coarse-grained edge shapes of bundled edges to further simplify visual representation of the network structure. Ersoy et al. [110] introduce skeletonbased edge bundling, which is a variation on the approach by Telea et al. [301]. To improve performance, GPU processing power is used for real-time interactions in real applications [22]. Other approaches Balzer and Deussen [32] propose a multi-level compound visualization using transparent surfaces and edge bundling for a hierarchical 3D visualization. Pupyrev et al. [261] considere edge bundling in layered drawings in which edges are already routed as polylines or splines; the method preserves the topology of the original drawing and disambiguates edges. Gansner et al. [134] introduce a multi-level method which approximates k-neighbor edge proximity graphs using kd-tree as input, for an agglomerative bundling algorithm. They report experiments on the approach up to one million edges in a few minutes.

108

4.2 Related work 4.2.1.1

Compatibility

Previous work on edge bundling reduces visual clutter and displays some high-level patterns. Yet the “bundles” are mainly based on geometry in disregard of the importance and the topology of the network. Section 4.1.1 gives an overview of several geometric compatibility measures, proposed in FDEB [171]. This motivates our new framework for edge bundling which integrates topology, geometry and importance, to highlight important skeletal structures of the networks. Our TGI framework [246] proposes importance compatibility and topology compatibility to improve edge bundled results. Independently, there are several works extending FDEB with other criteria for edge bundling [203, 280] to target different application domains. To improve bundling results, these works have proposed a variety of compatibility measures: • Semantic compatibility by Kienreich and Seifert [203] • Connectivity compatibility by Selassie et al. [280]; • Tracebility by Fabian et al. [113]. These measures, however, are different from our importance compatibility, topology compatibility and plane compatibility, which are described in Section 4.3.

4.2.2 4.2.2.1

Social network analysis Centrality analysis

Centrality is one of the most well-known individual-level network analysis method, which determines the relative prominence of vertices and edges in a network [318]. Centrality measures include, for example, degree, betweenness, closeness, eccentricity, eigenvalue and status; see, for example, Wasserman and Faust [318], and Brandes and Erlebach [52]. Centrality is also defined for edges to determine the importance of connections within a network. Edge centrality has been used for analyzing biological networks [336] and for community detection [253]. Section 2.4.2 describes more details on centrality analysis.

109

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY 4.2.2.2

k-core analysis

A major concern of social network analysis is identifying of cohesive subgroups of actors, among whom there are relatively strong ties [318, 52]. An important concept in group analysis is the k-core of a graph. A k-core of a graph G is a maximal connected subgraph of G such that the degree of each vertex is at least k in the subgraph. The notion of k-core has been used in social networks [64], and biological networks [30, 322, 332, 315]. For more details, see Section 2.4.3. We aim to integrate network analysis and k-core analysis into edge bundling.

4.3

New edge compatibility measures

Existing edge bundling methods mainly use geometry to define geometry compatibility G(e, e′ ). For instance, several metrics are proposed in FDEB (force-directed edge bundling) method [171] to define geometry compatibility (in FDEB paper, C(e, e′ ) is used) (see Section 2.3.2.2).

4.3.1

Importance compatibility

Here, we introduce a new measure “importance compatibility” to integrate importance into geometry for edge bundling. Importance compatibility is conceptually a new measure to guide the bundling with respect to the importance of edges. Thus, this measure is independent of geometry, which is computed from the positions of nodes in a graph. Importance can be defined from application domain or specific analysis in analytic task.

4.3.2

Topology compatibility

We now introduce another new notion of compatibility, called “topology compatibility”. The topology compatibility can be defined from topological structure or combinatorial structure of a given graph model. The topology compatibility, like importance compatibility, is independent of geometry.

110

4.4 Integrated framework for edge bundling

4.3.3

Plane compatibility

We also introduce a new notion of compatibility, called “plane compatibility”. The plane compatibility can be defined from two-and-a-half dimensional visualizations, such as, multi-plane graph visualizations. The plane compatibility takes into account the information about planes in which the vertices reside. The plane compatibility, to a certain extent, is a new type of geometry compatibility, which is designated for the 2.5D metaphor.

4.4

Integrated framework for edge bundling

This section presents our new generic framework for edge bundling which tightly integrates topology, geometry and importance. Our framework is flexible: one can use other measures for importance, geometry, and topology. For our specific framework, we first use a force-directed edge bundling method as a basis, and then integrate geometry with importance, defined by centrality and k-core analysis. We then integrate plane into the model, defined by a 2.5D graph model. We further integrate topology into the model, defined by a clustered graph model.

4.4.1

The framework

As an example of the integrated framework, we integrate our new edge compatibility measures into Holten and van Wijk’s Force-directed edge bundling method (FDEB) [171]. Section 4.2.1 gives more details of FDEB. We use FDEB as a base model for our framework because: (1) FDEB is one of the first edge bundling techniques that can be applied for general graphs, (2) it has an intuitive force- directed model for bundling edges that can be easily adapted to include new types of compatibility. As such, we then show our TGI-EB framework integrating topology, geometry and importance. This framework can be described as the following two versions.

111

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY 4.4.1.1

TGI framework (2D)

In its most general form, our new electrostatic force model can be defined as: Fspring (ei ) =

X

G(e, e′ ) · I(e, e′ ) · T (e, e′ ).g(|pei − pe′i |),

(4.4)

e′ ∈E

where: • C(e) is the set of compatible edges of e; • G(e, e′ ) is the geometry compatibility for a pair of edges e and e′ ; • I(e, e′ ) is the importance compatibility for a pair of edges e and e′ ; • T (e, e′ ) is the topology compatibility for a pair of edges e and e′ ; • g is a function of |pei −pe′i |. For instance, function g can be defined as |pei −pe′i |−d , where d is a numeric constant. Typically, d is chosen as 1, 2 or 3.

4.4.1.2

TGIP framework (2.5D)

The most general form defines electrostatic force model as: Fspring (ei ) =

X

G(e, e′ ) · I(e, e′ ) · T (e, e′ ) · P (e, e′ ) · g(|pei − pe′i |),

(4.5)

e′ ∈E

where P (e, e′ ) is the new plane compatibility. Note that, we extend the geometry compatibility G(e, e′ ), which is defined for 2D drawings, to use it for 2.5D drawings. The geometry compatibility is then defined using 3D vertex positions. Our new framework TGI-EB is very general and flexible. For example, one can derive various models by controlling the weight parameters between G(e, e′ ), I(e, e′ ), T (e, e′ ) and P (e, e′ ). Furthermore, one can define specific metrics for geometry compatibility, importance compatibility, topology compatibility and plane compatibility.

4.4.2

Centrality based edge bundling (CenEB)

As an example to define importance compatibility, we use edge centrality (see Section 2.4.2.2). Centrality is the most well-known network analysis method, which de-

112

4.4 Integrated framework for edge bundling termines the relative prominence of vertices and edges in a network [318, 52]. For instance, edge centrality analysis, which finds the important edges, has been used for mesh coarsening, analyzing biological networks and community detection. Our CenEB is a special case of the general model TGI-EB described in Equation (4.4), which integrates importance compatibility and geometry compatibility, and T (e, e′ ) is absent. We use the edge centrality metric to highlight important edges and bundle high centrality edges together. The most general form of our electrostatic force model for CenEB is defined by: Fspring (ei ) =

X

G(e, e′ ) · I(e, e′ ) · g(|pei − pe′i |),

(4.6)

e′ ∈E

where: • C(e) is the set of compatible edges of e, • I(e, e′ ) is calculated based on the centrality values of the edges e and e′ . • g=|pei − pe′i |−d , where d is a numeric constant;

4.4.3

Topology based edge bundling (TopoEB)

As an example of topology compatibility, here we use a flat clustered graph model. In this simple clustered graph model, we assume a graph G = (V, E) consists of a set of clusters, each of which contains a set of nodes (a subset of V ) and edges (a subset of E). This flat model does not allow nested clusters. Section 2.2.4.1 describes more general clustered graph models (for example, a clustered graph C=(G(VG , EG ), H(VH , EH )) is defined from a graph G with an additional cluster hierarchy tree H). We actually use general clustered graph model to define 2.5D edge bundling (see Section 4.4.6). With a clustered graph model, an edge that connects two nodes of the same cluster is called an intra-cluster edge, while an edge connecting two nodes from different clusters is called an inter-cluster edge. Using the topology of the clustered graphs, we can define topology compatibility as follows:

113

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY • Two intra-cluster edges are not topology-compatible unless they belong to the same cluster; • All inter-cluster edges are topology-compatible; in fact, they all belong to the root cluster G; • A pair of an intra-cluster edge and an inter-cluster edge is not topology-compatible. The metric is defined in three cases depending on whether the edges e and e′ are intracluster edges in the same cluster (intra-intra), inter-cluster edges (inter-inter), or one inter-cluster edge and one intra-cluster edge (inter-intra). In fact, the benefits of topology compatibility in clustered graph model are two-fold. • First, by using topology compatibility, the number of compatible edges C(e) of an edge e can be significantly reduced, which results in faster bundling iterations. • Second, for better flexibility, one can define a topology compatibility metric T (e, e′ ), which may allow bundling intra- and inter-cluster edges together. As an example of integration for TopoEB, we can integrate topology compatibility with geometry compatibility. TopoEB is the special case of the general model TGI-EB in Equation (4.4), and can be described as follows: Fspring (ei ) =

X

G(e, e′ ) · T (e, e′ ) · g(|pei − pe′i |),

(4.7)

e′ ∈E

where T (e, e′ ) is defined from the clustered graph model for two edges e and e′ . As a general example of integration for TopoEB, we integrate importance compatibility. That is, TopoEB can be described from Equation (4.4) as follows: Fspring (ei ) =

X

G(e, e′ ) · I(e, e′ ) · T (e, e′ ) · g(|pei − pe′i |),

(4.8)

e′ ∈E

where: • I(e, e′ ) is defined based on the centrality values of e and e′ , • T (e, e′ ) is defined from the clustered graph model. Note that, inter-cluster edges often have higher edge centralities than intra-cluster edges.

114

4.4 Integrated framework for edge bundling Two specific variations are TopoEB-A and TopoEB-B described below.

4.4.3.1

TopoEB-A

For example, the metric T (e, e′ ) can be simply defined as: • cintra : if e and e′ are intra-cluster edges in the same cluster; • cinter : if e and e′ are inter-cluster edges; • cmix : if e and e′ is a pair of an intra-cluster edge and an inter-cluster edge; where the constants cintra , cinter and cmix are chosen from 0 to 1. It should be noted that when these constants are chosen equal, the value T (e, e′ ) is the same for every pair of edges and thus the method is said topology-insensitive. When cmix is zero, there is no bundling between intra-cluster edge and inter-cluster edge. Generally, one may choose a value close to 1 for cintra and cinter and a small value for cmix .

4.4.3.2

TopoEB-B

With TopoEB-A, all inter-inter edges are treated equally and in some cases. Thus, the bundled results from TopoEB-A are sometimes not easy to analyse for a particular pair of clusters. To improve bundled results, a new model is proposed which extends TopoEB-A. The new model considers different inter-inter edges as different. That is, the new model prefers to bundle inter-cluster edges from same pair of clusters (e.g., cluster A and cluster B); or the new model prefers to bundle inter-cluster edges starting or ending at the same cluster. That is, the inter-inter cluster edges in TopoEB-A are classified into three following cases in TopoEB-B. • cinter-same : if e and e′ are inter-cluster edges from the same pair of clusters; • cinter-adj : if e and e′ are inter-cluster edges either starting or ending from the same cluster, but not having the same pair of clusters; • cinter-other : for the other cases of inter-cluster edges e and e′ ;

115

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY where cinter-same , cinter-adj , cinter-other are some constants from 0 to 1. When these constants are chosen equal, the TopoEB-B is the same with TopoEBA. Generally, one may choose a value close to 1 for cinter-same to simplify connections between any same pair of clusters. One may choose a high value for cinter-adj to improve tracebility of edges to or from the same cluster.

4.4.4

Radial edge bundling (RadEB)

We now present another variation of edge bundling, called Radial bundling (RadEB), which uses a radial layout consisting of concentric circles for the input of edge bundling. The radial layout can be used to display hierarchy or k-core analysis of graphs. As a specific example in this chapter, we used k-core analysis to define a radial layout.

Figure 4.4: Forces in radial layout: radial forces and clustering forces

An important group-level network analysis task is to identify cohesive subgroups of actors with strong ties [318, 52]. A well-known example is the k-cores of a graph, each of which is a maximal-connected subgraph whose nodes have the induced degree at least k [52]. The k-core analysis has been used in social networks such as collaboration networks, and biological networks for analyzing PPI networks. For a radial layout, we use forces to constrain the vertices u in each k-core to a circle of radius ru =f (k), where f is typically a linear function, although we have also used logarithmic functions. The forces place vertices from the same k-core along the same circle.

116

4.4 Integrated framework for edge bundling We integrate the standard force-directed layout method with a new radial force for each vertex u: Frad = crad (|pu − po | − ru ), where crad is a constant, o is the center of the circles.

4.4.4.1

Clustering constraints

We further extended RadEB to handle clustering constraints. We introduce a similarity clustering force, which attracts vertices of close similarity indices together. Thus, a new attraction force s(u, v) is applied between every pair of vertices u and v: Fclus =

X

s(u, v) · exp(−|iu − iv |),

(u,v)∈E

where s is some distance function, and iu and iv are clustering indices of u and v, respectively. The clustering index can defined based on the application: for example, functional-similarity for biological networks, and group membership for social networks. Figure 4.4 shows an example of radial forces and clustering forces. Note that, our radial layout is different from the k-core visualization by Alvarez et al. [18], which produces a radial layout using the polar coordinates. In fact, our model is more flexible, since we can further combine clustering constraints. After producing the radial layout for visualising k-cores, we apply our CenEB to the resulting layout for radial bundling.

4.4.5

Orthogonal edge bundling (OrthEB)

We also present a new variation of edge representation for edge bundling, called OrthEB, which produces orthogonal-like edge bundles. Orthogonal edge bundling can be effective to produce bundles with right angle crossings. More specifically, we adapt forces in CenEB using magnetic field forces [294], to produce orthogonal-like bundled edges. Figure 4.5a and Figure 4.5b show examples of forces in CenEB and OrthEB in each iterations, respectively.

117

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY

(a) Two interacting edges e and e′ . The spring and electrostatic forces on a control point e2

(b) Orthogonal forces on edge e Figure 4.5: Examples of forces in CenEB and OrthEB

The orthogonal forces are applied on the control points of each edge. The orthogonal force on point ei is based on the tangent of the subsegment ei−1 ei of the edge, and ei is sequentially moved towards the axis (either x-axis or y-axis) that forms a smaller angle. Consequently, sub-segments are placed almost horizontally or vertically. In the final drawing, splines are used to connect the control points in each edge to achieve aesthetically pleasing bundling effects.

4.4.6

2.5D bundling (2.5D-EB)

In previous variations, we only consider drawings on a 2D plane. In this section, we present another variation of edge bundling that can be used for 2.5 dimensional (2.5D) visualization.

118

4.4 Integrated framework for edge bundling Previous work on 2.5D visualization includes, for example, clustered graph layouts in 2.5D [102, 169] and 2.5D visualizations for temporal-spatial analysis and visualization of graph thickness [99, 51]. There are a few studies of edge routing in 3D [69] and in two planes [75]. For example, VisLink [75] shows element-wise relationships between two visualizations drawn on two planes. Another example is the edge routing technique developed on top of a city view metaphor for software package analysis [69]. As an example of plane compatibility in 2.5D, we use a multi-plane graph model, which is specifically a general clustered graph model (see Section 2.2.4.1). A multi-plane graph model is a clustered graph C=(G(VG , EG ), H(VH , EH )), which is defined from a graph G with an additional cluster hierarchy tree H. Nodes of G are the leaf nodes of the cluster hierarchy, and so VG ⊂ VH . The remaining nodes in VH \ VG are called cluster nodes. The edge sets EG and EH are disjoint (or EG ∩ EH = ∅). This is because EG contains only edges connecting the underlying graph nodes VG , whereas EH contains only edges between cluster nodes and edges between cluster nodes and graph nodes. For simplicity, we assume EH is an empty set; that is, we only consider the edge set EG . Each cluster in this model is a plane. An edge that connects two nodes in the same plane is called an intra-plane edge, while an edge connecting two nodes from different planes is called an inter-plane edge. This multi-plane graph model is more general than the simple clustered graph model used by TopoEB, which is given in Section 4.4.3. In fact, a single plane in this multiplane model may contain a clustered graph. Consequently, our 2.5D model subsumes the TopoEB model in Section 4.4.3. Using the multi-plane graph model, we can define plane compatibility for 2.5D bundling: • Two edges that belong to the same plane are plane compatible. Two intra-plane edges are not plane-compatible unless they belong to the same plane; • All inter-plane edges are plane-compatible; • A pair of an intra-plane edge and an inter-plane edge is not plane-compatible. We adapt TopoEB to handle three-dimensional vertex positions. In particular, the computation of geometry compatibility G(e, e′ ) is generalized to 3D vertex positions.

119

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY With plane compatibility, the number of compatible edges C(e) of an edge e is reduced and results in faster bundling iterations, compared to the version without using plane compatibility. Plane compatibility also gives better flexibility and better bundling results. One can define a plane compatibility metric P (e, e′ ) to guide bundling intraand inter-plane edges together. As an example of integration for 2.5D-EB, we now integrate plane compatibility with geometry compatibility. 2.5D-EB is the special case of the general model TGI-EB in Equation (4.5), and can be described as follows: Fspring (ei ) =

X

G(e, e′ ) · P (e, e′ ) · g(|pei − pe′i |),

(4.9)

e′ ∈E

where P (e, e′ ) is defined from the clustered graph model for two edges e and e′ . For a more general example of integration for 2.5D-EB, we now integrate plane compatibility, with the TopEB model. That is, 2.5D-EB can be described from Equation (4.5) as follows: Fspring (ei ) =

X

G(e, e′ ) · I(e, e′ ) · T (e, e′ ) · P (e, e′ ) · g(|pei − pe′i |),

(4.10)

e′ ∈E

where: • I(e, e′ ) is defined based on the centrality values of e and e′ , • and T (e, e′ ) is defined from the clustered graph model, and • P (e, e′ ) is defined from the multi-plane graph model.

4.4.7

Time complexity and implementation

Like force-directed edge bundling (FDEB), our TGI-EB traverses every pair of edges to determine compatible edges, thus it takes O(|E|2 ) time for each iteration of applying forces. Our force-directed radial layout with clustering constraints takes O(|V |2 ) time. Yet our experimental results show that our methods are quite fast for graphs with up to a few hundred nodes and two thousand edges. It took a few seconds to produce a nicely bundled layouts.

120

4.5 Experimental results We have implemented our new edge bundling methods using our own implementation in Java for k-core radial layout, a prototype implementation of FDEB from the jFlowMap project [1], and various clustered graph layouts [169] implemented in GEOMI [13]. Specifically, we developed 2.5D edge bundling methods in GEOMI for 2.5D visualizations.

4.5

Experimental results

This section describes our experimental results of the TGI-EB framework with social networks, biological networks, geographic networks, clustered graphs. We also include our bundling results with 2.5D visualizations.

4.5.1

Social networks

Analysis of social networks commonly aims to identify the important actors, groups and connections within the network. We describe a case study of our approach on collaboration networks, where the task is to identify important researchers, research groups and collaboration patterns.

4.5.1.1

Data set

As an example of a social network, we use the 2010 Graph Drawing competition data set1 (see Graph Drawing 2010 contest report [96]). The data represents research collaborations between researchers for Graph Drawing research papers from 2004-2010 obtained, derived from GDEA database. The data set is a graph with 362 nodes and 942 edges. In the contest, researchers are represented by labelled nodes with fixed position and size, and edges connect researchers who have co-authored in the same paper. The data set does not take into account the number of times an author appears, the number of actual collaborations between each pair of researchers, as well the significance of the paper. Figure 4.6 depicts a simple visualization of the collaboration data set. The layout is from the input data. This visualization draws edges as straight lines. 1

Data available at http://www.graphdrawing.de/contest2010/

121

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY

Figure 4.6: Simple visualization of the collaboration network

We give some examples of visual analysis using our framework; these analysis results may be used for references, rather than for judging researchers, their collaborations or their papers.

4.5.1.2

Visual analysis

This section describes our analysis1 of the collaboration network. For importance analysis, k-cores is an important notion to determine cohesive groups (see Section 4.2.2.2). Table 4.1 shows the total number of nodes and the total number of edges for each value of k-core. For each k, we associate a color for coreness k, as shown in the table. Figure 4.7 shows a visualization of the collaboration network using colours from kcores. The figure uses edge transparency to show edge betweenness centrality (see 1

High resolution figures are available at http://it.usyd.edu.au/∼qnguyen/edgebundling/

122

4.5 Experimental results

Figure 4.7: The collaboration network with k-cores

123

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY k |V | |E| Color

2 319 901 u

3 235 771 u

4 151 582 u

5 97 429 u

6 56 281 u

7 42 218 u

8-11 26 157 u

12-13 14 91 u

Table 4.1: Collaboration network: k-cores and the color scheme

Section 4.2.2.1 for edge centrality). The vertices are colored from their k-core values, with the the color scheme shown in Table 4.1. With this visualization, one can easily identify the major research groups. The largest collaborative group is a 13-core (red) of Spanish and German researchers; this comes from a paper by Wolff et al. [27]. The second largest group is an 11-core (blue) of an Australian clique, based on a paper by Ahmed et al. [13]. Figure 4.8 and Figure 4.9 show visualizations using our CenEB and OrthEB, respectively. In these figures, edges, which have similar edge betweenness centrality, are bundled together. The k-core values are used for coloring vertices, with the color scheme given in Table 4.1. In fact, A0 poster of Figure 4.8 and Figure 4.91 are the second runner for the Graph Drawing 2010 contest prize. Compared to the original visualization in Figure 4.6, our bundling results are cleaner and become easier for one to identify important edges. The figures enable the following visual analyses.

1

available at http://it.usyd.edu.au/∼qnguyen/edgebundling/highresolution/

124

4.5 Experimental results First, one can easily identify the major research groups and the major research collaborations between the groups. The largest collaborative group is a 13-core (red) of Spanish and German researchers; this comes from a paper by Wolff et al. [27]. The second largest group is an 11-core (blue) of an Australian clique, based on a paper by Ahmed et al. [13]. Second, the drawing clearly highlights researchers with high betweenness centrality: Brandes, Brandenburg, Kaufmann, Kobourov, Kratochvil, Liotta, Mutzel and Wolff. Third, one can identify several important edges with high centrality values. These collaborations include, for example, collaborations between Kaufmann and Kobourov in [124], between Kauffman, Wolff and Symvonis in [42]; between Kratochvil and Wolff in [148]; between Brandes and Dwyer in [51]; between Brandes and Symvonis in [45]; between Kobourov and Sander in [97, 98]. Finally, one can find a clique of four people with high betweenness centrality values: Brandenburg, Kobourov, Liotta and Mutzel, from their joint work [49]. Visualizations with radial layout: Further, we examine the data set, but without considering about the given node positions. We show our visualizations that applies edge bundling for a radial layout (see Section 4.4.4). Figure 4.10 shows a drawing of the collaboration network produced by the integration of RadEB and OrthEB. The figure shows a clear structure of the groups within different k-core circles. The inner most circle contains the 13-core group of researchers. The next circle contains the 11-core group of researchers. In other circles, each research group consists of several researcher nodes, which are placed close together and bundled edges link them tightly together. As another example of radial layout, Figure 4.11 depicts a visualization using RadEB, OrthEB and CenEB. The figure highlights the important collaboration paths between researchers. Strong collaboration paths become more visible and the orthogonal-like bundled edges also look appealing. Remarkably, the results in Figure 4.10 and Figure 4.11 look appealing and show interesting patterns of the collaboration network. However, we should emphasize again, that the depicted patterns and the findings here should be put in the scope of the limitations of the given data set. Remarkably, we do not aim to judge the importance of researchers, research groups or their research significance.

125

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY

Figure 4.8: Collaboration network using CenEB

126

4.5 Experimental results

Figure 4.9: Collaboration network using OrthEB

127

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY

Giuseppe Liotta Antonios Michael Symvonis Kaufmann

Urik Brandes Franz Brandenburg

Alexander Wolff

Jan Kratochvil

Stephen Kobourov Tim Dwyer Petra Mutzel

Figure 4.10: Collaboration network with RadEB and CenEB

128

4.5 Experimental results

Figure 4.11: Collaboration network using RadEB, OrthEB and CenEB

129

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY

4.5.2

Biological networks

In the life sciences, centrality analysis helps scientists to understand the underlying biological processes and has been successfully applied to different biological networks [336, 253]. A considerable amount of research in biological networks have shown correlations between specific centrality measures and functionally important properties [190, 333].

4.5.2.1

Data set

We use the same case study as in Section 4.1.1, but in detail. This case study aims to identify new important regulatory elements and structures in a protein-protein interaction (PPI) network. We have examined the data set collaboratively with the bioinformatics scientists from Bielefeld. The PPI network is based on NF-κB signal transduction system. NF-κB transcription factor is one of the most investigated transcription factors in animals and humans [195]. The role of NF-κB in the nervous system has gained interest because of its involvement in synaptic processes, neurotransmission, and neuroprotection. We consider the NF-κB PPI network consisting of 778 nodes and 1868 edges. Figure 4.1 in the motivating example (see Section 4.1.1) shows a visualization of the PPI network using force-directed layout.

4.5.2.2

Visual analysis

Table 4.2 shows the total number of nodes and edges for each coreness k of the graph data. The network has 14 levels of coreness. k N E

1 2 3 4 5 6 7 8 9 10 11 12 13 14 778 293 179 138 114 89 80 70 63 58 55 50 45 39 1868 1384 1157 1034 939 816 762 693 638 594 564 510 450 374 Table 4.2: Statistical results of k-cores of the NF-κB network

In the first example, we show the usefulness of our approach using the motivating example in Section 4.1.1. The visualization of the PPI network without using edge

130

4.5 Experimental results

Figure 4.12: NF-κB network in radial layout and without bundling

bundling is depicted in Figure 4.1. Analyzing the network using this visualization is difficult due to many crossings and overlaps between edges and nodes. In contrast, Figure 4.12 shows another visualization that uses radial layout and edges are colored from edge centrality values. One may find it more readable than the forcedirected visualization in Figure 4.1. Furthermore, Figure 4.13 depicts a visualization of the PPI network using our RadEB, with important elements and paths are highlighted (Figure 4.13 is a smaller version of Figure 4.2). The structure of the network is clearly depicted and important nodes can be identified easily from the concentric circles.

131

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY

Figure 4.13: NF-κB network using RadEB: (1) important elements are marked, (2) important edges are wider and less transparent, (3) six important pathways are circled

Figure 4.13 shows a visualization produced by RadEB. This figure clearly shows six significant paths indicating different cell functionality. The visualization also depicts several protein groups with specific functionality. In fact, our new visualization has inspired bioinformatics scientists1 to generate a new hypothesis, based on the newly identified six important paths and important proteins around the paths. Some lab experiments are being conducted to verify the hypothesis. The importance of elements 1

The case studies described in this section are conducted in collaborations with biologists – S. J. Janowski, J. Stoye and C. Kaltschmidt from Faculty of Technology, University of Bielefeld, Germany.

132

4.5 Experimental results is shown from edge centralities (for example, edge width and edge transparency). One can identify important proteins that directly influence the translocation of the NFκB transcription factor. Further, proteins that act in similar biological processes are grouped together to form network structures and motifs.

Figure 4.14: PPI network using RadEB (zoomed) For a second example, Figure 4.14 shows the zoom-in network at the center. Here, the radial force is reduced slightly and similarity-based force is increased. With this setting, vertices of closed clusters (and so more function-similar) are placed closer together, while the important paths are well highlighted. An important part of the NF-κB system is located at the center of Figure 4.14.

133

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY As a third example, Figure 4.15 shows the subgraph of 50 edges of highest centrality values, which are located at the center of Figure 4.14. The visualization in Figure 4.15 applies our RadEB, in which all the values of centralities and k-cores are computed from the subgraph.

Figure 4.15: A subgraph of 50 highest centrality edges from the PPI network The figure depicts some of the most important biological elements of the known system, such as TNFR-α, NF-κB IKK, IκB-α, IκB-β, IκB-ǫ. In addition, the picture also enables us to identify potential regulatory structures and potential proteins, that might play a crucial role within the system. The analyses have been conducted by the biologists to find these potential regulatory structures and potential proteins.

4.5.3

Geographic networks

For geographic networks, critical analysis include tasks such as flight scheduling and facility allocation. We extend our case study on airlines networks to identify important

134

4.5 Experimental results airports and flights.

4.5.3.1

Data set

We evaluated our method on the US airlines network [81, 171]. The network contains 235 nodes and 2101 directed edges. Figure 4.16a shows the visualization of the airline network. As can be seen, there are quite a lot of edge crossings from the figure. It is hard to see and compare the importance between the airlines and the flights.

4.5.3.2

Visual analysis

The k-cores values in the US airlines network range from 1 to 13. We use the color scheme shown in Table 4.3 to color vertices in our visualizations of the airline network.

k |V | |E| Color

1-2 235, 201 1297, 1263

3 162 1190

4 147 1146

5 134 1097

6 118 1021

7-9 103, 92, 76 931, 854, 726

10-13 62, 47, 30, 23 602, 457, 275, 149

u

u

u

u

u

u

u

Table 4.3: US airlines network: k-cores and the color scheme

Figure 4.16b depicts a visualization of the US airline network using force-directed edge bundling [171]. This example shows that bundling helps to reduce visual clutter of the airline network visualization. The bundled edges show clearer structures of the network and reveal hidden structures inside the network. Yet the visualization does not show the “importance” of routes within the network. One may not be able to identify the importance1 within the network. On the other hand, Figure 4.17 shows a visualization result of the same airlines network using our centrality-based edge bundling (CenEB) described in Section 4.4 below. The bundled edges highlight important airports and flights. 1 Here the importance is based on the social network analysis. This notion of importance shows the significance of the airlines and flights based on their structural positions within the network, rather than their actual real-world frequency of flights, or so on

135

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY

(a) Without bundling

(b) FDEB Figure 4.16: Visualizations of US airline network using FDEB [171]

Figure 4.17 and Figure 4.18 show the airlines network using CenEB and OrthEB, respectively. Airport nodes are colored with k-core values. One can identify several important flights: between SEA and DTW, between BIL and MSP, between LAX and SEA, and between BIL and ATL. The airlines network using RadEB is shown in Figure 4.19. The figure shows the most highly connected group consists of 23 airports of the 13-core around the inner most circle, including important airports, e.g., SEA, DTW, MSP, ATL, MEM and IAH. One can also identify all the important flights connecting the 13-core airports. Interestingly,

136

4.5 Experimental results

Figure 4.17: Visualization of US airline network using CenEB

137

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY

Figure 4.18: Visualization of US Airline network using OrthEB

138

4.5 Experimental results

Figure 4.19: Visualization of US Airline network using RadEB

an outlier was identified as depicted in the figure: BIL airport in a low core has several “important” flights (those connected to MEM, ATL, MSP and SEA airports). This is possibly because BIL is geographically located in the middle between MEM, ATL, MSP airports (the east) and SEA airport (the west), as shown in Figure 4.17 and Figure 4.18.

4.5.4

Clustered graphs

For clustered graphs, important analysis includes identifying the connections inside a group and connections between groups; finding group “leaders” and important actors

139

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY connecting between groups. We have experimented with randomly generated clustered graphs with different intercluster edge densities: sparse and dense. We use clustered graph layouts of Ho and Hong [169] implemented in GEOMI [13].

4.5.4.1

Data set

This case study uses clustered graphs that are randomly generated. For the clustered graphs, we apply circular-circular clustered graph layout. In this circular-circular layout, the centres of the clusters are placed on a circle; and all the nodes of each cluster are drawn using a circular layout.

4.5.4.2

Visual analysis

For all clustered graphs in this section, we use the same color for all nodes of the same cluster. Figure 4.20 shows the dense clustered graph that has 20 clusters before and after using TopoEB. Figure 4.21 shows the other dense instance that has 8 clusters before and after TopoEB. We found that clustered graphs with sparse inter-cluster edges have less edge bundling effects, compared to the dense inter-cluster edge instances. Thus, we present two examples with dense inter-cluster edges. Two instances were selected from randomly generated clustered graphs: one has 20 clusters consisting of 191 nodes and 2165 edges; and the other has 8 clusters consisting of 272 nodes and 2407 edges. Figure 4.20b and Figure 4.21b show our TopoEB results on the two clustered graphs using circular-circular layout. The figures clearly show important inter-cluster and important intra-cluster edges, and the clusters from intra-cluster edge bundles. Intercluster edge bundling has been shown to be effective for dense clustered graphs. Comparison of TopoEB-A and TopoEB-B We also experiment with TopoEB-A and TopoEB-B of our TopoEB. We compare different visualizations of the same graph, without edge bundling, with TopoEB-A and with TopoEB-B.

140

4.5 Experimental results

(a) Without bundling

(b) Using TopoEB Figure 4.20: Dense clustered graph with 20 clusters in Circular-Circular layout

141

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY

(a) Without bundling

(b) Using TopoEB Figure 4.21: An 8-cluster clustered graph in Circular-Circular layout

142

4.5 Experimental results

(a) Unbundled

(b) FDEB Figure 4.22: A Circular-Circular visualization of 9-cluster clustered graphs

143

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY

(a) TopoEB-A

(b) TopoEB-B Figure 4.23: A 9-cluster clustered graphs in Circular-Circular layouts using TopoEB

144

4.5 Experimental results Figure 4.22a shows a 9-cluster clustered graph without edge bundling. The figure has an issue with edge cluttering and one may find it difficult to compare between the clusters as well as to identify important edges. Figure 4.22b shows another visualization of the same clustered graph using FDEB with less visual clutter. On the other hand, Figure 4.23 shows different results using TopoEB-A (see Figure 4.23a) and TopoEB-B (see Figure 4.23b). In these figures, edge centrality values are used to determine edge transparency; important edges are more visible than the others. By using TopoEB, important edges are grouped together allowing to identify the nodes and edges that are structurally and topologically important. Further, the results show that TopoEB-B is better than TopoEB-A, because it shows clearer structures of edge connections between clusters.

4.5.5

2.5D visualizations

For 2.5D visualizations, important analysis includes identifying important connections inside a plane and important connections between planes. It is relevant to locate important actors among those residing on a plane, and important actors that connect different planes. These analyses are critical to show temporal-spatial relationships in dynamic graphs [99] or graph structures [102, 51, 169].

4.5.5.1

Data set

As an example of 2.5D-EB, we used 2.5D visualizations of clustered graphs which are randomly generated by [169]. As another example of 2.5D-EB, we developed a more general model of clustered graphs with 3 nested levels and used the multi-plane drawings of the newly generated clustered graphs to demonstrate the usefulness of 2.5D-EB.

4.5.5.2

Visual analysis

For a first example, Figure 4.24a shows a clustered graph visualization in 2.5D [169] in Geomi. The visual clutters present in the figure caused by the dense intra-cluster

145

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY

(a) Unbundled

(b) FDEB extended for 3D Figure 4.24: 2.5D drawings of a clustered graph

146

4.5 Experimental results

(a) 2.5D-EB with TopoEB-A

(b) 2.5D-EB with TopoEB-B Figure 4.25: 2.5D drawings of a clustered graph using 2.5D-EB

147

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY

Figure 4.26: A clustered graph in multi-plane drawings

Figure 4.27: A clustered graph in multi-plane drawings

148

4.5 Experimental results edges and inter-cluster edges. Figure 4.25 shows 2.5D visualizations of a clustered graph and the bundled results using our 2.5D-EB. Compared to the original straight-line drawing without edge bundling in Figure 4.24a, both 2.5D-EB models produce visualizations with less visual clutter and clearer high-level edge structure. In fact, TopoEB-B seems to show a better result than TopoEB-A, in that edges connecting a pair of clusters tend to be grouped together. For a second example, we use a more general model of clustered graph with three nested levels. Figure 4.26 shows an example of this graph. The visualization in Figure 4.26 does not apply edge bundling. There are a lot of edge crossings and one may find it hard to do any analysis using this visualization. Figure 4.27 shows the result of the same graph after applying our TopoEB. The result looks cleaner and one can distinguish inter-plane and intra-plan edges.

149

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY

4.6

Concluding remarks

In this chapter, we have described a new approach that tightly integrates network analysis methods and edge bundling techniques to enable visual analysis of large dense networks. We have proposed a TGI-EB framework for edge bundling that integrates: • structural properties, such as k-core and centralities of the graphs, • topological properties, such as clustered graphs, • multiple-plane arrangements, such as in 2.5D visualizations, to show high-level patterns of the graphs. We have also presented five variations of force-directed edge bundling based on TGI-EB framework: • CenEB (Centrality-based edge bundling), • TopoEB (Topology-based edge bundling), • RadEB (Radial edge bundling), • OrthEB (Orthogonal edge bundling), • 2.5D-EB (2.5 dimensional edge bundling). We have evaluated our new approach with biological networks, social networks and geographic networks. In particular, the new approach has proved very useful in the analysis on the integrated NF-κB protein-protein interaction and signaling transduction networks. This has led to a potential use of our framework for analysis of biological systems and for generating new hypotheses. Future work Our future work is to improve the running time to address the scalability problem for huge network instances; for example, adapting the agglomerative edge bundling algorithm of Gansner et al. [134]. We also plan to design new criteria or metric to evaluate the performance of edge bundling methods. We plan to generalize the magnetic field in our orthogonal edge bundling method to handle any arbitrary angles rather than just 90 degree, similar to gradient computation in Strzodka et al. [291].

150

4.6 Concluding remarks Though our prototype implementation supports basic zooms. Our future work is to consider ways to interact with edge bundles more effectively, such as, using semantic zoom [328]. Final remark This chapter has described a model for edge bundling of static graphs. In the next chapter, we show our other model that applies edge bundling of dynamic graphs.

151

4. TGI-EB: EDGE BUNDLING INTEGRATING TOPOLOGY, IMPORTANCE AND GEOMETRY

152

Chapter

5

StreamEB: Stream Edge Bundling “Imagination gives you the picture. Vision gives you the impulse to make the picture your own.” — Robert Collier

5.1

Introduction

This chapter addresses the concerns of readability in stream graph visualizations. The main motivation is to reduce visual clutter in stream graph visualizations and to enable us to perform visual analysis of the massive graph data.

5.1.1

Graph streams

Data streaming has become ubiquitous since the late 1990s [166, 16, 116]. Graph streaming is recently increasingly popular in many areas and in research [34, 122, 115, 153, 270, 12]. Examples of graph streams are from numerous applications in a wide variety of application domains, such as social networks like Facebook and Twitter, flight scheduling systems and financial markets systems. Many applications nowadays are operating on large graph infrastructures. The streaming model permits processing of huge graph structures produced from modern applications such as in Telecommunication traffic [4], World-Wide Web [61] and Internet Data [114]. Other applications include continuous monitoring applications, telephone

153

5. STREAMEB: STREAM EDGE BUNDLING call graphs and webgraphs from continuous web crawling [77, 115, 132, 82, 11]. The update speed of these data streams can range in a few seconds, a few milliseconds to even a few microseconds. The graphs in these applications are not only too large to fit on the screen but they are in general too large to fit in the main memory. In graph streams, individual edges of the underlying graphs arrive sequentially in a stream. This sequential arrival of edges permits computations on streams and the processing often consumes only a small amount of memory. Furthermore, many real-world applications, such as real-time monitoring systems, impose a single-pass constraint over the data while others may allow a small number of passes. In graph streams, individual edges of the underlying graphs arrive sequentially in a stream and thus any computation on the stream consumes a relatively small amount of memory. A vast number of graph stream algorithms exist; for example, for computing social network metrics over graphs [34, 115, 153], for clustering algorithms for graph streams, such as, for node clustering via multi-dimensional data [8, 252, 12], and for stream reasoning [37]. Existing stream algorithms share common principles: • processing data streams on the fly; • exploiting the temporal order of the data stream to optimize the computation; • using one pass (or a small number of passes) on the data; and • requiring a workspace much smaller than the size of the data. However, a comparatively tiny number of research works have focused on visualization of the rapid changes such as in the context of graph streams [3, 295, 85, 72, 45]. Graph streams and dynamic graphs: There are a number of different types of dynamic graphs, which are distinguished from the characteristics. These types of dynamic graphs are depicted in Figure 5.1 and are described as follows: • Relational time series are dynamic graphs of time-stamped edges. • Graph streams, on the other hand, are updated very frequently, and edges arrive in a flow with or without a timestamp.

154

5.1 Introduction

Figure 5.1: The classes of dynamic graphs

• A stricter class of dynamic graphs is the Streaming Relational Time Series containing graph streams of time-stamped tuples. Stream graph visualization: Stream graph visualization differs from traditional dynamic graph visualization in several aspects: • the size of graph stream can be very huge, for example, a long time dimension; • the data changes very quickly (in seconds or faster) and it arrives in incrementally as a stream; • the whole graph in the streaming context is not always available; and • the response time may become a critical factor to design visualization methods to show the evolution of graph streams. In graph streams, large amounts of data or information are continuously collected and patterns are also changing over time. It often requires an on-the-fly analysis of current overviews of the graph streams.

5.1.2

Motivating example

Visual clutter for large graph visualizations is a significant problem for graph stream visualization. Edge bundling is a popular technique to reduce visual clutter and show high-level edge patterns [135, 170, 81, 171, 214, 171, 203, 246, 110, 280] (see Section 2.3 and 4.2).

155

5. STREAMEB: STREAM EDGE BUNDLING

(a) Unbundled

(b) FDEB Figure 5.2: Visualizations of the stock trade data from TSX Venture-Canada equity exchange: without bundling and after using FDEB

156

5.1 Introduction To illustrate an issue with visual clutter, we compare the visualizations using straightline drawing and Holten and van Wijk’s force-directed edge bundling (FDEB) [171]. We use FDEB as an example of edge bundling because: (1) FDEB is the first edge bundling technique that can be applied for general graphs, (2) it has an intuitive forcedirected model for bundling edges, (3) we then show how we can extend the method with our new types of compatibility. Figure 5.2a shows an example visualization of stock trading data from TSX VentureCanada equities exchange from Reuters Thompson (using sequence-based window of size W =160). Nodes represent traders and links represent trades between traders. In this visualization, node placement is computed with multi-dimensional scaling from trader performance. Traders are nodes that are colored based on the total trades varying from blue (small amount) to red (large amount). Edges represent trades and are encoded a gradient color between the end nodes. Visual clutter in this visualization, which hinders most analytic tasks. Figure 5.2b applies Holten and van Wijk’s force-directed edge bundling [171], showing high-level edge patterns hidden in unbundled graph. However, most edge bundling methods have focused on static graphs. Further, compatibility between edges in existing methods is defined either geometrically or semantically. In this chapter, we show how to use other criteria such as temporal information from the graph streams and integrate it with topological and semantic properties to visualize graph streams.

5.1.3

Aims and contributions

This thesis chapter initiates the study of stream edge bundling, to support visual temporal analysis for the changes in topology of the graphs over time. More specifically, we show how edge bundling can reduce visual clutter in the visualizations of graph streams. This can assist key mining tasks such as: • Community discovery: how edge bundles may help identifying groups or communities of nodes/edges that are associated with each other? • Change detection: how edge bundles may help detecting the newly forming or decaying communities in the underlying network?

157

5. STREAMEB: STREAM EDGE BUNDLING Our research is based on a premise that the edge connection patterns may change rapidly, but “edge bundles” should change much slower. The term edge bundle in both static and dynamic context may refer to a group of edges that are drawn closely together. Edge bundled layouts show high-level structure of the network with a collection of edge bundles. Edge bundling methods may help unveil the extent to which communities have shrunk, split or emerged over time. We present a new framework, namely StreamEB1 , which is the first work that addresses edge bundling for graph streams. Our main research addresses flexibility (e.g., adaptive to user inputs at runtime) of methods to extract high-level patterns in graph streams. Our StreamEB framework integrates several compatibility measures such as temporal, neighborhood, data-driven and spatial compatibility for stream bundling. Amongst these metrics, temporal compatibility and neighborhood compatibility are introduced for the first time, whereas the other compatibility metrics are carefully adapted for fast graph streams.

Based on the framework, we present two bundling methods,

called FStreamEB (Force-directed Stream Edge Bundling) and TStreamEB (Tree-based Stream Edge Bundling). We evaluate our framework using US flights data and Reuters stocks data to show effectiveness to support various stream mining tasks. The main aim is the comprehension of the massive relational time series, dynamically generated from financial activities and flight monitoring activities. More specifically, we show how our framework can be used in the following scenarios: • identifying important actors and groups of closely related actors, and • locating time-varying (abnormal) patterns. As an example, we show several visualizations using our new methods on the motivating example in Section 5.1.2. Figure 5.3a applies FStreamEB to the stock trade network. Figure 5.3b shows a layout in which every edge is routed along a quad tree, which is a hierarchical structure of nodes computed from node position (for example, a quad tree has been used for fast graph drawing in FADE [267]). Figures 5.4a and 5.4b show the visualizations of TStreamEB to the stock trade network before and after edge smoothing. 1

See pictures and movies at http://www.it.usyd.edu.au/~qnguyen/streameb/

158

5.1 Introduction

(a) S-FStreamEB

(b) Routing along a quad tree Figure 5.3: Visualizations of the stock trade data from TSX Venture-Canada equity exchange: edge bundled using S-FStreamEB and edge routed along a quad tree.

159

5. STREAMEB: STREAM EDGE BUNDLING

(a) TStreamEB without smoothing

(b) TStreamEB after smoothing Figure 5.4: Visualizations of the stock trade data from TSX Venture-Canada equity exchange: using TStreamEB with and without curve smoothing

160

5.2 Related work In Figures 5.3a, 5.4a and 5.4b, the visualizations use a combination of spatial, temporal, data-driven and neighborhood compatibility (the ratios are 0.7, 0.1, 0.1, and 0.1, respectively). Our methods allow varying ratios of compatibility measures at runtime; such flexibility is absent from existing edge bundling methods. The rest of this chapter is organized as follows. Section 5.2 gives the related work on stream mining / reasoning, dynamic graph visualization and edge bundling. We present the details of our StreamEB framework in Section 5.3 and our bundling algorithms in Section 5.4. We then discuss implementation in Section 5.5. The case studies of our stream bundling methods to a collection of real-world datasets are given in Section 5.6. Section 5.7 gives some discussions and future work, and Section 5.8 concludes.

5.2 5.2.1

Related work Stream algorithms

The history of streaming algorithms goes back to the late 1970s, for instance [281]. In 1998 the first publication formalizing the streaming algorithms [166] appeared. Data stream algorithms then became popular since the famous results of [16] on approximating frequency moments. Much of the existing work has focused on computing statistics of a stream of data elements, e.g., frequency moments [16, 185], lp distances [116, 184], histograms [144, 153], and quantiles [149]. Streaming has become an active area of research and an important paradigm for processing massive data sets; see [166, 16, 116]. There have been a considerable amount of research on mining graph streams [34, 115, 153], such as counting triangles, distance estimation, properties of degree sequences, connectivity and graph matching [34, 77, 115, 86, 115, 193, 228, 82]. There are some trade-offs between the number of passes and space for stream algorithms, such as for shortest path problems in graph streams [86]. In general, it is difficult to approximate many properties on graphs, while maintaining sub-linear space in the number of vertices in the graph and a constant passes over the stream. More details of graph mining algorithms may be found in [12]. Specifically, we give an overview of clustering and reasoning techniques.

161

5. STREAMEB: STREAM EDGE BUNDLING Clustering:

The problem of clustering has been studied extensively in the data min-

ing literature, in the context of multi-dimensional data [252, 8], and recently in the context of graph data [122, 270, 12]. The problem of clustering graphs has traditionally been studied in node clustering of individual graphs. Node clustering aims to determine groups of nodes based on the density of linkage behavior, for example, in the context of graph-partitioning [202], minimum-cut determination [197] and dense subgraph determination [143, 337]. Recently, graph-level clustering of graph streams is concerned [7].

Reasoning:

Stream reasoning - reasoning upon rapidly changing information, has

become an active area of research [37, 276, 92, 21, 119, 255]. This has been applied to sensor networks, healthcare, financial fraud detection and social media analysis. Stream reasoning uses a rich static background knowledge (for example, from geographic maps) to reason about the resulting time-varying knowledge. Reasoning process often has strict time constraints, for example, few seconds to few milliseconds per query. When processing a “continuous” flow of information, recent information of the stream is often more relevant. Recent information describes better the current state of a dynamic system than outdated information. Commonly used techniques often try to exploit this fact to gain better accuracy and efficiency in answering a user query.

Common techniques:

Typically, a stream reasoner / miner selects the relevant

data in the input stream by exploiting the window-processing [23] and load-shedding techniques [29, 299]. More details of common techniques are given in Section 2.5.3.1 and Section 2.5.3.2. • In window-processing [23], a window extracts from the stream the last data stream elements, which are considered by the query. Such extraction can be sequencebased (a given number of tuples) or time-based (all the tuples which occur during a given time interval, the number of which varies over time). • Load-shedding techniques are less common than window processing. Load-shedding probabilistically drops stream data elements based on a set of sampling policies.

162

5.2 Related work

5.2.2

Dynamic graph visualization

Previous research in Graph Visualization has focused on static graphs or a set of static graphs, which have relatively modest size. A comparatively small number of works have focused on visualization of graph streams [3, 295, 85, 72, 45]. Visualization of graph streams is more difficult due to the dynamics of the streams. When analyzing dynamic graphs, it is critical to be able to see the statistical trends and changes over time, while preserving user’s mental map [232]. The most common techniques for representing temporal data are via animation and the “small multiples” display. Dynamic graph visualization has several challenges:

Quality and stability:

The first challenge is the control between readability and

stability when visualizing graph changes. A common technique for drawing graphs is stress-majorization [136]. The most common approach is to set the initial layout for each graph with the preceding layout [178, 233]; yet layout readability may degrade over the sequence of graphs. Other approaches address stability by either placing vertices of fixed vertex locations from the layout of an aggregate of all graphs [233]; or anchoring vertices to reference positions [56]; or linking vertices to instances of themselves that are close in the sequence [111]. A comprehensive study of existing models and the trade-offs between stability and readability is given in Brandes et al. [53].

Visual clutter:

The second challenge is that visualizations of large graphs often

suffer from visual clutter. Edge bundling [170, 338, 81, 214, 203, 246, 280] is one of the most popular approaches to reduce visual clutter and to detect clusters in static graph visualizations.

5.2.3

Edge bundling

Edge bundling has been quite successful in reducing clutter and has been extensively studied including hierarchical edge bundling [170], geometry-based edge clustering [170, 338, 81, 214], force-directed edge bundling [171, 203, 246, 280] and multi-level agglomerative edge bundling [134]. However, most existing bundling techniques are not concerned with dynamic graphs directly. There is an isolated study in [48] applying force-directed edge bundling [171]

163

5. STREAMEB: STREAM EDGE BUNDLING to yearly migration graphs and then displaying the bundled results in small multiples for analysis of the US migration flows over year. Another work proposes the use of edge bundling in a so-called 1.5D visualization [284]; a limited number of time-data points are centered of the proposed metaphor and edges of the same time are routed via the time nodes. In this section, we describe two common edge bundling algorithms, which we then adapt for our stream bundling algorithms.

5.2.3.1

Hierarchical edge bundling

Holten proposes hierarchical edge bundling(HEB) [170] to visualize graphs that have input data comprising a graph and a hierarchy. HEB draws edges along the associated paths in a hierarchy. HEB applies tree algorithms for the input hierarchy and then bundles adjacency edges along the tree branches. Intuitively, HEB uses a “single step” interpolation on the geometry of individual edges, with respect to the “backbone” tree as a compatibility between pairs of edges. This single interpolation can be expressed as inter(ctrl(ei ), ei , β), where: • ctrl(ei ) is the set of control points of an edge ei ; • inter(S, ei , β) denotes the set of interpolation points of a point set S, with respect to ei ; • a bundling strength factor β ∈ [0, 1]. An example of interpolation inter proposed in HEB is given as follows: k+1 (p end − pestart )), p′k = β.pk + (1 − β)(pestart + i i P + 1 ei where: • P is the number of control points; • pk is the control point at index k ∈ 0, ..., P − 1 in point set S; • p′k is the interpolated point of pk , and so inter(S, ei , β) = {p′k }; • px is the location of x; is the start and end of ei . and eend • estart i i

164

(5.1)

5.2 Related work 5.2.3.2

Force-based edge bundling

Holten and van Wijk [171] introduce the Force-directed Edge Bundling (FDEB) method, which models edges intuitively as flexible springs that can attract each other. The FDEB algorithm first inserts control points in each edge, and then uses a force-directed method to compute the position of the control points. The attractive forces depend on (l)

the so-called “spatial compatibility” S(ei , ej ). For a subdivision point ei on edge ei , (l)

(l)

the total force F (ei ) exerted on ei is a sum of the two spring forces exerted by two (l−1)

neighbors ei

(l+1)

and ei

, and the total of electrostatic forces Fspring :

(l)

F (ei ) = Felec + Fspring

(5.2)

Felec = ke (|pe(l−1) − pe(l) | + |pe(l) − pe(l+1) |, i

i

i

(5.3)

i

where ke is the stiffness of the edges. The electrostatic force model of a control point (l)

ei of ei in FDEB is defined by: Fspring =

X

S(ei , ej ) · |pe(l) − pe(l) |−d , i

ej ∈C (ei )

(5.4)

j

where C (ei ) is the set of compatible edges of ei ; and S(ei , ej ) is the spatial compatibility for a pair of edges ei and ej . To speed up the iterative computations of forces, C (ei ) contains only compatible edges ej such that the spatial compatibility S(ei , ej ) is greater than a user-defined threshold t ∈ [0, 1].

FDEB’s successors: Several works have adapted FDEB to cope with different bundling criteria in various application domains [203, 246, 280]; for example, for analyzing semantic graphs [203], for analyzing important connections and cluster-graphs [246], for analysis of connectivity and edge directions for edge bundling [280].

Edge compatibility measures:

A number of compatibility measures have been

proposed [171, 203, 246, 280]: • spatial compatibility is defined based on geometry mapping, for example, to avoid bundling edges that are diverse in length, position, angle and visibility [171], • semantic compatibility avoids bundling multi-attributed edges of static graphs [203], • connectivity compatibility avoids bundling edges in different disconnected components [280],

165

5. STREAMEB: STREAM EDGE BUNDLING • importance compatibility avoids bundling edges of very different importance together [246], and • topology compatibility guides bundling between inter-cluster and intra-cluster edges for clustered graphs [246]. In this chapter, we introduce temporal compatibility measures and adapt existing compability measures for stream bundling. Furthermore, force-directed methods and treebased methods are introduced, targeting directly graph streams.

5.3

StreamEB Framework

5.3.1

Problem definition and notation

A graph stream has an underlying graph G=(V ,E), where V is the vertex set and E is the edge set of G. The graph stream is a sequence of elements e0 , e1 , . . . , ei , . . . . Each element ei has a general form of (xi , yi , ti , [di ], a1i , . . . , am i ), where: • (xi , yi ) is a directed edge encountered at the time-stamp ti in which xi and yi are the nodes in V . • di is the duration associated with the edge (xi , yi ) at time ti . The edge ei exists from time ti to time ti + di unless a new edge of the same nodes (xi , yi ) arrives at time tj such that tj < ti + di . In some applications di is not well-defined. j • each incoming streamed edge has multiple attributes a1i , . . . , am i , where ai denotes

the j-th attribute associated with the edge (xi , yi ) at time ti . An edge (xi , yi ) may appear multiple times at different timestamps, but no edge occurs multiple times at any one timestamp. There is no self-loop edge. At a single timestamp, multiple edges may occur. To process a long stream, there are several techniques that can be applied. They include, for example, sliding window and load shedding. Among these techniques (see Section 2.5.3 for details), the sliding window is commonly used to select more recent elements. Two types of sliding windows are: (S)

• A sequence-based window GW (n) consists of W most recent data elements from max(0,n-W +1)-th to the n-th data elements.

166

5.3 StreamEB Framework (T)

• A time-based window GW (ti ) at time ti consists of data elements arriving within the last W time units, with timestamps from t= max(0, ti -W +1) to ti . Our methods are insensitive to whether the sliding windows are sequence-based or timebased. We simply use Wi to denote the current window of some size W , and we use Gi = {V, Ei } to denote the graph at current window Wi . Typically, the set of interactions in a particular sliding window of graph streams are of modest size, although the (entire) number of distinct edges may be very large on the aggregate data. This property is known as sparsity and has seen in a variety of real applications, such as social networks and collaboration networks.

5.3.2

General model for stream bundling

Figure 5.5: Stream Bundling Pipeline

The general pipeline for stream bundling and our StreamEB framework are depicted in Figure 5.5. • Data Input includes the main input source (Stream Data) of a sequence of tuples (xi , yi , ti , di , a1i , . . . , am i ), and an optional input (Input Geometry) giving (reference) locations for the graph vertices. • StreamEB framework consists of several modules. The Mapping module may take optional Input Geometry and/or use a graph layout algorithm to compute node placement. Central is the Routing module, which takes Geometric Data and performs Edge Bundling for streams. The bundling process may provide feedback for the node placement to better cope with changes in graph streams and user inputs.

167

5. STREAMEB: STREAM EDGE BUNDLING • User Input includes user interactions to support analytic tasks. The interactions include, for example, selecting sliding window types, selecting layout algorithms, tuning parameters for bundling, zooming the bundled visualizations, and highlighting stream elements. • Output contains final bundled images generated by Rendering.

5.3.3

Criteria for stream analytics

There are several criteria for enabling stream analytics. Aesthetics aim for nice bundling results, which have smooth curves and clear bundles. The well formed clusters of edges help analysts to identify both global and local structural changes. Mental map preservation is concerned about the relative positions and directions of the resulting bundles to each other over time.

5.3.4

Compatibility metrics

This section defines several compatibility measures for a pair of stream elements ei and ej in the current active window.

5.3.4.1

Temporal compatibility

We introduce a new measure, namely “temporal compatibility”, T (ei , ej ), for stream elements. Temporal compatibility is defined from the temporal information of the edges and thus is independent of spatial compatibility. The following measures of temporal compatibility are defined based on the timestamps ti and durations di of the streamed edges. • Timestamp compatibility: takes the timestamps of edges into account. This timestamp compatibility TT (ei , ej ) aims to avoid bundling edges that have huge difference in timestamps. Often one prefers to analyze edges of similar ages. • Duration compatibility: is another new measure concerning the time interval a element may last. The duration compatibility TD (ei , ej ) avoids bundling a longlasting element with a short-lasting element. The intuition is that short-lasting

168

5.3 StreamEB Framework stream may be expired in a couple of windows; and thus bundling a long-lasting one with a short-lasting one may not preserve the mental map. • Endtime compatibility: The dual of a start time (given by the timestamp) is the end time of a stream element. Sometimes one is more interested in how close when stream elements end rather than when they start or last. For instance, one analytic task may need to cluster the flights of similar arrival times to arrange facilities and public transports. • Time-overlapping compatibility: In some cases, one may be more interested in the amount of time overlap between two edges. For instance, one can determine the crash-potential of flights from the amount of time overlap between the flights. As such, edge bundling prefers to bundle edges if their time-intervals [ti , ti + di ] and [tj , tj + dj ] have a large overlap and not to bundle otherwise. Let o(ei , ej ) denote the overlap between two edges. The timestamp compatibility TT (ei , ej ), the duration compatibility TD (ei , ej ), the endtime compatibility TE (ei , ej ) and time-overlapping compatibility TO (ei , ej ) can be defined as: TT (ei , ej ) = f (|ti − tj |)

(5.5)

TD (ei , ej ) = f (|di − dj |)

(5.6)

TE (ei , ej ) = f (|ti + di − tj − dj |)

(5.7)

TO (ei , ej ) = f (o(ei , ej )),

(5.8)

where f (x) is a continuous and decreasing function: [0,+∞) → [0,1]; for example, (1 + x)−1 or (1 + log(1 + x))−1 . Since the newly defined compatibility metrics are closely-related (e.g., temporal compatibility), the final temporal compatibility T (ei , ej ) is defined in a general form, using the Generalised Multiplicative model in Section 5.3.7, as: T (ei , ej ) = TTα1 (ei , ej ) · TDα2 (ei , ej ) · TEα3 (ei , ej ) · TOα4 (ei , ej ),

(5.9)

for some parameters α1 , α2 , α3 and α4 . The durations are quite important in some applications such as flight scheduling, while are less so in other applications such as stock trading (i.e., trades are never expired generally, and only a few cases trades can get cancelled). In the absence of durations

169

5. STREAMEB: STREAM EDGE BUNDLING di , temporal compatibility T (ei , ej ) can be simply expressed in terms of timestamp compatibility TT (ei , ej ) only.

5.3.4.2

Neighborhood compatibility

The basic neighborhood compatibility N (ei , ej ) to determine the causal relation between neighboring edges. For instance, one delayed flight may (directly) result in other delayed flight(s); or one stock trade may affect on other stock trades. The neighborhood promixity is defined by some path-related distance function d(ei , ej ) between ei and ej . Often, one is interested in bundling edges when such a distance is small. • Ego-centric compatibility: An ego-network ego(xi ) of a node xi contains xi , xi ’s neighbors (so-called “alters”), and the induced edges. We define the ego-network of an edge (xi , xj ) as the union of ego(xi ) and ego(xj ). Our ego-centric compatibility E(ei , ej ) avoids bundling edges ei and ej that belong to different local communities or have few common neighbors. The ego-centric compatibility can be defined as: NE (ei , ej ) = f (dE (ei , ej )), where dE (ei , ej ) is on the intersection of the ego-networks of each edge, and f (x) is a continuous and decreasing function: f (x) ∈ [0,1], for all x ≥ 0. • Trace compatibility: is a measure to avoid bundling edges that are very far to reach each other. The trace compatibility T (ei , ej ) is based on dT (ei , ej ) - the smallest value of graph theoretic distances between one end of ei to one end of ej . This metric is similar to connectivity compatibility proposed by Selassie et al. [280], which is proposed to avoid bundling edges in different disconnected components in static graphs. For tracing, one is often interested in cases with a small value of dT (ei , ej ) (≤ 2).

An example formula for trace compatibility is defined as: NT (ei , ej ) =

f (dT (ei , ej )), where f (x) is a continuous and decreasing function: [0,+∞) → [0,1]. Thus, the neighborhood compatibility can be defined from the ego-compatibility and the trace compatibility. N (ei , ej ) = NEα1 (ei , ej ) · NTα2 (ei , ej ),

170

(5.10)

5.3 StreamEB Framework for some α1 and α2 . A common usecase may involve either the ego-compatibility or the trace compatibility.

5.3.4.3

Spatial compatibility

Spatial compatibility S(ei , ej ) is to avoid bundling edges that are very diverse in length, distance, visibility and crossing angles [171]. Though intensively studied in recent work, spatial compatibility has still been mainly applied for edge bundling of static graphs [203, 246, 280]. Here the use of spatial compatibility is extended for stream bundling in our framework to denote the spatial compatibility of a pair of stream elements.

5.3.4.4

Data-driven compatibility

We extend the semantic compatibility [203] which was proposed for static graphs, to apply for graph streams. This data-driven metric determines the similarity of a pair of 0 m multi-attributed streamed edges. Let ha0i , . . . , am i i and haj , . . . , aj i denote the two data

vectors of the edges ei and ej , where m is the number of attributes. The data-driven compatibility D(ei , ej ) can be defined as the similarity/dissimilarity between the two vectors, such as using a cosine function.

5.3.5

Aggregate compatibility

The compatibility C(ei , ej ) between two edges is the aggregate value of the temporal compatibility T (ei , ej ), neighborhood compatibility N (ei , ej ), spatial compatibility S(ei , ej ) and data-driven compatibility D(ei , ej ). For instance, the C(ei , ej ) can be defined using the Linear model in Section 5.3.7 as C(ei , ej ) =ηS · S(ei , ej ) + ηT · T (ei , ej ) + ηD · D(ei , ej ) + ηN · N (ei , ej ),

(5.11)

for some parameters ηS , ηT , ηD , ηN in [0, 1] and ηS + ηT + ηD + ηN =1.

5.3.6

High-level stream bundling methods

At the highest level, stream bundling methods can be divided into offline, online and mixed ones. The three types of stream bundling are given as follows:

171

5. STREAMEB: STREAM EDGE BUNDLING • Offline methods: process the whole graphs in an offline fashion and are ideal for historical data provided that all graphs are known in advance. At every sliding window, all control points for each edge can be precomputed. Thus, during streaming, the only task is to render the splines or polylines for edges. Considerate efforts are to make smooth transitions of one bundled image to the next bundle image. • Online methods: process graphs as if there were no information available per priori. The approach computes control points for all edges at every sliding window. This approach works for real-time scenario in which all past information is transparent. It is the hardest case since both fast recomputation of bundles for each window and smooth transitions between frames are required. • Hybrid methods: may assume on the previous layouts to achieve mental map preservation. The global information is assumed to help the Mapping and Routing process at each timeframe. The bundling is still performed on the fly but the transitions between frames are less challenging than the totally online approach. This approach is potentially adaptable to both the historical data and the realtime scenarios.

5.3.7

Aggregating compatibility

Edge bundling methods perform with several compatibility metrics; for example, AngleScale-Position-Visibility [171], Topology-Geometry-Importance in [246], Spatial-DirectionConnectivity in [280], and Spatial-Semantic in [203]. These works defines the aggregate Q compability value in a so-called multiplicative model: C = 1..k Ci . However, this model may not be flexible enough, to be adjustable at runtime. Analytical tasks may often need to scale the concerns differently at runtime. For runtime support, we propose several ways to compute the aggregate value C(C1 , . . . , Ck ) for stream bundling: • Generalised multiplicative model defines C = • Linear model defines C =

P

1..k

αi Ci , and

• Hybrid model combines the above models

172

Q

1..k

Ciαi ,

5.4 Stream bundling algorithms The new models permit the parameters αi to be adjusted at runtime for analytics with different scales of the concerns Ci ’s. Typically, one can use (generalised) multiplicative model for a set of closely-related concerns, for example, the spatial compatibilities. The linear model is used for a set of independent concerns.

5.4

Stream bundling algorithms

There are generally two approaches to compute bundled layout of the current window Wi (see notations in Section 5.3.1). The first approach does local updates on the bundled result of the previous window with respect to the recent changes. The second approach considers Wi globally rather than just the recent changes. The former is faster but could lead to results in which edges have swinging effects. Thus, we focus on the latter approach. Our StreamEB framework bundles graph streams by integration of the compatibility measures introduced in Section 5.3.4. Specifically, we introduce two specific approaches for stream bundling that realizes the StreamEB framework: Force-directed stream bundling (FStreamEB) and Tree-based stream bundling (TStreamEB).

5.4.1

FStreamEB: Force-directed stream bundling

Our Force-Directed stream bundling (FStreamEB) inserts control points into stream edges and applies a spring algorithm for those control points with respect to the aggregate compatibility between pair of stream edges. Three variations of the forcedirected edge bundling method are proposed: (1) a simple extension of FDEB [171] (S-StreamEB), (2) an integration of dynamic mapping (for example, Force Layout) with stream bundling (F-FStreamEB), and (3) a version optimized for static geometry (G-FStreamEB). Figure 5.6 shows the force models our FStreamEB algorithms.

5.4.1.1

S-FStreamEB

A simple version of our FStreamEB model extends FDEB [171] by integrating new (l)

stream compatibility measures. For a subdivision point ei on edge ei , the electrostatic

173

5. STREAMEB: STREAM EDGE BUNDLING

(a) S-FStreamEB

(b) F-FStreamEB

Figure 5.6: Force models: (a) S-FStreamEB: forces applied on control points only; (b) F-FStreamEB: forces applied on nodes + control points

force model of our model is

X

Fspring =

C(ei , ej ) · g(|pe(l) − pe(l) |, i

ej ∈C (ei )

(5.12)

j

where C(ei , ej ) is the aggregate of spatial, temporal, data-driven and neighborhood compatibility measures; C (e) is the set of compatible edges of e; and g is a function of |pe(l) − |pe(l) |; for example, g=|pe(l) − |pe(l) |−d of a constant d. i

j

i

j

This method applies to the current graph Gi at every timeframe. Figure 1 outlines our S-FStreamEB algorithm, which is adapted from FDEB algorithm. Algorithm 1: S-FStreamEB input : E the edge set of current graph t ←− constant; foreach edge ei of E do C(ei ) ←− {ej | ej ∈ {E − ei } ∧ S(ei , ej ) ≥ t} foreach step s ∈ [0..I] do foreach edge ei of E do pntsi ←− points of ei ; foreach edge ej of C(ei ) do pntsj ←− points of ej ; force(pntsi , pntsj )

5.4.1.2

F-FStreamEB (FStreamEB with Dynamic layout)

A more general model of FStreamEB integrates a dynamic layout (Mapping) with force-directed bundling (Routing). In particular, we integrate force-directed algorithm for dynamic layout.

174

5.4 Stream bundling algorithms In the classical force-directed methods [101], the force on a node x is a combination of forces F = Fspring + Frepulsion + Fexternal , where Fspring is the spring force for each vertex exerted by its neighbors, Frepulsion is the repulsion force between nodes, and Fexternal is the force from other sources, such as a magnetic field or an anchoring force. The general force model of our F-FStreamEB includes a new force Fbundle , which is the force exerted by the related bundles. This bundle-aware force provides feedbacks of bundling results to the node placement, as depicted in Figure 5.5. Figure 5.6 depicts the force models of S-FStreamEB and F-FStreamEB. The force on a node is therefore: F = Fspring + Frepulsion + Fexternal + Fbundle

(5.13)

The bundle-aware force Fbundle of a node x can be defined as: X X ke |px − pe(P −1) |, ke |pe(0) − px |) + Fbundle (x) = ei ∈Nout (u)

i

ei ∈Nin (x)

(5.14)

i

where Nin (x) and Nout (x) are the in-neighbors and out-neighbors of x, and ke is the stiffness of the edges. In our model, Fexternal includes the gravity force, the magnetic force and the anchoring force. Specificially, the anchoring force Fanchor for a node x is defined as: Fanchor (x) = P x∈|V| ka |qx − px |, where qx is the preferred location of node x, and ka is the anchoring force strength.

5.4.1.3

G-FStreamEB (FStreamEB with Static Geometry)

We introduce another variant, G-FStreamEB, that reduces computational costs for stream bundling when spatial information of nodes remain unchanged. This approach precomputes all compatible edges for each edge and compatibility values for every pair of edges in an “offline” step. Then during streaming, the offline information helps to avoid redundant computations (spatial compatibility and compatible edge sets) in “online” step. Let Gi ={V, Ei } and G={V, E} be the graph in current window and the underlying graph of the stream, respectively. Let S G (ei , ej ) and S(ei , ej ) denote the spatial compatibility for edges ei and ej in G and Gi , respectively. With fixed geometry, S(ei , ej ) has the same value of S G (ei , ej ). Let C G (ei ) and C(ei ) denote the set of compatible edges for edge ei in G and Gi , respectively.

175

5. STREAMEB: STREAM EDGE BUNDLING

Algorithm 2: Offline step of G-FStreamEB input : E the edge set of the union of graphs toff ←− constant; foreach edge ei of E do C G (ei ) ←− {ej | ej ∈ {E − ei } ∧ S G (ei , ej ) ≥ toff }

• The offline step computes the spatial compatibility S G (ei , ej ), and compatible edge set C G (ei ) for each edge ei in G. The set C G (ei ) contains only edges ej , each of which has S G (ei , ej ) greater than a threshold toff ∈ [0, 1]. • The online step computes the compatible edge set C(ei ) for each edge ei ∈ Ei by taking all the vertices in C G (ei ) ∩ Ei such that the aggregate compatibility C(ei , ej ) is greater than a threshold ton ∈ [0, 1]. The iterations are then progressed in similar manner of the S-FStreamEB. Figures 2 and 3 show the algorithms for the offline and online steps of our G-StreamEB. Algorithm 3: Online step of G-FStreamEB input : E the edge set of current graph ton ←− constant; foreach edge ei of E do C(ei ) ←− {ej | ej ∈ C G (ei ) ∩ {E − ej } ∧ C(ei , ej ) ≥ ton }; foreach step s ∈ [0..I] do foreach edge ei of E do pntsi ←− points of ei ; foreach edge ej of C(ei ) do pntsj ←− points of ej ; force(pntsi , pntsj )

5.4.2

TStreamEB: Tree-based stream bundling

We also introduce a new stream bundling method that extends HEB [170]. The key idea of our TStreamEB is a double-interpolation process which takes the aggregate compatibility C(ei , ej ) between a pair of edges into account. The first interpolation is the same as the interpolation in HEB. The second interpolation inter(P, ej , βij ) where inter can be defined in Section 5.2.3, where βij = γ · β · C(ei , ej ), and γ ∈ [0, 1] is the

176

5.5 Implementation second bundling strength. Figure 5.7 depicts the interpolations in our TStreamEB. Figure 4 shows the algorithm of our TStreamEB bundling method. Compared to iterative force calculations in FStreamEB, TStreamEB is faster since only a single iteration of interpolations is performed. Our TStreamEB is not restricted to any specific type of tree structure. In fact, the tree structure may be part of the input given from the application domain, or may be from hierarchical clustering algorithms. Algorithm 4: TStreamEB(E,H) input : E the edge set of current graph; H the hierarchy foreach edge ei of E do to eend ctrl(ei ) ←− points along H from estart i ; i pntsi ←− inter(ctrl(ei ), ei , β); foreach edge ej of C(ei ) do pntsi ←− inter(pntsi , ej , βij );

Figure 5.7: TStreamEB: hierarchy and double-interpolation

5.5

Implementation

We have implemented a prototype of our StreamEB framework based on GEOMI visualization framework [13]. Particularly, we have implemented the FStreamEB (with its three variations S-, G- and F-) and TStreamEB methods that integrate the stream

177

5. STREAMEB: STREAM EDGE BUNDLING compatibility measures described in Section 5.3.4. In most experiments, we set the default values of compatibility parameters to ηS = 0.7, ηT = 0.1, ηN = 0.1, ηD = 0.1. We have also provided some animations1 for the case studies. Since our studies are not involved in hierarchical data, we simply use a geometric clustering – a Quad tree decomposition implementation in FADE [267], for our TStreamEB bundling method. All figures in this chapter are produced by our prototype. All time measurements were conducted on a 1.73GHz Intel Quad Core i7-740QM laptop with 6GB L3 cache, ATI Radeon HD5000 GPU, Ubuntu 10.04 OS and OpenGL enabled. The experiments ran on jdk-1.7 and j3d-1.5.2.

Rendering:

Holten and van Wijk [171] suggested a continuous interpolation in-

ter between straight edges and bundled edges for smoothing curves and for users to understand the bundle structure. We adopt that approach for our FStreamEB and TStreamEB methods. Edges are then rendered using Java3D.

Bundling parameters: Our framework provides UI for users to adjust bundling parameters for analytic tasks. For example, users can select the bundling parameters for temporal, neighborhood, spatial and data-driven compatibility at runtime for both F-FStreamEB and TStreamEB. Users can also select parameters, such as Frepulsion , Fspring , Fbundle and Fanchor of the Force-layout in F-FStreamEB. To illustrate this, Figure 5.8 shows different bundled results of a random graph using S-FStreamEB. The random graph has ten nodes and edges are randomly connected. Each edge has a random timestamp and a data vector of three random integers. For TStreamEB, users may also alter the bundling parameters β, γ to achieve different bundling results. Figure 5.9 depicts different bundled results of the same random graph using force-directed edge bundling and TStreamEB.

User interactions:

Our StreamEB framework provides users with geometric zooms,

high-resolution screen shots and animation recording. 1

See pictures and movies at http://www.it.usyd.edu.au/∼qnguyen/streameb/

178

5.5 Implementation

(a) FDEB

(b) S-FStreamEB Figure 5.8: Visualizations of a random graph using (a) FDEB and (b) S-FStreamEB: with parameters ηS ,ηT ,ηD ,ηN ) of (0.7,0.3,0,0), (0.7,0,0.3,0), (0.7,0,0,3); and (e) and (0.7,0.1,0.1,0.1), respectively, from left to right.

179

5. STREAMEB: STREAM EDGE BUNDLING

(a) Original

(b)

(c)

(d)

(e)

(f )

Figure 5.9: Visualizations of a random graph using TStreamEB: (b) (β,γ)=(1,0);(c) (β,γ)=(1,0.2) and no smoothing; (d) (β,γ)=(1,0.2) with smoothing; (e) (β,γ)=(1,0.8) without smoothing; (f) (β,γ)=(1,0.8) with smoothing

180

5.6 Experimental results Sliding windows:

Two types of sliding windows, sequence-based and time-based,

have been implemented. Users can switch between one type to the other at runtime. The GUI allows users to change the window size by varying a slider.

Time complexity:

The most expensive step for stream bundling methods is the

computation of the compabitility between all pairs of edges and thus require Ω(M 2 ) time, where M is the number of edges |Ei | in the current graph Gi . • S-FStreamEB takes O(I · M 2 · P ) time at every time step, where I is the number of iterations, and P is the number of control points per edge. The offline step of G-FStreamEB requires O(|E|2 ) time to compute the compatibility between all pairs of edges, where |E| is the number of edges in the union graph G. For F-FStreamEB, the anchor forces need O(N ) time and the bundle-aware forces at the last iteration require O(M 2 ) time. • TStreamEB uses a double-interpolation of O(M 2 ·P ) time, where P is the average number of control points per edge. The edge smoothing in the post-processing step takes O(M · P · Ismooth ) where Ismooth is the number of iterations in the smoothing process. The hierarchy requires O(N ) time, where N is the number of nodes. When such hierarchy is fixed, it does not require recalculations.

5.6 5.6.1

Experimental results Geographic networks

Common analytic tasks include identifying the airports that have suffered from severe delays, pairs of airports that often had delays, US regions where most flight delays occurred, and correlations between geographic distances and delays.

5.6.1.1

Data set

We used US flights data1 . The data set contains schedules and delays of US flights from year 1987 to year 2008. The number of flights varies from over one million to over 1

US Flights http://stat-computing.org/dataexpo/2009/the-data.html

181

5. STREAMEB: STREAM EDGE BUNDLING seven millions per year. In our studies, we only keep all flights that actually departed and landed.

Year 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997

|E| 3440 3804 3647 3684 3694 3572 3419 3370 3351 3183 3153

|Stream| 1287334 5126499 4925483 5110528 4995006 5020652 4993588 5078412 5219141 5209327 5302000

Year 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008

|E| 3149 3242 3342 3530 3213 4320 4409 4487 4655 5032 5200

|Stream| 5227052 5360019 5481304 5723674 5197861 6375690 6987730 6992839 7003803 7275289 6855030

Table 5.1: US Flight dataset

The most relevant attributes from the total of 29 attributes of each flight record include origin, destination, date-time, actual duration, scheduled duration, departure delay, arrival delay and distance. For each flight, a positive delay indicates an actual delay, while a negative delay means earlier-than-scheduled. Our study considers only flights that connect between US mainland airports. Table 5.1 depicts general statistics of the dataset for each year. Over the years, the total number of edges |E| range from 3149 to 5200; and the total number of flights range from around one million to over seven millions.

5.6.1.2

Visual analysis

We applied our methods on the US flight data set to show the effect of the terrorist attacks on 11-Sep-2001. In this study, we use the following coloring scheme to separate departure-arrival delays. Each node is represented by two concentric circles: the inner / outer circle represents aggregate value for incoming / outcoming edges, respectively. For both sides (in- or out-), the aggregate values are scaled and assigned from red to green gradient colors. Radius of the nodes is based on out-value; the radius of inner circle is half of the radius of the outer circle. Edges have gradient colors based on the out-color of their end nodes. The colors of relevant nodes are updated at each incoming flight.

182

5.6 Experimental results

Figure 5.10: Visualization of 5000-second windows using S-FStreamEB. US flight delays before and after the 11-Sep-2001 attacks

183

5. STREAMEB: STREAM EDGE BUNDLING

Figure 5.11: Visualization of sequence-based windows (W=160) using TStreamEB. US Flight delays before and after the 11-Sep-2001 attacks.

184

5.6 Experimental results The impact of the terrorist attacks on 11-Sep-2001 was significant and leaded to cancellations of most flights. For instance, all flights were cancelled after 9:20am for half day and there was a single flight on the next day. Several studies are given as follows.

Analysis of flight delays:

Figure 5.10 applies S-FStreamEB on sliding windows

before and after the terrorist attacks on 11-Sep-2001. The figure shows a significant drop of flights from 8am to 8:30am, and from 9:00am to 9:20am. The number of edges in each window is 741, 549, 392 and 255, respectively. The figure also shows that many flights actually departed earlier (red) than scheduled in the 8am window. Then many flights got delayed (green) around 9am to 9:20am - the period after the final attack and before the shut-down period. Figure 5.11 depicts a visualization of flights using our TStreamEB. Most flights from 8:00am to 8:30am were within the Eastern US region. A few flights were delayed (green-yellow), while many of them departed earlier (red-yellow).

Geographic comparison:

Figures 5.10 and 5.11 show that most flights from 8am to

9:20am were within the Eastern US region. They also show the distribution of delays of the flights across US cities over time. A few flights were delayed (green-yellow), while many of them departed earlier (red-yellow) within the time period.

Group of flights:

Figures 5.10 and 5.11 also show bundles of flights that are closely

related. Further, they show the evolution of connecting flights, and flight delays among a group of airports. For instance, many flights in a small group of airports in the Middle-South region departed earlier (than the schedule) at 8am, but then many of them got delayed from 8:30am onward.

5.6.2

Trading networks

Our case studies of stock trading networks aim to identify important traders, groups of traders and the dynamic money flows among them.

185

5. STREAMEB: STREAM EDGE BUNDLING 5.6.2.1

Data set

We use trading data from Thompson Reuters [2], which consists of time series feeds of stock trades. Four datasets were downloaded from following equity exchanges: TSX Venture - Canada (V), Pure Trading-Canada (GO), OMX Nordic Exchange - Denmark (CO) and Stockholm SE - Sweden (ST). Each dataset contains transactions in the last week of May 2011 from each exchange. The datasets are named by the corresponding stock exchanges; they are V, GO, CO and ST. The most relevant attributes from the 44 attributes in the data include buyer, seller, data-time, price and volume. The total number of traders in the datasets are 93 (V), 85 (GO), 71 (CO) and 89 (ST), respectively. The total number of traded securities or stocks are 1956 (V), 1315 (GO), 217 (CO) and 240 (ST), respectively. The total number of trading connections between traders over the entire period are 3149, 1107, 6336 and 2955, respectively; and the number of transactions are 249800, 196662, 18545932 and 188488 for datasets CO, GO, ST and V respectively.

Data set CO ST GO V

|E| 3149 6336 1107 2955

Stream size 249800 18545932 196662 188488

Table 5.2: Statistics of the stock trading data

5.6.2.2

Visual analysis

The most common tasks include, for example, determining frequent traders, frequent trading pairs, and groups of traders that have similar trading behaviours. By examining those aspects, it helps to answer sophisticated questions in stock market surveillance, such as detecting market manipulation. In particular, our analysis includes overviews of the performance of traders and groups of traders over time; and showing evolution of the money flows. For analysis of the trading volumes between traders, we use two following color encodings. First, total-trade coloring colors traders based on the total traded volumes (both buys and sells) of traders (see Figure 5.12 and 5.13). For example, colors can range

186

5.6 Experimental results

Figure 5.12: Visualization of the dataset V on sequence-based sliding window W =160. Using MDS layout and S-FStreamEB.

187

5. STREAMEB: STREAM EDGE BUNDLING

Figure 5.13: Visualization of the dataset V on sequence-based sliding window W =160. Using MDS layout and TStreamEB.

188

5.6 Experimental results from blue to red to represent from low to high performance, respectively. To emphasize on trader performance, node size is computed from trading performance. Edges are colored based on the interpolated colors of their end nodes. Second, buy-sell coloring distinguishes active buyers from active sellers by using the double-circle metaphor for nodes(see Figure 5.14). Each node is represented by two concentric circles: inner circle represents an aggregate buy value and outer circle represents an aggregate sell value over time.

Analysis of trader performance: Figures 5.12 and 5.13 show how active the traders are in terms of buying and selling. Figure 5.12 applies S-FStreamEB and MDS layout on the sequence-based sliding windows W = 160. Figure 5.13 applies TStream with β= 1 and γ= 0.2, and MDS layout. The total trade patterns in the dataset V at market open time are depicted in Figure 5.14(a). The figure applies circular layout and TStreamEB method with β= 1 and γ= 0.2. For similar analysis, Figure 5.14(b) applies F-FStreamEB on sliding windows W = 160 with anchoring forces that anchor vertices to a circular layout. From the figures, the most active traders are 001, 002, 007 and 009; while most other traders still remain inactive at the time. Some traders such as 005 and 019, traded a moderate amount at first (few minutes), but then they traded more over time.

Analysis of trades between groups:

Figures 5.12 shows four groups of traders

located at the corners. The most active traders are located at top-left corner. Then second-most, third-most and the least active traders are located at top-right corner, bottom-left corner and bottom-right corner, respectively. Better pictures of trading groups and their trading patterns are depicted in Figure 5.13. The figure also shows that the most active group traded most frequently with the second, the third and the least active groups in that order. Interestingly, there were less trades between the second and the third compared to the number of trades between the second and the least.

Analysis of market manipulation:

Figure 5.12 and Figure 5.13 show that a num-

ber of active traders (top-left) are “frequent” traders whose the total trade volumes are fairly small. For instance, traders 088 and 124 are fairly blue at the first 30 minutes.

189

5. STREAMEB: STREAM EDGE BUNDLING

(a)

(b) Figure 5.14: Visualization of the dataset V on sequence-based sliding window W =160: (a) the first few seconds of market opening, 500 seconds windows, using Circular layout and TStreamEB; (b) using F-FStreamEB: Anchoring positions from Circular Layout.

190

5.6 Experimental results Whilst some other traders in the least active group, such as traders 027, 066 and 068, initially traded at high volumes. These abnormal trade patterns are useful for detecting price and volume manipulations. When combined with more specialized tools for stock analysis, these may be very useful for replaying and confirming market manipulation cases.

5.6.3

Performance comparison

We compared the performance of three force-directed variations S-FStreamEB, GFStreamEB and F-FStreamEB, and our tree-based bundling TStreamEB.

Figure 5.15: Runtime comparisons of stream bundling methods.

For our experiments, a thousand graph instances of at most nine hundred edges were randomly extracted from graph streams of the flight dataset. We ran each method separately and repeated experiments three times. The number of iterations I in all S-, G- and F- variants was set to 20; while the number of smoothing iterations Ismooth in TStreamEB was set to 10. Figure 5.15 shows average runtime. The figure shows that TStreamEB, which requires only a single iteration, is the fastest. For large graphs, the G- variant outperformed all the other variants of FStreamEB; this is because G-FStreamEB avoids redundant computations of spatial compatibility and compatible edges. Also, the bundle-aware forces added little overheads to F-FStreamEB due to the computation of force layout.

191

5. STREAMEB: STREAM EDGE BUNDLING

5.7

Discussions

Applicability:

Our framework is to demonstrate the usability of edge bundling of

streams. The framework and methods introduced in this chapter are not necessary better than existing works in terms of the quality of bundled results. Our bundling methods have shown to reduce visual clutter in circular layout and MDS layout of graph streams.

Limitations:

The bundling parameters provide flexibility to adapt to analytic needs

at run-time. However, they also require user’s adjustment from the large number of parameters to match with analysis needs. In current implementation, TStreamEB currently uses a synthetic quad tree structure to guide the bundling process and may result in edge bundles that might lead to misleading interpretations.

Scalability:

Our future work will consider Barnes-Hut algorithm to improve time

complexity of FStreamEB from O(I · M 2 · P ) to to O(I · M · logM · P ) for each step, where M is the number of edges |Ei | in the current graph Gi , I is the number of iterations, and P is the number of control points per edge. Computing compatibility values for all pairs requires O(M 2 ) and is a dominant factor. To overcome this limitation, future work might investigate k-means clustering and hierarchical clustering methods to identify top k edges with highest compatibility, such as, studied in Gansner et al. [134] to reduce time complexity. Such the clustering methods need also to be adapted for graph streams.

Interaction:

Our prototype implementation provides a few interaction techniques

for user inputs and zooming. Our future work is to consider ways to interact with edge bundles more effectively. Specifically, the techniques for semantic zoom [328] are worth considering.

Representation of time:

In general, there are two ways to represent time in graph

drawing, using animation and a spatial dimension. For streamed graph, using animation is more relevant because the whole graph is not known a priori.

192

5.8 Concluding remarks For animation, the mapping from data time to picture time is important. In offline representations of time, this mapping can be complex. In real time applications, it is often simple as the mapping is the identity function. That is, one may use the identity function, a linear scale function or a non-linear function for the time mapping. Our case studies are a replay of real-time scenarios; thus, the mapping from data time to picture time can become complex again. For example, in the 9/11 airline traffic data, there is a long period (in data time) that nothing happens. For picture time, this can be compressed.

5.8

Concluding remarks

This chapter has presented the first study of edge bundling for graph streams. We have proposed a new framework, namely StreamEB, that integrates several compatibility measures including temporal, neighborhood, data-driven and spatial compatibility. For these metrics, temporal compatibility and neighborhood compatibility are introduced for the first time in this work, whereas the other compatibility are adapted for fast graph streams. Further, we have presented two bundling methods: • force-directed stream bundling (FStreamEB), and • tree-based stream bundling (TStreamEB). We have implemented a prototype of the framework with the new stream bundling methods. The experimental results on US flights datasets and financial datasets have shown that our approaches are quite useful for stream mining/reasoning tasks. There are several directions for future work and they are described below.

5.8.1

Future work

Hierarchical data:

A possible future work is to study real hierarchical data, such

as those in software engineering. One could visualize the hierarchical structures in the data and could then use TStreamEB.

193

5. STREAMEB: STREAM EDGE BUNDLING Visual encoding:

Different visual encodings can be integrated to help visual anal-

ysis of stream graphs. For geographic networks, a geographic map helps analysis of geographic trends and patterns over time. One could color map regions to represent changes in the stream data. For representing multi-attribute data, glyphs can be used to represent multi-attributed nodes. Though in this chapter, we use simple gradient coloring (specifically, red-togreen, or blue-to-red gradient colors) to show the changes of some edge value associated with the nodes. One may apply similar coloring scheme to glyphs. The color schemes can be used to represent edge value or aggregate value of end nodes of each edge. In addition, one may also use color intensity (such as in heat map) to distinguish nodes with low and high values.

Clustered graphs:

We also would like to study graph streams with some inherent

clustered graph model. It would be useful to extend our StreamEB framework with the clustered graph bundling in TGI-EB [246].

Interpolation between frames: Another nice property to have is a simple interpolation to display transitions between consecutive bundled layouts. After computing the next bundled layout, we interpolate all control points of an edge ei in previous layout to newly computed locations. The interpolation helps tracking changes of bundles in both replay and online scenarios.

Other metrics:

Due to the high update rates of graph streams, edge directions are

not yet considered in this study. Also, the edge importance is not taken into account. These factors can be integrated into our framework to produce good bundled results.

194

Chapter

6

General remarks, Conclusion and Future work “Visualization is daydreaming with a purpose.”— Bo Bennett “Make sure you visualize what you really want, not what someone else wants for you.” — Jerry Gillies In the previous chapters, we have presented a new type of criterion to justify quality of graph visualizations and we have also presented techniques for visualizing large static and dynamic graphs, based on edge bundling. In this chapter, we summarize the main contributions of our visualization models, metrics and techniques. We discuss how these map to the challenges in Section 1.3 (big data and graph visualization challenges) and the need to justify and measure graph visualizations. Lastly, several possible directions for future work are discussed.

6.1

Contributions

This section summarises the main contributions of this thesis. A full list of contributions is given in Chapter 1, and we here highlight where these have been presented in the dissertation. Specifically, we propose a new concept of faithfulness and two new

195

6. GENERAL REMARKS, CONCLUSION AND FUTURE WORK frameworks, called TGI-EB and StreamEB, for visualizing large graphs and stream graphs. Faithfulness in graph visualization: Graph visualizations have been seen as the “solution” to help human to comprehend a huge amount of network data in a broad range of real-world applications. A large number of graph visualization methods have been proposed, and they have been tackled at various granularity levels from overview to detailed analysis. It is important to know if the pictures produced by a visualization method are true representations of the underlying data and if yes, how certain one could rely on them. This demand is especially urgent because of the popularity of network data (see Section 1.2), the big data from technological advances and various challenges of large complex and dynamic network visualizations (see Section 1.3). This thesis proposes a formal model of graph visualization and based on the model, we distinguish two important concepts: the “faithfulness” and the readability of visualizations of graphs. We believe that the new faithfulness criterion is relevant for large complex and dynamic graph visualizations, especially in modern visualization metaphors, such as edge bundling, 2.5D visualizations, map-based visualizations, matrix representations and their hybrid variants. Large graph visualization: Edge bundling techniques reduce visual clutter and display high-level patterns of graphs, which are perceived much better than in unbundled visualizations. But they are limited to show simple geometric or semantic patterns. Our major goal is to support more advanced analyses, such as analysis of the importance or topological structures of large and complex graphs. Subsequently, we propose bundling techniques to visualize large graphs. Stream graph visualization: We also address the challenge of visual analysis of massive data sets, especially those are relational time series data. These data sets are commonly seen in financial activities and security monitoring systems. Volume and velocity of the data are the major difficulties in analysing such massive data sets. In particular, the visual clutter commonly incurred in large dynamic graph visualizations is a major issue. Furthermore, the visualization of graph streams needs to take the mental map into account to support visual-temporal analysis. In summary, we have made the following contributions:

196

6.1 Contributions • To justify how reliable graph visualizations are with respect to user expectations, we have developed a new generic criterion for the quality of graph visualizations – namely faithfulness. In Chapter 3, we describe the visual-knowledge discovery process, taken from Data to Visualization to Human. Based on the model, we define the faithfulness concept, which is specifically the consistency from data to pictures to human knowledge. We subsequently distinguish three different types of faithfulness (information faithfulness, task faithfulness and change faithfulness). Further, we present sample metrics, which are used to quantify the faithfulness of a visualization. For case studies, we examine these faithfulness concepts with various representative visualization metaphors, such as force-directed approaches, multi-dimensional scaling approaches, edge bundling approaches, map-based approaches and compound visualization approaches. • To visualize large complex graphs, we have proposed a framework based on edge bundling. In Chapter 4, we describe our TGI-EB framework that integrates the concerns of structural importance and topology of the graphs for visual analysis of large graphs. The framework specifically integrates new compatibility measures, such as importance compatibility, topology compatibility and plane compatibility, with geometric compatibility. Based on the framework, we have presented five variations of force-directed edge bundling including centrality-based edge bundling (CenEB), Topology-based edge bundling (TopoEB), Radial edge bundling (RadEB), Orthogonal edge bundling (OrthEB), and 2.5D edge bundling (2.5D-EB). Experimental results in our case studies of biological networks, social networks and geographic networks have indicated the usefulness of our approach. It also has led to a potential use of our framework for analysis of biological networks and for generating new hypotheses. • For analysis of massive data sets of time series, we have proposed a framework, called StreamEB, which is described in Chapter 5. Our work is the first study of edge bundling for graph stream visualization. In particular, we have introduced temporal compatibility and neighborhood compatibility; and the frameworks integrate these new compatibility measures with geometric and semantic compatibility for graph streams. We have presented force-directed stream bundling (StreamEB) and tree-based stream bundling (TStreamEB). For evaluation, we

197

6. GENERAL REMARKS, CONCLUSION AND FUTURE WORK have used trading data from Thompson-Reuters and US flight data. Experimental results have indicated that our StreamEB framework and stream bundling methods are quite useful for visual analysis of graph streams.

6.2

Future work

As discussed in previous chapters, there are several directions that would be interesting for future investigation. We summarize some of these as follows.

Faithfulness metrics of compound visualization:

Compound graph visualization

combines several visualization metaphors in a single visualization (see Section 2.2.3 and Section 2.2.4). Examples of compound visualization include Matrix+Link visualization, such as MatLink [164] and NodeTrix [165], which integrate matrix views with node-link diagrams. Compound graph visualization may increase faithfulness of certain part of the network and sacrifice faithfulness of other parts of the networks. It would be interesting to have an explicit metric to capture the faithfulness in compound visualization. An example of such metric for compound visualization can be expressed as a vector of numbers, instead of a single number.

Scalability of bundling frameworks: Our bundling methods in the TGI-EB framework, and FStreamEB method in the StreamEB framework have been built based on Holten and van Wijk’s FDEB method [171]. The methods have a fairly high runtime complexity. To improve runtime performance, we aim to investigate more scalable approaches, such as integrating with multi-level edge bundling (MINGLE) [134].

Semantic zoom: Our prototype implementations of TGI-EB and StreamEB frameworks support basic geometric zoom. Our aim is to consider ways to interact with edge bundles more effectively, such as, by using semantic zoom [328].

198

6.3 Concluding remark

6.3

Concluding remark

To sum up the significance of this dissertation, we would like to state an important remark below, which is based on our new faithfulness criterion. “Good graph visualizations, they must be readable and faithful.” (2012)

199

6. GENERAL REMARKS, CONCLUSION AND FUTURE WORK

Bibliography

References include a page number reference to the citation(s) in this thesis.

[1] jFlowMap. http://code.google.com/p/jflowmap/, 2010. xi, 38, 121 [2] Thomson Reuters. http://thomsonreuters.com/, 2011. 186 [3] J. Abello, I. Finocchi, and J. Korn. Graph sketches. Proceedings of the IEEE Symposiom on Information Visualization, pages 67–71, 2001. 154, 163 [4] J. Abello and J. Korn. MGV: A system for visualizing massive multidigraphs. TVCG, 8(1):21–38, 2002. 153 [5] J. Abello and F. van Ham. Matrix Zoom: A visual interface to semi-external graphs. In Information Visualization, 2004. Proceedings. 1999 IEEE Symposium on. IEEE, 2004. 27 [6] J. Abello, F. Van Ham, and N. Krishnan. Ask-graphview: A large scale graph visualization system. Visualization and Computer Graphics, IEEE Transactions on, 12(5):669–676, 2006. 42 [7] C. Aggarwal, Y. Zhao, and S.Y. Philip. On clustering graph streams. In SIAM International Conference on Data Mining, 2010. 162 [8] C.C. Aggarwal, J. Han, J. Wang, and P.S. Yu. A framework for clustering evolving data streams. In VLDB, volume 29, pages 81–92. VLDB Endowment, 2003. 57, 154, 162 [9] C.C. Aggarwal, J. Han, J. Wang, and P.S. Yu. A framework for projected clustering of high dimensional data streams. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, pages 852–863. VLDB Endowment, 2004. 57 [10] C.C. Aggarwal, J. Han, J. Wang, and P.S. Yu. On demand classification of data streams. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 503–508. ACM, 2004. 57 [11] C.C. Aggarwal, Y. Li, P.S. Yu, and R. Jin. On dense pattern mining in graph streams. VLDB Endowment, 3(1-2):975–984, 2010. 154

BIBLIOGRAPHY [12] C.C. Aggarwal and H. Wang. Managing and Mining Graph Data, volume 40. Springer-Verlag New York Inc, 2010. 153, 154, 161, 162 [13] A. Ahmed, T. Dwyer, M. Forster, X. Fu, J. W. K. Ho, S. H. Hong, D. Koschtzki, C. Murray, N. S. Nikolov, R. Taib, A. Tarassov, and K. Xu. GEOMI: GEOmetry for Maximum Insight. In Patrick Healy and Nikola S. Nikolov, editors, Graph Drawing, volume 3843 of Lecture Notes in Computer Science, pages 468–479. Springer, 2005. 121, 124, 125, 140, 177 [14] R. Albert and A.L. Barab´asi. Statistical mechanics of complex networks. Reviews of modern physics, 74(1):47, 2002. 6, 49 [15] R. Albert, H. Jeong, and A. L. Barab´asi. Internet: The diameter of the world wide web. Nature, (6749):130–131, 1999. 6, 49, 51, 52 [16] N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci., 58:137–147, February 1999. 153, 161 [17] H. Alt and M. Godau. Computing the Fr´echet distance between two polygonal curves. International Journal of Computational Geometry and Applications, 5(1):75–91, 1995. 87 [18] H.J.I Alvarez, A.L. Dall, A. Barrat, and A. Vespignani. Large scale networks fingerprinting and visualization using the k-core decomposition. Advances in neural information processing systems, 18:41, 2006. 117 [19] R. Andersen, F. Chung, and K. Lang. Local graph partitioning using PageRank vectors. In Foundations of Computer Science, 2006. FOCS’06. 47th Annual IEEE Symposium on, pages 475–486. IEEE, 2006. 54 [20] N. Andrienko and G. Andrienko. Exploratory analysis of spatial and temporal data. Springer Verlag, 2006. 37, 38, 69 [21] D. Anicic, S. Rudolph, P. Fodor, and N. Stojanovic. Retractable complex event processing and stream reasoning. Rule-Based Reasoning, Programming, and Applications, pages 122–137, 2011. 162 [22] L. Antoine, A. David, and G. Melan¸con. Living flows: Enhanced exploration of edge-bundled graphs based on GPU-intensive edge rendering. In Information Visualisation (IV), 2010 14th International Conference, pages 523–530. IEEE, 2010. 46, 108 [23] A. Arasu, S. Babu, and J. Widom. The CQL continuous query language: semantic foundations and query execution. The VLDB JournalThe International Journal on Very Large Data Bases, 15(2):121–142, 2006. 58, 162 [24] D. Archambault, T. Munzner, D. Auber, et al. Grouse: Feature-based, steerable graph hierarchy exploration. In Proc. of Eurographics/IEEE VGTC Symp. on Visualization (EuroVis 07), pages 67–74, 2007. 42

BIBLIOGRAPHY [25] D. Archambault, H. Purchase, and B. Pinaud. Animation, small multiples, and the effect of mental map preservation in dynamic graphs. IEEE Transactions on Visualization and Computer Graphics, 17(4):539–552, 2011. 37, 69 [26] S. Arora, S. Rao, and U. Vazirani. Expander flows, geometric embeddings and graph partitioning. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pages 222–231. ACM, 2004. 54 ´ Garrido, C. I. Grima, G. Hern´andez, [27] N. Atienza, N. de Castro, C. Cort´es, M. A. A. M´ arquez, A. Moreno, M. N¨ ollenburg, J. R. Portillo, P. Reyes, J. Valenzuela, M. T. Villar, and A. Wolff. Cover contact graphs. In Graph Drawing, pages 171–182. Springer-Verlag, 2007. 124, 125 [28] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In ACM Principles of database systems, pages 1–16. ACM, 2002. 55, 57 [29] B. Babcock, M. Datar, and R. Motwani. Load shedding techniques for data stream systems. In The 2003 Workshop on Management and Processing of Data Streams. Citeseer, 2003. 57, 162 [30] G.D. Bader and C.W.V. Hogue. An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics, 4(1):2, 2003. 53, 110 [31] W.W.R. Ball. Mathematical recreations and essays. MacMillan, 1914. xi, 4, 5 [32] M. Balzer and O. Deussen. Level-of-detail visualization of clustered graph layouts. In APVIS, pages 133–140, 2007. 30, 45, 108 [33] M. Balzer, O. Deussen, and C. Lewerentz. Voronoi treemaps for the visualization of software metrics. In Proceedings of the 2005 ACM symposium on Software visualization, pages 165–172. ACM, 2005. 27, 28 [34] Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In SCDA, pages 623–632. Society for Industrial and Applied Mathematics, 2002. 153, 154, 161 [35] A.L. Barab´asi and R. Albert. Emergence of scaling in random networks. science, 286(5439):509–512, 1999. 49 [36] A.L. Barab´asi and Z.N. Oltvai. Network biology: understanding the cell’s functional organization. Nature Reviews Genetics, 5(2):101–113, 2004. 49 [37] D.F. Barbieri, D. Braga, S. Ceri, E.D. Valle, and M. Grossniklaus. Querying RDF streams with C-SPARQL. ACM SIGMOD Record, 39(1):20–26, 2010. 56, 154, 162 [38] V. Batagelj and M. Zaversnik. An O(m) algorithm for cores decomposition of networks. Arxiv preprint cs/0310049, 2003. 53, 102 [39] G. D. Battista, P. Eades, R. Tamassia, and I. G. Tollis. Graph Drawing: Algorithms for the Visualization of Graphs. Prentice-Hall, 1999. 22, 68

BIBLIOGRAPHY [40] M. Baur and U. Brandes. Crossing reduction in circular layouts. In WG 2004, Volume 3353 of LNCS, pages 332–343. Springer, 2004. 24 [41] R.A. Becker, S.G. Eick, and A.R. Wilks. Visualizing network data. Visualization and Computer Graphics, IEEE Transactions on, 1(1):16–28, 1995. xi, 26 [42] Michael A. Bekos, Michael Kaufmann, Antonios Symvonis, and Alexander Wolff. Boundary labeling: Models and efficient algorithms for rectangular maps. In Graph Drawing, pages 49–59, 2004. 125 [43] M.R. Berthold and D.J. Hand.

Intelligent data analysis: an introduction.

Springer, 2007. 53 [44] E. Bertini, A. Tatu, and D. Keim. Quality metrics in high-dimensional data visualization: An overview and systematization. Visualization and Computer Graphics, IEEE Transactions on, 17(12):2203–2212, 2011. 67 [45] Carla Binucci, Ulrik Brandes, Giuseppe Di Battista, Walter Didimo, Marco Gaertler, Pietro Palladino, Maurizio Patrignani, Antonios Symvonis, and Katharina Anna Zweig. Drawing trees in a streaming model. In Graph Drawing, pages 292–303, 2009. 125, 154, 163 [46] P. Bonacich. Factoring and weighting approaches to status scores and clique identification. The Journal of Mathematical Sociology, 2(1):113–120, 1972. 50, 52, 102 [47] P. Bonacich. Power and centrality: A family of measures. American journal of sociology, pages 1170–1182, 1987. 52 [48] I. Boyandin, E. Bertini, and D. Lalanne. Using flow maps to explore migrations over time. In Geospatial Visual Analytics Workshop, volume 2, 2010. 163 [49] F.J. Brandenburg, D. Eppstein, M.T. Goodrich, S.G. Kobourov, G. Liotta, and P. Mutzel. Selected open problems in graph drawing. In Graph Drawing, pages 515–539. Springer, 2004. 125 [50] U. Brandes. Drawing on physical analogies. Drawing Graphs, pages 71–86, 2001. 68 [51] U. Brandes, T. Dwyer, and F. Schreiber. Visualizing related metabolic pathways in two and a half dimensions. In Graph Drawing, pages 111–122. Springer, 2004. 35, 119, 125, 145 [52] U. Brandes and T. Erlebach. Network analysis: methodological foundations. Springer Verlag, 2005. 21, 52, 73, 109, 110, 113, 116 [53] U. Brandes and M. Mader. A quantitative comparison of stress-minimization approaches for offline dynamic graph drawing. In Graph Drawing, pages 99–110. Springer, 2012. 39, 69, 84, 163 [54] U. Brandes and C. Pich. Eigensolver methods for progressive multidimensional scaling of large data. In Graph Drawing, pages 42–53. Springer, 2007. 83

BIBLIOGRAPHY [55] U. Brandes and C. Pich. An experimental study on distance-based graph drawing. In Graph Drawing, pages 218–229. Springer, 2009. 10 [56] U. Brandes and D. Wagner. A bayesian paradigm for dynamic graph layout. In Graph Drawing, pages 236–247. Springer, 1997. 37, 39, 163 [57] U. Brandes and D. Wagner. Using graph layout to visualize train interconnection data. In Graph Drawing, pages 44–56. Springer, 1998. 44, 106 [58] Ulrik Brandes and Christian Pich. Eigensolver methods for progressive multidimensional scaling of large data. In Graph Drawing, pages 42–53, 2006. 68 [59] J. Branke. Dynamic graph drawing. Drawing graphs, pages 228–246, 2001. 37 [60] S.S. Bridgeman and R. Tamassia. A user study in similarity measures for graph drawing. J. Graph Algorithms Appl., 6(3):225–254, 2002. 87 [61] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web. Computer networks, 33(1):309–320, 2000. 153 [62] A. Buja, D.F. Swayne, M.L. Littman, N. Dean, H. Hofmann, and L. Chen. Data visualization with multidimensional scaling. Journal of Computational and Graphical Statistics, 17(2):444–472, 2008. 68 [63] M. Burch, C. Vehlow, F. Beck, S. Diehl, and D. Weiskopf. Parallel edge splatting for scalable dynamic graph visualization. Visualization and Computer Graphics, IEEE Transactions on, 17(12):2344–2353, 2011. 37 [64] C. Calero, R. Buter, C. Cabello Vald´es, and ED Noyons. How to identify research groups using publication analysis: an example in the field of nanotechnology. Scientometrics, 66(2):365–376, 2006. 53, 102, 110 [65] S.K. Card, J.D. Mackinlay, and B. Shneiderman. Readings in information visualization: using vision to think. Morgan Kaufmann, 1999. 18, 19 [66] S.K. Card, T.P. Moran, and A. Newell. The psychology of human-computer interaction. CRC, 1986. 19 [67] S. Carpendale. Evaluating information visualizations. Information Visualization, pages 19–45, 2008. 4, 11, 65, 67 [68] J. Carriere and R. Kazman. Interacting with huge hierarchies: beyond cone trees. In Information Visualization, 1995. Proceedings., pages 74–81. IEEE, 1995. xi, 23, 24, 31 [69] P. Caserta, O. Zendra, and D. Bod´en`es. 3D hierarchical edge bundles to visualize relations in a software city metaphor. In Visualizing Software for Understanding and Analysis (VISSOFT), 2011 6th IEEE International Workshop on, pages 1–8. IEEE, 2011. 119 [70] M. Chen, D. Ebert, H. Hagen, R.S. Laramee, R. Van Liere, K.L. Ma, W. Ribarsky, G. Scheuermann, and D. Silver. Data, information, and knowledge in

BIBLIOGRAPHY visualization. Computer Graphics and Applications, IEEE, 29(1):12–19, 2009. 4, 11, 65, 67 [71] E.H. Chi. A Framework for Information Visualization Spreadsheets. PhD thesis, University of Minnesota, 1999. 19 [72] G. Chin, M. Singhal, G. Nakamura, V. Gurumoorthi, and N. Freeman-Cadoret. Visual analysis of dynamic data streams. InfoVis, 8(3):212, 2009. 154, 163 [73] A. Clauset, M.E.J. Newman, and C. Moore. Finding community structure in very large networks. Physical review E, 70(6):066111, 2004. 54 [74] M.K. Coleman and D.S. Parker. Aesthetics-based graph layout for human consumption. Software: Practice and Experience, 26(12):1415–1438, 1996. 68 [75] C. Collins and S. Carpendale. VisLink: Revealing relationships amongst visualizations. IEEE Transactions on Visualization and Computer Graphics, 13(6):1192– 1199, 2007. 119 [76] Robert Cooley, Bamshad Mobasher, Jaideep Srivastava, et al. Data preparation for mining world wide web browsing patterns. Knowledge and information systems, 1(1):5–32, 1999. 6 [77] G. Cormode and S. Muthukrishnan. Space efficient mining of multigraph streams. In ACM SIGMOD-SIGACT-SIGART Principles of database systems, pages 271– 282. ACM, 2005. 154, 161 [78] B. Cornelissen, D. Holten, A. Zaidman, L. Moonen, J.J. Van Wijk, and A. Van Deursen. Understanding execution traces using massive sequence and circular bundle views. In IEEE International Conference on Program Comprehension (ICPC), pages 49–58. IEEE, 2007. 25, 106 [79] B. Cornelissen, A. Zaidman, D. Holten, L. Moonen, A. Van Deursen, and J.J. van Wijk. Execution trace analysis through massive sequence and circular bundle views. Journal of Systems and Software, 81(12):2252–2268, 2008. 25, 46, 106 [80] W. Cui and H. Qu. A survey on graph visualization. Hong Kong University of Science and Technology, 2007. 22 [81] W. Cui, H. Zhou, H. Qu, P.C. Wong, and X. Li. Geometry-based edge clustering for graph visualization. TVCG, pages 1277–1284, 2008. 45, 85, 102, 108, 135, 155, 163 [82] A. Das Sarma, S. Gollapudi, and R. Panigrahy.

Estimating PageRank on

graph streams. In ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 69–78. ACM, 2008. 154, 161 [83] A. Dasgupta and R. Kosara. Adaptive privacy-preserving visualization using parallel coordinates. Visualization and Computer Graphics, IEEE Transactions on, 17(12):2241–2248, 2011. 67 [84] R. Davidson and D. Harel. Drawing graphs nicely using simulated annealing. ACM Transactions on Graphics (TOG), 15(4):301–331, 1996. 25, 68

BIBLIOGRAPHY [85] W. De Pauw and H. Andrade. Visualizing large-scale streaming applications. InfoVis, 8(2):87, 2009. 154, 163 [86] C. Demetrescu, I. Finocchi, and A. Ribichini. Trading off space for passes in graph streaming problems. ACM Transactions on Algorithms (TALG), 6(1):6, 2009. 161 [87] I.S. Dhillon, Y. Guan, and B. Kulis. Weighted graph cuts without eigenvectors a multilevel approach. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(11):1944–1957, 2007. 54 [88] G Di Battista, E Pietrosanti, R Tamassia, and IG Tollis. Automatic layout of pert diagrams with x-pert. In Visual Languages, 1989., IEEE Workshop on, pages 171–176. IEEE, 1989. 6 [89] M. Dickerson, D. Eppstein, M.T. Goodrich, and J.Y. Meng. Confluent drawings: visualizing non-planar diagrams in a planar way. In Graph Drawing, pages 1–12. Springer, 2004. 44, 63, 68 [90] S. Diehl and C. G¨ org. Graphs, they are changing. In Graph drawing, pages 23–31. Springer, 2002. 37, 38, 69 [91] J. Diesner, T. L. Frantz, and K. M. Carley. Communication networks from the Enron email corpus ”it’s always about the people. enron is no different”. Comput. Math. Organ. Theory, 11(3):201–228, October 2005. 8 [92] T. Do, S. Loke, and F. Liu. Answer set programming for stream reasoning. Advances in Artificial Intelligence, pages 104–109, 2011. 162 [93] P. Domingos and G. Hulten. A general method for scaling up machine learning algorithms and its application to clustering. In Machine Learning - International Workshop then Conference -, pages 106–113, 2001. 56 [94] G. Dong, J. Han, L.V.S. Lakshmanan, J. Pei, H. Wang, and P.S. Yu. Online mining of changes from data streams: Research problems and preliminary results. In Proceedings of the 2003 ACM SIGMOD Workshop on Management and Processing of Data Streams, 2003. 58 [95] S.N. Dorogovtsev and J.F.F. Mendes. Evolution of networks. Advances in physics, 51(4):1079–1187, 2002. 49 [96] C. A. Duncan, C. Gutwenger, L. Nachmanson, and G. Sander. Graph drawing contest report. In Proceedings of the 18th international conference on Graph drawing, GD’10, pages 406–411. Springer-Verlag, 2011. 121 [97] Christian A. Duncan, Gunnar W. Klau, Stephen G. Kobourov, and Georg Sander. Graph-drawing contest report. In Graph Drawing, pages 448–452, 2006. 125 [98] Christian A. Duncan, Stephen G. Kobourov, and Georg Sander. Graph drawing contest report. In Graph Drawing, pages 395–400, 2007. 125

BIBLIOGRAPHY [99] T. Dwyer. Two-and-a-half-dimensional Visualisation of Relational Networks. PhD thesis, School of Information Technologies, Faculty of Science, University of Sydney, 2004. 35, 119, 145 [100] T. Dwyer, K. Marriott, and M. Wybrow. Integrating edge routing into forcedirected layout. In Graph Drawing, pages 8–19. Springer, 2007. 25, 45 [101] P. Eades. A heuristic for graph drawing, 1984. 25, 83, 175 [102] P. Eades and Q.W. Feng. Multilevel visualization of clustered graphs. In Graph drawing, pages 101–112. Springer, 1997. 30, 119, 145 [103] P. Eades, C. Gutwenger, S. H. Hong, and P. Mutzel. Graph drawing algorithms. In Algorithms and theory of computation handbook, pages 6–6. Chapman & Hall/CRC, 2010. 68 [104] P. Eades, Lai W., K. Misue, and Sugiyama K. Preserving the mental map of a diagram. In Proceedings of COMPUGRAPHICS, pages 34–43. International Institute for Advanced Study of Social Information Science, Fujitsu Limited, 1991. 38, 69, 76 [105] M. Eiglsperger, S. Fekete, and G. Klau. Orthogonal graph drawing. Drawing graphs, pages 121–171, 2001. 104 [106] M. Eiglsperger, U. F¨oßmeier, and M. Kaufmann. Orthogonal graph drawing with constraints. In Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, pages 3–11. Society for Industrial and Applied Mathematics, 2000. 104 [107] Geoffrey Ellis and Alen Dix.

A taxonomy of clutter reduction for Informa-

tion Visualization. IEEE Transactions on Visualization and Computer Graphics, 13(6):1216–1223, 2007. 41 [108] D. Eppstein, M. Goodrich, and J. Meng. Delta-confluent drawings. In Graph Drawing, pages 165–176. Springer, 2006. 63 [109] D. Eppstein, M.T. Goodrich, and J.Y. Meng. Confluent layered drawings. Algorithmica, 47(4):439–452, 2007. xii, 64 [110] O. Ersoy, C. Hurter, F. Paulovich, G. Cantareiro, and A. Telea. Skeleton-based edge bundling for graph visualization. TVCG, 17(12):2364–2373, 2011. 108, 155 [111] C. Erten, S. Kobourov, V. Le, and A. Navabi. Simultaneous graph drawing: Layout algorithms and visualization schemes. In Graph Drawing, pages 437–449. Springer, 2004. 39, 163 [112] L. Euler. Solutio problematis ad geometriam situs pertinentis. Commentarii academiae scientiarum Petropolitanae, 8:128–140, 1741. 4 [113] Beck F., Puppe M., Braun P., Burch M., and Diehl S. Edge bundling without reducing the source to target traceability. In InfoVis, pages 298–305. IEEE, 2011. 109

BIBLIOGRAPHY [114] M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In ACM SIGCOMM Computer Communication Review, volume 29, pages 251–262. ACM, 1999. 49, 153 [115] J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. Theoretical Computer Science, 348(2-3):207–216, 2005. 56, 153, 154, 161 [116] J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An approximate L1-difference algorithm for massive data streams. In Foundations of Computer Scienc, pages 501–511. IEEE, 1999. 153, 161 [117] Q. Feng. Algorithms for drawing clustered graphs. PhD thesis, University of Newcastle, Australia, 1997. 31 [118] Q. Feng, R. Cohen, and P. Eades. How to draw a planar clustered graph. Computing and Combinatorics, pages 21–30, 1995. 68 [119] P. Fergus, J. Haggerty, M. Taylor, and L. Bracegirdle. Towards a whole body sensing platform for healthcare applications. Whole Body Interaction, pages 135– 149, 2011. 162 [120] M.C. Ferreira de Oliveira and H. Levkowitz. From visual data exploration to visual data mining: A survey. Visualization and Computer Graphics, IEEE Transactions on, 9(3):378–394, 2003. 2 [121] B. Finkel and R. Tamassia. Curvilinear graph drawing using the force-directed method. In GD 2004, pages 448–453. Springer. 83 [122] G.W. Flake, R.E. Tarjan, and K. Tsioutsiouliklis. Graph clustering and minimum cut trees. Internet Mathematics, 1(4):385–408, 2004. 153, 162 [123] S. Fortunato and M. Barthelemy. Resolution limit in community detection. Proceedings of the National Academy of Sciences, 104(1):36–41, 2007. 54 [124] F. Frati, M. Kaufmann, and S. G. Kobourov. Constrained simultaneous and nearsimultaneous embeddings. In Graph Drawing, pages 268–279. Springer-Verlag, 2007. 125 [125] L.C. Freeman. A set of measures of centrality based on betweenness. Sociometry, 40(1):35–41, 1977. 50, 51, 102 [126] L.C. Freeman. Centrality in social networks conceptual clarification. Social networks, 1(3):215–239, 1979. 50, 51, 52, 102 [127] L.C. Freeman, S.P. Borgatti, and D.R. White. Centrality in valued graphs: A measure of betweenness based on network flow. Social Networks, 13(2):141–154, 1991. 50, 102 [128] Y. Frishman and A. Tal. Dynamic drawing of clustered graphs. In Information Visualization, 2004. INFOVIS 2004. IEEE Symposium on, pages 191–198. IEEE, 2004. 37, 39, 69

BIBLIOGRAPHY [129] Y. Frishman and A. Tal. Online dynamic graph drawing. IEEE Transactions on Visualization and Computer Graphics, 14(4):727–740, 2008. 37 [130] T. M. J. Fruchterman and E. M. Reingold. Graph drawing by force-directed placement. Softw. Pract. Exper., 21(11):1129–1164, 1991. 25, 83 [131] P. Gajer, M.T. Goodrich, S.G. Kobourov, et al. A fast multi-dimensional algorithm for drawing large graphs. In Graph Drawing, pages 211–221, 2000. 10 [132] S. Ganguly and B. Saha. On estimating path aggregates over streaming graphs. Algorithms and Computation, pages 163–172, 2006. 154 [133] E. Gansner, Y. Hu, and S. North. Visualizing streaming text data with dynamic maps. In Graph Drawing, pages 439–450. Springer, 2012. 29, 37, 91 [134] E. Gansner, Y. Hu, S. North, and C. Scheidegger. Multilevel agglomerative edge bundling for visualizing large graphs. In Pacific Visualization Symposium (PacificVis), 2011 IEEE, pages 187–194. IEEE, 2011. 46, 63, 85, 86, 108, 150, 163, 192, 198 [135] E. Gansner and Y. Koren. Improved circular layouts. In GD, pages 386–398. Springer, 2006. 25, 45, 102, 106, 155 [136] E. Gansner, Y. Koren, and S. North. Graph drawing by stress majorization. In Graph Drawing, pages 239–250. Springer, 2005. 39, 163 [137] E. R. Gansner and S. C. North. Improved force-directed layouts. In Graph Drawing, pages 364–373, 1998. 25, 68, 83 [138] E.R. Gansner, Y. Hu, and S. Kobourov. GMap: Visualizing graphs and clusters as maps. In PacificVis, pages 201–208. IEEE, 2010. xi, xii, 29, 91, 92 [139] E.R. Gansner, E. Koutsofios, S.C. North, and K.P. Vo. A technique for drawing directed graphs. Software Engineering, IEEE Transactions on, 19(3):214–230, 1993. 44 [140] E.R. Gansner and S.C. North. An open graph visualization system and its applications to software engineering. Software: practice and experience, 30(11):1203– 1233, 2000. 6, 44 [141] M. R. Garey and D. S. Johnson. Crossing number is NP-complete. In SIAM Journal on Algebraic and Discrete Methods, volume 4, pages 312–316, 1983. 43 [142] M. Ghoniem, J.D. Fekete, and P. Castagliola. On the readability of graphs using node-link and matrix-based representations: a controlled experiment and statistical analysis. Information Visualization, 4(2):114–135, 2005. 8, 90 [143] D. Gibson, R. Kumar, and A. Tomkins. Discovering large dense subgraphs in massive graphs. In VLDB, pages 721–732. VLDB Endowment, 2005. 162 [144] A.C. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan, and M.J. Strauss. Fast, small-space algorithms for approximate histogram maintenance. In ACM symposium on Theory of computing, pages 389–398. ACM, 2002. 161

BIBLIOGRAPHY [145] A.C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M.J. Strauss. One-pass wavelet decompositions of data streams. Knowledge and Data Engineering, IEEE Transactions on, 15(3):541–554, 2003. 57 [146] E.W. Gilbert. Pioneer maps of health and disease in England. Geographical Journal, pages 172–183, 1958. 3 [147] M. Girvan and M.E.J. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12):7821–7826, 2002. 49, 54 [148] X. Goaoc, J. Kratochv´ıl, Y. Okamoto, C. Shin, and A. Wolff. Moving vertices to make drawings plane. In GD’07: Proceedings of the 15th international conference on Graph drawing, pages 101–112, Berlin, Heidelberg, 2008. Springer-Verlag. 125 [149] M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In ACM SIGMOD Record, volume 30, pages 58–66. ACM, 2001. 161 [150] M. Gronemann and M. J¨ unger. Drawing clustered graphs as topographic maps. In Graph Drawing, pages 426–438. Springer, 2012. 29, 91 [151] J.L. Gross and J. Yellen. Handbook of graph theory. CRC, 2003. 22 [152] J.L. Gross and J. Yellen. Graph theory and its applications. Chapman & Hall/CRC, 2005. 22 [153] S. Guha, N. Koudas, and K. Shim. Approximation and streaming algorithms for histogram construction problems. TODS, 31(1):396–438, 2006. 153, 154, 161 [154] R. Guimera, M. Sales-Pardo, and L.A.N. Amaral. Modularity from fluctuations in random graphs and complex networks. Physical Review E, 70(2):025101, 2004. 54 [155] C. Gutwenger, M. J¨ unger, G. W. Klau, S. Leipert, and P. Mutzel. Graph drawing algorithm engineering with AGD. Springer, 2002. 68 [156] S. Hachul and M. J¨ unger. Drawing large graphs with a potential-field-based multilevel algorithm. In Graph Drawing, pages 285–295. Springer, 2005. 10 [157] Steffen Hadlak, H-J Schulz, and Heidrun Schumann. In situ exploration of large dynamic networks. Visualization and Computer Graphics, IEEE Transactions on, 17(12):2334–2343, 2011. 37 [158] D.J. Hand. Statistics and data mining: intersecting disciplines. ACM SIGKDD Explorations Newsletter, 1(1):16–19, 1999. 53 [159] D.J. Hand, H. Mannila, and P. Smyth. Principles of data mining. MIT press, 2001. 53, 55 [160] D. Harel. On visual formalisms. Communications of the ACM, 31(5):514–530, 1988. 31 [161] D. Harel and Y. Koren. A fast multi-scale method for drawing large graphs. In Graph Drawing, pages 235–287. Springer, 2001. 25, 68

BIBLIOGRAPHY [162] M. Hemmje, C. Kunkel, and A. Willett. Lyber WorldA visualization user interface supporting fulltext retrieval. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 249–259. Springer-Verlag New York, Inc., 1994. 34 [163] N. Henry and J.D. Fekete. Matrix Explorer: A dual-representation system to explore social networks. IEEE Transactions on Visualization and Computer Graphics, 12(5):677–684, 2006. 26, 27, 90 [164] N. Henry and J.D. Fekete. Matlink: Enhanced matrix visualization for analyzing social networks. Human-Computer Interaction–INTERACT, pages 288–302, 2007. 26, 32, 91, 93, 198 [165] N. Henry, J.D. Fekete, and M.J. McGuffin. NodeTrix: a hybrid visualization of social networks. TVCGn, 13(6):1302–1309, 2007. 32, 91, 93, 198 [166] M.R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. Technical report, Technical Note 1998-011, Digital Systems Research Center, Palo Alto, CA, 1998. 153, 161 [167] I. Herman, G. Melan¸con, and M.S. Marshall. Graph visualization and navigation in information visualization: A survey. Visualization and Computer Graphics, IEEE Transactions on, 6(1):24–43, 2000. 22, 23 [168] S.L. Hibino. A task-oriented view of information visualization. In Conference on Human Factors in Computing Systems: CHI’99 extended abstracts on Human factors in computing systems, volume 15, pages 178–179, 1999. 21, 73 [169] J. Ho and S.H. Hong. Drawing clustered graphs in three dimensions. In Graph Drawing, pages 492–502. Springer, 2005. 35, 119, 121, 140, 145 [170] D. Holten. Hierarchical edge bundles: Visualization of adjacency relations in hierarchical data. TVCG, pages 741–748, 2006. xii, 31, 32, 45, 63, 85, 102, 106, 155, 163, 164, 176 [171] D. Holten and J. J. van Wijk. Force-directed edge bundling for graph visualization. Computer Graphics Forum, 28(3):983–990, 2009. xii, xiii, 46, 47, 85, 86, 87, 88, 89, 102, 107, 109, 110, 111, 135, 136, 155, 157, 163, 165, 171, 172, 173, 178, 198 [172] S.H. Hong and T. Murtagh. Visualisation of large and complex networks using polyplane. In Graph Drawing, pages 471–481. Springer, 2004. 35 [173] Susan Horwitz and Thomas Reps. The use of program dependence graphs in software engineering. In Proceedings of the 14th international conference on Software engineering, pages 392–411. ACM, 1992. 6 [174] Y. Hu. Efficient, high-quality force-directed graph drawing. Mathematica Journal, 10(1):37–71, 2005. 10

BIBLIOGRAPHY [175] Y. Hu, S.G. Kobourov, and S. Veeramoni. Embedding, clustering and coloring for dynamic maps. In Pacific Visualization Symposium (PacificVis), 2012 IEEE, pages 33–40. IEEE, 2012. 29, 91 [176] Y.F. Hu. Efficient and high quality force-directed graph drawing. The Mathematica Journal, 10:37–71, 2005. 83 [177] M. Huang and P. Eades. A fully animated interactive system for clustering and navigating huge graphs. In Graph Drawing, pages 374–383. Springer, 1998. 37, 69 [178] M.L. Huang, P. Eades, and J. Wang. On-line animated visualization of huge graphs using a modified spring algorithm. Journal of Visual Language and Computing, 9(6):623–645, 1998. 39, 163 [179] T. Hughes, Y. Hyun, and D.A. Liberles. Visualising very large phylogenetic trees in three dimensional hyperbolic space. BMC bioinformatics, 5(1):48, 2004. 33 [180] J. Hullman, E. Adar, and P. Shah. Benefitting InfoVis with visual difficulties. Visualization and Computer Graphics, IEEE Transactions on, 17(12):2213–2222, 2011. 67 [181] J. Hullman and N. Diakopoulos. Visualization rhetoric: Framing effects in narrative visualization. Visualization and Computer Graphics, IEEE Transactions on, 17(12):2231–2240, 2011. 67 [182] C Hurter, O Ersoy, and A Telea. Smooth Bundling of Large Streaming and Sequence Graphs. In IEEE PacificVis (to appear). IEEE, 2013. 87 [183] R.F. i Cancho and R.V. Sol´e. The small world of human language. Proceedings of the Royal Society of London. Series B: Biological Sciences, 268(1482):2261–2265, 2001. 6, 49 [184] P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. In Foundations of Computer Science, pages 189–197. IEEE, 2000. 161 [185] P. Indyk and D. Woodruff. Optimal approximations of the frequency moments of data streams. In ACM symposium on Theory of computing, pages 202–208. ACM, 2005. 161 [186] Robert J. K. Jacob. A state transition diagram language for visual programming. Computer, 18:51–59, 1985. 6 [187] H. J¨anicke, T. Weidner, D. Chung, R.S. Laramee, P. Townsend, and M. Chen. Visual reconstructability as a quality metric for flow visualization. In Computer Graphics Forum, volume 30, pages 781–790. Wiley Online Library, 2011. 64, 67 [188] S. Janowski, B. Kormeier, K. Hippe, Q. Nguyen, Seok-Hee Hong, R. Hofestadt, J. Stoye, B. Kaltschmidt, and C. Kaltschmid. Reconstruction and analysis of biological networks based on large scale data from the NF-kB pathway. In International Conference on Integrative Bioinformatics, 2011. 102, 103

BIBLIOGRAPHY [189] N. Jaworska and A. Chupetlovska-Anastasova. A review of multidimensional scaling (MDS) and its utility in various psychological domains. Tutorials in Quantitative Methods for Psychology, 5(1):1–10, 2009. 68 [190] H. Jeong, S.P. Mason, A.L. Barab´asi, and Z.N. Oltvai. Lethality and centrality in protein networks. Nature, 411(6833):41–42, 2001. 130 [191] H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barab´asi. The large-scale organization of metabolic networks. Nature, 407(6804):651–654, 2000. 49 [192] B. Johnson and B. Shneiderman. Tree-maps: A space-filling approach to the visualization of hierarchical information structures. In Visualization, 1991. Visualization’91, Proceedings., IEEE Conference on, pages 284–291. IEEE, 1991. 23, 27, 28 [193] H. Jowhari and M. Ghodsi. New streaming algorithms for counting triangles in graphs. Computing and Combinatorics, pages 710–716, 2005. 161 [194] B.H. Junker, D. Kosch¨ utzki, and F. Schreiber. Exploration of biological network centralities with centibin. BMC bioinformatics, 7(1):219, 2006. 52 [195] Barbara Kaltschmidt and Christian Kaltschmidt. Nf-kappab in the nervous system. Cold Spring Harb Perspect Biol, 1(3):a001271, Sep 2009. 130 [196] T. Kamada and S. Kawai. An algorithm for drawing general undirected graphs. Information processing letters, 31(1):7–15, 1989. 25, 68 [197] D.R. Karger. Random sampling in cut, flow, and network design problems. In ACM symposium on Theory of computing, pages 648–657. ACM, 1994. 162 [198] H. Kargupta, R. Bhargava, K. Liu, M. Powers, P. Blair, S. Bushra, J. Dull, K. Sarkar, M. Klein, M. Vasa, et al. Vedas: A mobile and distributed data stream mining system for real-time vehicle monitoring. In Proceedings of SIAM International Conference on Data Mining, volume 334, 2004. 57 [199] B. Karrer, E. Levina, and M.E.J. Newman. Robustness of community structure in networks. Physical Review E, 77(4):046119, 2008. 54 [200] G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20(1):359–392, 1998. 54 [201] D.A. Keim et al. Information visualization and visual data mining. IEEE transactions on Visualization and Computer Graphics, 8(1):1–8, 2002. 3 [202] B.W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal, 49(2):291–307, 1970. 162 [203] W. Kienreich and C. Seifert. An application of edge bundling techniques to the visualization of media analysis results. In Information Visualisation (IV), 2010 14th International Conference, pages 375–380. IEEE, 2010. 85, 87, 88, 109, 155, 163, 165, 171, 172

BIBLIOGRAPHY [204] E. Kleiberg, H. Van De Wetering, and J.J. Van Wijk. Botanical visualization of huge hierarchies. In Proceedings of the IEEE Symposium on Information Visualization, page 87, 2001. 34 [205] T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, J. Honkela, V. Paatero, and A. Saarela. Self organization of a massive document collection. Neural Networks, IEEE Transactions on, 11(3):574–585, 2000. 55 [206] R. Kosara. Visualization criticism-the missing link between information visualization and art. In The 11th International Conference Information Visualization (IV’07), pages 631–636. IEEE, 2007. 2 [207] D. Kosch¨ utzki and F. Schreiber. Comparison of centralities for biological networks. In Proc German Conf Bioinformatics (GCB’04), volume 53, pages 199– 206. Citeseer. 50, 102 [208] V. Krishnamurthy, M. Faloutsos, M. Chrobak, L. Lao, LH. Cui, and AG. Percus. Reducing large internet topologies for faster simulations. In IFIP Networking, 2005. 48 [209] J.B. Kruskal. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29(1):1–27, 1964. 83 [210] Gautam Kumar and Michael Garland. Visual exploration of complex time-varying graphs. Visualization and Computer Graphics, IEEE Transactions on, 12(5):805– 812, 2006. 37 [211] H. Lam, E. Bertini, P. Isenberg, C. Plaisant, and S. Carpendale. Seven guiding scenarios for information visualization evaluation. Technical Report, UNIVERSITY OF CALGARY Calgary, 2011. 4, 11, 65, 67 [212] H. Lam, E. Bertini, P. Isenberg, C. Plaisant, and S. Carpendale. Empirical studies in information visualization: Seven scenarios. IEEE Transactions on Visualization and Computer Graphics, 18(9):1520–1536, 2012. 67 [213] A. Lambert, R. Bourqui, and D. Auber. 3D Edge Bundling for Geographical Data Visualization. In InfoVis, pages 329–335. IEEE, 2010. 86, 108 [214] A. Lambert, R. Bourqui, and D. Auber. Winding Roads: Routing edges into bundles. In Computer Graphics Forum, volume 29, pages 853–862. Wiley Online Library, 2010. 45, 85, 102, 108, 155, 163 [215] Y.Y. Lee, C.C. Lin, and H.C. Yen. Mental map preserving graph drawing using simulated annealing. In Proceedings of the 2006 Asia-Pacific Symposium on Information Visualisation, volume 60, pages 179–188. Australian Computer Society, Inc., 2006. 37, 39, 69 [216] T. Leighton and S. Rao. An approximate max-flow min-cut theorem for uniform multicommodity flow problems with applications to approximation algorithms. In Foundations of Computer Science, 1988., 29th Annual Symposium on, pages 422–431. IEEE, 1988. 54

BIBLIOGRAPHY [217] T. Leighton and S. Rao.

Multicommodity max-flow min-cut theorems and

their use in designing approximation algorithms. Journal of the ACM (JACM), 46(6):787–832, 1999. 54 [218] J. Leskovec and E. Horvitz. Planetary-scale views on a large instant-messaging network. In Proceedings of the 17th international conference on World Wide Web, WWW ’08, pages 915–924, New York, NY, USA, 2008. ACM. 8 [219] W. Li, P. Eades, and N. Nikolov.

Using spring algorithms to remove node

overlapping. In Proceedings of the 2005 Asia-Pacific symposium on Information visualisation-Volume 45, pages 131–140. Australian Computer Society, Inc., 2005. 25 [220] F. Liljeros, C. R. Edling, L. A. Amaral, H. E. Stanley, and Y. Aberg. The web of human sexual contacts. Nature, 411(6840):907–908, June 2001. 6, 49 [221] C.C. Lin, Y.Y. Lee, and H.C. Yen. Mental map preserving graph drawing using simulated annealing. Information Sciences, 2011. 37, 39, 69 [222] Hui Liu. Dynamic concept cartography for social networks. PhD thesis, 2011. 29 [223] N. Lloyd. Clutter measurement and reduction for enhanced information visualization. PhD thesis, Worcester Polytechnic Institute, 2005. 21 [224] J. Mackinlay. Automating the design of graphical presentations of relational information. ACM Transactions on Graphics (TOG), 5(2):110–141, 1986. 20 [225] K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate Analysis. Academic Press, London, 1980. 55 [226] D. Mashima, S. Kobourov, and Y. Hu. Visualizing dynamic data with maps. Visualization and Computer Graphics, IEEE Transactions on, 18(9):1424–1437, 2012. 29, 37, 91 [227] T. Matsuyama, X. Wu, T. Takai, and S. Nobuhara. Real-time 3d shape reconstruction, dynamic 3d mesh deformation, and high fidelity visualization for 3d video. Computer Vision and Image Understanding, 96(3):393–434, 2004. 64, 67 [228] A. McGregor. Finding graph matchings in data streams. Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques, pages 611–612, 2005. 161 [229] G. A Miller. WordNet: a lexical database for English. Communications of the ACM, 38(11):39–41, 1995. 6 [230] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. Science Signalling, 298(5594):824, 2002. 50 [231] A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, IMC ’07, pages 29–42, New York, NY, USA, 2007. ACM. 8

BIBLIOGRAPHY [232] K. Misue, P. Eades, W. Lai, and K. Sugiyama. Layout adjustment and the mental map. Journal of visual languages and computing, 6(2):183–210, 1995. 10, 25, 38, 69, 163 [233] J. Moody, D. McFarland, and S. Bender-deMoll. Dynamic network visualization1. American Journal of Sociology, 110(4):1206–1241, 2005. 39, 163 [234] T. Munzner. H3: Laying out large directed graphs in 3D hyperbolic space. In Information Visualization, 1997. Proceedings., IEEE Symposium on, pages 2–10. IEEE, 1997. 33 [235] S. Muthukrishnan. Data streams: algorithms and applications. In SODA, pages 413–413, 2003. 55, 57, 58 [236] P. Mutzel. The SPQR-tree data structure in graph drawing. Springer, 2003. 68 [237] E. Namey, G. Guest, L. Thairy, and L. Johnson. Data reduction techniques for large qualitative data sets. Handbook for team-based qualitative research, pages 137–162, 2007. 68 [238] A. A. Nanavati, S. Gurumurthy, G. Das, D. Chakraborty, K. Dasgupta, S. Mukherjea, and A. Joshi. On the structural properties of massive telecom call graphs: findings and implications. In Proceedings of the 15th ACM international conference on Information and knowledge management, CIKM ’06, pages 435–444, New York, NY, USA, 2006. ACM. 8 [239] F.J. Newbery. Edge concentration: A method for clustering directed graphs. In Proceedings of the 2nd International Workshop on Software configuration management, pages 76–85. ACM, 1989. xii, 44, 63, 68 [240] M.E.J. Newman. The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences, 98(2):404–409, 2001. 6, 49 [241] M.E.J. Newman. The structure and function of complex networks. SIAM review, 45(2):167–256, 2003. 49 [242] M.E.J. Newman. Fast algorithms for detecting community structure in networks. Physical Review E, 96(6):66133, 2004. 43 [243] M.E.J. Newman. A measure of betweenness centrality based on random walks. Social networks, 27(1):39–54, 2005. 50, 102 [244] M.E.J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical review E, 69(2):026113, 2004. 54 [245] Q. Nguyen, P. Eades, S.H. Hong, and W. Huang. Large crossing angles in circular layouts. In Graph Drawing, pages 397–399. Springer, 2010. 24 [246] Q. Nguyen, S.H. Hong, and P. Eades. TGI-EB: A new framework for edge bundling integrating topology, geometry and importance. In Graph Drawing, pages 123–135. Springer, 2011. 63, 85, 86, 87, 88, 109, 155, 163, 165, 166, 171, 172, 194

BIBLIOGRAPHY [247] Q.V. Nguyen and M.L. Huang. A space-optimized tree visualization. In Information Visualization, 2002. INFOVIS 2002. IEEE Symposium on, pages 85–92. IEEE, 2002. xi, 23, 24 [248] A. Noack. An energy model for visual graph clustering. In Graph Drawing, pages 425–436. Springer, 2004. 25 [249] D. Norman. The design of everyday things. Basic books, 2002. 19 [250] C. North. Toward measuring visualization insight. IEEE Computer Graphics and Applications, 26(3):6–9, 2006. 72 [251] S. North. Incremental layout in dynadag. In Graph Drawing, pages 409–418. Springer, 1996. 37, 38, 69 [252] L. O’Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani. Streamingdata algorithms for high-quality clustering. In IEEE ICDE, pages 685–694. IEEE, 2002. 154, 162 [253] A. Ochoa and L. Arco. Differential Betweenness in Complex Networks Clustering. Progress in Pattern Recognition, Image Analysis and Applications, pages 227–234, 2008. 53, 109, 130 [254] A. Papakostas and I.G. Tollis. Efficient orthogonal drawings of high degree graphs. Algorithmica, 26(1):100–125, 2000. 68 [255] J. Park, Y. Shin, K. Kim, and B.S. Chung. Searching social media streams on the web. IEEE Intelligent Systems, 25(6):24–31, 2010. 162 [256] F.N. Paulisch and W.F. Tichy. EDGE: An extendible graph editor. Software: Practice and Experience, 20(S1):S63–S88, 1990. 38, 69, 76 [257] W. Peng, M.O. Ward, and E.A. Rundensteiner. Clutter reduction in multidimensional data visualization using dimension reordering. In IEEE Symposium on Information Visualization (INFOVIS 2004), pages 89–96. IEEE, 2004. 90 [258] D. Phan, L. Xiao, R. Yeh, and P. Hanrahan. Flow map layout. In Information Visualization, 2005. INFOVIS 2005. IEEE Symposium on, pages 219–224. IEEE, 2005. 44, 102, 106 [259] S. Pinker. A theory of graph comprehension. Artificial intelligence and the future of testing, pages 73–126, 1990. 19 [260] K.J. Pulo. Structural Focus+ Context Navigation of Relational Data. PhD thesis, School of Information Technologies, Faculty of Science, University of Sydney, 2004. 30, 31 [261] S. Pupyrev, L. Nachmanson, and M. Kaufmann. Improving layered graph layouts with edge bundling. In Graph Drawing, pages 329–340. Springer, 2011. 46, 108 [262] H. Purchase, N. Andrienko, T. Jankun-Kelly, and M. Ward. Theoretical foundations of information visualization. InfoVis, pages 46–64, 2008. 4, 11, 65, 67, 81

BIBLIOGRAPHY [263] H.C. Purchase. Which aesthetic has the greatest effect on human understanding? In Proceedings of the 5th International Symposium on Graph Drawing, pages 248– 261. Springer-Verlag, 1997. 11, 40, 62, 68 [264] H.C. Purchase. Metrics for graph drawing aesthetics. Journal of Visual Languages & Computing, 13(5):501–516, 2002. 68 [265] H.C. Purchase, R.F. Cohen, and M. James. Validating graph drawing aesthetics. In Proceedings of the 5th International Symposium on Graph Drawing, volume 1027, page 435. Springer, 1996. 11, 40, 62, 68 [266] H. Qu, H. Zhou, and Y. Wu. Controllable and progressive edge clustering for large networks. In Graph Drawing, pages 399–404. Springer, 2007. 108 [267] A. Quigley and P. Eades. Fade: Graph drawing, clustering, and visual abstraction. In Graph Drawing, pages 197–210, London, UK, 2001. Springer-Verlag. 25, 83, 158, 178 [268] F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. Defining and identifying communities in networks. Proceedings of the National Academy of Sciences of the United States of America, 101(9):2658–2663, 2004. 54 [269] D. Rafiei and S. Curial. Effectively visualizing large networks through sampling. In Visualization, 2005. VIS 2005. IEEE Symposium on, pages 48–48. IEEE, 2005. 48 [270] M.J. Rattigan, M. Maier, and D. Jensen. Graph clustering with network structure indices. In ICML, pages 783–790. ACM, 2007. 153, 162 [271] E.M. Reingold and J.S. Tilford. Tidier drawings of trees. Software Engineering, IEEE Transactions on, (2):223–228, 1981. 23, 31, 67, 68 [272] J. Rekimoto and M. Green. The information cube: Using transparency in 3D information visualization. In Proceedings of the Third Annual Workshop on Information Technologies & Systems (WITS93), pages 125–132, 1993. 33, 34 [273] G.G. Robertson, J.D. Mackinlay, and S.K. Card. Cone trees: animated 3D visualizations of hierarchical information. In Proceedings of the SIGCHI conference on Human factors in computing systems: Reaching through technology, pages 189–194. ACM, 1991. 23, 31, 34 [274] A.H. Robinson. The thematic maps of charles joseph minard. 1967. 3 [275] James Rumbaugh, Ivar Jacobson, and Grady Booch. Unified Modeling Language Reference Manual, The (2nd Edition). Pearson Higher Education, 2004. 6 [276] M. Ruta, S. Colucci, F. Scioscia, E. Di Sciascio, and F.M. Donini. Finding commonalities in RFID semantic streams. Procedia Computer Science, 5:857– 864, 2011. 162 [277] A. Sallaberry, C. Muelder, and K. L. Ma. Clustering, visualizing, and navigating for large dynamic graphs. In Graph Drawing, pages 487–498. Springer, 2012. 37

BIBLIOGRAPHY [278] G. Sander. Layout of compound directed graphs. Technical report, University of the Saarlands, 1996. 31 [279] S.E. Schaeffer. Graph clustering. Computer Science Review, 1(1):27–64, 2007. 54 [280] D. Selassie, B. Heller, and J. Heer. Divided edge bundling for directional network data. IEEE Transactions on Visualization and Computer Graphics, 17(12):2354– 2363, 2011. 85, 87, 88, 109, 155, 163, 165, 170, 171, 172 [281] P.G. Selinger, M.M. Astrahan, D.D. Chamberlin, R.A. Lorie, and T.G. Price. Access path selection in a relational database management system. In ACM SIGMOD international conference on Management of data, pages 23–34. ACM, 1979. 161 [282] J. Shi and J. Malik. Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(8):888–905, 2000. 54 [283] Lei Shi, Nan Cao, Shixia Liu, Weihong Qian, Li Tan, Guodong Wang, Jimeng Sun, and Ching-Yung Lin. HiMap: Adaptive visualization of large-scale online social networks. In Visualization Symposium, 2009. PacificVis’ 09. IEEE Pacific, pages 41–48. IEEE, 2009. 37 [284] Lei Shi, Chen Wang, and Zhen Wen. Dynamic network visualization in 1.5D. In Pacific Visualization Symposium (PacificVis), 2011 IEEE, pages 179–186, 2011. 164 [285] B. Shneiderman. Tree visualization with Tree-maps: 2-d space-filling approach. ACM Transactions on graphics (TOG), 11(1):92–99, 1992. 23, 27 [286] B. Shneiderman. The eyes have it: A task by data type taxonomy for information visualizations. In Visual Languages, 1996. Proceedings., IEEE Symposium on, pages 336–343. IEEE, 1996. 9, 10 [287] B. Shneiderman and A. Aris. Network visualization by semantic substrates. Visualization and Computer Graphics, IEEE Transactions on, 12(5):733–740, 2006. 36 [288] M. Sigman and G.A. Cecchi. Global organization of the wordnet lexicon. Proceedings of the National Academy of Sciences, 99(3):1742–1747, 2002. 6, 49 [289] J. M. Six and I. G. Tollis. Circular drawings of biconnected graphs. In ALENEX ’99: Selected papers from the International Workshop on Algorithm Engineering and Experimentation, pages 57–73, London, UK, 1999. Springer-Verlag. 24 [290] S.H. Strogatz. Exploring complex networks. Nature, 410(6825):268–276, 2001. 49 [291] R. Strzodka and A. Telea. Generalized distance transforms and skeletons in graphics hardware. In VisSym, pages 221–230, 2004. 150 [292] T. Sugibuchi, N. Spyratos, and E. Siminenko. A framework to analyze information visualization based on the functional data model. In Proceedings of the 2009 13th International Conference Information Visualisation, IV ’09, pages 18–24, Washington, DC, USA, 2009. IEEE Computer Society. 67

BIBLIOGRAPHY [293] K. Sugiyama and K. Misue. Visualization of structural information: Automatic drawing of compound digraphs. Systems, Man and Cybernetics, IEEE Transactions on, 21(4):876–892, 1991. 31 [294] K. Sugiyama and K. Misue. Graph drawing by the magnetic spring model. Journal of Visual Languages and Computing, 6(3):217–231, 1995. 117 [295] J. Sun, C. Faloutsos, S. Papadimitriou, and P.S. Yu. Graphscope: parameter-free mining of large time-evolving graphs. In SIG KDD, pages 687–696. ACM, 2007. 154, 163 [296] R. Tamassia. crete

Handbook of Graph Drawing And Visualization.

Mathematics

And

Its

Applications.

Online

draft

available

Disat

http://www.cs.brown.edu/~rt/gdhandbook/. Taylor & Francis, 2013. 11, 22, 62, 75 [297] R. Tamassia, G. Di Battista, and C. Batini. Automatic graph drawing and readability of diagrams. IEEE Transactions on Systems, Man and Cybernetics, 18(1):61–79, 1988. 68 [298] Lei Tang, Huan Liu, Jianping Zhang, and Zohreh Nazeri. Community evolution in dynamic multi-mode networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’08, pages 677–685, New York, NY, USA, 2008. ACM. 6, 8 [299] N. Tatbul, U. C ¸ etintemel, S. Zdonik, M. Cherniack, and M. Stonebraker. Load shedding in a data stream manager. In Proceedings of the 29th international conference on Very large data bases-Volume 29, pages 309–320. VLDB Endowment, 2003. 57, 162 [300] M. Taylor and P. Rodgers. Applying graphical design techniques to graph visualisation. In Proceedings of the ninth International Conference on Information Visualisation, pages 651–656. IEEE, 2005. 68 [301] A. Telea and O. Ersoy. Image-Based Edge Bundles: Simplified Visualization of Large Graphs. Computer Graphics Forum, 29(3):843–852, 2010. 46, 85, 86, 108 [302] J.B. Tenenbaum, V. De Silva, and J.C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000. 55 [303] I. G. Tollis, G. Di Battista, P. Eades, and R. Tamassia. Graph Drawing: Algorithms for the Visualization of Graphs. Prentice Hall, July 1998. 4, 7, 9, 11, 62, 67 [304] L.A. Treinish. Task-specific visualization design: a case study in operational weather forecasting. In Proceedings Visualization’98, pages 405–409. IEEE, 1998. 21, 73 [305] E.R. Tufte. Envisioning information. Optometry & Vision Science, 68(4):322– 324, 1991. 20, 21

BIBLIOGRAPHY [306] E.R. Tufte. Visual explanations: images and quantities, evidence and narrative. Graphics Press Cheshire, CT, 1997. 2 [307] E.R. Tufte. The visual display of quantitative information. Visual Explanations, pages 194–95, 2001. 2, 20, 36, 82 [308] F. Van Ham. Using multilevel call matrices in large software projects. In Information Visualization, 2003. INFOVIS 2003. IEEE Symposium on, pages 227–232. IEEE, 2003. 4 [309] F. van Ham and J.J. van Wijk. Interactive visualization of small world graphs. In Information Visualization, 2004. IEEE Symposium on, pages 199–206. IEEE, 2004. xii, 42 [310] F. Van Ham and M. Wattenberg. Centrality based visualization of small world graphs. Computer Graphics Forum, 27(3):975–982, 2008. 53 [311] R. Van Liere and W. De Leeuw. Graphsplatting: Visualizing graphs as continuous fields. Visualization and Computer Graphics, IEEE Transactions on, 9(2):206– 212, 2003. xi, 4, 5 [312] J.J. Van Wijk. The value of visualization. In VIS ’05, pages 79–86. IEEE, 2005. 4, 11, 65, 67, 70, 72, 73 [313] J.J. Van Wijk and H. Van De Wetering. Cushion treemaps: Visualization of hierarchical information. In Information Visualization, 1999.(Info Vis’ 99) Proceedings. 1999 IEEE Symposium on, pages 73–78. IEEE, 1999. 27, 28 [314] Dillion R. W. and Goldstein M. Multivariate Analysis. Wiley, London, 1984. 55 [315] S. Wachi, K. Yoneda, and R. Wu. Interactome-transcriptome analysis reveals the high centrality of genes differentially expressed in lung cancer tissues. Bioinformatics, 21(23):4205, 2005. 53, 110 [316] C. Walshaw. A multilevel algorithm for force-directed graph drawing. In Graph Drawing, pages 31–55. Springer, 2001. 25, 68 [317] C. Ware. Information visualization. Morgan Kaufmann, 2nd edition, 2004. 23 [318] S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1 edition, November 1994. 6, 21, 49, 50, 53, 73, 102, 105, 109, 110, 113, 116 [319] D. J. Watts and S. H. Strogatz. Collective dynamics of’small-world’networks. Nature, 393(6684):409–10, 1998. 49 [320] D.J. Watts. Small worlds: the dynamics of networks between order and randomness. Princeton university press, 2003. 49, 51 [321] Stephen Wehrend and Clayton Lewis. A problem-oriented classification of visualization techniques. In VIS ’90, pages 139–143. IEEE, 1990. 21, 73 [322] J.C. Whisstock and A.M. Lesk. Prediction of protein function from protein sequence and structure. Quarterly reviews of biophysics, 36(03):307–340, 2003. 53, 110

BIBLIOGRAPHY [323] L. Wilkinson. The grammar of graphics. Springer, 2005. 20 [324] R.J. Williams, N.D. Martinez, et al. Simple rules yield complex food webs. Nature, 404(6774):180–183, 2000. 49 [325] J.A. Wise. The ecological approach to text visualization. Journal of the American Society for Information Science, 50(13):1224–1233, 1999. xi, 3 [326] J.A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow. Visualizing the non-visual: Spatial analysis and interaction with information from text documents. In Information Visualization, 1995. Proceedings., pages 51–58. IEEE, 1995. 3 [327] N. Wong and S. Carpendale. Interactive poster: Using edge plucking for interactive graph exploration. In Poster in the IEEE Symposium on Information Visualization, 2005. 48, 95 [328] N. Wong, S. Carpendale, and S. Greenberg. Edgelens: An interactive method for managing edge congestion in graphs. In InfoVis, pages 51–58. IEEE, 2003. xii, 47, 48, 95, 151, 192, 198 [329] P. C. Wong, H. Foote, G. Chin, P. Mackey, and K. Perrine. Graph signatures for visual analytics. IEEE Transactions on Visualization and Computer Graphics, 12(6):1399–1413, 2006. 36 [330] P. C. Wong, P. Mackey, K. Perrine, J. Eagan, H. Foote, and J. Thomas. Dynamic visualization of graphs with extended labels. In Information Visualization, 2005. INFOVIS 2005. IEEE Symposium on, pages 73–80. IEEE, 2005. 37 [331] S. Wuchty. Scale-free behavior in protein domain networks. Molecular biology and evolution, 18(9):1694–1702, 2001. 6, 49 [332] S. Wuchty and E. Almaas. Peeling the yeast protein network. Proteomics, 5(2):444–449, 2005. 53, 110 [333] S. Wuchty and P.F. Stadler. Centers of complex networks. Journal of theoretical biology, 223(1):45–53, 2003. 130 [334] E. Yeger-Lotem, S. Sattath, N. Kashtan, S. Itzkovitz, R. Milo, R.Y. Pinter, U. Alon, and H. Margalit. Network motifs in integrated cellular networks of transcription–regulation and protein–protein interaction. Proceedings of the National Academy of Sciences of the United States of America, 101(16):5934–5939, 2004. 50 [335] J.S. Yi, Y. ah Kang, J.T. Stasko, and J.A. Jacko. Toward a deeper understanding of the role of interaction in information visualization. Visualization and Computer Graphics, IEEE Transactions on, 13(6):1224–1231, 2007. 36 [336] J. Yoon, A. Blumer, and K. Lee. An algorithm for modularity analysis of directed and weighted biological networks based on edge-betweenness centrality. Bioinformatics, 22(24):3106, 2006. 53, 109, 130

BIBLIOGRAPHY [337] Z. Zeng, J. Wang, L. Zhou, and G. Karypis. Out-of-core coherent closed quasiclique mining from large dense graph databases. ACM Transactions on Database Systems (TODS), 32(2):13, 2007. 162 [338] H. Zhou, X. Yuan, W. Cui, H. Qu, and B. Chen. Energy-based hierarchical edge clustering of graphs. In IEEE PacificVis, pages 55–61. IEEE, 2008. 45, 63, 85, 106, 163

Suggest Documents