examination of the network behaviours, such as routing efficiency, ..... it remains an open question as how representative the topologies they generate are [21].
Parameterising and Modelling the Internet Topology
Shi Zhou
Department of Electronic Engineering Queen Mary, University of London
A thesis submitted to the University of London for the degree of Doctor of Philosophy. July 2004
To my family.
2
Abstract Simulation plays a vital role in studying the complex behaviour of both existing telecommunications networks and proposed future architecture. When modelling the behaviour of the Internet it is crucial to obtain a good description of its structure, because structure fundamentally affects function. The aim of this work is to provide quantitative parameters to fully characterise network structures and propose realistic models which can accurately reproduce the Internet topology at the autonomous systems (AS) level. This thesis introduces the novel concept of rich-club phenomenon to describe the Internet hierarchical structure, where a small number of highly connected nodes are tightly interconnected with each other. This structure is quantitatively characterised by the rich-club connectivity and the node-node link distribution. The metric of the rich-club connectivity is a milestone on parameterising the Internet topology. Using this unique metric, the author reports that the existing degree-based models do not match the Internet hierarchical structure. The author shows that an appreciation of the rich-club connectivity is essential for a proper examination of the network behaviours, such as routing efficiency, redundancy and robustness. The author also uses this metric to reveal the major topological disparities between the Internet measurements obtained using different methodologies. The author introduces an original Interactive Growth (IG) model, which closely resembles both the power-law degree distribution and the rich-club connectivity of the AS-level Internet. Based on observations on the Internet history data, 3
the author improves the IG model and introduce the Positive-Feedback Preference (PFP) model, which is doubtlessly the most complete and detailed model to date. The PFP model accurately reproduces all the relevant topological properties of the Internet, including degree distribution, rich-club connectivity, the maximum degree, shortest path length, short cycles, disassortative mixing and betweenness centrality. The PFP model’s non-linear preference mechanism provides a novel insight into the basic dynamics that could be responsible for the evolving topology of complex networks. This successful research has provided a number of promising contributions. These achievements represent a profound extension of the state-of-the-art knowledge in the area of parameterising and modelling the Internet topology.
4
Acknowledgements The author would like to express his deepest gratitude to the many people who have kindly supported and assisted his work, including Dr. Chris Phillips and Dr. Matthew Woolf, specially to his supervisor, Dr. Ra´ ul J. Mondrag´on, for his great help and guidance through every step of the author’s research. Thanks also to Dr. Andre Broido (CAIDA) for the inspiring discussions. The author thanks the hospitality and support of Department of Electronic Engineering, Queen Mary, University of London. This work was funded by the U.K. Engineering and Physical Sciences Research Council (EPSRC) under Grant No. GR-R30136-01.
5
Contents Abstract
3
Acknowledgements
5
Contents
6
List of Figures
11
List of Tables
15
1 Introduction
16
1.1
Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
1.2
Contributions of this thesis . . . . . . . . . . . . . . . . . . . . . .
17
1.2.1
Parameterising The Internet Topology . . . . . . . . . . .
17
1.2.2
Modelling The Internet Topology . . . . . . . . . . . . . .
18
Structure of this thesis . . . . . . . . . . . . . . . . . . . . . . . .
19
1.3
2 Preliminaries
20
2.1
Internet Topology . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
2.2
Topological Properties . . . . . . . . . . . . . . . . . . . . . . . .
22
2.2.1
Network Size . . . . . . . . . . . . . . . . . . . . . . . . .
22
2.2.2
Degree . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
2.2.3
Degree Distribution . . . . . . . . . . . . . . . . . . . . . .
23
2.2.3.1
23
Poisson Degree Distribution . . . . . . . . . . . .
6
2.2.3.2
Power-Law Degree Distribution . . . . . . . . . .
24
2.2.4
Shortest Path Length . . . . . . . . . . . . . . . . . . . . .
25
2.2.5
Node Betweenness Centrality . . . . . . . . . . . . . . . .
26
2.2.6
Clustering Coefficient . . . . . . . . . . . . . . . . . . . . .
27
2.2.7
Disassortative Mixing (Degree Correlations) . . . . . . . .
27
2.3
Random Networks . . . . . . . . . . . . . . . . . . . . . . . . . .
28
2.4
Small-World Networks . . . . . . . . . . . . . . . . . . . . . . . .
30
2.5
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
3 Measurements and Models Of The AS-Level Internet
33
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.2
Topology Measurements Of The AS-Level Internet . . . . . . . . .
33
3.2.1
Passive Measurement - BGP AS Graph . . . . . . . . . . .
34
3.2.2
Extended BGP AS Graph . . . . . . . . . . . . . . . . . .
34
3.2.3
Active Measurement - Traceroute AS Graph . . . . . . . .
35
3.2.4
Discovery Of The Internet Power-Law Degree Distribution
36
3.2.5
Which AS Graph? . . . . . . . . . . . . . . . . . . . . . .
36
Topology Models Of The AS-Level Internet . . . . . . . . . . . . .
37
3.3.1
Tiers Model . . . . . . . . . . . . . . . . . . . . . . . . . .
37
3.3.2
GT-ITM Model . . . . . . . . . . . . . . . . . . . . . . . .
38
3.3.3
User-Provider Model . . . . . . . . . . . . . . . . . . . . .
38
3.3.4
Inet Model . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3.3.5
Barab´asi and Albert Model . . . . . . . . . . . . . . . . .
39
3.3.6
Fitness BA Model . . . . . . . . . . . . . . . . . . . . . . .
41
3.3.7
Generalised BA Model . . . . . . . . . . . . . . . . . . . .
42
3.3.8
BRITE Model . . . . . . . . . . . . . . . . . . . . . . . . .
43
3.3.9
Dorogovtsev-Mendes Model . . . . . . . . . . . . . . . . .
43
3.3.10 Generalised Linear Preference Model . . . . . . . . . . . .
44
3.3.11 Generalised Network Growth Model . . . . . . . . . . . . .
45
3.3
7
3.4
3.5
3.3.12 Highly Optimised Tolerance Model . . . . . . . . . . . . .
46
Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
3.4.1
Structure-Based Models vs Degree-Based Models . . . . .
47
3.4.2
Accuracy vs Simplicity . . . . . . . . . . . . . . . . . . . .
47
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
4 Rich–Club Phenomenon 4.1
4.2
4.3
4.4
50
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
4.1.1
Internet Hierarchical Structure
. . . . . . . . . . . . . . .
50
4.1.2
Connectivity Of The Core . . . . . . . . . . . . . . . . . .
52
4.1.3
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
Rich-Club Phenomenon . . . . . . . . . . . . . . . . . . . . . . . .
53
4.2.1
Rich-Club Connectivity
. . . . . . . . . . . . . . . . . . .
56
4.2.2
Node-Node Link Distribution . . . . . . . . . . . . . . . .
57
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
4.3.1
Rich-Club Subgraph . . . . . . . . . . . . . . . . . . . . .
59
4.3.2
Rich-Club Phenomenon Is Relevant . . . . . . . . . . . . .
59
4.3.3
Modelling The Rich-Club
. . . . . . . . . . . . . . . . . .
60
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
5 Interactive Growth Model
62
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
5.2
Interactive Growth Model . . . . . . . . . . . . . . . . . . . . . .
63
5.3
Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
5.3.1
Degree Distribution . . . . . . . . . . . . . . . . . . . . . .
65
5.3.1.1
Degree Distribution . . . . . . . . . . . . . . . .
65
5.3.1.2
Degree vs Rank . . . . . . . . . . . . . . . . . . .
66
Rich-club Phenomenon . . . . . . . . . . . . . . . . . . . .
67
5.3.2.1
Rich-Club Connectivity . . . . . . . . . . . . . .
67
5.3.2.2
Node-Node Link Distribution . . . . . . . . . . .
68
5.3.2
8
5.4
5.5
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
5.4.1
Maximum Degree . . . . . . . . . . . . . . . . . . . . . . .
70
5.4.2
Rich-Club Connectivity
. . . . . . . . . . . . . . . . . . .
72
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
6 Structure Affects Functions
74
6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
6.2
Routing Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . .
75
6.3
Network Redundancy . . . . . . . . . . . . . . . . . . . . . . . . .
76
6.4
Network Robustness . . . . . . . . . . . . . . . . . . . . . . . . .
78
6.4.1
Node Error . . . . . . . . . . . . . . . . . . . . . . . . . .
80
6.4.2
Node Attack . . . . . . . . . . . . . . . . . . . . . . . . . .
80
6.4.3
Link Error . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
6.4.4
Link Attack . . . . . . . . . . . . . . . . . . . . . . . . . .
82
6.5
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
6.6
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
7 Topological Disparities Between Internet Measurements
85
7.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
7.2
Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
7.2.1
Degree Distribution . . . . . . . . . . . . . . . . . . . . . .
87
7.2.2
Rich-Club Connectivity
. . . . . . . . . . . . . . . . . . .
89
7.2.3
Shortest Path Length . . . . . . . . . . . . . . . . . . . . .
91
7.2.4
Short Cycles . . . . . . . . . . . . . . . . . . . . . . . . . .
93
7.2.5
Disassortative Mixing . . . . . . . . . . . . . . . . . . . . .
96
7.2.6
Betweenness Centrality . . . . . . . . . . . . . . . . . . . .
96
7.3
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
7.4
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
99
9
8 The Positive-Feedback Preference Model
100
8.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8.2
Modelling The Maximum Degree . . . . . . . . . . . . . . . . . . 101
8.3
The Positive-Feedback Preference Model . . . . . . . . . . . . . . 102
8.4
Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 8.4.1
Degree Distribution, Rich-Club Connectivity and Maximum Degree . . . . . . . . . . . . . . . . . . . . 105
8.5
8.6
8.4.2
Short Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.4.3
Disassortative Mixing . . . . . . . . . . . . . . . . . . . . . 109
8.4.4
Shortest Path Length . . . . . . . . . . . . . . . . . . . . . 110
8.4.5
Betweenness Centrality . . . . . . . . . . . . . . . . . . . . 111
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 8.5.1
The Positive-Feedback Preferential Attachment . . . . . . 112
8.5.2
Critical Assessment of The PFP Model . . . . . . . . . . . 112
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
9 Discussion and Conclusion
115
9.1
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
9.2
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
9.3
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Appendix I. Queen Mary Topology Simulator
121
Appendix II. Author’s Publications
127
Glossary
129
Bibliography
131
10
List of Figures 2.1
Structure of the Internet . . . . . . . . . . . . . . . . . . . . . . .
21
2.2
The motorway network of the USA. . . . . . . . . . . . . . . . . .
24
2.3
Poisson degree distribution. . . . . . . . . . . . . . . . . . . . . .
24
2.4
The air traffic route network of the USA. . . . . . . . . . . . . . .
25
2.5
Power-law degree distribution . . . . . . . . . . . . . . . . . . . .
25
2.6
Three Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
2.7
Small-world properties . . . . . . . . . . . . . . . . . . . . . . . .
32
3.1
An map of the AS-level Internet. . . . . . . . . . . . . . . . . . .
35
3.2
Degree Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
3.3
Growth of the BA model. . . . . . . . . . . . . . . . . . . . . . .
40
3.4
The growth of the GLP model. . . . . . . . . . . . . . . . . . . .
44
4.1
Two network structures. . . . . . . . . . . . . . . . . . . . . . . .
53
4.2
Cumulative distribution of degree. For each model, ten networks are generated and averaged. . . . . . . . . . . . . . . . . . . . . .
55
4.3
Rich-club connectivity . . . . . . . . . . . . . . . . . . . . . . . .
56
4.4
Node-node link distribution . . . . . . . . . . . . . . . . . . . . .
58
4.5
Degree distribution inside the rich-club subgraph . . . . . . . . .
59
5.1
The interactive growth mechanism of the IG model. . . . . . . . .
63
5.2
Degree distribution. For each model, ten networks are generated
5.3
and averaged. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
Degree vs rank . . . . . . . . . . . . . . . . . . . . . . . . . . . .
66
11
5.4
Rich-club connectivity . . . . . . . . . . . . . . . . . . . . . . . .
67
5.5
Node-node link distribution . . . . . . . . . . . . . . . . . . . . .
68
5.6
Node-node link distribution . . . . . . . . . . . . . . . . . . . . .
69
5.7
A network generated by the IG model. . . . . . . . . . . . . . . .
70
5.8
Time-evolution of node degree . . . . . . . . . . . . . . . . . . . .
71
6.1
Cumulative distribution of degree. . . . . . . . . . . . . . . . . . .
75
6.2
Rich-club connectivity. . . . . . . . . . . . . . . . . . . . . . . . .
75
6.3
Cumulative distribution of shortest path length . . . . . . . . . .
76
6.4
Distribution of triangle coefficient . . . . . . . . . . . . . . . . . .
77
6.5
Cumulative distribution of triangle coefficient . . . . . . . . . . .
77
6.6
Distribution of quadrangle coefficient . . . . . . . . . . . . . . . .
78
6.7
Cumulative distribution of quadrangle coefficient . . . . . . . . . .
78
6.8
Node attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
6.9
Link attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
6.10 Network robustness under node error. . . . . . . . . . . . . . . . .
80
6.11 Network robustness under node attack. . . . . . . . . . . . . . . .
81
6.12 Network robustness under link error. . . . . . . . . . . . . . . . .
81
6.13 Network robustness under link attack.
. . . . . . . . . . . . . . .
82
6.14 A conical structure model. . . . . . . . . . . . . . . . . . . . . . .
83
7.1
Cumulative degree distribution. . . . . . . . . . . . . . . . . . . .
87
7.2
Degree distribution. . . . . . . . . . . . . . . . . . . . . . . . . . .
88
7.3
Degree vs rank. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
88
7.4
Rich-club connectivity φ(k) as a function of degree. . . . . . . . .
89
7.5
Rich-club connectivity φ(r/N ) as a function of normalised rank. .
90
7.6
Cumulative distribution of shortest path length. . . . . . . . . . .
91
7.7
Correlation between shortest path length and degree. . . . . . . .
91
7.8
Cumulative distribution of clustering coefficient . . . . . . . . . .
93
7.9
Correlation between clustering coefficient and degree. . . . . . . .
93
12
7.10 Cumulative distribution of triangle coefficient . . . . . . . . . . .
94
7.11 Correlation between triangle coefficient and degree. . . . . . . . .
94
7.12 Cumulative distribution of quadrangle coefficient . . . . . . . . . .
95
7.13 Correlation between quadrangle coefficient and degree. . . . . . .
95
7.14 Correlation between nearest-neighbours average degree and degree
96
7.15 Cumulative distribution of betweenness. . . . . . . . . . . . . . .
97
7.16 Correlation between betweenness and degree . . . . . . . . . . . .
97
7.17 The three AS graph measurements. . . . . . . . . . . . . . . . . .
98
8.1
Degree vs rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.2
Rich-club connectivity . . . . . . . . . . . . . . . . . . . . . . . . 102
8.3
Three degree functions . . . . . . . . . . . . . . . . . . . . . . . . 104
8.4
Degree growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
8.5
Degree distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.6
Cumulative degree distribution . . . . . . . . . . . . . . . . . . . 106
8.7
Cumulative distribution of triangle coefficient. . . . . . . . . . . . 107
8.8
Cumulative distribution of quadrangle coefficient. . . . . . . . . . 107
8.9
Correlation between triangle coefficient and degree . . . . . . . . . 108
8.10 Correlation between quadrangle coefficient and degree . . . . . . . 108 8.11 Cumulative distribution of nearest-neighbours average degree. . . 109 8.12 Correlations between nearest-neighbours average degree and degree 109 8.13 Cumulative distribution of shortest path length. . . . . . . . . . . 110 8.14 Correlation between shortest path length and degree . . . . . . . 110 8.15 Cumulative distribution of betweenness centrality . . . . . . . . . 111 8.16 Correlations between betweenness centrality and degree . . . . . . 112 8.17 Network properties of a growing PFP model . . . . . . . . . . . . 113 10.1 Function flowchart of the QMUL Topology Simulator. . . . . . . . 122 10.2 Window of “Parameters for generating networks”. . . . . . . . . . 123 10.3 Window of the main interface. . . . . . . . . . . . . . . . . . . . . 124 13
10.4 Window of “Save plot data files”. . . . . . . . . . . . . . . . . . . 125
14
List of Tables 4.1
Distribution of ASes in the Internet hierarchy [78] . . . . . . . . .
51
4.2
Networks parameters . . . . . . . . . . . . . . . . . . . . . . . . .
55
4.3
Rich-club properties . . . . . . . . . . . . . . . . . . . . . . . . .
57
5.1
Network properties . . . . . . . . . . . . . . . . . . . . . . . . . .
65
5.2
Node-node link distribution . . . . . . . . . . . . . . . . . . . . .
68
6.1
Network Parameters . . . . . . . . . . . . . . . . . . . . . . . . .
75
6.2
Network Short Cycles . . . . . . . . . . . . . . . . . . . . . . . . .
77
7.1
Parameters of the three AS graphs . . . . . . . . . . . . . . . . .
87
7.2
Rich-club connectivity as a function of degree . . . . . . . . . . .
89
7.3
Rich-club connectivity as a function of normalised rank . . . . . .
90
7.4
Parameters of the three AS graphs (continued) . . . . . . . . . . .
92
8.1
Network Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 105
15
Chapter 1 Introduction Recently there have been considerable efforts to understand the topology of complex systems [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]. Of particular interest is the Internet as it is so influential in our daily life.
1.1
Challenges
Effective engineering of the Internet is predicated on a detailed understanding of issues such as the large-scale structure of its underlying physical topology, the manner in which it evolves over time, and the way in which its constituent components contribute to its overall function [17]. In the last three decades, the Internet has experienced fascinating evolution, both exponential growth in its traffic and endless expansion in its topology [18]. This emphasises the necessity of the research on more thorough and rigourous analysis of the nature of Internet topology. Unfortunately, developing a deep understanding of these issues has proven to be a challenging task [19, 20, 21, 22, 23], since it in turn involves solving difficult problems such as mapping the actual topology [24], characterising it, and developing models that capture its emergent behaviour. Reliable measurements of the Internet topology became available only re-
16
cently [25, 26, 27, 28, 29, 30, 31]. Based on measurement data, Faloutsos et al reported in 1999 that the Internet has a power-law degree distribution [32]. This discovery invalidated all previous research results on modelling the Internet topology, because they were based on the random network theories [33, 34, 35]. Even though the networking community and physicists have since then proposed a number of Internet topology models [36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47], it remains an open question as how representative the topologies they generate are [21].
1.2
Contributions of this thesis
The aim of this work is to provide quantitative parameters to fully characterise the network structure and propose realistic models to accurately reproduce the Internet topology at the autonomous systems (AS) level. The author’s own research contributions are presented in chapters 4, 5, 6, 7 and 8. All the research are based on the actual measurements of the Internet.
1.2.1
Parameterising The Internet Topology
Chapter 4 introduces the novel concept of rich-club phenomenon to describe the hierarchical structure of the AS-level Internet, i.e. highly connected nodes not only have large numbers of links but also are tightly interconnected with each other. Two metrics are defined to quantitatively characterise this structural property, which are the rich-club connectivity and the node-node link distribution. The calculation of the two parameters are rather simple and based only on the network connectivity information. Using the two parameters, the author shows that degree-based models may not reproduce the Internet hierarchical structure. The metric of the rich-club connectivity is a milestone on parameterising the Internet topology and it provides a new criterion for network models. Inspired by the rich-club phenomenon, Chapter 5 proposes an original Interac17
tive Growth (IG) model, which adopts a so-called interactive growth mechanism that has been observed on the Internet history data. The IG model closely resembles both the power-law degree distribution and the rich-club connectivity of the AS-level Internet. Using the IG model as an example of networks containing a rich-club, Chapter 6 shows it is relevant to reproduce the Internet’s rich-club structure because an Internet model that does not contain a rich-club underestimates the actual network’s routing efficiency (shortest path length) and routing flexibility (alternative reachable paths), and overestimate the network robustness under node-attack. This result highlights the importance of studying the Internet topological structure because structure fundamentally affects function. Chapter 7 provides a novel comparison of different Internet data sources obtained by using distinct measuring methodologies. Results show that the measurements contain non-trivial topological differences. The major structural discrepancy is revealed by the rich-club connectivity.
1.2.2
Modelling The Internet Topology
Using the IG model as a precursor, Chapter 8 introduces the Positive-Feedback Preference (PFP) model. The PFP model is superior to any other currently known Internet topology generator. The PFP model accurately reproduces all the relevant topological properties of the AS-level Internet, including degree distribution, rich-club connectivity, the maximum degree, shortest path length, short cycles, disassortative mixing and betweenness centrality. Moreover, the two growth mechanisms of the PFP model, namely the appearance of new internal links and the positive-feedback preference, are based on (and supported by) the observations on the Internet history data. The model’s unique non-linear preference provides a novel insight into the basic dynamics that could be responsible for the evolving topology of complex networks. The PFP model is a significant achievement on
18
modelling the Internet topology. In summary, the author’s successful research has provided a number of promising contributions. The two main achievements are the metric of rich-club connectivity and the PFP model. These novel contributions represent a significant extension of the state-of-the-art knowledge in the area of parameterising and modelling the Internet topology.
1.3
Structure of this thesis
Chapter 2 defines a number of topological properties that are used in the network research, including network size, degree, rank, degree distribution, shortest path length, node betweenness centrality, clustering coefficient and disassortative mixing (degree correlation). Chapter 2 also introduces two classical network theories used before the discovery of the power-law degree distribution of the Internet topology. Chapter 3 provides the up-to-date background of this research. It introduces data sources of the Internet topology and their measuring methodologies. Chapter 3 describes a number of existing topology models that have been used for generating Internet-like graphs and then discusses the problems of the models and points out the objectives of the research. Chapters 4, 5, 6, 7 and 8 present the author’s research contributions on parameterising and modelling the Internet topology. Chapter 9 reviews the methodology used in this research and provides possible directions for the future work. Appendix I provides a brief introduction on the self-developed software kit of QMUL Topology Simulator, which is used to conduct simulations and obtain numerical results. Appendix II lists the author’s publications. Most materials present in this thesis have been published in peer-reviewed journals and conferences.
19
Chapter 2 Preliminaries This Chapter introduces the Internet topology and a number of topological properties that have been widely used by the network research community. This Chapter also introduces the two important classes of networks, namely random networks and small-world networks, which had been used in studying and modelling the Internet topology until the practical measurements of the Internet became available.
2.1
Internet Topology
In general terms, the Internet is a global net of computers, which are interconnected by wires (links) [8]. This network provides electronic transmission of information between computers. The connections in the Internet can be abstracted in the dimension of network administration, which groups IP addresses into subnetworks, subnetworks into network prefixes and prefixes into autonomous systems (AS). Figure 2.1 [32] shows a scheme of the structure of the Internet. The vertices (nodes) of the Internet are: • Hosts that are the computers of users. • Servers that are computers or programs providing a network service, which also can be hosts. 20
Figure 2.1: Structure of the Internet [32]. The global structure of the Internet is determined by the routers (the router level) and domains (the AS level).
• Routers that distribute traffic across the Internet. • Domains (autonomous systems), where routers are grouped into subnetworks. In 2001 the Internet contained about 100 million (108 ) hosts. However, it is not the hosts that determine the structure of the Internet but routers and domains. So, one can consider the topology of the Internet at the router level or the AS level. The net of routers is much larger than the net of autonomous systems. In 2001 there were roughly 228,000 routers in total and the total number of autonomous systems was about 104 [18]. An autonomous system is the term that the Border Gateway Protocol (BGP) [48] gives to an entity that manages one or more networks and has a coherent policy for routing IP traffic both internally and to other autonomous systems. Within autonomous systems, the routing of information is advertised by some internal rules and algorithms (internal protocols). In principle, the internal protocols of distinct autonomous systems should not coincide. Therefore the network structure inside an autonomous system only affects local traffic behaviours. This thesis focuses on the AS-level Internet topology, in which each node is an 21
autonomous system, because the delivery of IP traffic through the Internet depends on the complex interactions between thousands of autonomous systems that exchange routing information using the Border Gateway Protocol [49, 50]. For example, research [51] has showed that the topology of the AS-level Internet has a major impact on the delayed BGP routing convergence. When studying the topology of the Internet, the network connectivity information is represented with a graph, in which nodes are connected by links. Usually nodes and links in the graph do not contain physical properties, such as the buffer volume of a router or the length of an optical cable. There are some assumptions on the Internet graph: all links are not directed links, no link connects a node to the node itself (self-loop) and each node has at least one link (k ≥ 1). Also there is no portion separated from the network, in other words, any node is reachable from any other node.
2.2 2.2.1
Topological Properties Network Size
The size of a network is given by the total number of nodes N , and the total number of links L. For example in 1999 the AS-level Internet had 6374 nodes and 13641 links [26] and in 2001 it had 11122 nodes and 30054 links [113].
2.2.2
Degree
The degree k of a node, also called node connectivity, is the number of links which have the node as an end-point, or equivalently, the number of nearest neighbours of the node. The average degree of a network, hki, can be given by hki = (L ∗ 2)/N , where L is the number of links and N is the number of nodes. The average degree of the AS-level Internet was 4.28 in 1999 and 5.4 in 2001.
22
The maximum degree of a network, kmax , is the largest degree that a node has in the network. In 2001 the maximum degree of the AS-level Internet was 2839, which was nearly a quarter of the number of nodes, kmax ' N/4, where N = 11122. The concept of rank is often used when studying the property of degree. The rank r of a node denotes its position on a list of all nodes sorted in decreasing degree. The node with rank r = 1 has the largest degree. When a group of nodes have the same degree, they are arbitrarily assigned a position within that group. Therefore r ∈ [1, N ], where N is the number of nodes a network has.
2.2.3
Degree Distribution
If p(k, s, N ) is defined as the probability that the node s in the network of size N has a degree k, the degree distribution is P (k, N ) =
N 1 X p(k, s, N ) [8], N s=1
(2.1)
which is often denoted as P (k). While degree is a local property, the probability distribution of the degree gives important information of the global properties of a network and can be used to characterise different network topologies. For example the so-called complex networks [2, 5, 7, 8] are characterised by highly heterogeneous degree distributions [52]. 2.2.3.1
Poisson Degree Distribution
Figure 2.2 shows the motorway network of the USA. Most cities (nodes) have 3 or 4 motorway connections, only a few cities have many motorway connections and only a few cities have only one or two motorway connections. This motorway network characterised by a Poisson degree distribution as shown in Figure 2.3. The distribution curve is symmetric and the majority of nodes are distributed around the average degree of the network, hki. Networks with a Poisson degree distribution are often referred as exponential networks. 23
Figure 2.2: The motorway network of the USA.
Figure 2.3: Poisson degree distribution.
2.2.3.2
Power-Law Degree Distribution
Figure 2.4 shows the air traffic route network of the USA. There are a very large number of airports in the USA, but most of airports have just a few airline connections. Only a few hub cities having huge numbers of airline connections and they dominate the whole network traffic. This network is characterised by a power-law degree distribution as shown in Figure 2.5. The distribution curve in a logarithmic scale is a straight line, which suggests that the formula of the power-law degree distribution is P (k) ∼ k −γ , where the constant γ is the power-law exponent. Networks with a power-law degree distribution are often referred as scale-free networks [38]. Both scale-free networks and exponential networks widely exist in nature and human society [7]. Faloutsos et al [32] reported in 1999 that the AS-level Internet topology exhibit
24
Figure 2.4: The air traffic route network of the USA.
Figure 2.5: Power-law degree distribution (on a log-log scale).
a power-law degree distribution, P (k) ∼ k −γ , where γ ' 2.22.
2.2.4
Shortest Path Length
The shortest path is a route connecting two nodes with the least number of hops. In a graph, the number of hops along a route is called the length of the path. The average shortest path length l of a node is defined as the average length of the shortest paths from the node to all other nodes in the network. In this thesis, the shortest path length is calculated using Dijkstra’s algorithm [53]. The characteristic path length l∗ of a network is the average length of the shortest paths over all pairs of nodes. The characteristic path length indicates the network’s overall routing efficiency. A network with a smaller value of l∗ may achieve better dynamic performance [54, 55, 56]. The characteristic path length of the AS-level Internet was 3.7 in 1999 and it was 3.13 in 2001, which are very 25
small considering the huge size of the network. The maximum value of the shortest paths over all pairs of nodes is the network’s diameter, D. A network’s diameter may not proportionally increase with the network size and it mainly depends on the topological structure of the network.
2.2.5
Node Betweenness Centrality
On a network, there are nodes that are more prominent than others because they are highly used when transferring information. A way to measure this “importance” is by using the concept of node betweenness centrality, also called betweenness, which measures the proportion of shortest paths which visit a certain node. The betweenness centrality is defined as the total number of data packets passing through that node when every pair of nodes sends and receives a data packet along the shortest path connecting the pair. When there exist more than one shortest paths between a pair of nodes, the data packet would be divided evenly. Given a source node s and a destination node d, the number of different shortest paths is g(s, d). The number of shortest paths that contain the node w is g(w; s, d). The proportion of shortest paths, from s to d, which contain node w is ps,d (w) = g(w; s, d)/g(s, d). The betweenness centrality of node w is calculated [57, 58, 59] as CB (w) =
XX
ps,d (w),
(2.2)
s d6=s
where the sum is over all possible pairs of nodes with s 6= d. If all pairs of nodes of a network communicate at the same rate, and the traffic goes by the shortest paths, then the traffic through a node is proportional to the betweenness of the node. In other words, the betweenness estimates the capacity of each node needed for a free-flow state [57]. A node with a large CB is “important” because it carries a large traffic load. If this node fails or gets congested, the consequences to the network traffic can
26
be drastic [59, 60]. As is natural, one can suggest that the betweenness of a node strongly correlates with its degree. In this thesis the betweenness centrality is normalised by N , the total number of nodes, and denoted as CB∗ . The average betweenness centrality over all nodes is hCB∗ i = l∗ + 1 [59], where l∗ is the network’s characteristic path length.
2.2.6
Clustering Coefficient
If a node has k neighbours, then at most k ∗ (k − 1)/2 inter-neighbour links can exist between the neighours. If nc denotes the number of inter-neighbour links the node has, then the clustering coefficient c of the node is defined as the fraction of the allowable links that actually exist [61], c=
nc . k(k − 1)/2
(2.3)
Clustering coefficient reflects the extent to which neighbours of a node are also neighbours of each other, and thus it measures the cliquishness of a typical neighbour circle. In other words, it characterises the ‘density’ of connections in the environment close to a node. When a node has only one neighbour (k = 1), the value of c is zero. The maximum value of c is one, which means all neighbours are connected to each other and the maximum linkage in this cluster (the maximum ‘clustering’) is reached. The average clustering coefficient of a network, hci, is the average value of clustering coefficient over all nodes. Depending on the measurement data sources, the average clustering coefficient of the AS-level Internet is between 0.24 and 0.49 [11].
2.2.7
Disassortative Mixing (Degree Correlations)
Complex networks can be grouped into assortative, disassortative and neutral networks [62, 63, 64, 65]. Social networks (e.g. the co-authorship network) are
27
assortative networks, in which high-degree nodes prefer to attach to other highdegree nodes. Information networks (e.g. the World Wide Web and the Internet) and biological networks (e.g. protein interaction networks) have been classified as disassortative networks, in which high-degree nodes tend to connect with lowdegree ones. A network’s degree mixing pattern is identified by the conditional probability pc (k 0 |k) that a link connects a node with degree k to a node with degree k0.
This joint degree-degree distribution is inconvenient for empirical analy-
sis due to the poor statistics obtained using the limited data sources. PasterSatorras et al [66, 67] found that the conditional probability can be indicated by the nearest-neighbours average degree knn of a node with degree k. In this dependence, only one variable (degree k) is present. A disassortative network exhibits a negative correlation between the nearest-neighbours average degree and the degree. The degree correlations are absent in classical random graphs, but are natural in growing networks. For example the AS-level Internet exhibits the disassortative mixing behaviour [63, 66, 67, 64], where high-degree nodes tend to connect to peripheral nodes with low degrees.
2.3
Random Networks
The classical random network theory was introduced by Erd˝os and R´enyi [33, 34, 68]. There are two main constructions of Erd˝os-R´enyi graphs with a fixed number of nodes N : 1. Each two nodes of the network are connected by a link with probability p. Naturally, this link is absent with probability 1 − p. 2. The nodes are randomly connected by a given number L of links. One can realise this construction procedure by repeatedly adding new links between
28
pairs of randomly chosen nodes. In graph theory, this is called a random graph process. These two constructions define two equivalent statistical ensembles of graphs. The set of graphs in construction (1) is all 2N (N −1)/2 graphs with any number of links smaller than or equal to N (N − 1)/2. The set of graphs in construction (2) consists of all possible graphs with N nodes and a given number of L of links. The constructions above naturally generate uncorrelated graphs. In other words, correlations between their nodes are absent. Each node in the graph with N nodes is in the same situation. It can have any number of links attached, from zero (a “bare” node) to N − 1. If a node is of degree k, then its k links can occupy N − 1 possible positions. Standard combinatorics readily lead to the following degree distribution of the classical random graph: Ã
!
N −1 k P (k) = p (1 − p)N −1−k , k
(2.4)
that is the binomial distribution, so that the average degree is hk = p(N − 1)i and the network contains, on average, pN (N − 1)/2 links. For large N and fixed hki, the degree distribution takes the Poisson form P (k) = e−hki hkik /k!.
(2.5)
The Erd˝os-R´enyi model generates statistically homogeneous networks in which, despite the fundamental randomness of the model, most nodes have the same number of links, hki (the average degree). In particular, the connectivity follows a Poisson distribution that peaks strongly at hki, implying that the probability of finding a highly connected node decays exponentially (P (k) ' e−k , for k À hki.) The Waxman model [35] provides another construction for random networks with Poisson degree distribution and has been widely used to generate random topologies for network simulations. It starts by placing N nodes uniformly on an n by n plane. Once all nodes have been placed on the plane, the model computes
29
the probability of creating a link between two nodes µ and υ with the following probability function: P (µ, υ) = αe−d(µ,υ)/βL ,
(2.6)
Where d(µ, υ) is the Euclidean distance between µ and υ, L is the maximum Euclidean distance between two nodes, α and β are parameters in the range (0, 1]. Then a random number is generated between 0 and 1. A link is created between µ and υ only if the random number is smaller than P (µ, υ). The above random networks are static, in the sense that they have a fixed size. Starting with a constant set of N disconnected nodes, these networks are defined by the rules assigning links between pairs of nodes. These networks share a random nature in the process of placing the links, that it is in general independent of the local properties of nodes. Despite this extreme simplification, however, random networks have provided for a long time the theoretical reference framework in network modelling, including the Internet. The characteristic path length of a random network can be approximated [33, 34] by l∗ ≈ ln(N )/ lnhki,
(2.7)
where N is the number of nodes and hki is the average degree.
2.4
Small-World Networks
A regular network is characterised by its neighbour clustering. For example the ring-lattice network shown in Figure 2.6-a has a large number of triangles and the grid-lattice network has a large number of quadrangles. This structural property provides the network a large number of alternative routing choices and makes the network as a whole highly fault-tolerant. A random network is characterised by its random connections, which provide routing shortcuts and make the characteristic path lengths l∗ (see Equation 2.7) 30
Figure 2.6: The three networks with the same numbers of nodes, the same number of links and the same placement of nodes. a. Regular network (ring-lattice). b. Smallworld network (the Watts-Strogatz Model). c. Random network.
of the network significantly smaller than that of an equivalent regular network. A small-world network [61, 69, 70, 71, 72] has the following properties: • The clustering coefficient c is much larger than that of a random graph with the same number of nodes and the same average degree. • The characteristic path length l∗ is almost as small as l∗ for the corresponding random graph. This means a small-world network has a large number of triangles and quadrangles and also has random connections. The AS-level Internet is regarded as a good example of a small-world network because, despite the immense size of the network, it has a very small characteristic path length (l∗ = 3 ∼ 4) and fairly large average clustering coefficient (hci = 0.30 ∼ 0.49). Watts [61] demonstrated that a regular lattice can be transformed into a smallworld network by making a small fraction of the connections random. Figure 2.6-a shows a ring-lattice regular network, in which each nodes are uniformly connected to its 4 closest neighbours. If a small fraction p, of the links are made random, the network turns into a small-world network (Figure 2.6-b). If all the links are made random, the network becomes a random network (Figure 2.6-c). 31
c(p)/c(0)
l*(p)/l*(0)
p Figure 2.7: Small-world properties [61]. c(p) is the average clustering coefficient and l∗ (p) is the characteristic path length of network with a fraction of p links randomly rewired.
As shown in Figure 2.7, when only a fraction of p = 0.01 links are rewired randomly, the network’s average clustering coefficient is nearly the same as that of the ring-lattice regular network, c(p = 0.01)/c(0) ' 1 and the network’s characteristic path length is significantly smaller than that of the ring-lattice regular network l∗ (p = 0.01)/l∗ (0) ' 0.18 and close to that of the random network.
2.5
Summary
This chapter defined the following topological properties: network size, degree, degree distribution, shortest path length, node betweenness centrality, clustering coefficient and disassortative mixing. They are going to be used in the rest of this thesis. This chapter also introduced the concepts of random networks and small-world networks.
32
Chapter 3 Measurements and Models Of The AS-Level Internet 3.1
Introduction
This Chapter introduces three types of data sources that predominate in the Internet research and two methodologies used to obtain the data sets. The Chapter also introduces a number of recently proposed network models which are of relevance to this work. This Chapter sets out the immediate context of this research and point out what the challenges are.
3.2
Topology Measurements Of The AS-Level Internet
There are currently two primary methods of inferring the Internet structure at the AS-level: the passive measurement, which uses BGP inter-domain routing tables, and the active measurement, which actively probes IP addresses to get the actual paths that packets travel from a source to a destination. The strength of this research is that it is based on the real measurement data of the Internet topology.
33
3.2.1
Passive Measurement - BGP AS Graph
The Internet passive measurement [25, 26, 27] produces the BGP AS graphs, which are constructed from Internet inter-domain BGP routing tables. The BGP tables contain the information of connections from an AS to its immediate AS neighbours. The widely used BGP data are available from the Active Measurement Project at National Laboratory for Applied Network Research [25] and the Route Views Project at University of Oregon [26]. Both projects connect to a number of operational routers within the Internet for the purpose of collecting BGP routing tables. The Measurement and Network Analysis Group of the US National Laboratory for Applied Network Research (NLANR) [25], has developed the Network Analysis Infrastructure (NAI). The NAI is the largest project of its kind that makes all data publicly available for use by other network researchers. On its web site, http://most.nlanr.net/, one can find extensive Internet routing related information collected since November 1997. For nearly each day NLANR has a complete map of connections of operating autonomous systems. BGP tables have the advantage that they are relatively easy to parse, process and comprehend. However, despite widespread public availability, BGP data has several limitations [73]. BGP tables do not reflect how traffic actually travels in network and provide only a local perspective from a router toward a destination.
3.2.2
Extended BGP AS Graph
The Topology Project at University of Michigan [30] provided the extended version [74, 75] of BGP AS graphs by using additional data sources, such as the Internet Routing Registry (IRR) data and the Looking Glass (LG) data. The IRR maintains individual ISP’s (Internet Service Provider) routing information in several public repositories to coordinate global routing policy. The LG sites are
34
maintained by individual ISPs to help troubleshoot Internet-wide routing problems. Extended BGP AS graphs typically have 20-50% more links than the original BGP AS graphs and provide more complete pictures of the Internet topology.
3.2.3
Active Measurement - Traceroute AS Graph
Figure 3.1: An map of the AS-level Internet measured by the Internet Mapping Project [29] of Bell Labs.
The Internet active measurement [28, 29, 76] produces the Traceroute AS graphs.
From 1998, the Cooperative Association for Internet Data Analysis
(CAIDA [28]) began its Macroscopic Topology Project to collect and analyse Internet-wide topology and latency data at a representatively large scale. In the course of this project CAIDA has created several innovative measurement, analysis and visualisation tools. The primary topology measurement tool is skitter, which implements the Internet Control Message Protocol (ICMP) to collect the forward path from the monitor to a given destination and capture the addresses of intermediate routers in the path. Skitter runs on more than 20 monitors around the globe and actively collects forward IP path to over half a million destinations. Traceroute AS graph extracts [28, 29, 76] interconnect information of ASes from the massive traceroute data collected by skitter.
35
3.2.4
Discovery Of The Internet Power-Law Degree Distribution
Figure 3.2: Degree Frequency [32] of a BGP AS graph measured on 5th December 1998.
Based on the BGP measurement data, Faloutsos et al [32] reported in 1999 that the degree distribution of the AS-level Internet (see Figure 3.2) and the router-level Internet are described by a power-law P (k) ∼ k −γ ,
(3.1)
where the power-law exponent is γ ' 2.2 for the AS-level Internet. The discovery of the Internet power-law degree distribution is of fundamental importance because it showed that the Internet topology can not be modelled by network models with a Poisson degree distribution, such as random networks and smallworld networks. In fact, this property literarily invalidated all previous research on modelling the Internet topology.
3.2.5
Which AS Graph?
Most recent studies on the AS-level Internet topology were based on the BGP AS graphs and the Extended AS graphs, such as the power-law degree distribution [32], the error and attack tolerance [77] and other research works [66, 67, 78, 37, 79]. 36
Comparison studies [80, 81, 82, 73] have shown that the Traceroute AS graph is more complete and reliable than the BGP AS graph. However it is not clear whether the Traceroute AS graph is more complete than the Extended BGP AS graph, which has captured even more Internet connections than the Traceroute AS graph. Chapter 7 will compare the three types of AS graphs in detail by examining all the topological properties. Based on the comparison results, the author suggests that the Traceroute AS graphs are more realistic measurements for the Internet research. In this thesis, Chapter 4 and 5 are based on an Extended AS graph measured in 2001. Chapter 6 and 8 are based on a Traceroute AS graph measured in 2002.
3.3
Topology Models Of The AS-Level Internet
This section introduces a selection of the existing Internet models which have been widely used in the studying of the Internet. The Tiers model, the GT-ITM model and the User-Provider model focus on the Internet hierarchical structure [83]. The Inet model, the Barab´asi and Albert (BA) model and the modifications of the BA model are degree-oriented models. The BRITE model, the DorogovtsevMendes Model, the Generalised Network Growth (GNG) Model, the Generalised Linear Preference (GLP) Model and Highly Optimised Tolerance (HOT) Model are examples of models using more complex growth mechanisms. This thesis will further study the Inet model, the BA model, the Fitness BA model and the GLP model in the following chapters.
3.3.1
Tiers Model
The Tiers generator [84] is based on a three level hierarchy that represents Wide Area Networks (WAN), Metropolitan Area Networks (MAN), and Local Area Networks (LAN). To generate a random topology using Tiers, one specifies a 37
target number of LANs and MANs. Currently Tiers cannot generate more than one WAN per random topology. For each level of hierarchy, one also specifies a fixed number of nodes per network. A minimum spanning tree is computed to connect all links, then other links are created based on user-specified average inter-level and intra-level redundancy. The link formation favours close-by nodes, resulting in topologies with large diameters (see Section 2.2.4 on page 26).
3.3.2
GT-ITM Model
GT-ITM (Transit-Stub) model [85, 83] generates topologies based on several different models. The connectivity used to generate each connected graph can be selected from one of six methods: PureRandom, Waxman1, Waxman2, Doar-Leslie, Exponential, or Locality [85, 83]. Similar to Tiers, the model has a well-defined hierarchical structure. It generates topologies with two levels of hierarchy: one consisting of transit ASes, and the other consisting of stub ASes. Also similar to Tiers, the GT-ITM model allows for extra links to be added between stub ASes and between stub and transit ASes.
3.3.3
User-Provider Model
User-Provider model [36] generates networks using a self-organised interaction between users and providers, where the interactive can be rearranged during the network growth. All nodes in the model are divided into two roles: providers and users. Providers can have several links, pointing to other sites which correspond to users. Users have a single link pointing to their providers. At each time-step, a node is added to the network. The new node can be either a provider with a probability r or a user with probability 1−r. When a provider is added, D(t) users in the network are chosen at random, and rewired to the new provider. Links to the previous providers are removed. It is assumed that the integer number D(t) is a random variable with Poisson distribution and each user has the same probability
38
(1/k) to be rewired.
3.3.4
Inet Model
The Inet model1 [86, 37] was designed to match the degree distribution as measured in the BGP AS graphs. The model generates networks in three steps: • Build a spanning tree with all nodes that have degrees greater than one. • Connect all nodes with degree one to nodes in the spanning tree with a linear preference. • Connect the remaining free links in the spanning tree. The number of links generated by the model depends on two parameters, which are the total number of nodes and the percentage of nodes with degree k = 1. Since the model is based on the original BGP AS graph, it typically generates 26% less links than the extended BGP AS graph.
3.3.5
Barab´ asi and Albert Model
Pursuing a very different class of dynamic graph models, Barab´asi and Albert [38, 87] showed that power-law graphs can arise from a simple dynamic model that combines incremental growth with a preference for new nodes to connect to existing ones that are already well connected. The BA model starts with a small random network, the system “grows” by attaching a new node with m links 2 to m different nodes that already present in the system (see Figure 3.3); and the attachment is “preferential” [88] because the probability that a new node connects to node i with degree ki is ki Π(ki ) = P , j kj
(3.2)
1
During the research on this thesis, the author found that the Inet-2.1 model contains redundant links in the output. According to his report, the Inet research group (http://topology.eecs.umich.edu/inet/) identified the programming bug and updated the model to version 2.2 and later Inet-3.0. 2 Use m = 3 to obtain Internet-like networks.
39
(Existing network)
New node Figure 3.3: Growth of the BA model.
which is a linear function of ki . The BA model has generated great interest in various research areas [89, 90, 91, 92]. Barab´asi and Albert state [40, 93] that this intuitively appealing growth model applies to the Internet’s AS graph and therefore explains why AS graph exhibit power-law degree distributions. The model has also been used as a startingpoint in research into the error and attack tolerance of the Internet [77, 94]. Simplicity and parsimony are the two advantages of the BA model. The BA model is important also because the model can be mathematically analysed. Using mean-field theory, Barab´asi and Albert [95] showed that the BA model generates networks with a degree distribution of P (k) ∼ k −γ with the power-law exponent of γ = 3.0, which is independent of network size (growth time) and the parameter m. Mean-field theory for scale-free random networks After t time-steps, the network has N = t + m0 nodes and mt links. Time dependence of the connectivity ki of a given node i can be calculated analytically using a mean-field approach. Assume that k is continuous, and thus the probability Π(ki ) = ki /
P j
kj can be interpreted as a continuous rate of change of ki .
Consequently, ki ∂ki = mΠ(ki ) = m PN −1 . ∂t j=1 kj
40
Taking into account the total growth in the number of links
P j
kj = 2mt, then
∂ki /∂t = ki /2t. The solution of this equation, with the initial condition that node i was added to the system at time ti with connectivity ki (ti ) = m, is µ
t ki (t) = m ti
¶β
, β = 1/2.
The probability that a node has a degree ki (t) smaller than k, P (ki (t) < k), can be written as:
Ã
m1/β t P (ki (t) < k) = P ti > 1/β k
!
If the nodes are added at equal time intervals, the probability density of ti is P (ti ) = 1/(m0 + t). Then, Ã
m1/β t P ti > 1/β k
!
m1/β t =1− . (m0 + t)k 1/β
The degree probability distribution is
P (k) =
where
3.3.6
1 β
∂P (ki (t) < k) 2m1/β t = ∂k (m0 + t)k 1/β+1
+ 1 = 3, so that P (k) ∼ k −3 .
Fitness BA Model
The Fitness BA (FBA) model [39] is a modification of the BA model. It uses generalised preferential attachment which assures that, even a relatively young node with a small number of links, can acquire new links at a higher rate if it has a large fitness parameter. The reason the author studies this model is that, for the uniform fitness parameter distribution, the network generated by this model has a power-law exponent similar to that of the AS graph. The FBA model [39] is identical to BA model except that a new parameter, fitness, is introduced in to the calculation of the probability Π. In the real Internet, 41
the probability that a new node will be connected to node i does not only depends on the node’s connectivity k. The node’s fitness describes it’s ability to compete for links at the expense of other nodes. Fitness BA model generate networks with a power-law degree distribution with the exponent of the power-law closer to the actual Internet degree distribution. A fixed fitness parameter η is assigned to each nodes, where η is chosen uniformly from the interval [0, 1]. The preferential probability becomes: ηi k i Π(i) = P . j ηj k j
(3.3)
Using mean-field theory, Bianconi and Barab´asi [39] showed that the Fitness BA model generates networks with a power-law degree distribution of P (k) ∼ k −γ , where the slope γ = 2.25, which is closer to that of the Internet (γ ' 2.2).
3.3.7
Generalised BA Model
The Generalised BA model [40] is an extension of the BA model. It can generate networks with power-law exponents between 2 and 4. In the Generalised BA model, three possible activities could happen in every growth step: • With probability p (0 ≤ p < 1), m (m < m0 ) new links are added. • With probability q (0 ≤ q < 1 − p) , m links are rewired. • With probability 1 − p − q, a new node is added. The preferential probability is ki + 1 , j (kj + 1)
Πi = P
(3.4)
which is proportional to ki + 1, such that there is a nonzero probability that isolated nodes (ki = 0) acquire new links. Albert and Barab´asi [40] showed that the network’s power-law degree distribution is: P (k) =
t D(p, q, m) (k + A(p, q, m) + 1)−1−B(p,q,m) , m0 + t 42
(3.5)
where, Ã
!
2m(1 − q) +1 , A(p, q, m) = (p − q) 1−p−q B(p, q, m) =
2m(1 − q) + 1 − p − q m
(3.6) (3.7)
and D(p, q, m) = (m + A(p, q, m) + 1)B(p,q,m) B(p, q, m).
(3.8)
The power-law exponent γ = 1 + B(p, q, m) and varies between 2 and 4.
3.3.8
BRITE Model
BRITE [41, 96] is an approach towards universal topology generation. BRITE combines a number of topology generation tools, which can be used to flexibly control various parameters (such as connectivity and growth models) and study various properties of generated network topologies (such power-laws, average path length, etc). It has the following features: • Flexible: BRITE supports multiple generation models. Models can be enhanced by assigning links attributes such as bandwidth and delay. • Extensible: BRITE’s object-oriented architecture provides researchers with the ability to add new models of generation and with the ability to import from and export to custom topology files. • Interoperable: BRITE allows importing topologies from other topology generators and extending or combining them with other topologies.
3.3.9
Dorogovtsev-Mendes Model
Dorogovtsev and Mendes [42] introduced a model using the addition of new internal links. If the parameter m is the number of new internal links that appear at each growth time-step, the model evolves according to the following rules. 43
• At each time-step, a new node is added and linked with node i with the probability given by the BA model (see Equation 3.2 on page 39). • In addition, – m ≥ 0 new internal links are added between unconnected pairs of old node i and j with probability proportional to the product of their degrees, ki × kj . – In the case of m ≤ 0, some old links between old nodes are removed with equal probability. The parameter m may be also non integer. Dorogovtsev and Mendes showed that with a wide range of m, this model can generate networks with power-law degree distributions and the power-law exponent γ can be adjusted by the m. However this model produces the wrong kind of the degree-degree correlation.
3.3.10
Generalised Linear Preference Model
Bu et al [44] recently introduced the Generalised Linear Preference (GLP) model. This model is a modification of the BA model. It reflects the fact that the evolution of the Internet topology is mostly due to two operations, the addition of new nodes and the addition of new links between existing nodes.
(Existing network)
(Existing network)
New links
New node a. Addition of new nodes
b. Addition of new links
Figure 3.4: The growth of the GLP model. The two operations are independent.
44
The model starts with m0 nodes connected through m0 − 1 links. As shown in Figure 3.4, at each time-step, one of the following two operations is performed: • With probability ρ ∈ [0, 1], m (m < m0 ) new links are added between m pairs of nodes chosen from existing nodes; • With probability 1 − ρ, one new node is added connecting to m old nodes. The GLP model uses the generalised linear preference that the probability Π(i) to choose node i with degree ki is given by (ki − β) Π(i) = P , β ∈ (−∞, 1). j (kj − β)
(3.9)
The parameter β can be adjusted such nodes have a stronger preference of being connected to high degree nodes than predicted by the linear preference of the BA model given by Equation 3.2 (on page 39). Bu et al showed that the GLP model, using the recommended parameter values (ρ = 0.66, m = 1, m0 = 10 and β = 0.6447), resembles the characteristic path length and the clustering coefficient of a BGP AS graph measured in September 2000.
3.3.11
Generalised Network Growth Model
The Generalised Network Growth (GNG) Model [45, 97] is similar to the GLP model. The basic idea of the GNG model is to allow both the addition of a vertex (with probability p) and the addition of a link (with probability 1 − p), but the model applied a new preference scheme. According to the its definition, at each time-step, • either a node is added and linked with node i with probability
Π(i) = p · P
ki
j=1, N
45
kj
,
• or a link is added (if absent) between nodes i and j, which are already present in the system, with probability
Π(i) = (1 − p) · P
ki
k=1, N
kk
|ki − kj | . k6=i=1, N |ki − kk |
·P
The resulting network is a scale-free one, with the power-law exponent γ(p) = 2+
p . 2−p
From the above rules, the case p = 1 (no link creation) corresponds
to a traditional BA model where only one connection is added for a time-step. This model exhibits some agreement with the Internet measurements for the degree distribution, the betweenness distribution, the clustering coefficient and the correlation functions for the degrees. However the growth dynamics of the GNG model are not supported by the real measurements.
3.3.12
Highly Optimised Tolerance Model
Carlson et al [46, 98] introduced another mechanism for generating power-law distributions, referred to as Highly Optimised Tolerance (HOT), which is motivated by biological organisms and advanced engineering technologies. Their focus is on systems which are optimised, either through natural selection or engineering design, to provide robust performance despite uncertain environment. They suggest that power-laws in these systems are due to tradeoffs between yield, cost of resources, and tolerance to risks. The characteristic features of HOT systems include: 1) high efficiency, performance, and robustness to designed-for uncertainties; 2) hypersensitivity to design flaws and unanticipated perturbations; 3) nongeneric, specialised, structured configurations; and 4) power-laws.
46
3.4 3.4.1
Discussions Structure-Based Models vs Degree-Based Models
Following the long-held belief that the Internet is hierarchical, the network topology generators most widely used by the Internet research community, e.g. the Tiers model and the GT-ITM model, create networks with a deliberately hierarchical structure. However, in 1999 Faloutsos et al [32] revealed that the Internet’s degree distribution is a power-law and Tangmunarunkit et al [99] showed that the degree distributions produced by structure-based topology generators are not power-laws. Since then the research community has largely dismissed the structure-based models as inadequate and proposed new network generators that attempt to generate graphs with power-law degree distributions. Tangmunarunkit et al [99, 100] also discovered, much to their surprise, that network generators based on the degree distribution more accurately capture the Internet large-scale structure (such as the hierarchical structure measured by Subramanian et al [78]). However their judgements were based on simple qualitative comparisons and heuristic assumptions. Tangmunarunkit et al and other researchers recognised [20, 21] that there is a need for further studies to characterise the network topology structures. One objective of this thesis is to provide parameters to quantitatively characterise and differentiate the hierarchical structure of Internet-like scale-free networks.
3.4.2
Accuracy vs Simplicity
Since the discovery of the power-law degree distribution in the Internet, the number of models trying to explain the power-law has been growing very rapidly. However, there is still no Internet evolution model that would be satisfactory
47
from both the physical and networking standpoints [101]. As a result, the laws governing the Internet evolution remain unclear. The Barab´asi-Albert (BA) model and its derivatives, popular among physicists, have seen a lot of criticism from the networking community for being too general, not incorporating any domain specifics, and, hence, failing to predict correctly many characteristics of the Internet topology and evolution. For example, by examining the AS graph Data Sets from the Topology Project of University of Michigan, Chen et al [74] show that available historical data of the AS-level Internet does not support the connectivity-based dynamics assumed in the BA model. And detailed dynamics underlying the BA modelling approach does not explain the complex structure of the AS maps. The modified BA models have similar problems. The same type of argument has been actively used against the BA model by biologists. On the other hand, the models proposed by the networking community try to incorporate Internet evolution specifics by introducing a number of non-physical parameters allowing one to easily fit the output of a model to the observed data (e.g. [102]). It is easy to see that any model with sufficient number of external parameters can be forced to produce any required output by parameter manipulations. A model can be of some theoretical value only when all its parameters can be expressed via physical variables. All the existing Internet models only focus on selected network properties and no model is capable of accurately capturing all the relevant topological properties of the Internet topology. Furthermore it is uncertain which model is better than other and researchers are even not sure whether it is feasible at all to accurately reproduce the Internet topology with an evolving model using fairly simple and realistic mechanisms. Because of the above inadequateness and uncertainty of the research on the Internet topology, random networks and regular lattice graphs are still often used by the Internet engineering community in practical studies on routing behaviours 48
and protocol simulations [103]. Another objective of this thesis is to provide realistic models to accurately reproduce the AS-level Internet topology.
3.5
Summary
This Chapter introduces the recent measurements of the AS-level Internet topology and a number of the Internet topology generators. This Chapter also discusses the challenges in parameterising and modelling the Internet topology and sets out the immediate context for this research.
49
Chapter 4 Rich–Club Phenomenon 4.1
Introduction
Inspired by detailed measurements on the Internet hierarchical structure, this chapter introduces the concept of the rich-club phenomenon, which describes an overlooked hierarchical structure of the AS-level Internet, that high-degree nodes are tightly interconnected with each other. Two metrics are provided to quantitatively characterise this structural property.
4.1.1
Internet Hierarchical Structure
It is well-known that the Internet topology has a hierarchical structure. However the description of this structure is merely qualitative and vague. Recently based on measurements, Subramanian et al [78] has classified and identified the exact details of the tier structure of the AS-level Internet topology. Subramanian studied the topology structure in terms of customer-provider and peer-peer relationships between autonomous systems as manifested in the BGP routing policies. Using heuristic arguments based on the commercial relationship [104] between ASes, they proposed a five-level classification of ASes. Dense Core: For every AS present in the dense core, all of its peers and its provider should also be present in the core. The core of the network should 50
include the small number of so-called tier-1 providers. In practice, the term Tier-1 provider is loosely defined as a “large” AS or as an AS that does not have any upstream provider. These ASes could be identified by looking for all provider-free nodes. The dense core consists of 20 ASes, including the large Internet Service Providers (ISP) such as Genuity, Sprint, AT&T, and UUNet. The top 20 ASes have a very dense connectivity of 312 peering links. The top 15 of the 20 ASes almost form a clique with only three links missing from the clique. Transit Core: ASes in the transit core are large national providers and hosting companies that have peering relationships with many of the ASes in the dense core. Outer Core: The remaining ASes in the core as the outer core. The members of the outer core typically represent regional ISPs which have a few customer ASes and a few peering relationships with other such regional ISPs. Small Regional ISPs: Small Regional ISPs are ASes having one or more customers and they have no ASes peering relationships. Customers: Customers are those stub networks which are origins and sinks of traffic and which do not carry any transit traffic. Table 4.1: Distribution of ASes in the Internet hierarchy [78]
Level Number of ASes Dense core (0)
20
Transit core (1)
129
Outer core (2)
897
Small regional ISPs (3)
971
Customers (4)
8898
51
4.1.2
Connectivity Of The Core
Subramanian et al has showed that the graph constructed from ten BGP dumps on 18 April 2001 has 10,915 ASes, of which 8,898 are customers and 971 are small regional ISPs (see Table 4.1). The remainder of the network is the core, consisting of a connected component with just 1046 ASes and 6249 connections. This represents approximately 25% of the total number of connections in the graph. The nodes in the core have an average degree of 6. The key result is that the Internet has a tier structure, where the Tier 1 consists of a “core” of ASes which are well interconnected to each other. However the network research community did not pay sufficient attention on this hierarchical property, because the approach that used in Subramanian et al ’s analysis has a number of limitations. Firstly it is a time-consuming process, which involves scrutinising on large amounts of various information data sources. Secondly it is based on a number of heuristic assumptions on the commercial relationships between network elements. Thirdly the result is represented as several tables of numbers. Thus this analysis only applies to this specific case and provides no comparison with other networks.
4.1.3
Motivation
The author noticed Subramanian et al ’s work and were very interested in the fact that highly connected nodes are tightly interconnected with each other. It is known that the AS-level Internet has a power-law degree distribution, therefore it contains a small number of nodes which have very large numbers of links. The AS-level Internet also exhibits the disassortative mixing behaviour [66, 67], where high-degree nodes tend to connect to nodes with low degrees. However neither the power-law degree distribution nor the disassortative mixing suggest whether the high-degree nodes are tightly or loosely interconnected with each other.
52
(b)
(a)
Figure 4.1: Two disassortative networks. (a) High-degree nodes are loosely interconnected. (b) High-degree nodes are tightly interconnected.
As shown in Figure 4.1, two networks having similar degree distributions and disassortative mixing behaviours can exhibit different structures. In Figure 4.1-a the high-degree nodes are not directly interconnected, whereas in Figure 4.1-b the high-degree nodes are tightly interconnected. One can see that this structural difference is relevant because the network routing is much more efficient when the high-degree nodes have direct connections among each other. The author realised that Subramanian et al ’s measurement on the connectivity of the core actually revealed a structural property that had not been characterised by the existing topological parameters. Then the author recognised that there was a need for further studies to characterise this critical structure feature and the author expected that measuring on the inter-connectivity among the high-degree nodes using a quantitative metric might provide a clue for a deeper understanding on the Internet topology, namely to answer the following two questions: • How to quantitatively characterise the rich-club phenomenon? • Do networks having power-law degree distributions, such as maps of the ASlevel Internet and synthetic scale-free networks generated by models, show similar hierarchical structures?
4.2
Rich-Club Phenomenon
In 2002 the author introduced the concept of rich-club phenomenon [105] to describe the above hierarchical structure of the AS-level Internet. The rich-club
53
phenomenon has two meanings. Firstly the network contains a small number of highly connected nodes. These nodes are called “rich” nodes. Secondly the rich nodes are tightly interconnected with each other and form a tight group, which is called the “rich-club”. The term rich-club is used to resemble a popular phenomenon in the human society, where rich upper-class people form an exclusive club to promote social and business connections among the club members. Note that the rich-club phenomenon does not imply that the majority of the rich nodes’ links are directed to other club members. Indeed, rich nodes have very large numbers of links and only a few of them are enough to provide the connectivity to other club members, whose number is anyway small. After many calculations and tests on various possible candidate parameters [106, 107], the author provided two metrics to quantitatively characterise the rich-club phenomenon, which are the rich-club connectivity and the node-node link distribution. These two parameters are not associated with any heuristic assumption but based only on the network connectivity information. The calculation of the metrics is fairly simple and their topological meanings are straightforward.
The Four Networks In this sections, the two metrics of the rich-club phenomenon are defined and measured in four different networks, which include an Extended BGP AS graph measured in May 2001 [30] and three synthetic networks generated by the Barab´asiAlbert (BA) model, the Fitness BA (FBA) model and the Inet-3.0 model. For each model, ten networks are generated with different seed numbers and all results are the average over the ten networks. As shown in Table 4.2, the four networks have the same number of nodes and similar numbers of links (except the Inet-3.0 network). Figure 4.2 shows that the cumulative degree distribution Pcum (k) of the four networks follow power-laws. The Pcum (k) of the AS graph is characterised by a power-law of slope -1.22, which
54
Table 4.2: Networks parameters
AS Graph
Inet-3.0
Fitness BA
BA Model
Number of nodes, N
11461
11461
11461
11461
Number of links, L
32730
24171
34366
34366
Average degree, hki
5.7
4.2
6.0
6.0
Max. degree, kmax
2432
2010
1793
329
Power-law exponent, γ
2.22
2.22
2.255
3.0
Cumulative distribution
10 0 Extended BGP AS graph Inet-3.0 model FBA model BA model
10 -1
10 -2
10 -3
10 -4 10 0
10 1
10 2 Degree
10 3
Figure 4.2: Cumulative distribution of degree. For each model, ten networks are generated and averaged.
yields the power-law degree distribution of P (k) ∼ k −γ , γ ' 2.22. Table 4.2 shows that the Inet-3.0 model and the FBA model have similar power-law exponents as the AS graph, whereas the power-law slope of the BA model is 3.0. The reasons that the author chose and compared these three models are because the BA model is the most widely-studied scale-free model, the FBA model generates networks with a similar power-law exponent as the AS graph and the Inet-3.0 model is designed to resemble the AS graph’s degree-distribution. Notice that the author is not trying to characterise all the existing power-law network generators, but to show that it is possible to distinguish between them by studying the properties of the rich-club.
55
4.2.1
Rich-Club Connectivity
A quantitative assessment of the rich-club phenomenon is obtained by measuring the rich-club connectivity φ, defined as the fraction of allowable links1 that actually exist among members of a rich-club. The rich-club membership is specified in two ways: nodes with degrees higher than k (“guys richer than k”), or nodes with ranks less than r ( “the top r richest guys”). Thus the rich-club connectivity can be plotted as a function of node degree or node rank. In order to be independent from the scale of the network size, the rich-club connectivity is often plotted as a function of node rank that is normalised by the number of network nodes. The rich-club connectivity measures how well the members of the rich-club “know” each other. A rich-club connectivity of 100% means that all the members have a direct link to any other member. Lower percentages of this quantity means lesser connections between them.
Rich club connectivity
100%
AS Graph Inet-3.0 Fitness BA BA Model
10%
1%
0.1%
0.01% 0.001
0.01 0.1 Normalized rank (r/N)
1
Figure 4.3: Rich-club connectivity φ(r/N ) as a function of normalised rank r/N .
Figure 4.3 shows the rich-club coefficient φ(r/N ) as a function of normalised rank r/N . The figure illustrates that in the four networks, the rich-club subgraphs formed by nodes of higher degrees are progressively more interconnected. However it is clear that the four networks exhibit profound structural differences on the 1
The number of allowable links in a n-node subgraph is n(n − 1)/2.
56
tendency of high-degree nodes to be well interconnected among each other. For example the rich nodes of the AS graph is significantly more tightly interconnected than those of the three synthetic networks. As shown in Table 4.3, the top 1% rich nodes in the AS graph have 32% of the allowable links, compared with φ(0.01) = 18% of the Inet-3.0 model and only φ(0.01) = 5% of the BA model and the Fitness model. Table 4.3: Rich-club properties
φ(r/N = 0.01)
P
rj
l(ri ≤ 5%, rj )
l(ri ≤ 5%, rj ≤ 5%)
AS Graph
Inet-3.0
Fitness BA
BA Model
32%
18%
5%
5%
28602
22620
20929
15687
8919
3697
1426
1511
φ(r/N = 0.01) is the rich-club connectivity among the top 1% richest nodes. P rj l(ri ≤ 5%, rj ) is the number of links connecting to the top 5% rich nodes. l(ri ≤ 5%, rj ≤ 5%) is the number of links connecting among the top 5% rich nodes.
4.2.2
Node-Node Link Distribution
The node-node link distribution is introduced to provide an more detailed view of the network rich-club structure. Network nodes are divided into subsets according to their ranks, for example ranks are normalised by the total number of nodes and divided into 5% bins. Then the node-node link distribution l(ri , rj ), is define as the number of links connecting from nodes in the subset ri to nodes in the subset rj , where ri ≤ rj . Figure 4.4 illustrates the node-node link distribution l(ri , rj ), against corresponding rank bins ri and rj . In the Extended BGP AS graph (Figure 4.4a), rich nodes (see columns in the row of ri = 5%) are connected preferentially to other rich nodes, where the number of links interconnecting among the top 5% rich nodes (the far corner column) is significantly larger than the numbers of links connecting the rich nodes to other lesser rich nodes. 57
Ext. BGP AS graph 9000
l(ri,rj)
Inet-3.0 model
4000 5% 0
5%
rj
5% ri
(a)
l(ri,rj)
100%
0
rj
5% ri
(b)
100%
FBA model
BA model 2000
2000 l(ri,rj)
(c)
0
rj
5% ri 100%
5%
l(ri,rj)
5% 0
100% 100%
100%
(d)
rj
5% ri 100%
100%
Figure 4.4: Node-node link distribution.
The node-node link distribution of the Inet-3.0 (Figure 4.4b) is similar to that of the AS graph, however, the number of links interconnecting among the top 5% rich nodes (far corner, 3697 links) is significantly smaller than that of the AS graph (8919 links, see Table 4.3). The node-node link distributions of the BA and the Fitness BA graphs (Figure 4.4c, 4.4d) are fundamentally different from that of the AS graph. The top 5% rich nodes of the BA and the Fitness BA graphs are connected to all node subsets with similar probabilities regardless of the rank range of subsets. Networks generated by these two models do not contain a tightly interconnected rich-club at all.
4.3
Discussion
The rich-club phenomenon describes a hierarchical property of the AS-level Internet that high-degree nodes are tightly interconnected with each other. Until recently this structural feature has been overlooked by the network research community. The author’s novel contribution is the introduction of the rich-club
58
connectivity and the node-node link distribution, which for the first time provide a realistic way to quantitatively characterise and differentiate this structural property of networks having power-law degree distributions. Results show that synthetic scale-free networks generated by degree-based models may exhibit different hierarchical structures.
4.3.1
Rich-Club Subgraph
Distribution
0.06
0.04
0.02
0 0
20
40
60
80
100
Degree
Figure 4.5: Degree distribution inside the rich-club subgraph which consisting of the top 1% rich nodes of the AS graph.
As shown in Figure 4.5, if the rich-club comprises the top 1% rich nodes of the Internet AS graph, the probability distribution of degrees among the club members is not a power-law, in fact it peaks around degree k = 25. Calculation shows that the average distance between rich nodes is 1.73 hops, which is very small and means if two club members do not have a direct link between them, very likely they share a neighbour member.
4.3.2
Rich-Club Phenomenon Is Relevant
The rich-club consists of highly connected nodes, which are well interconnected between each other and the average hop distance among the club members is very small (1 to 2 hops). The rich-club is a “super” traffic hub of the network and the 59
Internet’s disassortative mixing property ensures that peripheral nodes are always near the hub. These two structural properties contribute to the routing efficiency of a network. Modelling the rich-club phenomenon is relevant [108], because an Internet model that does not reproduces the properties of the rich-club will underestimate the actual network’s routing efficiency (shortest path length) and routing flexibility (alternative reachable paths), and also, it will overestimate the network robustness under node-attack [77]. Chapter 6 will investigate into more details on the impacts of the Internet rich-club structure.
4.3.3
Modelling The Rich-Club
Results show the Inet-3.0 model does not show the rich-club phenomenon as strong as the Extended BGP AS graph. The reason is that the Inet-3.0 model is designed to resemble the original BGP AS graph. For example, networks generated by the model typically have 27% less links than the Extended BGP AS graph. The BA model and Fitness BA model generate strict power-law degree distributions, which are very different from that of the AS-level Internet. Moreover, it does not show the rich-club phenomenon of the AS graph at all. This is due to the growth dynamics of the models. In both models, new links are brought into the system by the addition of new nodes. New nodes are preferentially connected to high degree nodes. Thus inter-rich links can only appear when some new nodes grow into rich nodes. However, due to the preferential attachment, the probability for a new node to become a rich node decreases as the network grows. As a result, rich nodes are not well interconnected between each other. This suggests a simple modification to these models to generate a rich-club: as the network grows, new internal links appear which are preferentially attached between the existing nodes. An example is the Interactive Growth model, which will be introduced in Chapter 5.
60
The above analysis on the three network models demonstrates that the richclub connectivity is useful in revealing structural details of complex networks and provides a new perspective for analysing the growth mechanisms of evolving network models. In the following chapters, the rich-club connectivity is used as both a new criterion for validating network structures and a practical guideline for proposing new models.
4.4
Summary
The rich-club phenomenon describes the hierarchical structure of the AS-level Internet where high-degree nodes are tightly interconnected with each other. This structural property is quantitatively characterised by the rich-club connectivity and the node-node link distribution. The calculation of the two metrics is simple and solely based on graph connectivity information. The rich-club connectivity is a critical complement to the existing topology parameters to explicitly and thoroughly characterise large-scale complex network structures and it provides a new criterion for network models.
61
Chapter 5 Interactive Growth Model 5.1
Introduction
Chapter 4 shows that the rich-club connectivity quantitatively characterises the hierarchical structure of the AS-level Internet and a number of degree-based Internet models do not reproduce the rich-club connectivity as the actual network. This chapter introduces the Interactive Growth (IG) model [109, 110], which uses a growth mechanism that is based on observations on the Internet history data. The model is validated against an Extended BGP AS graph and the IG model is also compared with a number of other Internet models. Results show that the IG model compares favourable with other models because it closely resembles both the power-law degree distribution and the rich-club connectivity of the ASlevel Internet. The chapter also discusses the reasons that are responsible for the topological differences between the network models. The IG model, as an example of networks containing a rich-club, will be used in the next chapter to investigate the impact of the network structures on the network behaviours. The IG model is also the precursor of the Positive Feedback Preference (PFP) model which will be introduced in Chapter 8.
62
5.2
Interactive Growth Model
The Interactive Growth (IG) model modifies the Barab´asi and Albert (BA) model (see Section 3.3.5 on page 39) by using a so-called interactive growth mechanism, which is based on a number of dynamic behaviours observed [74, 66, 67, 79] on the Internet history data. Firstly there are two main operations that account for the evolution of the Internet graph: the addition of new nodes and the appearance of new internal links between already existing nodes (old nodes). Secondly the majority of new nodes are added to the system by attaching them to only one or two old nodes. Thirdly the degree distribution of the AS-level Internet is not a strict power-law, for example it has more nodes with degree two than nodes with degree one (P (k = 2) > P (k = 1)). Lastly the majority nodes (with low degrees) in the AS-level Internet exhibit a linear preferential attachment as described in the BA model (see Equation 3.2 on page 39).
Figure 5.1: The interactive growth mechanism of the IG model. a) A new node is attached to one old node and at the same time-step two new internal links appear. b) A new node is attached to two old nodes and one new internal link appears.
The interactive growth mechanism is shown in Figure 5.1. The IG model starts with a small random network, at each time-step, • with probability p ∈ [0, 1] (see Figure 5.1-a), a new node is attached to one old node (host node), and at the same time two new internal links appear connecting the host node to two other old nodes (peer nodes), 63
• with probability 1 − p (see Figure 5.1-b), a new node is attached to two host nodes and one new internal link appears connecting one of the two host node to one peer node. The linear preference probability given by the BA model is used for the attachment of new nodes and the appearance of new internal links. From numerical simulations, the author found that p = 0.4 produces the best result to fit the degree distribution and the rich-club connectivity of the AS-level Internet. The interactive growth mechanism satisfies all the above observations on the Internet evolution. Since the two growth operations are interdependent, at each time-step the number of nodes of the network increases by one and the number of links increases by three. Therefore the model produces a similar ratio of links over nodes (L/N ' 3) as the AS-level Internet.
5.3
Model Validation
The IG model is compared against an Extended BGP AS graph measured in May 2001. The IG model is also compared with synthetic networks generated by other Internet models, such as the BA model, the Inet-3.0 model (see Section 3.3.4 on page 39) and the GLP model (see Section 3.3.10 on page 44). For each model, ten networks are generated with different seed numbers and all results are the average over the ten networks. As shown in Table 5.1, all the model networks have the same number of nodes and similar numbers of links as the AS graph. The GLP(1) network is generated using parameters of ρ = 0.66, m = 1, m0 = 10 and β = 0.6447, as recommended by Bu et al [44]. The GLP(2) network uses the same parameters except β = 0, which makes the GLP model’s generalised linear preference of Equation 3.9 (on page 45) equivalent to the linear preference of the Equation 3.2 (on page 39) used by the BA model and the IG model.
64
Table 5.1: Network properties
N
L
γ
kmax
hki P (k = 1) P (k = 2) P (k = 3)
AS graph
11461
32730
2.22
2432
5.7
28.9%
40.3%
11.6%
IG model
11461
34363
2.22
842
6.0
26.0%
33.8%
10.5%
GLP(1)
11461
34363
2.20
517
6.0
68.4%
11.3%
5.1%
GLP(2)
11461
34363
2.20
524
6.0
52.0%
16.3%
7.9%
Inet-3.0
11461
24171
2.22
2010
4.2
40.0%
36.7%
8.2%
BA model
11461
34363
3.0
329
6.0
0%
0%
40.0%
N - Number of nodes. L - Number of links. γ - power-law exponent. kmax - maximum degree. hki - average degree. P (k) - degree distribution, percentage of nodes with degree k.
5.3.1
Degree Distribution
5.3.1.1
Degree Distribution 100 AS graph IG model GLP (1) GLP (2) Inet model BA model
10-1
10-2
P(k) 10-3
10-4
10-5 100
101
102
103
104
k Figure 5.2: Degree distribution. For each model, ten networks are generated and averaged.
Figure 5.2 and Table 5.1 show that the IG model and the Inet-3.0 model closely match the degree distribution of the AS graph, particularly the low-range degree distributions, where the percentage of nodes with degree one P (1), is actually smaller than the percentage of nodes with degree two P (2). The low-range degree distribution is important because nodes with degree one and two account for more
65
than 70% of the total number of nodes in the AS graph. The IG model is a dynamic growing model and it is the growth mechanism that defines the model’s topological properties, including the degree distribution. The reason that the Inet-3.0 model well matches the AS graph’s degree distribution is because the static model is designed to resemble the Internet measurements, where links are attached to nodes according to pre-assigned node degrees. The BA model is based solely on the addition of new nodes. In order to obtain a similar ratio of links over nodes as the AS-level Internet, each new node in the BA model is attached to three old nodes (m = 3) and therefore P (1) = P (2) = 0. Bu et al recommend the parameter m = 1 for the GLP model, thus each new node is attached to only one old node. As a result the probability of nodes with degree one of the two GLP networks are significantly larger than that of the actual network (see Table 5.1). For example, P (1) of the GLP(1) is as high as 68.4%, which is more than twice of the AS graph. 5.3.1.2
Degree vs Rank 104 AS graph IG model GLP (1) GLP (2) Inet model BA model
103
k 102
101
100 100
101
102
r
103
104
105
Figure 5.3: Degree k as a function of rank r.
Figure 5.3 shows degree k as a function of rank r on a log-log scale. The AS graph has a nearly strick power-law relationship between degree and rank, 66
k ∼ r−0.85 . The curves of the two GLP networks are not power-laws. The BA model exhibits a power-law behaviour between degree and rank, but the powerlaw exponent is significantly different from that of the AS graph. The curve of the Inet-3.0 network deviates from the AS graph between k = 101 ∼ 103 . Apart from a few richest nodes (r ≤ 101 ), the IG model in general well matches the correlation between degree and rank of the AS graph.
5.3.2
Rich-club Phenomenon
Networks generated using the IG model and the GLP model should exhibit a higher rich-club connectivity than the BA model, because new internal links added in the IG model and the GLP model preferentially connect among already well connected nodes. 5.3.2.1
Rich-Club Connectivity 100%
φ(r/N)
10%
1% AS graph IG model GLP (1) GLP (2) Inet model BA model
0.1%
0.01% 0.1%
1%
r/N
10%
100%
Figure 5.4: Rich-club connectivity, φ(r/N ), as a function of normalised rank, r/N .
Figure 5.4 shows the rich-club connectivity φ(r/N ) as a function of normalised rank r/N on a log-log scale. The plot shows that only the IG model closely matches the rich-club connectivity of the AS graph. The rich-club connectivity of the Inet-3.0 model and the BA model are significantly lower than that of the AS 67
graph. It is interesting to notice that the rich-club connectivity of the two GLP networks are higher than that of the AS graph. This means the rich nodes in these two models are even more tightly interconnected among each other than in the AS graph. For example, the AS graph and the IG model have φ(0.01) = 32%, comparing with φ(0.01) = 72% of the GLP(1) and φ(1%) = 50% of the GLP(2). 5.3.2.2
Node-Node Link Distribution
l(ri ,rj )/L 25%
l(ri ,rj )/L 25%
a). AS graph
20%
20%
15%
15%
10%
10%
5%
5%
0% 100%
ri rj
5% 0% 100%
5% 100%
b). IG model
5%
ri rj
5% 100%
Figure 5.5: Node-node link distribution l(ri , rj ), which is normalised by L, the total number of links.
Figure 5.5 shows that the IG model well resembles the node-node link distribution of the Extended BGP AS graph. Table 5.2: Node-node link distribution
AS graph
IG
GLP(1)
GLP(2)
Inet
BA
Number of links, L
32730
34363
34363
34363
24171
34363
l(ri ≤ 5%, rj )
29602
26422
32376
29073
22620
15687
8919
7806
16210
11540
3697
1511
P
rj
l(ri ≤ 5%, rj ≤ 5%) P
l(ri ≤ 5%, rj ) is the number of links connecting to the top 5% rich nodes; l(ri ≤ 5%, rj ≤ 5%) is the number of links connecting among the top 5% rich nodes. rj
In order to compare all the networks together, Figure 5.6 shows a simplified version of the node-node link distribution, l(ri ≤ 5%, rj ), which has only one variable of rj and illustrates where the top 5% rich nodes (ri ≤ 5%) are connected 68
j)
5% ,r l(r
i
1, j kj
(8.2)
which favours high-degree nodes. A numerical experiment (called the Test* model) using Equation (8.2) instead of Equation (8.1) in the IG model showed that, when α = 1.15 ± 0.01, this nonlinear preferential growth creates a network with a maximum degree kmax similar to the AS graph.
Rich-club connectivity
10
0
10
-1
10
-2
r/N=0.01
AS graph PFP model IG model BA model Test* model
10 -3 10
-3
-2
-1
10 10 Normalized rank (r/N)
10
0
Figure 8.2: Rich-club connectivity φ(r/N ) vs normalized rank r/N .
However, as shown in Figure 8.2, the rich-club connectivity produced by the Test* model deviates from the AS graph. For example, the 1% best connected nodes of the Test* model have 42% allowable interconnections compared with 27% of the AS graph.
8.3
The Positive-Feedback Preference Model
Based on the Internet history data, Pastor-Satorras et al [66] and V´azquez et al [67, 124] measured that the probability that a new node links with a low-degree old node indeed follows the linear preferential attachment given by Equation (8.1). Whereas Chen et al [74] reported that high-degree nodes have a stronger ability of 102
acquiring new links than predicted by Equation (8.1). The Internet-history data also show that at early times, node degree increases very slowly; later on, node degree grows more and more rapidly. Taking into account these observations, we modified the IG model by using the nonlinear preferential attachment 1+δ log10 ki
ki
Π(i) = P
j
1+δ log10 kj
kj
, δ ∈ [0, 1].
(8.3)
Equation (8.3) is used for the attachment of new nodes and the appearance of new internal links. We call this the Positive-Feedback Preference (PFP) model [125, 126]. From numerical simulations, we found that δ = 0.048 produces the best result. It is interesting to notice that, for δ = 0.048 and kmax = 2839, the exponent 1+δ log10 kmax ' 1.166 is close to the value of α used in the Test* model to reproduce the AS graph’s maximum degree. The PFP model also modifies the IG model’s interactive growth mechanism. The PFP model starts with a small random network, at each time-step, • with probability p ∈ [0, 1], a new node is attached to one old node; and at the same time with probability q ∈ [0, 1] one new internal link appears between old nodes and with probability 1 − q two new internal links appear. • with probability 1 − p, a new node is attached to two old nodes; and at the same time with probability q one new internal link appears and with probability 1 − q two new internal links appear. When p = 0.4 and q = 0.9, the generated PFP networks have the same ratio of links over nodes as the AS graph (see Table 8.1).
103
kα
Degree functions
10 3 10 2
k
10 1
1+δlogk
k
10 0 10 0
10 1
102
10 3
Degree Figure 8.3: Three degree functions: k, k α with α = 1.15 and k 1+δ log10 k with δ = 0.048.
10 3
Degree
10 2
10 1 PFP model IG model BA model
10 0 10 0
10 1
103 10 2 Age (timestep)
10 4
Figure 8.4: Degree growth of a node added in an early time-step.
The PFP model satisfies Pastor-Sartorras et al, V´azquez et al and Chen et al ’s observations. For low-degree nodes, the attachment preference is approximated by the linear preference of Equation (8.1). For high-degree nodes, the attachment preference increases as a nonlinear function of the node degree (see Figure 8.3). As a result, as the time passes by, the rate of degree growth in the PFP model is faster than in the IG model and the BA model (see Figure 8.4).
104
8.4
Model Validation
In this Chapter, the analysis is based on the Traceroute AS graph measured in April 2002 by CAIDA [28, 113]. The AS graph is compared with networks generated by the PFP model, the IG model and the BA model. For each model, ten networks are generated with different seed numbers and all results are the average over the ten networks. The networks had the same number of nodes and similar numbers of links as the AS graph (see Table 8.1). Table 8.1: Network Parameters
Number of nodes N Number of links L Power-law exponent γ Degree distribution P (k = 1) Degree distribution P (k = 2) Degree distribution P (k = 3) Average degree hki Max. degree kmax Rich-club connectivity φ(r/N 0.01) Avg. triangle coef. hkt i Max. triangle coef. kt−max Avg. quadrangle coef. hkr i Max. quadrangle coef. kr−max Charact. path length l∗ Average knn hknn i Avg. betweenness hCB∗ i Max. betweenness CB∗ max
8.4.1
AS graph 11122 30054 2.22 26% 38% 14% 5.4 2839 27% 12.7 7482 277 9648 3.13 660 4.13 3237
PFP model 11122 30151 2.22 28% 36% 12% 5.4 2686 30% 12 8611 247 9431 3.14 482 4.14 3419
IG 11122 33349 2.22 26% 34% 11% 6.0 700 32% 10.4 4123 105.4 8780 3.6 103 4.6 1002
BA 11122 33349 3 0% 0% 40% 6.0 292 4.5% 0.1 64 1.3 527 4.3 20 5.3 1064
Degree Distribution, Rich-Club Connectivity and Maximum Degree
The PFP model closely matches the degree distribution (see Figure 8.5 and 8.6), the rich-club connectivity (see Figure 8.2) and the maximum degree (see Table I) of the AS graph. Also the PFP model has the same power-law relationship between degree and rank, k ∼ r−0.85 as the AS graph (see Figure 8.1).
105
Degree distribution
10
0
AS graph PFP model IG model BA model
10 -1
10
-2
10
-3
slope -2.22
10 -4 10
0
10
1
2
10 Degree
10
3
Figure 8.5: Degree distribution.
Cumulative degree distribution
10 0 slope -1.22
10 -1
10 -2
10 -3
10
AS graph PFP model IG model BA model
-4
10
0
10
1
2
10 Degree
10
3
Figure 8.6: The cumulative degree distribution.
In certain respect the accuracy of the PFP model to reproduce these properties is not a surprise. After all, the model was designed to match these properties.
106
8.4.2
Short Cycles
Cumulative distribution
10
0
10
-1
10
-2
10
-3
AS graph PFP model IG model BA model
10 -4 10
0
10
1
2
10 10 Triangle coefficient
3
10
4
Figure 8.7: Cumulative distribution of triangle coefficient.
Cumulative distribution
10
0
10
-1
10
-2
10
-3
10 -4 10 0
AS graph PFP model IG model BA model 10 1 10 2 10 3 Quadrangle coefficient
10 4
Figure 8.8: Cumulative distribution of quadrangle coefficient.
Figure 8.7 and 8.8 show that the AS graph and the PFP model have similar cumulative distributions of short cycles.
107
10
4
Triangle coefficient
10 3 10
2
10 1 10 0 10
AS graph PFP model IG model BA model
-1
10 -2 0 10
10
1
10 Degree
2
10
3
Figure 8.9: Correlation between triangle coefficient kt and degree, where kt is the average over nodes with the same degree.
Quadrangle coefficient
10 4 10 3 10 2 10 1
AS graph PFP model IG model BA model
10 0 10 -1 0 10
10 1
10 2 Degree
10 3
Figure 8.10: Correlation between quadrangle coefficient kq and degree, where kq is the average over nodes with the same degree.
Figure 8.9 and 8.10 show that the AS graph and the PFP networks also exhibit similar correlations between short cycles and degree. The AS graph and the PFP model have higher densities of short cycles (see hkt i and hkq i in Table 8.1) than the IG model and the BA model, therefore exhibit higher degrees of network routing flexibility.
108
8.4.3
Disassortative Mixing 1
AS graph PFP model IG model BA model
Cumulative distribution
0.8 0.6 0.4 0.2 0 0 10
10
1
10
2
10
3
10
4
Nearest-neighbors average degree
Figure 8.11: Cumulative distribution of nearest-neighbours average degree.
Nearest-neighbours average degree
10 3
AS graph PFP model IG model BA model
10 2
10
1
10
0
10
1
10 degree
2
10
3
Figure 8.12: Correlations between nearest-neighbours average degree knn and degree, where knn is the average over nodes with the same degree.
The AS graph and the PFP model have close cumulative distribution of the nearest-neighbours average degree (see Figure 8.11). Figure 8.12 shows that the AS graph and the PFP networks exhibit similar negative correlations between the nearest-neighbours average degree and degree, therefore show similar disassortative mixing behaviours.
109
8.4.4
Shortest Path Length 1.0
AS graph PFP model IG model BA model
Cumulative distribution
8.0 6.0 4.0 2.0 0.0 2.0
3.0
4.0
5.0
Shortest path length
Figure 8.13: Cumulative distribution of shortest path length.
4.5
AS graph PFP model IG model BA model
Shortest path length
4.0 3.5 3.0 2.5 2.0 1.5 10 0
10 1
10 2 degree
10 3
Figure 8.14: Correlation between shortest path length l and degree, where l is the average over nodes with the same degree.
Figure 8.13 and 8.14 show that the PFP model accurately reproduces the cumulative distribution of shortest path length and the correlation between shortest path length and degree of the AS graph. Table 8.1 shows that the AS graph and the PFP model have nearly the same characteristic path length, which is significantly shorter than that of the IG model and the BA model. 110
The reason that the PFP model accurately reproduces the routing efficiency properties (shortest path length and characteristic path length) of the AS graph is because the model correctly resembles both the rich-club connectivity and the disassortative mixing of the AS graph. The rich-club consists of highly connected nodes, which are well interconnected between each other and the average hop distance among the club members is very small (1 to 2 hops). The rich-club is a “super” traffic hub of the network and the disassortative mixing property ensures that peripheral nodes are always near the hub. These two structural properties together contribute to the routing efficiency of a network. On the contrary, the BA model does not reproduces the two structural properties and therefore underestimates the actual network’s routing efficiency.
8.4.5
Betweenness Centrality
Cumulative distribution
10 0
AS graph PFP model IG model BA model
10 -1
10
-2
10 -3
slope -1.1
10 -4 10 0
10 1 10 2 10 3 Betweenness centrality
10 4
∗ Figure 8.15: Cumulative distribution of betweenness centrality, Pcum (CB ).
Figure 8.15 shows that the cumulative distribution of betweenness centrality Pcum (CB∗ ) of the four networks exhibit similar power-law behaviours characterised by slope −1.1. However as shown in Table 8.1, the maximum value of the betweenness centrality CB∗ max of the AS graph and the PFP model are significantly larger than that of the IG model and the BA model. Figure 8.16 also shows that only
111
Betweenness
10 3
AS graph PFP model IG model BA model
10 2
10 1
10 0 0 10
10 1
10 2 Degree
10 3
∗ ∗ Figure 8.16: Correlations between betweenness centrality CB and degree, where CB is the average over nodes with the same degree.
the PFP model closely matches the correlation between betweenness centrality and degree of the AS graph.
8.5
Discussion
8.5.1
The Positive-Feedback Preferential Attachment
The positive-feedback preferential attachment means that, as a node acquires new links, the node’s relative advantage of competing for more new links increases as a non-linear feed-back loop. This implies the inequality on the link-acquiring ability between rich nodes and non-rich nodes enlarges as the network evolves. Rich nodes, not only become richer, they become disproportionately richer.
8.5.2
Critical Assessment of The PFP Model
The PFP model accurately reproduces the AS-level Internet topology. Comparing with other existing Internet models, the PFP model has in a number of advantages. • Firstly the model closely matches all the topological properties that are widely studied by the network research community, including degree distribution, rich-club connectivity, the maximum degree, shortest path length, 112
P(k), %
10
2
10
1
a.
10
10
0
10
10
10 10
φ, %
10
1
k 10
2
10
2
10
3
1
10
3K
0
1
0
10
c.
1K
10
10
0
N=1K
10
3K 11K
-2
N=3K
2
k
1K
-1
b.
N=11K
0
1
10
10
2
3 r 10
11K
knn
10
3
10
4
d.
3K 2
1K 11K
10
-1
10
-1
10
0
10
1
10
2
r/N, %
10
1
10
0
1
10
k
10
2
10
3
Figure 8.17: Network properties of a growing PFP model with the number of nodes N =1000 (1K), 3000 (3K) and 11122 (11K). (a) Degree distribution. (b) Degree vs rank. (c) Rich-club connectivity. (d) Nearest-neighbours average degree vs degree.
short cycles, disassortative mixing and betweenness centrality. • Secondly the model reproduces these properties with remarkable accuracy. • Thirdly the two growth mechanisms used in the model, namely the interactive growth and the positive-feedback preference, are based on (and supported by) the observations on the Internet history data. • Finally, the validation of the model was conducted with the traceroutederived AS graph, which is regarded as more realistic than measurements based on the BGP-tables (see Chapter 7). While the initial motivation was to create a model that can accurately reproduce the rich-club connectivity and the maximum degree of the AS graph, the PFP model actually captures all other topological properties as well. This suggests that the Internet structure can be described by only three topological properties.
113
The PFP model is a phenomenological model. Further studies are needed to explain why the Internet growth seems to follow the non-linear preferential attachment given by the PFP model and what are the consequences of the PFP growth mechanism for the future of the Internet. Figure 8.17 shows a number of network properties of a growing PFP model with different numbers of nodes. It would be interesting to investigate whether the PFP model also resembles other evolution stages of the Internet topology without customising the model parameters.
8.6
Summary
There are two mechanisms that are necessary for the correct modelling of the Internet topology at the AS level: the interactive growth and a nonlinear preferential growth, where the growth is described by a positive-feedback mechanism. The Positive-Feedback Preference model uses the two mechanisms and accurately reproduces all the topological properties of the AS-level Internet. The PFP model is superior to other Internet models.
114
Chapter 9 Discussion and Conclusion 9.1
Discussion
Three years ago the research on the Internet topology was still in a preliminary stage. The Internet has a power-law degree distribution. This means the network contains a small number of nodes with very large numbers of links and the average degree can not characterise this heterogeneous nature. The discovery of the powerlaw degree distribution invalidated all previous research on the Internet topology because they were based on the random network theories. Many degree-based Internet models have been proposed. However no model accurately reproduces the full picture of the Internet topology. Some models are not based on real measurement data and some models even use non-physical growth mechanisms to produce selected network properties that are of the researcher’s own interests. During his year-long literature survey, the author developed an intuition that the difficulties in modelling the Internet is due to the lack of means to thoroughly describe the complex structure of the Internet. There might be some hidden properties that have not been explicitly characterised by the existing topology parameters. Therefore, the author did not follow the normal way of starting the research by examining and comparing all the existing models, which of course 115
would be a daunting job. Instead the author started his research by searching for the hidden structure in the Internet topology. Researchers have looked for other topological properties to characterise the Internet topology. For example, by studying the correlation between degree and nearest-neighbours average degree, researchers have reported that the Internet exhibits the disassortative mixing behaviour, where high-degree nodes tend to connect to low-degree nodes. However the disassortative mixing does not characterise how high-degree nodes are connected with each other. Preliminary measurement data suggested that the Internet has a large number of links connecting among high-degree nodes. The author realised that it is a key property of the Internet hierarchical structure. Then the author introduced the concept of rich-club phenomenon to describe this overlooked structure, i.e. highly connected nodes not only have large numbers of links but also are tightly interconnected with each other. The rich-club phenomenon is quantitatively characterise by the rich-club connectivity and the node-node link distribution. The metric of the rich-club connectivity is a milestone on parameterising the Internet topology. Using the rich-club connectivity, the author discovered the structural deficiencies of the Internet models and the author also revealed the structural discrepancies between different Internet measurements. Moreover, the authro showed that the rich-club connectivity is relevant to the network behaviours, such as routing efficiency, redundancy and robustness. Inspired by the rich-club properties, the author introduced the IG model, which closely resembles both the power-law degree distribution and the rich-club connectivity of the AS-level Internet. The IG model uses the interactive growth mechanism that is abstracted from observations on the Internet history data. An important contribution of the IG model is that it demonstrates a possible way to capture more structural properties by adopting realistic mechanisms originated from measurements on the Internet evolution. The author noticed that the IG model still had limitations. For example, the 116
model does not reproduce the maximum degree of the AS graph. The author found that this shortfall could be responsible for not accurately reproducing other topological properties, such as disassortative mixing. In fact it is well known that the Internet features a very large maximum degree, but no model using evolving mechanisms can reproduce this property. The author discovered that by increasing the preference probability, the modified IG model can reproduce the maximum degree. However the rich-club connectivity of the generated network deviates from the AS graph. After painstaking study on the Internet history data and with some inspiration, the author introduced the PFP model. The model modifies that IG model by using the so-called Positive-Feedback Preference, which only favours high-degree nodes. As a result the model accurately reproduces the maximum degree, the degree distribution and the rich-club connectivity at the same time. While the initial motivation was to reproduce three degree-related structural properties, the PFP model accurately captures all other topological properties as well, including properties of short cycles, shortest path length, disassortative mixing and betweenness centrality. The PFP model is doubtlessly the most complete and accurate model to date. The author is confident on the above results because, as an important methodology that guided throughout the research, the author bases the research only on the actual measurements of the Internet. The author uses the Internet measurement data to study the network structure and validate the Internet models. Moreover the growth mechanisms adopted by the IG model and the PFP model are abstracted from (and supported by) the observations on Internet history data.
9.2
Future Work
The immediate work is to study the phenomenological PFP model to explain why the preferential attachment is given by a non-linear feedback loop and what are the consequences of this growth mechanism for the future of the Internet. 117
Future research work should take into account of the two major challenges of the Internet. • Due to the rapid growth, the Internet has evolved into such an immense scale, that the existing methods are not valid anymore to carry out practical simulations, e.g. to test new routing protocols. • The Internet is constantly disrupted due to traffic congestions, facility failures and malicious attacks. The Quality-of-Service (QoS) issues are getting more and more concerned when deploying future network infrastructures. Considering the above challenges and based on the research achievements presented in this thesis, we propose two possible future directions as follows: 1. Scaling problem [127]. Can the network simulation be simplified by using models with smaller size and less complexity? Are all scales important at all? 2. Cascading effects [128, 129]. Does local disorder cause a cascading disruption of the whole network? How to predict and prevent this? How long will it take to recover?
9.3
Conclusion
The Internet topology has been measured at two different levels. By inferring router adjacencies it is possible to measure the Internet Router (IR) level graph. At another level, the graph of the Internet is obtained from the AS routing path information. These two measurements are related but describe the Internet at different levels. The AS level describes the aggregation of the routers and links at a given domain. The two ways to measure the AS Internet are (1) passive measurements obtained from the BGP routing tables and (2) active measurements where a probe traces the routers that a IP packet visits when transversing the 118
network (that is at the IR level). The AS graph is obtained by mapping the router information obtained by the probe with its AS domain. The active measurements are considered to give better description of the Internet connectivity because they can collect ephemeral adjacency not captured by only looking at the BGP tables. In summary the AS graph is a heterogeneous network characterised by a powerlaw degree distribution. The majority of nodes have only a few links, whereas a small number of rich nodes have large numbers of links, in particular the best connected node has links to nearly a quarter of nodes in the network. Based on the Internet measurement data, the author concluded that the AS graph exhibits a rich-club phenomenon where the highly connected nodes are tightly interconnected with other. In fact the top 100 richest nodes form a fully connected mesh. The existence of a rich-club is critical to for the description and understanding of the AS Internet. The rich-club is a “super” traffic hub of the network and the disassortative mixing property ensures that peripheral nodes are always near the hub. Thus the rich club structure together with the disassortative mixing explain why the network has a very small characteristic path length. Scale-free models without the rich-club structure may under-estimate the flexibility of the traffic routing in the Internet. Moreover, there is also a counter intuitive consequence of modelling networks without the rich-club. A network without the rich-club may over-estimate the robustness of the network to a node attack, where the removal of a small percentage of its richest club members can break down the network integrity. The PFP model demonstrates that the degree distribution, the maximum degree and the rich-club connectivity can be accurately reproduced by using two realistic growth mechanisms based on the Internet history data, namely the interactive growth and the positive-feedback preference. Moveover, when the above three structural properties are closely resembled, all other topological properties of the AS graph are also reproduced at the same time. The PFP model is the most precise and complete Internet topology generator to date. The PFP model not 119
only is a practical model for representative Internet simulation but also provides insights on the fundamental rules that govern the evolution of complex networks. The above novel contributions represent a profound extension of the stateof-the-art knowledge in the research field of parameterising and modelling the Internet topology.
120
Appendix I. QMUL Topology Simulator The QMUL Topology Simulator provides all the calculation and simulation results presented in this thesis. The motivation of developing the topology simulation tool was that there was no suitable kit available for this research, which involves generating self-designed models and calculating self-defined properties. The simulator is developed by the author himself using Microsoft Visual C++ 6.0. It is based on the MS Windows 2000 operating system. It has the following functions (see Figure 10.1): • It grows scale-free networks using the BA model series, including the BA model, the Fitness BA model and the Generalised BA model, with various settings of initial status and parameters. It also imports topology data generated by the Inet model of version 2.1 ∼ 3.0. • It generates Internet-like networks using self-designed model, such as the Interactive Growth model and the Positive-Feedback Preference model. • It parses and imports that Internet measurement data (AS graphs). • It calculates all the topological properties used in this thesis, such as clustering coefficient, degree distribution, shortest path length, betweenness, nearest-neighbours average degree, rich-club connectivity, triangle coefficient and quadrangle coefficient. 121
• It exports topology data into the Pajek [130] file format, which can be used to visualise the network graphs, e.g Figure 5.7. It also export plot data files in the Gunplot [131] format to create scientific plot figures. • It saves the network connectivity information and all the calculation results of topological properties in the ‘∗.topo’ file format, which can be restored for further uses.
To generate networks using models of BA, FBA, GBA, GLP, IG, and PFP.
Parameter Setting
Inet generator
Internet Raw data
Generate
Initial Status
parse Inet Data Files
Grow Medium Status
Grow
Import
Save ( *.topo ) Topology files Load
Import
QMUL Topology Simulator Other data
(Inner data structure)
Calculate topological properties: degree distribtuion, rich-club connectivity, shortest path length, trianglecoefficient, degreedegree correlations, betweenness......
Export
Standard Pajek network data files
Export
Standard Gunplot plot data files
Load PAJEK (Visualised graphs)
Legend:
BGP and Traceroute AS graphs
Load Gunplot (Scientific plot figures)
QMUL Topology Simulator Other software and data sources
Figure 10.1: Function flowchart of the QMUL Topology Simulator.
122
The strongpoint of this simulator is that, by using a tight linear data structure to store the topology information, the simulator achieves an optimal balance between the fast speed of calculation and the economic amount of memory required by the process. Running on a Dell desktop computer with merely 256MB RAM and an Intel 1.0GHz CPU, it takes only 30 seconds for the simulator to generate a BA model network of 11K nodes and 33K links. The author also improved Dijkstra’s algorithm [53] of calculating the shortest path length between every pair of nodes, so that the same process also calculate the betweenness centrality of every node. It takes only about 5 hours to calculate the two properties. The QMUL Topology Simulator also has the following features: • Flexible. The simulator supports multiple evolving network models. • Extensible. The simulator uses an object-oriented architecture, which provides the ability to add new network models and to handle customised file formats. • Large-scale. The simulator is capable of processing large scale networks with up to 100K nodes and 4.5M links. • User Friendly. The simulator provides a Graphical User Interface as shown in Figure 10.2, 10.3 and 10.4.
Figure 10.2: Window of “Parameters for generating networks”.
123
Figure 10.3: Window of the main interface.
It took four months for the author to design, code and debug the first version of the QMUL Topology Simulator in late 2001. Since then the simulator has been updated and optimised for many times in order to revise program bugs, add new functions and improve the calculation speed. The latest version of the program has more than 5000 lines of code and it has been proved to be en efficient and powerful network simulation tool. The following is a list of functions defined in the C++ Class of “CQMUL Topo”.
long Do U nif yData(long thisM anyXdata); long Do ReadP lotF ile(CString thisP lotN ame, int thisP lotF ormat); long Do GenerateGrowOneN onLineal(long thisOneEnd, long thisCreateT ime); long Do GenerateGetN onLinealP ref erential(long thisException, double thisAlpha); long Do P lotAverageErrorBar(long thisM anyData); long Do GenerateGrowOneP ref erence(long thisOneEnd, long thisCreateT ime); long Do GenerateGrowOneRandom(long thisOneEnd, long thisCreateT ime); 124
Figure 10.4: Window of “Save plot data files”.
long Do GenerateGetLinealP ref erential( ); long Do GenerateGetLinealP ref erentialF BA(double thisT otal); long Do ReadBarabasiActor(CString thisF ileN ame); long Do GetLinkDistDataRank(long X1, long X2, long Y 1, long Y 2); long Do GetLinkDistDataDegree(long X1, long X2, long Y 1, long Y 2); long Do GetRichClubRankLink(long thisRank); long Do P lotP ercentage(long thisM anyData); long Do ArrangeRawData(long thisM anyData); long Do ReadDataF ile(CString thisF ileN ame, int thisF ileF ormat); long GetSmallestLabel( ); double Do GetRandom(double theBase); void Do GenerateN LP ( ); void Do CalculateLength( ); void Do P lot70( ); void Do GetBetweenT his( ); void Do P lot40 K(long thisP lot); void Do P lotV alueRank(long thisM any); void Do P lot10 Rank(long thisP lot); void Do CalcalateLocal( ); 125
void Do GenerateDoro( ); void Do GenerateIG( ); void Do GenerateF BA( ); void Do GenerateInitialStatus( ); void Do GenerateBA( ); void Do GenerateRandom( ); void Do P lot63( ); void Do P lot62( ); void Do P lotSortData(long thisM any); void Do P lotCumulative(long thisM any); void Do GetT opoInf o( ); void Do P lot20 Distribution(long thisP lot); void Do P lot60( ); void Do P lot61( ); void Do P lot51( ); void Do P lot00 ID(long thisP lot); void Do ArrangeData( ); void Do InitN etwork( ); void Do W riteP lotF ile(CString thisF ileN ame, long thisLongData, BOOL if AverageErrorBar, BOOL if AllLong); CString Do GetCString(double thisData); CString Do ComposeF ileN ame( ); BOOL DoScan(long thisSmallest); BOOL Do If HasLink(long thisStart, long thisEnd); BOOL Do If HasLinkAf terSort(long thisStart, long thisEnd); BOOL Do AddN ewLink(long thisLinkID, long thisStart, long thisEnd, long thisCreateT ime, BOOL thisIf Check); BOOL Do AddN ewN ode(long thisN odeID, long thisCreateT ime); BOOL Do CheckData( ); 126
Appendix II.
Author’s Publications Journal Papers 1. S. Zhou and R. J. Mondrag´on. The rich-club phenomenon in the Internet topology. IEEE Communications Letters, volume 8, page 180, March 2004. 2. S. Zhou and R. J. Mondrag´on.
Redundancy and robustness of the AS-
level Internet topology and its models. IEE Electronic Letters, volume 40, page 151, January 2004. 3. S. Zhou and R. J. Mondrag´on. Accurately modelling the Internet topology. Accepted by Physical Review E, 2004. 4. M. Woolf, D. K. Arrowsmith, S. Zhou, R. J. Mondrag´on and J. M. Pitts. Dynamical modelling of TCP packet traffic on scale-free networks. Submitted to Physical Review E, 2004.
Conference Papers 5. S. Zhou and R. J. Mondrag´on. - the Interactive Growth model.
Towards modelling the Internet topology In J. Charzinski, editor, Proc. of 18th 127
International Teletraffic Congress (ITC18), volume 5a of Teletraffic Science and Engineering (Elsevier), pages 121–130, Berlin, German, Sept. 2003. 6. S. Zhou and R. J. Mondrag´on.
The missing links in the BGP-based AS
connectivity maps. In Proc. of Passive and Active Measurement Workshop (PAM2003) , pages 219–222, San Diego, USA, April 2003. 7. S. Zhou and R. J. Mondrag´on. Internet topology.
Analyzing and modelling the AS-level
In Prof. of 1st International Working Conference on
Performance Modelling and Evaluation of Heterogeneous Networks (HETNETs’03), Ilkley, West Yorkshire, UK, July 2003. 8. S. Zhou and R. J. Mondrag´on.
Topological properties of the AS-level
Internet. In Proc. of IEEE & IEE International Conference on Telecommunications (ICT2002) , volume 3, pages 497–501, Beijing, China, June 2002. 9. S. Zhou and R. J. Mondrag´on.
Connectivity in the Internet topology. In
Proc. of PGNet2002, pages 157–162, Liverpool, UK, May 2002. 10. S. Zhou and R. J. Mondrag´on. The Positive-Feedback Preference model of the AS-level Internet topology. Submitted to IEEE ICC, 2005. 11. S. Zhou and R. J. Mondragon, Sampling Methodologies and Structural Deficiencies of the AS-level Internet Topology Measurements. Submitted to The International Conference on Information Networking (ICOIN) 2005.
128
Glossary AS Autonomous System, a collection of routers operated in a coordinated way so that the routers implement the same routing policy; typically operated by a single administrative entity. ASN Autonomous System Number, a two-byte number that uniquely identifies an AS. BGP Border Gateway Protocol, the primary inter-domain routing protocol used in the Internet. ICMP Internet Control Message Protocol, the diagnostic part of the network layer used in the Internet for reporting status information, checking connectivity, and so on. IP Internet Protocol, the network layer protocol used by the Internet. ISPs Internet Service Providers LANs Local Area Networks. MANs Metropolitan Area Networks. Protocol A standard procedure for regulating data transmission between computers. Router A computer that typically has two or more interfaces on different networks and provides forwarding of packets between those networks.
129
Routing The process by which a router calculates a forwarding table by using its knowledge of the network taken from local configurations. Routing Table A conceptual data structure used to hold routing information. Server A hardware and software device designed to perform a specific function for many users. TCP Transmission Control Protocol, the principal reliable transport protocol used in the Internet. UDP User Datagram Protocol. WANs Wide Area Networks. WWW World Wide Web.
130
Bibliography [1] L. A. Adamic and B. A. Huberman, “Power-law distribution of the world wide web,” Science, vol. 287, p. 2115, 2000. [2] S. H. Strogatz, “Exploring complex networks,” Nature (London), vol. 410, p. 268, 2001. [3] P. L. Krapivsky and S. Redner, “Organization of growing random networks,” Phys. Rev. E, vol. 63, p. 066123, 2001. [4] L. A. Adamic, R. M. Lukose, A. R. Puniyani, and B. A. Huberman, “Search in power-law networks,” Physical Review E, vol. 64, p. 046135, 2001. [5] A. L. Barab´asi, Linked: The New Science of Networks. Perseus Publishing, 2002. [6] R. Albert and A. L. Barab´asi, “Statistical mechanics of complex networks,” Rev. Mod. Phys., vol. 74, pp. 47–97, 2002. [7] S. Bornholdt and H. G. Schuster, Handbook of Graphs and Networks - From the Genome to the Internet. Weinheim Germany: Wiley-VCH, 2002. [8] S. N. Dorogovtsev and J. F. F. Mendes, Evolution of Networks - From Biological Nets to the Internet and WWW. Oxford University Press, 2003. [9] A. Vazquez, R. P.-S. M. Boguna, Y. Moreno, and A. Vespignani, “Topology and correlations in structured scale-free networks,” Physical Review E, vol. 67, no. 046111, 2003. 131
[10] R. Cohen and S. Havlin, “Scale-free networks are ultrasamll,” Physical Review Letters, vol. 90, no. 5, p. 058701, 2003. [11] R. Pastor-Satorras and A. Vespignani, Evolution and Structure of the Internet - A Statistical Physics Approach. Cambridge University Press, 2004. [12] S. T. Park, D. Pennock, and C. L. Giles, “Comparing static and dynamic measurements andmodels of the Internets AS topology,” in Proc. of IEEE INFOCOM 2004, 2004. [13] A. Medina, I. Matta, and J. Byers, “On the origin of power laws in Internet topologies,” ACM SIGCOMM Computer Communication Review, 2000. [14] K. I. Goh, B. Kahng, and D. Kim, “Fluctuation-driven dynamics of the Internet topology,” Physical Review Letters, 2002. [15] A. C. Zorach and R. E. Ulanowicz, “Quantifying the complexity of flow networks: how many roles are there?” Complexity, vol. 8, no. 3, 2003. [16] A. L. Barab´asi, Z. Deszo, E. Ravasz, S. H. Yook, and Z. Oltvai, “Scalefree and hierarchical structures in complex networks,” to appear in Sitges Proceedings on Complex Networks, 2004. [17] S. Floyd, “Simulation is crucial,” IEEE Spectrum, January 2001. [18] G. F. Riley and M. H. Ammar, “Simulating large networks - how big is big enough?” in Proc. of 1st Intl. Conf. on Grand Challenges for Modeling and Simulation, 2002. [19] V. Paxson and S. Floyd, “Why we don’t know how to simulate the Internet,” in Proc. of the 1997 Winter Simulation Conference, 1997. [20] S. Floyd and V. Paxson, “Difficulties in simulating the Internet,” IEEE/ACM Transactions on Networking, vol. 9, no. 4, pp. 392–403, August 2001. 132
[21] S. Floyd and E. Kohler, “Internet research needs better models,” ACM SIGCOMM Computer Communications Reviews, vol. 33, no. 1, pp. 29–34, January 2003. [22] W. Willinger and V. Paxson, “Where mathematics meets the Internet,” Notices of the American Mathematical Society, vol. 45, no. 8, 1998. [23] B. Yao, R. Viswanathan, F. Chang, and D. Waddington, “Topology inference in the presence of anonymous routers,” in Proc. IEEE INFOCOM, 2003. [24] T. Petermann and P. D. L. Rios, “Exploration of scale-free networks – do we measure the real exponents?” Eur. Phys. J., vol. 38, pp. 201–204, 2004. [25] NLANR
(National
Laboratory
for
Applied
Network
Research),
Oregon,
Eugene.
http://moat.nlanr.net/. [26] Route
Views
Project,
University
of
http://www.routeviews.org/. [27] Routing Information Service,
RIPE Network Coordination Center.
http://www.ripe.net/. [28] CAIDA
(Cooperative
Association
For
Internet
Data
Analysis),
http://www.caida.org/. [29] Internet Mapping Project, Lumeta, http://research.lumeta.com/ches/map/. [30] Topology
Project,
University
of
Michigan,
Ann
Arbor.
http://topology.eecs.umich.edu/. [31] M. Murray and kc claffy, “Measuring the immeasurable: global Internet measurement infrastructure,” in Prof. of PAM2001, 2001.
133
[32] M. Faloutsos, P. Faloutsos, and C. Faloutsos, “On power-law relationships of the Internet topology,” Comput. Commun. Rev., vol. 29, pp. 251–262, 1999. [33] P. Erd˝os and A. R´enyi, “On random graphs,” Publ. Math. Debrecen, vol. 6, p. 290, 1959. [34] P. Erd˝os and A. R´enyi, “On the evolution of random graphs,” Publ. Math. Inst. Hung. Acad. Sci., vol. 5, p. 17, 1960. [35] B. M. Waxman, “Routing of multipoint connections,” IEEE Journal of Selected Areas in Communications, vol. 6, no. 9, pp. 1617–1622, 1988. [36] A. Capocci, G. Caldarelli, R. Marchetti, and L. Pietronero, “Growing dynamics of Internet providers,” Physical Review E, vol. 64, no. 035105, 2001. [37] J. Winick and S. Jamin, “Inet-3. 0 Internet topology generator,” University of Michigan, Tech. Rep. UM-CSE-TR-456-02, 2002. [38] A. L. Barab´asi and R. Albert, “Emergence of scaling in random networks,” Science, vol. 286, pp. 509–512, 1999. [39] G. Bianconi and A. L. Barab´asi, “Competition and multiscaling in evolving networks,” Europhysics Letters, vol. 54, no. 4, pp. 436–442, 2001. [40] R. Albert and A. L. Barab´asi, “Topology of evolving networks: local events and universality,” Physical Review Letters, vol. 85, no. 24, pp. 5234–5237, 2000. [41] A. Medina and I. Matta, “Brite: A flexible generator of Internet topologies,” Boston University, Tech. Rep. BU-CS-TR-2000-005, 2000. [42] S. N. Dorogovtsev and J. F. F. Mendes, “Scaling behaviour of developing and decaying networks,” EuroPhys. Lett., vol. 52, no. 33, p. 33, 2000.
134
[43] T. E. D. Vukadinovic, P. Huang, “A spectral analysis of the Internet topology,” Technical report ETH TIK-NR. 118, 2001. [44] T. Bu and D. Towsley, “On distinguishing between Internet power law topology generators,” in Proc. of IEEE INFOCOM 2002, 2002, p. 638. [45] G. Caldarelli, P. D. L. Rios, and L. Pietronero, “Generalized network growth:
from microscopic strategies to the real Internet properties,”
arXiv:cond-mat/0307610 v1, 2004. [46] J. M. Carlson and J. C. Doyle, “Highly optimized tolerance: A mechanism for power laws in designed systems,” Physical Review E, vol. 60, pp. 1412– 1428, 1999. [47] I. Norros and H. Reittu, “Architectural features of the power-law random graph model of Internet: nodes on soft hierarchy, vulnerability and multicasting,” in Proceedings of the 18th International Teletraffic Congress - ITC 18, Elsevier, 2003. [48] C. P. B. Quoitin and L. Swinnen, “Interdomain traffic engineering with bgp,” IEEE Communications Magazine, May 2003. [49] D. K. Arrowsmith and M. Woolf, “Modelling of tcp packet traffic in a large interactive growth network,” IEEE Proc. of Systems and Circuits, 2004. [50] M. Barenco and D. K. Arrowsmith, “The autocorrelation of double intermittency maps and the simulation of computer packet traffic,” to appear in Jnl of Dyn. Sys, 2004. [51] C. Labovitz, A. Ahuja, R. Wattenhofer, and S. Venkatachary, “The impact of Internet policy and topology on delayed routing convergence,” in Proc. of INFOCOMM 2001, 2001.
135
[52] R. V. Sol´e and S. Valverde, “Information theory of complex networks: On evolution and architectural constraints,” Santa Fe Institute, Tech. Rep. DOI: SFI-WP 03-11-061, 2003. [53] A.
Kershenbaum,
Telecommunications
network
design
algorithms.
McGraw-Hill, Inc., 1993. [54] M. Steenstrup, Routing in communications networks. Prentice Hall, 1995. [55] H. Tangmunarunkit, R. Govindan, S. Shenker, and D. Estrin, “The impact of routing policy on Internet paths,” in Prof. of IEEE INFOCOM 2001, 2001. [56] R. Guerin and A. Orda, “Computing shortest paths for any number of hops,” IEEE/ACM Transactions on Networking, vol. 10, no. 5, October 2002. [57] K. I. Goh, B. Kahng, and D. Kim, “Universal behavior of load distribution in scale-free networks,” Phys. Rev. Lett., vol. 87, no. 278701, 2001. [58] K. I. Goh, E. Oh, B. Kahng, and D. Kim, “Betweenness centrality correlation in social networks,” Phys. Rev. E, vol. 67, no. 017101, 2003. [59] P. Holme and B. J. Kim, “Vertex overload breakdown in evolving networks,” Phys. Rev. E, vol. 65, no. 066109, 2002. [60] P. Holme, B. J. Kim, C. N. Yoon, and S. K. Han, “Attack vulnerability of complex networks,” Phys. Rev. E, vol. 65, no. 056109, 2002. [61] D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-world’ networks,” Nature, vol. 393, 1998. [62] M. E. J. Newman, “Assortative mixing in networks,” Phys. Rev. Lett., vol. 89, no. 208701, 2002. [63] M. E. J. Newman, “Mixing patterns in networks,” Phys. Rev. E, vol. 67, no. 026126, 2003. 136
[64] S. Maslov, K. Sneppen, and A. Zaliznyak, “Detection of topological patterns in complex networks: correlation profile of the Internet,” Physica A, vol. 333, p. 529, 2004. [65] R. Xulvi-Brunet, W. Pietsch, and I. M. Sokolov, “Correlations in scale-free networks: Tomography and percolation,” Phys Rev E, vol. 68, no. 036119, 2003. [66] R. Pastor-Satorras, A. V´azquez, and A. Vespignani, “Dynamical and correlation properties of the Internet,” Phys. Rev. Lett., vol. 87, no. 258701, 2001. [67] A. V´azquez, R. Pastor-Satorras, and A. Vespignani, “Large-scale topological and dynamical properties of Internet,” Phys. Rev. E, vol. 65, no. 066130, 2002. [68] S. Janson, T. Luczak, and A. Rucinski, Random Graphs.
Wiley-
Interscience, 2000. [69] J. Watts, Small Worlds: The Dynamics of Networks between Order and Randomness. New Jersey, USA: Princeton Univeristy Press, 1999. [70] L. Adamic, “The small world web,” in Proceedings of ECDL’99, 1999, pp. 443–452. [71] M. E. J. Newman and D. J. Watts, “Scaling and percolation in the smallworld network model,” Phys. Rev. E, vol. 60, p. 7332, 1999. [72] M. E. J. Newman and D. J. Watts, “Renormalization group analysis of the small-world network model,” Physics Letters A, vol. 263, pp. 341–346, 1999. [73] Y. Hyun, A. Broido, and k. claffy, “Traceroute and BGP AS path incongruities,” http://www.caida.org/outreach/papers/2003/ASP/.
137
[74] Q. Chen, H. Chang, R. Govindan, S. Jamin, S. J. Shenker, and W. Willinger, “The origin of power laws in Internet topologies (revisited),” in Proc. of IEEE INFOCOM 2002, 2002, pp. 608–617. [75] H. Chang, R. Govindan, S. Jamin, S. J. Shenker, and W. Willinger, “Towards capturing representative as-level Internet topology,” Computer Networks Journal, vol. 44, no. 6, pp. 737–755, 2004. [76] R. Govindan and H. Tangmunarunkit, “Heuristics for Internet map discovery,” in Proc IEEE Infocom 2000, 2000. [77] R. Albert, H. Jeong, and A. L. Barab´asi, “Error and attack tolerance of complex networks,” Nature, vol. 406, pp. 378–381, 2000. [78] L. Subramanian, S. Agarwal, J. Rexford, and R. H. Katz, “Characterizing the Internet hierarchy from multiple vantage points,” in Proc. of IEEE INFOCOM 2002, 2002, pp. 618–627. [79] S. T. Park, A. Khrabrov, D. M. Pennock, S. Lawrence, C. L. Giles, and L. H. Ungar, “Static and dynamic analysis of the Internet’s susceptibility to faults and attacks,” in Proc. of IEEE INFOCOM 2003, vol. 3, April 2003, pp. 2144–2154. [80] A. Broido and kc Claffy, “Internet topology: connectivity of IP graphs,” in SPIE International symposium on Convergence of IT and Communication 2001, 2001. [81] B. Huffaker, D. Plummer, D. Moore, and kc Claffy, “Topology discovery by active probing,” in Proc. of the 2002 Symposium on Applications and the Internet, 2002. [82] E. N. A. Broido and kc Claffy, “Internet expansion, refinement and churn,” European Transactions on Telecommunications 2002, 2002.
138
[83] K. L. Calvert, M. B. Doar, and E. W. Zegura, “Modeling Internet topology,” IEEE Communications Magazine, June 1997. [84] M. Doar, “A better model for generating test networks,” Proc. of IEEE GLOBECOM 1996, Nov. 1996. [85] E. W. Zegura, K. L. Calvert, and M. J. Donahoo, “A quantitative comparison of graph-based models for Internet topology,” ACM/IEEE Transactions on Networking,, vol. 5, no. 6, pp. 770–783, 1997. [86] C. Jin, Q. Chen, and S. Jamin, “Inet: Internet topology generator,” University of Michigan, Tech. Rep. UM-CSE-TR-433-00, 2000. [87] A. L. Barab´asi, “The architecture of complexity: From the diameter of the www to the structure of the cell,” http://www. nd. edu/ networks/. [88] Z. N. H. Jeong and A. L. Barabasi, “Measuring preferential attachment in evolving networks,” Europhysics Letters, vol. 61, no. 4, pp. 567–572, 2003. [89] A. L. Barab´asi, “The physics of the web,” Physics World, July 2001. [90] D. Cohen, “All the world is a net,” New Scientist, April 2002. [91] R. Cohen and S. Havlin, “Scale-free networks are ultrasmall,” Phys. Rev. Lett., vol. 90, no. 5, p. 058701, 2003. [92] Y. Moreno, R. Pastor-Satorras, A. V´azquez, and A. Vespignani, “Critical load and congestion instabilities in scale-free networks,” Europhys. Lett., vol. 62, p. 292, 2002. [93] S. H. Yook, H. Jeong, and A. L. Barabsi, “Modelling the Internet’s largescale topology,” Proc. of the Nat’l Academy of Sciences, vol. 99, pp. 13 382– 13 386, 2002. [94] R. Pastor-Satorras and A. Vespignani, “Epidemic spreading in scale-free networks,” Physical Review Letters, vol. 86, no. 14, pp. 3200–3203, 2001. 139
[95] A. L. Barab´asi, R. Albert, and H. Jeong, “Mean-field theory for scale-free random networks,” Physica A, vol. 272, pp. 173–187, 1999. [96] A. Medina, A. Lakhina, I. Matta, and J. Byers, “Brite: Universal topology generation from a user’s perspective,” Boston University, Tech. Rep. BUCSTR-2001-003, 2001. [97] G. Bianconi, G. Caldarelli, and A. Capocci, “Number of h-cycles in the Internet at the autonomous system level,” ArXiv:cond-mat/0310339, 2003. [98] A. Fabrikant, E. Koutsoupias, and C. H. Papadimitriou, “Heuristically optimized trade-offs: A new paradigm for power laws in the Internet,” in Proc. of ICALP 2002, 2002. [99] H. Tangmunarunkit, R. Govindan, S. Jamin, S. Shenker, and W. Willinger, “Network topology generators: Degree-based vs. structural,” Proc. of ACM/SIGCOMM 2002, pp. 147–159, 2002. [100] H. Tangmunarunkit, J. Doyle, R. Govindan, and S. Jamin, “Does AS size determine degree in AS topology?” ACM SIGCOMM Computer Communication Review, 2001. [101] D. Krioukov, http://www.krioukov.net/ dima/rs.html. [102] J. Spencer and L. Sacks, “Modelling ip network topologies by emulating network development processes,” in IEEE Softcom 2002, 2002. [103] H. Fuk´s and A. T. Lawniczak, “Performance of data networks with random links,” Mathematics and Computers in Simulation, vol. 51, pp. 103–119, 1999. [104] L. Gao, “On inferring autonomous system relationships in the Internet,” in Proc. of IEEE Global Internet, 2000.
140
[105] S. Zhou and R. J. Mondrag´on, “The rich-club phenomenon in the Internet topology,” IEEE Comm. Lett., vol. 8, no. 3, pp. 180–182, March 2004. [106] S. Zhou and R. J. Mondrag´on, “Connectivity in the Internet topology,” in Proc. of PGNet2002. Liverpool, UK: EPSRC, May 2002, pp. 157–162. [107] S. Zhou and R. J. Mondrag´on, “Topological properties of the as-level Internet,” in Proc. of Int. Conf. on Telecommunications (ICT) 2002, vol. 3. Beijing, China: IEEE and IEE, June 2002, pp. 497–501. [108] S. Zhou and R. J. Mondrag´on, “Redundancy and robustness of the as-level Internet topology and its models,” IEE Elec. Lett., vol. 40, no. 2, pp. 151– 152, January 2004. [109] S. Zhou and R. J. Mondrag´on, “Analyzing and modelling the as-level Internet topology,” in Prof. of 1st Int. Working Conf. on Performance Modelling and Evaluation of Heterogeneous Networks (HET-NETs’03), Ilkley, West Yorkshire, UK, July 2003, arXiv:cs. NI/0303030. [110] S. Zhou and R. J. Mondrag´on, “Towards modelling the Internet topology - the interactive growth model,” in Proc. of 18 Int. Teletraffic Congress (ITC18), ser. Teletraffic Science and Engineering, J. Charzinski, Ed., vol. 5a. Berlin, German: Elsevier, Sept. 2003, pp. 121–130. [111] M. Woolf and D. K. Arrowsmith, “Modelling of tcp packet traffic in a large interactive growth network,” in IEEE Int. Symposium on Circuits and Systems (ISCAS), Vancouver, Canada, May 2004. [112] M. Woolf, D. K. Arrowsmith, S. Zhou, R. J. Mondrag´on, and J. M. Pitts, “Dynamical modelling of tcp packet traffic on scale-free networks,” (submitted), 2004.
141
[113] The Data Kit #0204 was collected as part of CAIDA’s Skitter initiative, http://www.caida.org. Support for Skitter is provided by DARPA, NSF, and CAIDA membership. [114] P. M. Gleiss, P. F. Stadler, A. Wagner, and D. A. Fell, “Small cycles in small worlds,” SFI Working Paper 00-10-058, 2000. [115] G. Bianconi and A. Capocci, “Number of loops of size h in growing scale-free networks,” Phys. Rev. Lett., vol. 90, no. 078701, 2003. [116] R. P.-S. G. Caldarelli and A. Vespignani, “Structure of cycles and local ordering in complex networks,” The European Physical Journal B, vol. 28, no. 2, pp. 183–186, 2004. [117] M. M. C. Gkantsidis and E. Zegura, “Spectral analysis of Internet topologies,” in Proc. of IEEE INFOCOM 2003, 2003. [118] G. Iannaccone, C. N. Chuah, R. Mortier, S. Bhattacharyya, and C. Diot, “Analysis of link failures in an ip backbone,” Proc. of the second ACM SIGCOMM Workshop on Internet measurment, 2002. [119] D. S. Callaway, M. E. J. Newman, S. H. Strogatz, and D. J. Watts, “Network robustness and fragility: Percolation on random graphs,” Physical Review Letters, vol. 85, no. 25, p. 5468, December 2000. [120] S. L. Tauro, C. Palmer, G. Siganos, and M. Faloutsos, “A simple conceptual model for the Internet topology,” in Prof. of Global Internet, San Antonio, Texas, 2001. [121] S. Zhou and R. J. Mondrag´on, “The missing links in the BGP-based AS connectivity maps,” in Proc. of Passive and Active Measurement (PAM) Workshop 2003.
San Diego, USA: NLANR, April 2003, pp. 219–222, arXiv:cs.
NI/0303028.
142
[122] S. Zhou and R. J. Mondragon, “On measuring and modeling the Internet topology at the autonomous systems level,” Submitted to ACM/IMC2004, 2004. [123] P. L. Krapivsky, S. Redner, and F. Leyvraz, “Connectivity of growing random networks,” Phys. Rev. Lett., vol. 85, no. 4629, 2000. [124] A. V. A. Vazquez, R. Pastor-Satorras, “Internet topology at the router and autonomous system level,” cond-mat/0206084, 2002. [125] S. Zhou and R. J. Mondrag´on, “The positive-feedback preference model of the as-level Internet topology,” Submitted to IEEE Communications Letters, 2004. [126] S. Zhou and R. J. Mondrag´on, “Accurately modelling the Internet topology,” 2004, preprint: arXiv.cs.NI/0402011. [127] K. Psounis, R. Pan, B. Prabhakar, and D. Wischik, “The scaling hypothesis: simplifying the prediction of network performance using scaled-down simulations,” ACM SIGCOMM Computer Communications Review, vol. 33, no. 1, 2003. [128] A. V. Y. Moreno, R. Pastor-Satorras and A. Vespignani, “Critical load and congestion instabilities in scale-free networks,” Europhys. Lett., vol. 62, no. 2, pp. 292–298, 2003. [129] S. Agarwal, C. N. Chuah, and R. H. Katz, “OPCA: Robust interdomain policy routing and traffic control,” in Proc. of the 6th InternationalConference on Open Architectures and Network Programming (OPENARCH 2003), 2003. [130] Pajek, http://vlado.fmf.uni-lj.si/pub/networks/pajek/. [131] Gunplot, http://t16web.lanl.gov/Kawano/gnuplot/.
143