Introduction to the Physics of Complex Networks

4 downloads 812 Views 4MB Size Report
links=Co-Authorship. • Networks structure to represent collaboration. • Questions: centrality, influence, thematic interests, expertise. Newman, Girvan, Phys. Rev.
Introduction to the Physics of Complex Networks J¨ org Reichardt [email protected] Institute for Theoretical Physics, University of W¨ urzburg, Germany

W¨ urzburg, Dec 2012

Outline

1 Introduction 2 Random Graphs as Models of Networks 3 Real Networks vs. Random Networks 4 Models of Real Networks

Small World Model Scale Free Models

Acknowledgments • • • • • • • • • • • • • •

Albert-Lazlo-Barabasi Daniel Bartholomae Stefan Bornholdt Dirk Brockmann Jahn Philip Gehrke Claudius Gros Annemarie K¨ohl Frederike Petzschner Stefan Pinkert Carl-Friedrich Schleussner J¨org Schultz Benjamin Stadtm¨ uller Douglas R. White Konstantin Klemm

What are networks made of? • Network may represent a

physical reality or an abstraction of such • Nodes=vertices • links=connections=edges • degree=number of

neighbours • directed, undirected,

weighted, non-weighted

The ultimate Network

The almost ultimate Network

Feuillet, L., Dufour, H. & Pelletier, J., et al. Lancet 370, 262 (2007)

Networks as Physical Reality • Protein-Interaction

Network • Nodes=proteins • links=physical

interactions (pairwise binding) • Networks structure to

represent biological function • Questions: Robustness,

inference of biological function from network Palla et al., Nature, 435 (2005)

Networks as Physical Reality • Gene Regulatory Networks • Nodes=Genes/Gene

Products • links=regulatory influence

(promotion, suppression) • Networks structure to

represent biological function • Questions: Robustness,

dynamical behavior, inference of biological function from network

Networks as Virtual Reality • WWW • Nodes=political blogs

prior to 2004 US election • links=hyperlinks • Networks structure to

represent opinion (formation) • Questions: Emergent

properties from observing autonomous agents

Lada Adamic, hp-Information Dynamics Lab

Networks as Virtual Reality • Co-Authorship network • Nodes=Authors of

scientific articles on networks • links=Co-Authorship • Networks structure to

represent collaboration • Questions: centrality,

influence, thematic interests, expertise

Newman, Girvan, Phys. Rev. E, 2004

Networks as Virtual Reality PNAS Cover 2004

• Co-citation/Co-

Appearance Network • Nodes=words, citations

to scientific articles • links=co-appearance • Networks structure to

represent “knowledge map” • Questions:

“bibliometrics”, thematic grouping, “mapping of science”, “knowledge domains”

Networks as Virtual Reality

• Protein-Folding Network • Nodes: protein-configurations

during MD simulation • Links: observed transitions • Network structure to represent

energy landscape • Questions: Basins, barriers, F. Rao and A. Caflisch, J. Mol. Bio., 342, 299, (2004)

transition states, stable, meta-stable configurations

Manifesto

• Networks are ubiquitous in nature • Physical Reality or Abstraction of Relations/Dependencies • Form follows function - Function follows form • What can we learn about complex systems by studying the topology

of interactions?

Random Graphs as Models of Networks • After Paul Er¨ os and Alfred Renyi (ER

graphs) • Pairs of nodes are connected independently

and with equal probability • Two ensembles: G (N, M) and G (N, p) • N nodes, M edges, connection probability p

Paul Erd¨ os

Alfred Renyi

(1913-1996)

(1921 - 1970)

• “microcanonical” and “canonical” • Equivalent in the thermodynamic limit with ML estimate for

p=

2M N(N − 1)

ning of the supercritical phase was studied by Bollob´ as (1984), Kolchin (1986) and Luczak (1990). Their results show that in this region the largest cluster clearly separates from the rest of the clusters, its size S increasing proportionally with the separation from the critical probability,

Xk does not diverge much from the approximative result Xk = N P (ki = k), valid only if the nodes are independent (see Fig. 7). Thus with a good approximation the degree distribution of a random graph is a binomial distribution

Important Characteristics

S ∝ (p − pc ).

k P (k) = CN−1 pk (1 − p)N−1−k ,

(9)

(14)

• How areAsthe we will see in Sect. IV.F, dependence is analnumbers of this neighbours distributed? which for large N can be replaced by a Poisson distribuogous with the scaling of the percolation probability in infinite dimensional percolation.



p(k) = D. Degree Distribution

N −1 k



tion

k −hki hki (pN ) p k (1 − p)N−k ≈ e P (k) ! e =e k!k! −pN

k

k −"k# %k&

k!

.

(15)

Erd˝ os and R´enyi (1959) were the first to study the

• p(k) is called distribution“Degree of the maximum and minimum degree in a Distribution”

random graph, the full degree distribution being derived later by Bollob´ as (1981). In a random graph with connection probability p the degree ki of a node i follows a binomial distribution with parameters N − 1 and p

• Average connectivity is i

k k N−1

N−1−k

(10)

This probability k represents the number of ways in which k edges can be drawn from a certain node: the probability of k edges is pk , the probability of the absence of addik tional edges is (1 − p)N−1−k , and there are CN−1 equivalent ways of selecting the k endpoints for these edges. Furthermore, if i and j are different nodes, P (ki = k) and P (kj = k) are close to be independent random variables. 2 To find the degree distribution of the graph, we need to study the number of nodes with degree k, Xk . Our main goal is k to determine the probability that Xk takes on a given value, P (Xk = r). According to (10), the expectation value of the number of nodes with degree k is

Xk/N

X hki =P (k = k) =kp(k) C p (1= − p)pN .

E(Xk)/N

0.10

0.05

• With variance

X ∆ k= (k − hki) p(k) = hki 2

0.00

0

10

20

30

k

FIG. 7. The degree distribution that results from the numerical simulation of a random graph. We generated a single random graph with N = 10, 000 nodes and connection probability p = 0.0015, and calculated the number of nodes with degree k, Xk . The plot compares Xk /N with the expectation value of the Poisson distribution (13), E(Xk )/N = P (ki = k), and we can see that the deviation is small.

• Important features: hki grows with N for constant p and ∆k/k → 0

for large hki. where

E(Xk ) = N P (ki = k) = λk ,

(11)

y has its origins in the 18th century in the hard Euler, the early work concentrating hs with a high degree of regularity. In the graph theory has become more statistical ic. A particularly rich source of ideas has y of random graphs, graphs in which the ributed randomly. Networks with a comand unknown organizing principles often m, thus random graph theory is regularly udy of complex networks. of random graphs was founded by Paul fr´ed R´enyi (1959,1960,1961), after Erd˝ os t probabilistic methods were often useful blems in graph theory. An detailed review available in the classic book of Bollob´ as emented by the review of the parallels beansitions and random graph theory of Cond the guide of the history of the Erd˝ osh by Karo´ nski and Ru´cinski (1997). In the riefly describe the most important results ph theory, focusing on the aspects that are ance to complex networks.

p=0

Is a network connected?

p=0.15

p=0.1

FIG. 5. Illustration of the graph evolution process for the Erd˝ os-R´enyi model. We start with N = 10 isolated nodes (upper panel), then connect every pair of nodes with probability p. The lower panel of the figure shows two different stages in the graph’s development, corresponding to p = 0.1 and p = 0.15. We can notice the emergence of trees (a tree of order 3, drawn with dashed lines) and cycles (a cycle of order 3, drawn with dotted lines) in the graph, and a connected cluster which unites half of the nodes at p = 0.15 = 1.5/N .

• Let u be the probability that a randomly chosen node does not

belong the giant component. • If a node does not belong to g.c., neither will its neighbours: R´ enyi model X k p(k)u ssic first article on random graphs, Erd˝ os u The= construction of a random graph is often called in the mathematical literature an evolution: starting with a set of N isolatedkvertices, the graph develops by the suc-

ne a random graph as N labeled nodes condges which are chosen randomly from the

9

= e −hki

X (hkiu)k k

k!

= e hki(u−1)

• The size of the largest connected component is hence:

S = 1 − u = 1 − e −hkiS

Properties of the largest connected component • Mean comp. size:

hsi =

1 1 − hki + hkiS

• We observe a

phase-transition with S

∝ (hki − 1)β with β = 1

hsi ∝ |hki − 1|−γ with γ = 1 Adding more and more links to the network, we pass from a disconnected phase to a connected phase as soon as hki = 1.

How fast can we go from A to B? • Reformulate the question: How many vertices

can we possibly reach with n steps? • How many choices do we have after the first

k

step? • “Excess Degree Distribution”:

q(d) = P

(d + 1)p(k = d + 1) (d + 1) = p(k = d + 1) hki d=0 (d + 1)p(k = d + 1)

• Average number of options after one step:

hdi =

X d

dq(d) =

hk 2 i −1 hki

• For Poissonian p(k) we have q(d) = p(k = d) and thus hdi = hki.

d

How fast can we go from A to B? • With one step, we could visit z1 = hki nodes. • With two steps, we could visit z2 = hkihdi nodes. • After m steps, we could visit

 zm =

z2 z1

m−1

z1 = hdim−1 hki

• Hence, if hdi = hki we could visit every node in the network with

only D≈

log N steps. loghdi

• Often, we study the average shortest path length

l=

2 N(N−1)

P

i 1 • short average path lengths • Any deviation from this must be explained by non-random processes!

Small World Networks • First Problem: Nodes are not connected independently • Social Networks: Two people are more likely to be be “friends” if

they have a common friend. • Clustering coefficient c(k) and hci:

c(k) =

2m k(k − 1)

• For ER graphs, we have c(k) = p independent of k and network size • Can the short path path lengths still hold in highly clustered

networks, such as social networks?

Six Degrees of Separation

Stanley Milgram (1933-1984)

• Letters addressed to stockbroker in

Boston given out to randomly selected people in Omaha, Nebraska • Passing only to personal acquaintances Milgram, S., “The small world problem”, Psychol. Today 2, 60-67 (1967) Dodds et al., Science, Vol. 301. no. 5634, pp. 827 - 829, (2003)

Milgram’s Experiment Starting Population Nebraska Random Nebraska Stockholders All Nebraska Boston Random All

Mean Chain Length 5.7 5.4 5.5 4.4 5.2

• Surprisingly many letters arrived, mean of 5-6 steps • Conclusion: Short paths exist, humans are able to find them • 2001, repreated using e-mail (60,000 users, 18 targets, 13 countries) Milgram, S., “The small world problem”, Psychol. Today 2, 60-67 (1967) Dodds et al., Science, Vol. 301. no. 5634, pp. 827 - 829, (2003)

How to connect clustering and short pathlengths? The Watts-Strogatz Model:

Every node is connected to 2k nearest neighbours on a ring. Rewire connections with probability p Interpolates between lattice and ER graph D. Watts & S. Strogatz, Nature, 393, 440, (1998)

Pathlength and Clustering in Small World Networks

Cross-over between localized and fully mixed (mean field) interactions Allows to have short path lengths and high clustering in one network

C(p) =

2(2k − 1)

(1 − p)

(8)

with N = 50 to N = 8000, and averaging over 5000 samples, that the two definitions ˜ r (we see in figure (9) that the difference between C(p) and C(p) is very small), and that are indeed of order 1/N . The behaviour of C(p) is therefore very simply described by VOLUME 84, NUMBER 14 he dependence on N is very small.

PHYSICAL RE Analytic Results

C(p)

0.3 2

l

10

0.4

1

0.2

10

l/L

0.03 0.3 0.02

10

100 1000

0.01

0.1 "

#1

10

0.1

0.2

0

10

#2

10

0.01

#3

10

0.1 0 0 !3 10

0

0.2

0.4

0.6

0.8

0.0 0.001

1 !1

10

0.01

0.1

1

10

100

1000 10000

L/!

p

N 1 √ tanh k 2 x + 2x



x x +1



˜ −1 of system size on FIG. 2. TheC(p), average sus p, for k = 2 (C(0) = C(0) = 0.5), N = 1000, 2000, 5000: open symbols are for and path the length as a fraction a kfor !N 1 small-world graph, plotted against the average number is C(0)(1 − p)3 . Inset: corrections C(p) − C(0)(1 − p)3 (filled symbols) = 1000 (circles), 2 3 ˜ shortcuts. The circles are numerical measurements for = 5000 (triangles), and C(p) − C(0)(1 − p) (open symbols) 1000of(circles), N = 2000 3 ˜ for N =L!j the discrete model the solid line is the analytic solution for angles). We see that the corrections go to zero as 1/N for C(p); the corrections for C(p) areand larger,

l(p) =

c(p) =

3(k − 1) (1 − p) 2(2k − 1)

where x model, = Nkp is number ofon shortthe continuum Eq. (20). The error bars the numerical measurements are smaller than the points. Lower inset: the cutsdataaverage replotted on log-log scales, showing the convergence on, we have shown that&the small-world behaviour as defined same by the chemical Barrat, Weigt, EPJ B, 13, 547-560, –(1999) of soon the numerical and analytic results in the limit of large L!j. g coefficient – is indeed present for any finite value of 0 < p < 1 as as the network is Upper inset: the average path length on small-world graphs with 6 , for values of f from 0.01 up to 1 (circles) and the L ! 10Newman, Moore, Watts, PRL, 84, 3201-3204 (2000) analytic solution, Eq. (19) (dotted line).

10

P(k)

(b)

(a)

-2

!6

10 0 10

!6

P(k)

0

10

10!400 10 10 10

P(k)

World-Wide-Web:

10 10 0 10

(a) (c)

-4

10

Pin(k)

-6

10

!2

10

-8

10

-10

10

-2

10

0

10

2

10 k

4

10

6

10

-2

10

0

10

2

10 k

• Squares: Albert et al,

1999, N = 325, 727 4

• Circles: Broder et al,

2000, N > 2 × 108 • Log-Bins

4

10

6

10

!4

10

0

(b (d

(d)

(d

(c)

10 10!2 10

!6

10

(c)

!6 !2 !2

!4 !4

10

P(k)

a subhave 1999) btainet al. roder ining 2.72 rders 2000) WW, d two

as (b) γFig. (a) with (b the power-law γIr=!2.15 2.3 and (see gree exponentsscaling between γIas(a) 2 I =3a !2 !2 101995 survey 10 0 of the Internet topology at the rout 103, 888 nodes found γIr = 2.48 (Falouts containing 1999). Recently Govindan and Tangmunarunkit (b !4 !4 the connectivity of(a) nearly 150, 000 route 10mapped 10 !2 10 nearly 200, 000 router adjacencies, con faces and the power-law scaling Level: with γIr ! 2.3 (see Fig. 3a Internet at Router

Is the WWW and Internet wired randomly?

Pout(k)

d the eb are RLs). s and sically

measure the clustering coefficient using Eq. (1). One way to avoid this difficulty is to make the network undirected, making each edge bidirectional. This was the path followed by Adamic (1999) who studied the WWW at the domain level using an 1997 Alexa crawl of 50 million webpages distributed between 259, 794 sites. Adamic removed the nodes which have only one edge, focusing on a network of 153, 127 sites. While these modifications are expected to increase somewhat the clustering coefficient, she found C = 0.1078, orders of magnitude higher than Crand = 0.00023 corresponding to a random graph of the same size and average degree.

!6 !6

101!40 02 10 10 10 10 10 10

13

0

2

1

3

02

10 1010 1010 10 10 10 k k

13

10 10

(c) !6 !2 • Govindan and Tangmunarunkit, 10 0

1

2

10 150, 10 000 10 2000, N ≈

• Log-Bins !4 10

3

10 10 k

0

10

2

10

(d 1

2

10

51

Exponential and Scale free Degree distributions compared 0.1

æ

à æ

à æ

æ à

æ

0.001

-5

10

æ æ æ à à à æ àæ àà æ àà æ àààà ààà æ æ æ æ æ æ æ

1.0

1.5

2.0

3.0

5.0

7.0

10.0

15.0

æ 20.0

• Same average degree hki = 5, blue circles Poissonian, purple squares

scale free with exponent γ = 3 • Exponential: mean degree is typical, scale free: there is no typical degree

Models of Scale Free Networks The Barabasi-Albert Model • Idea: Growth + Preferential Attachment • Start with n0 nodes wired randomly • At each time step, add a node and connect it to m nodes already

present in the network • Probability of of connecting to node i is prop. to ki

t

t+1 m=3

n(t) A.L. Barabasi & R. Albert, Science, Vol 286, 509, (1999)

Continuum solution to the BA-Model ∂ki ∂t

ki = mΠ(ki ) with Π(ki ) = P j

=

kj

where

X

kj = 2M = 2mt

j

ki 2tr

t with i.c. that ki (ti ) = m ti   m2 t m2 t P(ki (t) < k) = P ti > 2 , i.e. added later than 2 k k ki (t) = m

Nodes are added at constant time intervals, hence P(ti ) =   m2 t 1 m2 t P ti > 2 = 1− 2 k k n0 + t

1 (n0 +t)

=

1 N(t)

we review each of them next. ory: The continuum approach introi and Albert (1999) and Barab´ asi, Al999) calculates the time dependence of given node i. This degree will increase node enters the system and links to bility of this process being Π(ki ). Asa• continuous real variable, the rate at We calculate the degree s expected to be proportional to Π(ki ). distribution satisfies the dynamical equation

P (k) ∼ 2m1/β k −γ ,

with

γ=

1 +1= 3 β

(85)

being independent of m, in agreement the numerical Continuum solution to the with BA-Model results.

2



m2 t 2 k (78)

∂k denominator goes over all2 nodes in the 2m t 1 newly introduced = one, thus its3value is , leading to n0 + t k 2 1 ∂kp(k) ki i  n0 = . ≈ 2m k 3 for t(79) ∂t 2t

his with the condition • equation, Find a scale freeinitial distribution at its introduction has ki (ti ) = m, is p(k) ∝ k −γ " #β t 1 , with β = . (80) tiwith exponent γ 2= 3

dicates that the degree of all nodes

(b)

10

10

-5

10

-7

10

-2

10

-2

10

-4

-9

10

0

10

1

2

10

10 k

10

3

10 10

3

2

10

-4

10

ki(t)

ti >

0

10 -3

P(k)/2m



0

P(k)

ki mΠ(ki ) = m !N−1 ∂P. j=1 p(k, t) = kj

10

-6

1

10

10

(a)

-6

10

0

10 10

1

10

k

2

3

10

10

0

2

10

-8 0

10

t

10

1

10

k

4

10

2

3

10

• a) different m FIG. 21. (a) Degree distribution of the scale-free model, with system N =•mb) = 300, 000 and m0 = msizes = 1 (circles), m0 = m = 3 0 +t different (squares), m0 = m = 5 (diamonds) and m0 = m = 7 (triangles). The slope of the dashed line is γ = 2.9. The inset shows the rescaled distribution (see text) P (k)/2m2 for the

rating preferential attachment, w was independently proposed and (1999).

Alternatives? • BA model without preferential

40

attachment

• Degree distribution falls off

exponentially

20

1

10 0 1 10

-3

10

3

5

10

10

1

t

P(k)

randomly and connect it randomly to m nodes in the network

ki(t)

10

• Start with n0 nodes • At each time step, add a node

1

30

-1

(a)

1

-5

10

• circles, squares,diamonds, triangles for

1

m = 1, 3, 5, 7 -7

10

0

50

100

1

k

FIG. 22. (a) Degree distribution fo (circles), m0 = m = 3 (squares),

rating preferential attachment, while keeping N constant was independently proposed and studied by Amaral et al. (1999).

Alternatives?

0 1 10

-3

P(k)

• No scale 10 free distribution

• Circles, squares, diamonds for

t = N, 5N, 40N

3

10

10

0

20 ki(t)

ki(t)

• BA model without growth 40 • Start with N nodes 30 -1 10 time step, pick a node • At each 20 randomly and connect it to m other 10 nodes under preferential attachment

-1

0

5

10

10 t

10

-2

10

-3

10

-4

10

-5

(a)

-5

10

10

0

50000 t

(b)

-7

10

0

50

100

k

10

0

10

1

10

k

2

10

3

Meaning of it all • BA-Model is “father” of large class of network models producing SF

degree distributions • Basic ingredients growth + preferential attachment are present in all

models • Preferential attachment and resulting broad degree distributions are

ubiquitous in real world networks • In technologial, biological and economical contexts • At the level of individuals and groups of individuals • Broad degree distribution has immense impact on processes taking

place on networks • Before fitting a power law, read:

A. Clauset, C. R. Shalizi and M. E. J. Newman Power-law distributions in empirical data http://arxiv.org/abs/0706.1062

More network characteristics: Mixing and Assortativity

• ”Assortativity matrix” ers as fractions of links connecting nodes of

type r and s. • Row/colum totals as = • Obey sum rules

X

P

r

ers and br =

ers =

r ,s

X s

as =

P

X

s ers

br = 1

r

• Assortativity coefficient:

r=

X

ess − as bs

s

measures the deviation from the expectations based on marginals

Assortativity by degree

r > 0: assortative typically social

r < 0: dissortative typically technological, biological

Network Comparison

Newman, SIAM Review,2000

A first conclusion

• Real networks are far from randomly wired • Differ in many aspects from “thermodynamical” expectations • Need to build models to understand network formation process • Why is topology different from random? • Because topology matters for the processes taking place on

networks! • Next section will explore this connection somewhat.