The Influence of Network Properties on the ... - Springer Link

J Stat Phys (2013) 152:519–533 DOI 10.1007/s10955-013-0775-z

The Influence of Network Properties on the Synchronization of Kuramoto Oscillators Quantified by a Bayesian Regression Analysis Guilherme F. de Arruda · Thomas Kauê Dal’Maso Peron · Marinho Gomes de Andrade · Jorge Alberto Achcar · Francisco Aparecido Rodrigues

Received: 13 November 2012 / Accepted: 23 May 2013 / Published online: 6 June 2013 © Springer Science+Business Media New York 2013

Abstract The influence of the network structure on the emergence of collective dynamical behavior is an important topic of research that has not been fully understood yet. In the current work, it is shown how statistical regression analysis can be considered to address this issue. The regression model proposed suggests that the average shortest path length is the network property most influencing the degree of synchronization of Kuramoto oscillators. Moreover, this model revealed to be very accurate, being the predicted and measured values of synchronization highly correlated. Therefore, the regression modeling allows predicting the values of the dynamic variable in terms of network structure. Keywords Complex networks · Synchronization · Regression analysis 1 Introduction Complex systems are composed by discrete elements connected through non-linear interactions, presenting self-organization and emergent behavior [1, 2]. The structure of these systems can be described by the network of interactions among its components [2–5]. Examples of complex networks include the Internet [6], which is composed by routers connected by optical fibers; protein networks [7], which consist of proteins linked by physical interactions; our brain [8], which is formed by a set of neurons linked by synapses; and social groups [9], which are represented by people connected through social relationships [4]. G.F. de Arruda · M.G. de Andrade · F.A. Rodrigues () Departamento de Matemática Aplicada e Estatística, Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, Campus de São Carlos, Caixa Postal 668, 13560-970 São Carlos, SP, Brazil e-mail: [email protected] T.K. Dal’Maso Peron Instituto de Física de São Carlos, Universidade de São Paulo, Av. Trabalhador São Carlense 400, Caixa Postal 369, CEP 13560-970, São Carlos, São Paulo, Brazil J.A. Achcar Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, São Paulo, Brazil

520

G.F. de Arruda et al.

The beginning of the research on complex networks was mainly motivated by the observation that maps of natural and artificial systems have intricate connectivity when compared to regular (e.g., lattices) or random graphs (e.g., [3, 4, 10, 11]). Indeed, many complex networks are characterized by a non-trivial organization, presenting distribution of connections following a power-law [4], and complex structures, such as communities [9]. Many authors have verified that this highly complex topology plays a fundamental role in the emergence of collective behavior, such as synchronization [12, 13] and percolation [3, 12]. Various works have investigated the basic mechanisms through which topological properties influence the emergence of the synchronous state [13]. Watts and Strogatz [11] showed that when the average shortest path length of a network is shortened, a more efficient coupling is obtained, enhancing the synchronization of Kuramoto oscillators. They verified this tendency by increasing the probability of edge rewiring in regular networks, which creates shortcuts between pairs of vertices. By taking into account the master stability function approach, Barahona and Pecora obtained similar results [14]. In addition, Nishikawa et al. [15] analyzed the influence of the load, quantified by the betweenness centrality, on the network synchronization. They suggested that networks with an homogeneous distribution of connectivity are more synchronizable than heterogeneous ones. Following the same formalism, other authors observed how the clustering coefficient (e.g., [16]) and the degree correlations (e.g., [17]) is related to the synchronization of coupled oscillators. All these works investigated how the network topology influences the synchronization by varying structural properties, while trying to keep other features constant. Although the analyses performed by those authors are important to understand the relationship between the structure and dynamics of networks, their results are not conclusive. When the parameters characterizing the original network are modified, other network properties also change, which raises difficulties to draw conclusions about the relationship between one single property and the degree of synchronization [13]. To overcome these difficulties, here we propose the use of statistical regression analysis, which allows understanding how the typical value of the dependent random variable changes when any one of the independent variables is modified, whereas the other variables are held fixed. Thus, defining the macroscopic complex order parameter as the dependent variable and the network properties as the independent ones, we can quantify how each topological feature influences the synchronization of coupled oscillators. Although regression does not imply causation, the influence of the structure of networks on dynamical processes is assumed, since it has been verified in many works (e.g., [12, 13]). Our results suggest that the average shortest path length is the network property that most influences the synchronization of Kuramoto oscillators, followed by the variance of the betweenness centrality. Moreover, the regression model revealed to be very accurate, being the predicted and measured values of synchronization highly correlated. Therefore, it is possible to predict the values of the level of synchronization from the topological properties of networks. In the following sections, we introduce some concepts related to complex networks, synchronization of coupled oscillators, regression analysis and Bayesian inference. Results and discussion are presented subsequently. Concepts of network characterization and Markov chain Monte Carlo are presented at the end of this paper.

2 Network Models Many stochastic models have been developed to generate networks with different topological properties [12]. A simple model based on a Bernoulli process was proposed by Erd˝os

The Influence of Network Properties on the Synchronization

521

and Rényi, called random graph [18]. In this case, n vertices are connected according to a fixed probability p. Therefore, the number of connections follows a binomial distribution, converging to a Poisson distribution in the limit of large n and constant average degree [3]. Another model was introduced by Watts and Strogatz [11], called small-world model. To construct a small-word network, one starts with a regular lattice of n vertices in which each vertex is connected to κ nearest neighbors in each direction, totalizing 2κ connections, where n κ log(n) 1. Next, each edge is randomly rewired with probability p. In the case that p = 0, the network is an ordered lattice with high number of loops of length three but large distances. On the other hand, if p → 1, the network becomes a random graph—not in the sense of Erd˝os and Rényi [19]—with short distances but few loops. The small-world regime is obtained with intermediary values of p, when both short distances and a large number of loops are present. Random graphs and small-world networks have homogeneous degree distribution, which is seldom observable in real-world networks [4]. To overcome this limitation, Barabási and Albert proposed a model based on growth and preferential attachment, called scale-free network model [20]. In this case, a network is generated starting with a set of m0 connected vertices. After that, new vertices with m edges are included in the network. The probability of the new vertex i to connect with an existing vertex j is proportional to the number of connections of j , i.e., kj

P (i → j ) =

u ku

(1)

.

The most connected vertices have greater probability of receiving new vertices. In this way, networks generated by this model present a power-law degree distribution, P (k) = k −γ , where γ = 3 in the thermodynamic limit (n → ∞) [3]. The preferential attachment model of Barabási and Albert was generalized by Krapivsky et al. [21], who considered a non-linear preferential attachment probability, i.e., kjα

P (i → j ) =

α u ku

(2)

.

where α is a constant. If α < 1, the network generated presents a stretched exponential degree distribution. On the other hand, for α > 1, a single site tends to connect to nearly all other sites. When α = 1 we recover the Barabási and Albert model.

3 Synchronization A model to describe the synchronization of a system was proposed by Kuramoto [22]. In complex topologies, each oscillator obeys an equation of motion defined as [13] θ˙i = ωi + λ

n

aij sin(θj − θi ),

i = 1, . . . , n,

(3)

i=1

where λ is the coupling strength, ωi is the natural frequency of the oscillator i, distributed according to some function g(ω), and aij are the elements of the adjacency matrix A. Coupling strengths higher than a critical value λc produce the onset of synchronization [13]. Indeed, as the coupling strength is increased, more and more oscillators get entrained around the average phase of the whole system and the network settles in the complete synchronized state.

522


Note that the term aij represents the influence of the network structure on the Kuramoto model, since for the traditional case, in which a network is fully connected [22], aij = 1 for all pairs (i, j ). The collective dynamics of such system can be measured by the macroscopic complex order parameter [13], i.e. n 1 iθj (t) e r(t) = (4) , n j =1

where 0 ≤ r(t) ≤ 1 measures the phase coherence of populations. The case in which r(t) ≈ 1, all nodes oscillate with similar phases, whereas r(t) ≈ 0 indicates that there is no coherence among the oscillators.

4 Regression Analysis Regression analysis is used for prediction and forecasting in many scientific areas [23–26], consisting of statistical techniques for modeling and analyzing several variables, when we are interested on the relationship between a dependent variable and one or more independent variables. In this way, regression analysis is useful to understand how the value of the dependent variable changes when any one of the independent variables is varied, whereas the other independent variables are held fixed. In general the main goal is the estimation of a function of the independent variables called the regression function. We also have interest to characterize the variability of the dependent variable around the regression function, which can be described by a probability distribution. This function could be given as linear or non-linear models (see for example, [24]). The regression analysis is performed by relating a dependent variable Y with one or k = (X1 , . . . , Xk ). In addition, in fixed independent variables given in a vector denoted as X the regression models there is associated to the vector of independent variables, a vector of unknown regression parameters, denoted as α = (α0 , α1 , . . . , αk ). A general regression model is defined as α ) + ε, Y = f (X,

(5)

α ) is a specified function and the error term ε is a random variable assumed to where f (X, have a particular probability distribution. This random error includes all other factors which could influence the dependent variable Y not included in the regression model. A particular regression model is a linear model [26], which is given by yi = α0 + α1 xi1 + α2 xi2 + · · · + αk xik + εi ,

(6)

for i = 1, . . . , n, where n is the sample size. The error term εi is a random variable assumed to be normally distributed with mean zero and standard deviation σ . In the case of synchronization of Kuramoto oscillators, we assumed that the dependent variable yi represents a function of r(t) given in Eq. (4), which is obtained for a large value of t , e.g., t ≈ 104 . On the other hand, the independent variables are the network structural measures (see Appendix 8.1 for a presentation of some network measures). Note that each observation i corresponds to a network generated by a given model. Since the dependent variable r(t) is defined in the interval (0, 1), we could consider a standard existing Beta regression models introduced in the literature to analyze the data


523

(e.g., [27, 28]). Alternatively, we could take into account a transformation of the data to assume a standard linear model (Eq. (6)), which is a simpler approach. Here, we adopted this last option by taking into account a log-additive transformation of the dependent variable r, given by a logit transformation, i.e., y = log(r/(1 − r)), since the dependent variable is defined in the interval (0, 1). The use of standard transformations for the data, especially for the dependent or response variable, is usual in regression analysis to satisfy the required assumptions needed in linear regression models for purposes of inferences and prediction [29]. This transformation has been adopted by many authors considering a Dirichlet regression model also known as compositional data in the presence of covariates which generalizes the beta regression models (e.g., [30, 31]). Therefore, considering the logit transformation, the regression model to relate the structure and synchronization of networks is given by, y = log

r 1−r

= α0 + α1 H + α2 B + α3 V (B) + α4 + α5 R + α6 S + ε,

(7)

where H is the Shannon entropy of the degree distribution, B is the average betweenness centrality, V (B) is the variance of the betweenness centrality, is the average shortest path length, A is the assortativity coefficient, and S is the network average search information. All these measures are presented in an appendix at the end of this paper.

5 Bayesian Inference Bayesian inference has been considered to address data analysis problems in physics, ranging from extra-solar planet detection to inference of data from surface experiments (see for instance [32, 33]). Its use is associated to the task of draw conclusions from missing data, as Bayesian methods have been developed for reasoning quantitatively when statements cannot be made with certainty [34]. Since the reliability of network data is a source of great concern (e.g., [35, 36]), the Bayesian inference can be considered as a suitable tool for network analysis. The linear regression modeling under the Bayesian paradigm considers the parameters of the model, α1 , . . . , α6 (see Eq. (7)), as random variables. Note that in the classical approach, such parameters are constants [24]. All the prior knowledge about these parameters are represented by prior probability density functions. The prior distribution represents the amount of knowledge before obtaining the data. In the absence of information, it is take into account uninformative priors, such as the Jeffreys prior [37]. A vast quantity of prior probability density functions can be found in [37]. In the current work, we adopted a non-informative normal distributions with zero mean and variance σ 2 . Considering the likelihood function constituted from the hypothesis done for the data {yi , i = 1, . . . , n} (or equivalently, for the errors {i , i = 1, . . . , n}), the methodology of the Bayesian inference consists in determining a probability density a posteriori, combining, through Bayes theorem, the likelihood function and the joint prior density of the parameters of interest. Therefore, once the likelihood and prior distributions are specified, the problem is reduced to computing the posterior probability distributions, which provide the full description of the state of knowledge about the parameters of the model. Considering that the errors i are independents and identically distributed according to a Normal N (0, σ 2 ), we have the likelihood function for estimation of α = (α0 , . . . , α6 ), which

524


is given by [34]

L α , σ 2 y =

n g yi α , σ 2 ,

(8)

i=1

where σ 2 = V ar(i ), y = (y1 , . . . , yn ) and 2 1 1 2 exp − 2 yi − f (xi , α ) , g yi α , σ = √ 2σ 2πσ where f (xi , α ) is defined by Eqs. (5) and (7). Therefore, Eq. (8) can be written as

n 2 −n/2 2 1 2 L α exp − 2 , σ y ∝ σ yi − f (xi , α ) . 2σ i=1 Adopting prior densities for the parameters αi , i = 0, . . . , 6, and σ 2 given as 1 1 exp − 2 (αi − θi )2 , i = 0, 1, . . . , 6 π0 (αi ) ∝ σ0 2σ0

(9)

(10)

(11)

and 1 π0 σ 2 ∝ 2 , σ

(12)

i.e, αi ∼ N (θi , σ02 ), i = 0, . . . , 6, in which θi and σ 2 are known hyperparameters. For σ 2 we considered uninformative priors [37]. Note that when it is adopted prior densities as normal N (θi , σ02 ) for the parameters αi , we can adopt large values for σ02 , which provides the level of prior information (or lack of information) about these parameters. When σ02 → ∞, we have uninformative uniform priors for αi , i = 0, 1, . . . , 6. The joint posterior density for α and σ 2 is given as π α , σ 2 y ∝ L α , σ 2 y π0 ( α )π σ 2 , (13) where π0 ( α ) = π0 (α0 )π0 (α1 ) . . . π0 (α6 ), assuming prior independency between the parameters αi , i = 0, 1, . . . , 6. To obtain Bayesian estimation for α0 , . . . , α6 , we can analyze some characteristics of the posterior probability density function (Eq. (13)) in terms of a few numbers, such as mean (expected value), variance, median and mode. These posterior summaries of interest are calculated numerically via Markov chain Monte Carlo (MCMC) simulation methods, as the popular Gibbs sampling algorithm [38] or the Metropolis–Hastings algorithm when the conditional posterior distributions required for the Gibbs sampling algorithm do not have standard parametrical forms (e.g., [39]). A brief introduction to MCMC is presented in an appendix at the end of this paper. A great simplification in the simulation of samples for the joint posterior distribution for the parameters of the model is given by using the WinBugs software [40] which only requires the specification of the distribution for the data and the prior distributions for the parameters. The use of the MCMC algorithm results in a sample of the vector of parameters, de (j ) = ( α (j ) , σ 2(j ) ), j = 1, . . . , M, generated from the posterior density (see noted here as Θ Appendix 8.2). Under the consideration of a quadratic loss function, the Bayesian estimators of the parameters are the mean of the posterior distribution. In this case, these estimates are obtained from the Monte Carlo method by taking into account the mean of the


525

(j ) , j = 1, . . . , M}. The summary of the posterior distribution of the sample generated {Θ parameters can also be obtained from this sample. The evaluation of the significance of the parameters of the regression model can be done by considering the credible interval with (j ) 100(1 − α) % of credibility, i.e, I C = [Θα(j ) , Θ1−α ], which is obtained from the sample (j ) (j ) given that P (Θα ≤ Θ ≤ Θ1−α ) = α. Such confidence interval can be understood as an analog of frequentest confidence intervals. Additional information about Bayesian inference in physics can be found in the recent reviews by Toussaint [32] and Trotta [41] and in the book by Gregory [42]. Concepts and methods of Bayesian inference can be found in textbooks such as [34, 43].

6 Results and Discussion In order to characterized the large scale network topology, we adopted the following measures: (i) the Shannon entropy of the degree distribution (H ), (ii) the average (B ) and (iii) the variance of the betweenness centrality (V (B)), (iv) the average shortest path length ( ), (v) the assortativity coefficient (A), and (vi) the average search information (S ). These measures quantify different network properties, such as heterogeneities in the number of connections (H ), typical distance between nodes ( ), centrality (B), degree correlations (A) and capacity of information propagation (S ). We extracted the data of the independent variables, i.e., topological measures, from networks generated by different models, including non-linear Barabási-Albert networks (NBA) whose exponent in preferential attachment was equal to α = 1, 1.5, 2 and 3 (see Eq. (2)), random graphs (ER) and small-world networks (SW) with probability of rewiring p = 0.2 and 0.3. Such models were selected in order to obtain a variability in the independent variables, since they generate topologies with different properties [44]. For each of these models, we generated a set of 50 networks composed by n = 103 vertices and average degree k = 4, obtaining a database of 350 networks. The number of vertices and connections were kept constant in all networks, because we were interested in verifying only how the organization of connections influences the degree of synchronization. This assumption guaranteed that effects of network size and density of connections did not influence the degree of synchronization, since most network measures and level of synchronization depend on n and k [19, 44]. Due to the difference of scales of the network measures, we took into account the stan, where x and dardization procedure, i.e, each measurement x is normalized as z = x−x σx σx are the average and standard deviation of the values of x calculated considering the whole set of networks, respectively. The probability distribution of each transformed measurement presents zero mean and unit variance. The data representing the dependent variable were generated from the simulation of the Kuramoto model in the 350 networks generated by the models considered. We adopted an uniform distribution of the natural frequency in the range [−1/2, 1/2] and considered coupling values λ = 0.15 and λ = 0.25. Figure 1 shows the coherence diagram, which helps to justify the choice of such values of λ. Note that for λ > 0.25, the values of r become similar for all network models. On the other hand, λ < 0.15 approach to the critical couplings necessary to reaches the onset of synchronization, which are not the same for all network models [13]. To analyze the relationship between the structure and dynamics of networks under a Bayesian approach considering the regression model defined in Eq. (7), we assumed approximately non-informative normal distributions for the regression parameters with zero mean and variance equal to 100. Using the WinBugs software [40], we first simulated a

526


Fig. 1 Coherence diagram r(λ) for non-linear Barabási-Albert network (NBA) with exponents α = 1, 1.5, 2 and 3, random graphs (ER) and small-world networks (SW) with probability of rewiring p = 0.2 and 0.3. Each point is an average over 50 networks

“burn-in-sample” of size 10,000 discarded to eliminate the effect of the initial values used in the Gibbs sampling algorithm. After this “burn-in-sample period”, we simulated another 50,000 Gibbs samples for each regression parameter. From this sample, we selected a final sample of size 1,000, taking a sample chosen from every 50 simulated samples to have an approximately uncorrelated sample to be used to find the posterior summaries of interest. The posterior summaries of interest (posterior mean, posterior standard deviation and 95 % credible intervals) are given in Tables 1 and 2. Convergence of the algorithm was monitored using standard trace plots of the simulated samples (see, for example, [45]). Tables 1 and 2 present the coefficients for regression models defined in Eq. (7) for λ = 0.15 and λ = 0.25, respectively. Only the regression parameters related to the betweenness centrality (V (B)), the average shortest path length ( ) and the average search information (S ) are statistically significant (95 % credible intervals do not include the zero value), which indicates that these measures most influence the degree of synchronization. In addition, since the coefficients αj are the expected change in the dependent variable yi for an one-unit change in xj when the other covariates are held fixed, it allow us to conclude that the average shortest path length is the network property that most affect the value of r (α4 = maxi (|αi |), i = 1, . . . , 6). In addition, since α4 < 0, higher average shortest path lengths implies in smaller degree of synchronization for a defined coupling λ. This can be explained by the fact that the average shortest path length is a network property related to the efficiency of information processing [13]. Indeed, it was verified before that networks presenting shorter paths present more efficient coupling and thus enhanced degree of synchronization (e.g., [11, 14]). The regression analysis also suggests that other important measure is the variance of the betweenness centrality. As shown in Tables 1 and 2, since α3 < 0, the increase of this

The Influence of Network Properties on the Synchronization Table 1 Posterior summaries for the model described in Eq. (7) considering λ = 0.15 in the Kuramoto model

Table 2 Posterior summaries for the model described in Eq. (7) considering λ = 0.25 in the Kuramoto model

Mean

Std. Dev.

527 95 % credible intervals

Measure

α0

−1.145

0.023

(−1.190, −1.098)

–

α1

0.099

0.065

(−0.031, 0.227)

H

α2

−0.096

0.141

(−0.360, 0.167)

B

α3

−0.535

0.162

(−0.836, −0.217)

V (B)

α4

−1.617

0.394

(−2.356, −0.862)

α5

0.028

0.176

(−0.290, 0.380)

R

α6

0.251

0.064

(0.120, 0.378)

S

σ2

0.162

0.012

(0.138, 0.187)

–

Std. Dev.

95 % credible intervals

Measure

Mean α0

0.192

0.023

(0.158, 0.225)

–

α1

0.242

0.065

(0.142, 0.339)

H

α2

−0.062

0.141

(−0.272, 0.146)

B

α3

−0.741

0.162

(−0.974, −0.501)

V (B)

α4

−1.189

0.394

(−1.765, −0.602)

α5

0.250

0.176

(−0.008, 0.514)

R

α6

−0.220

0.064

(−0.316, −0.120)

S

σ2

0.095

0.012

(0.082, 0.110)

–

measure tends to decrease the degree of synchronization. In scale-free networks, the high variance of the betweenness centrality is due to correlation between the node degree and the load. In this case, highly connected oscillators synchronize faster among them and form synchronization clusters [46]. These clusters will synchronize in longer time scale [47], which undermines the emergence of the synchronous state. Therefore, higher variance in the load, quantified by V (B), decreases the synchronization level. With respect to the average search information, higher values of this measure implies in higher values of r. Thus, less structured networks are easy to navigate and easy to synchronize, since the average search information is higher in these networks [48]. Analyzing the remainder measures, it is interesting to observe that the average betweenness centrality and the degree of assortativity do not influence the synchronous state, since they are not statistically significant (see Tables 1 and 2). The entropy of the degree distribution is statistically significant only in the case in which λ = 0.25. Some works verified that higher the network heterogeneity, higher the degree of synchronization [49]. Our analysis revealed that such effect is mainly due to the fact that an increase in the Shannon entropy of the degree distribution reduces the average distance between nodes. Similarly, it was observed before that dissassortative networks tend to be more synchronizable [47]. However, this effect is due to the increase of the average shortest path length when a network becomes more assortative. This fact is illustrated in Fig. 2 for the scale-free model described in [50], which allows to control the assortativity of a network. Therefore, the influence of degree correlations and heterogeneity on the degree of synchronization verified in these works is mainly due to the modifications on the average shortest path length when structural variations are imposed in the networks.

528


Fig. 2 Evolution of the average shortest path length in function of the assortativity coefficient for the model described in [50]. The data points were obtained from networks with n = 103 nodes and average degree k = 4

The differences of αi observed at Tables 1 and 2 are due to the role of the coupling strength (λ) on the relationship between the structure and dynamics of networks. As λ is increased, the dynamics is less dependent on the network organization. Note in Fig. 1 that for λ > 0.6, networks of all models reach approximately the same level of synchronization. Despite these differences, the network properties most influencing the synchronization are the same for λ = 0.15 and λ = 0.25, i.e., , V (B) and S . Defined the influence of each network measure on the synchronization dynamics, it is necessary to verify the validity and accuracy of the proposed regression model (Eq. (7)). It is important to point out that some assumptions are needed to fit linear regression models for the data for purposes of inferences and prediction: linearity of the relationship between dependent and independent variables; independence of the errors (no serial correlation); homoscedasticity (constant variance) of the errors and normality of the error distribution [24]. If any of these assumptions is violated, then the forecasts and confidence intervals are not reliable since the regression model may be inefficient or seriously biased or misleading. All these assumptions can be verified by observing the normal probability plot of the residuals, which is a graphical technique for normality testing, in such a way that the points should form an approximate straight line. Since we verified that the distribution of the residuals is approximately normal (see Fig. 3), the requirements for a regression analysis is fulfilled. In this way, the confidence intervals identified are valid. Another important issue to point out is the correlation between network measures. The problem of multicollinearity in regression analysis is related to situations where two or more predictor variables in a multiple regression model are highly correlated. In this case, the regression parameter estimates may have great variability with small changes in the model or the data. One great problem of high multicollinearity is related to inaccurate computation since it is required matrix inversions in the determination of the least squares regression parameter estimation. For situations with very large multicolinearity, we could use ridge


529

Fig. 3 Normal probability plot of the residuals calculated from the regression model given by Eq. (7)

regression or principal components regression (see for example, [24, 26]), but these alternatives were not needed in our case, since we obtained a very good fit of the model for the data set. Indeed, the correlation between the simulated and the estimated values of r are larger than 0.9 for both models. This fact is due the reduction of the effect of multicolinearity because we considered a large data set (350 observations)—a large number of observations can produce more precise parameter estimates and reduce the effect of multicolinearity [26]. Therefore, despite these limitation, our analysis is valid and the regression model can be used to predict the value of the order parameter r from the structural properties of networks.

7 Conclusions Although statistical inference is a well-established research area, presenting powerful and robust methods, only a few works have considered inferential techniques for analysis of networks. Most of these works focus on the prediction of missing connections by considering the maximum likelihood approach with Monte Carlo sampling algorithms (e.g., [51]). In the current work, we showed how regression analysis can be considered to quantify the influence of network topological properties on a dynamic process, reinforcing the importance of using statistical inference methods for complex networks analysis. The regression model proposed here suggests that the average shortest path length is the network property most influencing the degree of synchronization. The variance of the betweenness centrality is also related to the emergence of the synchronous state. Other measures, such as the assortativity and degree entropy, are not significantly related to this dynamic process. Indeed, although some works verified the importance of these features (e.g., [47, 49]), their influence on the degree of synchronization is mainly due to the modifications of the average shortest path length when the network organization is changed. In addition, the regression modeling allows predicting the values of order parameter in terms of the network properties with high accuracy. Our analysis can be extended by considering other network measures, as well as additional regression models by taking into account other transformations of the dependent variable or a beta regression [27]. Additional dynamic processes can be investigated in terms of regression modeling, such as epidemic spreading or cascade failures.

530


Acknowledgements Francisco A. Rodrigues would like to acknowledge CNPq (305940/2010-4) and FAPESP (2010/19440-2) for the financial support given to this research. Jorge A. Achcar would like to acknowledge CNPq (302142/2002-9). Thomas and Guilherme thank Fapesp for sponsorship.

Appendix 8.1 Network Characterization An complex network can be represented by its adjacency matrix A, whose elements aij are equal to one if there is a connection between the nodes i and j , or equal to zero otherwise. Many topological measures have been developed to describe the organization of networks [12, 44]. The number of connections of each node i is called its degree ki . The degree distribution P (k) of a network is the probability that a given node has exactly k connections. The Shannon entropy of the degree distribution quantifies the heterogeneity of a network [52]. This measure is calculated as P (k) log P (k), (14) H =− k

where the sum is performed over all possible degrees in a network. The length of a path connecting two nodes i and j is given by the number of edges visited only once while going from i to j . The paths with the minimum length connecting two nodes i and j are called the shortest paths. Defining the distance matrix D, whose elements dij is equal to the length of the shortest path connecting nodes i and j , the average shortest path length is calculated as 1 dij . (15)

= n(n − 1) i =j The shortest distance between pairs of nodes can be considered to quantify the centrality of a node. In this case, a possible definition of node centrality is given by the betweenness centrality [9], i.e., σ (i, u, j ) , (16) Bu = σ (i, j ) ij where σ (i, u, j ) is the number of shortest paths connecting vertices i and j that pass through vertex u, σ (i, j ) is the total number of shortest paths between i and j . The sum is performed considering all pairs i, j of distinct vertices. In this way, a central node should be crossed by many paths and therefore present a high value of betweenness centrality. The large scale network organization can be characterized by the average, B , and variance, V (B), of the betweenness centrality. The probability to follow the path p(i, m) through a random walk is given as

P p(i, m) =

1 ki

1 , k −1 j ∈p(i,m) j

(17)

where kj is the degree of vertex j and the product includes all vertices j in the path p(i, m) with the removal of i and m. The total information necessary to identify one of the shortest


531

paths connecting i and m can be quantified by the search information [48], which is defined as

S (i, m) = − log2 P p(i, m) , (18) {p(i,m)}

where the sum is taken over all shortest paths p(i, m) from i to m. The average search information is defined as [48] 1 S= 2 S (i, m). (19) n im This measure characterizes the ease or difficulty of navigation in a network [48]. Networks can also present degree correlations. For instance, highly connected nodes can have a tendency to be connected to nodes of similar degrees [53]. Such correlations can be calculated as 1 1 1 2 j >i ki kj aij − [ E j >i 2 (ki + kj )aij ] E A= 1 , (20) 1 1 2 1 2 2 j >i 2 (ki + kj )aij − [ E j >i 2 (ki + kj )aij ] E where E is the total number of edges. Note that −1 ≤ A ≤ 1. In the case of A > 0, highly connected vertices tend to connect with similar degree vertices and the network is classified as assortative. If A < 0, vertices of high degree tend to connect with vertices of low degree or vice versa, and the network is called disassortative. If A = 0, there is no correlation between the number of connections. 8.2 Markov Chain Monte Carlo Markov chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution (e.g., [37]). The state of the chain after a large number of steps is used as a sample of the desired distribution. The quality of the sample improves as a function of the number of steps. Typical use of MCMC sampling can only approximate the target distribution, as there is always some residual effect of the starting position. The most common application of these algorithms is numerically calculating multi-dimensional integrals. A special case is given by the Gibbs sampling algorithm that is a special case of the Metropolis–Hastings algorithm. Gibbs sampling uses the fact that given a multivariate distribution it is simpler to sample from a conditional distribution than to marginalize by integrating over a joint distribution. Suppose we want to obtain k samples of X = {x1 , . . . , xn } from a joint distribution p(x1 , . . . , xn ). Denote the ith sample by X (i) = {x1(i) , . . . , xn(i) }. We proceed as follows: • We begin with some initial value X (0) for each variable. • For each sample i = {1, . . . , k}, sample each variable xj(i) from the conditional distribution p(xj(i) |x1(i) , . . . , xj(i)−1 , xj(i)+1 , . . . , xn(i) ). That is, sample each variable from the distribution of that variable conditioned on all other variables, making use of the most recent values and updating the variable with its new value as soon as it has been sampled. The samples then approximate the joint distribution of all variables. Furthermore, the marginal distribution of any subset of variables can be approximated by simply examining the samples for that subset of variables, ignoring the rest. In addition, the expected value of any variable can be approximated by averaging over all the samples.

532


References 1. Mitchell, M.: Complexity: A Guided Tour. Oxford University Press, Oxford (2009) 2. Amaral, L.A.N., Ottino, J.M.: Complex networks: augmenting the framework for the study of complex systems. Eur. Phys. J. B 38(2), 147–162 (2004) 3. Newman, M.: Networks: An Introduction. Oxford University Press, Oxford (2010) 4. da Costa, L.F., Oliveira, O. Jr, Travieso, G., Rodrigues, F.A., Boas, P.R.V., Antiqueira, L., Viana, M.P., Rocha, L.E.C.: Analyzing and modeling real-world phenomena with complex networks: a survey of applications. Adv. Phys. 60(3), 329–412 (2011) 5. Barabási, A.: The network takeover. Nat. Phys. 8(1), 14–16 (2011) 6. Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the Internet topology. In: ACM SIGCOMM Computer Communication Review, vol. 29, pp. 251–262. ACM, New York (1999) 7. Barabási, A., Oltvai, Z.: Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5(2), 101–113 (2004) 8. Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10(3), 186–198 (2009) 9. Girvan, M., Newman, M.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99(12), 7821 (2002) 10. Albert, R., Jeong, H., Barabási, A.-L.: The diameter of the world wide web. Nature 401, 130–131 (1999) 11. Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 440–442 (1998) 12. Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., Hwang, D.: Complex networks: structure and dynamics. Phys. Rep. 424(4), 175–308 (2006) 13. Arenas, A., Díaz-Guilera, A., Kurths, J., Moreno, Y., Zhou, C.: Synchronization in complex networks. Phys. Rep. 469(3), 93–153 (2008) 14. Barahona, M., Pecora, L.: Synchronization in small-world systems. Phys. Rev. Lett. 89(5), 54101 (2002) 15. Nishikawa, T., Motter, A., Lai, Y., Hoppensteadt, F.: Heterogeneity in oscillator networks: are smaller worlds easier to synchronize? Phys. Rev. Lett. 91(1), 14101 (2003) 16. McGraw, P., Menzinger, M.: Clustering and the synchronization of oscillator networks. Phys. Rev. B 72(1), 015101 (2005) 17. Motter, A., Zhou, C., Kurths, J.: Enhancing complex-network synchronization. Europhys. Lett. 69, 334 (2005) 18. Erd˝os, P., Rényi, A.: On random graphs. Publ. Math. 6, 290–297 (1959) 19. Barrat, A., Barthlemy, M., Vespignani, A.: Dynamical Processes on Complex Networks. Cambridge University Press, Cambridge (2008) 20. Barabási, A.-L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999) 21. Krapivsky, P.L., Redner, S., Leyvraz, F.: Connectivity of growing random networks. Phys. Rev. Lett. 85(21), 4629–4632 (2000) 22. Acebrón, J.A., Bonilla, L.L., Vicente, C.J.P., Ritort, F., Spigler, R.: The Kuramoto model: a simple paradigm for synchronization phenomena. Rev. Mod. Phys. 77(1), 137 (2005) 23. Fox, J.: Applied Regression Analysis, Linear Models, and Related Methods. Sage, Thousand Oaks (1997) 24. Draper, N.R., Smith, H.: Applied Regression Analysis. Wiley Series in Probability and Statistics 25. Rawlings, J.O., Pantula, S.G., Dickey, D.A.: Applied Regression Analysis: A Research Tool. Springer, Berlin (1998) 26. Montgomery, D.C., Peck, E.A., Vining, G.G.: Introduction to Linear Regression Analysis. Wiley, New York (2007) 27. Ferrari, S., Cribari-Neto, F.: Beta regression for modelling rates and proportions. J. Appl. Stat. 31(7), 799–815 (2004) 28. Kieschnick, R., McCullough, B.: Regression analysis of variates observed on (0, 1): percentages, proportions and fractions. Stat. Model. 3(3), 193–213 (2003) 29. Box, G., Cox, D.: An analysis of transformations. J. R. Stat. Soc., Ser. B, Methodol., 211–252 (1964) 30. Aitchison, J.: The Statistical Analysis of Compositional Data. Chapman & Hall, New York (1986) 31. Iyengar, M., Dey, D.: Box–cox transformation in Bayesian analysis of compositional data. Environmetrics 9, 657–671 (1998) 32. von Toussaint, U.: Bayesian inference in physics. Rev. Mod. Phys. 83(3), 943 (2011) 33. Dose, V.: Bayesian inference in physics: case studies. Rep. Prog. Phys. 66(9), 1421 (2003) 34. Bernardo, J.M., Smith, A.F.: Bayesian Theory, vol. 405. Wiley, New York (2009) 35. Guimerà, R., Sales-Pardo, M.: Missing and spurious interactions and the reconstruction of complex networks. Proc. Natl. Acad. Sci. USA 106(52), 22073–22078 (2009)


533

36. Clauset, A., Moore, C., Newman, M.E.: Hierarchical structure and the prediction of missing links in networks. Nature 453(7191), 98–101 (2008) 37. Paulino, C., Turkman, M., Murteira, B.: Estatística bayesiana. Fundação Calouste Gulbenkian, Lisboa (2003) 38. Gelfand, A., Smith, A.: Sampling-based approaches to calculating marginal densities. J. Am. Stat. Assoc., 398–409 (1990) 39. Chib, S., Greenberg, E.: Understanding the metropolis-hastings algorithm. Am. Stat., 327–335 (1995) 40. Spiegelhalter, D., Thomas, A., Best, N., Lunn, D.: Winbugs Version 1.4 User Manual (2003) 41. Trotta, R.: Bayes in the sky: Bayesian inference and model selection in cosmology. Contemp. Phys. 49(2), 71–104 (2008) 42. Gregory, P.: Bayesian Logical Data Analysis for the Physical Sciences, vol. 10. Cambridge University Press, Cambridge (2005) 43. Sivia, D., Skilling, J.: Data analysis: a Bayesian tutorial 44. Costa, L., Rodrigues, F., Travieso, G., Boas, P.: Characterization of complex networks: a survey of measurements. Adv. Phys. 56(1), 167–242 (2007) 45. Gamerman, D., Lopes, H.: Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference. Chapman & Hall/CRC, New York (2006) 46. Gómez-Gardenes, J., Moreno, Y., Arenas, A.: Paths to synchronization on complex networks. Phys. Rev. Lett. 98(3), 34101 (2007) 47. Arenas, A., Diaz-Guilera, A., Pérez-Vicente, C.: Synchronization reveals topological scales in complex networks. Phys. Rev. Lett. 96(11), 114102 (2006) 48. Rosvall, M., Trusina, A., Minnhagen, P., Sneppen, K.: Networks and cities: an information perspective. Phys. Rev. Lett. 94(2), 28701 (2005) 49. Ichinomiya, T.: Frequency synchronization in random oscillator network. Phys. Rev. B 70, 026116 (2004) 50. Xulvi-Brunet, R., Sokolov, I.: Reshuffling scale-free networks: from random to assortative. Phys. Rev. B 70(6), 066102 (2004) 51. Clauset, A., Moore, C., Newman, M.: Hierarchical structure and the prediction of missing links in networks. Nature 453(7191), 98–101 (2008) 52. Solé, R.V., Valverde, S.: Information theory of complex networks: on evolution and architectural constraints. In: Lecture Notes in Physics, vol. 650, pp. 189–207 (2004) 53. Newman, M.E.J.: Mixing patterns in networks. Phys. Rev. B 67(2), 026126 (2003)