modeling the evolution of complex networks through the path-star ...

April 14, 2010

13:29

WSPC/S0218-1274

02608

International Journal of Bifurcation and Chaos, Vol. 20, No. 3 (2010) 795–804 c World Scientific Publishing Company DOI: 10.1142/S0218127410026083

MODELING THE EVOLUTION OF COMPLEX NETWORKS THROUGH THE PATH-STAR TRANSFORMATION AND OPTIMAL MULTIVARIATE METHODS L. DA F. COSTA∗,†, F. A. RODRIGUES‡ and P. R. VILLAS BOAS∗ ∗ Institute of Physics at S˜ ao Carlos, University of S˜ ao Paulo, PO Box 369, S˜ ao Carlos, S˜ ao Paulo, 13560-970 Brazil †National Institute of Science and Technology for Complex Systems ∗[email protected] ‡Departamento de Matem´ atica Aplicada e Estat´ıstica, Instituto de Ciências Matem´ aticas e de Computa¸c˜ ao, Universidade de S˜ ao Paulo, Campus de S˜ ao Carlos, Caixa Postal 668, 13560-970 S˜ ao Carlos, SP, Brazil Received November 4, 2008; Revised January 16, 2009 The topology of real-world complex networks, such as in transportation and communication, is always changing with time. Such changes can arise not only as a natural consequence of their growth, but also due to major modifications in their intrinsic organization. For instance, the network of transportation routes between cities and towns (hence locations) of a given country undergo a major change with the progressive implementation of commercial air transportation. While the locations could be originally interconnected through highways (paths, giving rise to geographical networks), transportation between those sites progressively shifted or was complemented by air transportation, with scale free characteristics. In the present work we introduce the path-star transformation (in its uniform and preferential versions) as a means to model such network transformations where paths give rise to stars of connectivity. It is also shown, through optimal multivariate statistical methods (i.e. canonical projections and maximum likelihood classification) that while the US highways network adheres closely to a geographical network model, its path-star transformation yields a network whose topological properties closely resembles those of the respective airport transportation network. Keywords: Complex networks; modeling; complex systems.

1. Introduction

and towns (hence locations) were initially interconnected through highways, and then by air routes between the respective airports [Costa, 2004b]. While highway interconnection is implemented by paths going through several locations, therefore, being strongly affected by spatial distances and adjacency between locations (geographical organization), air routes often involve point-to-point displacement and no (or just a few) connecting stops. In addition, air routes are frequently organized from central airports, which act as hubs in the respective networks. As such, airport transportation networks

The connectivity of complex networks representing several real-world structures has been the topic of constant changes as a consequence of their natural evolution [Christensen & Albert, 2007]. For instance, the small-world network of social interactions is getting larger as a consequence of human population expansion. However, in some cases the changes in a given network are a consequence of more drastic effects affecting its own pattern of connectivity. One prototypical example of such a situation is provided by transportation networks. Cities 795

April 14, 2010

796

13:29

WSPC/S0218-1274

02608

L. Da F. Costa et al.

are known to exhibit respective scale free topologies with respect to their degrees [Guimera et al., 2005]. While the evolution of growing complex networks is intrinsically accounted for by respective theoretical models (e.g. the Barabási–Albert scale free model), the more drastic alterations in network evolution such as those implied by changing paradigms (such as the above discussed highway/airport interaction) has received little attention from literature. In this work we introduce the concept of the path-star transformation, in which paths give rise to stars of connections. More specifically, given a network, a path is chosen by transversing the network through a self-avoiding random walk starting from a specific node. One of the nodes along such a path is then chosen and connected directly to all other nodes along that path, implying a star of connections. Figure 1 illustrates such a situation with respect to the piece of highway (the path shown in grey) extending from Cleveland to San Antonio. Now, we chose Nashville as pivot and connect it directly to all the other eight locations along that path, defining the star of connections centered in Nashville. Interestingly, all flight routes in this star are actually available in the real-world. Typically, the path-star transformation is applied repeatedly, leading to the definition of a respective dynamical evolution. Though this transformation can be considered in order to produce a new network from a previously existing structure

Cleveland Columbus Indianapolis

Louisville Nashville Litle Rock

Memphis

Fort Worth

San Antonio

Fig. 1. The path (gray) defined by the highway from San Antonio to Cleveland contains 9 cities/towns. The path-star transformation of this highway, while choosing Nashville as pivot, yields the star of connections shown in black, which actually corresponds to respective air routes in the real-world.

(e.g. from highway to air transportation networks) or to complement a specific structure (e.g. in case the stars are incorporated into the original network), in this current work we consider only the former possibility. In other words, we apply several path-star transformations on the original, unchanged network, in order to derive a second and new structure containing the respective stars. It should be also observed that the choice of paths, as well as the respective pivots, can be performed in different ways. In this work, we obtain paths by randomly selecting one of the nodes and performing self-avoiding walks through two outgoing links until their respective terminations (i.e. reaching a node of degree 1 or leading only to already visited nodes). This procedure allows a more uniform sampling of the paths in the network than would be obtained by starting the self-avoiding walks preferentially from nodes with higher degree. As for the choice of the respective pivot in each path, we take into account two different strategies: (i) the pivot is chosen with uniform probability among all nodes in the path; and (ii) the pivot is chosen preferentially to the degree of those nodes. The second approach understands that the importance of the nodes along the path are proportional to their degrees. This is particularly relevant for highway-airport transformation, where locations originally more interconnected through highways tend to be more important, and thus becoming air route centers. It is interesting to observe that the progressive path-star transformation of a given network resembles conceptually the rewiring procedure frequently considered in complex networks research as a means to randomize the connections of a given network while maintaining the respective degree distribution. However, such a similarity goes only as far as that a given network is progressively altered by a specific set of rules, because by no means, does it preserve the degree distribution, and has a clear tendency to produce hubs (related to the pivots). The modifications performed on a network, e.g. for improving some of its specific features, can be divided into two main categories: (i) global, where the modifications are performed indiscriminately throughout the whole network; and (ii) local, in which case the modifications are restricted to specific parts of the network. Examples of global modifications are provided by the rewiring schemes which have been traditionally used in complex networks research (e.g. [Milo et al., 2003]), in which any connection may be changed. An example of

April 14, 2010

13:29

WSPC/S0218-1274

02608

Modeling the Evolution of Complex Networks

local modification is to replace a motif [Milo et al., 2002] by another, as we present in this work. A critical research issue regards the effect of the modifications on the network topological and dynamical characteristics [Wang, 2002]. For instance, the rewiring scheme in [Milo et al., 2003] changes several network features, but does not affect the degree distribution. Therefore, it is interesting to characterize as completely as possible the effects of network modifications. This can be done by considering several measurements [Costa et al., 2007], especially those which are more closely related to the expected features, as well as optimal multivariate statistical methods for the respective analysis. The path-star transformation is also potentially related to the optimization of given aspects of a given network (e.g. average shortest path length). For instance, small world networks are a natural choice for implementing solutions where a relatively short path can be found between any pairs of nodes, which is an important feature favoring fast communication between pairs of nodes. On the other hand, geographical networks tend to present high clustering coefficients, implying a good robustness to edge attacks and large average shortest path length (such networks are among the few structures which are not small world) [Criado et al., 2007]. An interesting possibility is, given a complex network, to try to modify it in order to improve its fitness with respect to some imposed criteria. One of the few approaches in the complex network literature addressing the latter subject has been described in [Costa, 2004b], where the effects of complementing the connectivity by considering several strategies were assessed respectively to improvements in the network resilience to attacks. In order to illustrate the potential of the pathstar transformation, we also apply it to the US highway transportation network in order to yield a respective estimation of the air transportation network. By considering several measurements of the respective connectivity of these networks, as well as optimal multivariate methods for statistical analysis (i.e. canonical projections and maximum likelihood classification), we show that: (a) the highway transportation network adheres closely to a geographical model; (b) the path-transformation of the US highway transportation network up to the same average degree of the respective air transportation network yields a structure which has connectivity features closely resembling the latter. The fact that the most of the dispersion of the measurement space

797

was properly captured into the first two canonical variables corroborates the statistical relevance of such findings. This article starts by summarizing the basic concepts related to complex networks in terms of measurements, models, as well as the pathstar transformation. It proceeds by reporting the methodology adopted for network classification considering optimal multivariate statistical dimensionality reduction and maximum likelihood. The path-star transformation is then applied to a US highways network and the results are compared to the US air transportation network.

2. Basic Concepts and Methods 2.1. Network representation and measurements Complex networks are discrete structures involving N nodes and E edges connecting those nodes. In this work, we focus attention on undirected networks, which can be represented by a symmetric adjacency matrix A. The presence of each edge (i, j), where i and j are any network nodes and i = j, implies A(i, j) = A(j, i) = 1, with A(i, j) = A(j, i) = 0 otherwise. Two edges (i, j) and (l, m) are adjacent whenever they share an extremity (i.e. i = l or j = l or i = m or j = m). A sequence of adjacent edges constitutes a walk. Observe that a walk can repeat nodes or edges. On the other hand, a path is a walk which never revisits any edge or node. A closed path is called a cycle. The immediate neighbors of a node i are those nodes which are distant by one edge from i. The degree of a node i, k(i), is equal to the number of its immediate neighbors, i.e. k(i) = j A(i, j). The clustering coefficient of a node i quantifies how well the immediate neighbors of that node are interrelated. More specifically, if n(i) is the number of immediate neighbors of node i, then its respective clustering coefficient can be calculated as: cc(i) =

2e(i) , n(i)[n(i) − 1]

(1)

where e(i) is the total number of undirected edges connecting the immediate neighbors of i. Though the degree and the clustering coefficient, which are traditionally adopted local measurements, are defined for each node in a network, it is also interesting to consider the average of those values as global features of the whole network.

April 14, 2010

798

13:29

WSPC/S0218-1274

02608


The average neighbor connectivity, knn (i), measures the average degree of the neighbors of each vertex i in the network [Pastor-Satorras et al., 2001]. The average shortest path length () is calculated by taking into account the shortest distance between each pair of vertices in the network. The diameter of a network corresponds to the length of the longest shortest path between any pair of nodes. The assortative coefficient, r, measures the correlation between vertex degrees [Newman, 2002]. If r > 0 the network is assortative; if r < 0, the network is disassortative; for r = 0 there is no correlation between vertex degrees. While in assortative networks vertices with similar degree tend to connect to each other, in dissassortative structures the highly connected nodes tend to be attached to poorly connected nodes [Newman, 2002]. Other measurements, related to centrality, are based on the betweenness centrality [Freeman, 1979; Girvan & Newman, 2002], which is defined as σ(i, u, j) , (2) Bu = σ(i, j) ij

where σ(i, u, j) is the number of shortest paths between vertices i and j that pass through vertex u, σ(i, j) is the total number of shortest paths between i and j, and the sum is over all pairs i, j of distinct vertices. The average betweenness centrality (B) is calculated considering the whole set of vertices in the network. The central point dominance is defined in terms of the betweenness by the following equation, 1 (Bmax − Bi ), (3) cD = N −1 i

where Bmax represents the maximum betweenness found in the network. The central point dominance is zero for complete graphs and one for a star graph, i.e. there is a central vertex that all paths include [Freeman, 1977]. Traditional complex networks measurements can be extended in a more generalized, hierarchical fashion [Costa, 2004a; Costa & Silva, 2006; Costa & Rocha, 2006; Costa & Andrade, 2007]. Hierarchical (or concentric) measurements are defined by considering the successive neighborhoods around each node. In this way, it is possible to define the ring of vertices Rd (i) formed by those vertices which are distant d edges from the reference vertex i. The hierarchical degree at distance d, kd (i), is defined as the number of edges connecting the rings Rd (i) and Rd+1 (i). The hierarchical clustering coefficient

is calculated by the ratio between the number of edges in the respective d-ring, md (i), and the maximum number of possible edges between the vertices in that ring, i.e. ccd (i) =

2md (i) , nd (i)[nd (i) − 1]

(4)

where nd (i) represents the number of vertices in the ring Rd (i). The divergence ratio at distance d of i corresponds to the ratio between the number of vertices in the ring Rd (i) and the hierarchical degree at distance d − 1, dv d (i) =

nd (i) . kd−1 (i)

(5)

In this paper, we consider up to the second hierarchical level (d = 2) for network characterization. Similarly to the degree and clustering coefficient, it is also interesting to consider the averages of the hierarchical measurements as global measurements of each network. Table 1 presents all the ten measurements considered in the present work, as well as their respective abbreviations.

2.2. Complex networks models In the current work, we take into account six network models. The first, the Erd˝ os–Rényi random graph (ER), is constructed by connecting each pair of vertices in the network with a fixed probability p [Erd˝ os & Rényi, 1959], where each pair of vertices (i, j) is selected at random only once. This model generates a Poisson degree distribution. The second model, the small-world model of Watts and Strogatz (WS), is generated starting with a regular lattice of N vertices in which each vertex is connected to κ nearest neighbors in each direction. Each edge is Table 1. The measurements of the network connectivity considered in the present work and their respective abbreviations. Measurement

Abbreviation

Maximum degree Average clustering coefficient Average neighbor connectivity Average shortest path length Assortative coefficient Average betweenness centrality Central point dominance Average hier. degree of level two Average hier. clustering coeff. of level two Average divergence ratio of level two

kmax cc knn r B cD hk2 hcc2 dv2

April 14, 2010

13:29

WSPC/S0218-1274

02608


then randomly rewired with probability p [Watts & Strogatz, 1998]. The third, the Waxman geographical model (GN), is constructed by distributing N vertices at random in a 2D space and connecting them according to geographical distance considering the probability to connect two vertices i and j, distant Dij , as P (i → j) ∼ e−λDij [Waxman, 1988]. The fourth, the scale-free Barab´ asi-Albert scale-free model (BA), is generated by starting with a set of m0 vertices and, at each time step, the network grows with the addition of a new vertex with m links. The vertices which receive the new edges are chosen following a linear preferential attachment rule, i.e. the probability of the new vertex i to connect to an existing vertex j is proportional to the asi & degree of j, i.e. P(i → j) = kj / u ku [Barab´ Albert, 1999]. Similarly, the fifth, the limited scalefree model (LSF), is generated as in the BA model, but the maximum degree is limited in order to be equal to the degree of the real network [Amaral et al., 2000]. The last model, the nonlinear preferential network model (NLSF), is constructed as in the BA model, but instead of a linear preferential attachment rule, the vertices are connected following a nonlinear preferential attachment rule, i.e. Pi→j = kjα / u kuα . In this case, while for α < 1, the network has a stretched exponential degree distribution, and for α > 1 a single site connects to nearly all other sites [Krapivsky et al., 2000]. In the current paper, we considered α = 0.5 and α = 1.5 in the NLSF model.

2.3. The path-star transformation The path-star transformation maps a path into a respective star by considering self-avoiding random walk dynamics. More specifically, a path is identified in the network by self-avoiding random walks: a pivot vertex is chosen, and the connections among the nodes in the path are replaced so that the middle node becomes directly connected to all remainder nodes along the original path. The connections are established in a new network, while the original network is maintained. Two versions of the pathstar transformation can be considered: (i) uniform path-star transformation (UPST), where the pivot is chosen at uniformly random fashion between the nodes in the original path; and (ii) preferential pathstar transformation (PPST), where the pivot vertex is selected following a preferential attachment probability based on the degree of each of the nodes along the path (as in the BA model). An example

799

of path-star transformation is provided in Fig. 1, where the node “Nashville” was chosen as pivot.

2.4. Networks classification The set of M measurements obtained for a complex networks under analysis can be represented in terms of a feature vector in the RM space. The so-defined M -dimensional space is frequently called the measurement space adopted in the specific investigation. Therefore, each distinct network is mapped into a point, defined by the respective feature vector, in the measurement space. Groups of points (also called clusters) appearing in such spaces indicate possible categories of networks, exhibiting similar topological properties. Networks which are mapped into relatively distant points can be understood to exhibit substantial differences in their topology (e.g. [Costa et al., 2007]). In order to obtain the identification of the most likely model for a given network, we considered an optimal multivariate method for dimensionality reduction (involving the maximization of the separation between the given categories), followed by a maximum likelihood (also called Bayesian) decision procedure [Duda et al., 2000]. These two optimal multivariate methods are described in the following respective subsections.

3. Canonical Variable Analysis Because the number of adopted measurements M is frequently larger than 2 or 3, it becomes impossible to visualize the distribution of the networks when mapped into the M -dimensional measurement space. Fortunately, it is possible to use optimal stochastic projections in order to reduce the dimensionality of such spaces while maximizing the separation between the known categories. In the present work we consider the canonical projection approach (e.g. [Costa et al., 2007]), which provides a powerful extension of principal component analysis [Johnson & Wichern, 1998]. In canonical discriminant analysis [Campbell & Atchley, 1981; Costa & Cesar Jr., 2001], linear combinations of the original variables — the so-called canonical variables, are determined in such a way that the distances between groups (network models in our approach) are maximized relatively to the dispersion within groups (i.e. between the networks generated by each respective model). This projection allows the data to be represented in two or three dimensions, defined respectively by

April 14, 2010

800

13:29

WSPC/S0218-1274

02608


the first canonical variables. In order to perform the canonical analysis, it is necessary to construct a matrix which quantifies the variation inside the groups previously defined, and a second matrix which quantifies the variation among these groups. If we consider C classes (network models), each one identified as Ci , i = 1, . . . , C, and that each network realization n is represented by its respective feature vector xn = (x1 , x2 , . . . , xp )T , the intraclass scatter matrix is defined as Sintra =

C

(xn − xi ) (xn − xi )T ,

(6)

i=1 n∈Ci

and the interclass scatter matrix is given as, Sinter =

C

Ni (xi − x) (xi − x)T ,

(7)

i=1

where xi corresponds to the average of a given variable for the class i and x is the general average of a given variable for all classes. By computing the eigenvectors of the matrix −1 Sintra Sinter and selecting those corresponding to highest absolute eigenvalues, λ1 , . . . , λp , it is possible to project the set of variables into less dimension — usually 2 or 3 dimensions, depending on the number of highest eigenvalues considered, while maximizing the separation between the given categories. The transformation of the original data in the measurement space into the 2D space was obtained in terms of the inner products between the measurement vectors and the two eigenvectors corresponding to the highest eigenvalues.

4. Maximum Likelihood Decision Theory After obtaining the measurements and their optimal dimensionality reduction, the regions of classification can be determined for each complex network model by using decision theory. Such a methodology, called supervisioned learning, assumes that the network models are known a priori, while the unknown network is assigned to the model implying the largest respective density probability amongst all considered models [Costa et al., 2007]. In this work, the maximum likelihood is performed by taking into account nonparametric estimation [Duda et al., 2000] of the density probabilities of each category in the measurement space, which allows the identification of optimal decisions in situations involving uncertainty [Bishop, 2006],

for minimizing the chance of wrong classifications. The mass probabilities Pi , which correspond to the probability that a network belongs to class Ci are taken into account jointly with the conditional probability densities, p(xn |Ci ), which are here estimated by nonparametric methods (see [Costa & Cesar Jr., 2001; Duda et al., 2000]). The decision rule can be expressed as: if P (xn |Ca )P (Ca ) = maxb=1,m {P (xn |Cb )P (Cb )} then select Ca , where xn is the vector that stores the network set of measurements and Ca is the class of networks associated to the model a. Figure 2 presents an example of the application of the Bayes rule with respect to an 1D measurement space. In this work we use nonparametric estimation of the probability densities of each class. Computationally, each point in the 2D space is considered as a Dirac’s delta function. Next, these functions are convolved with a normalized gaussian function in order to obtain, from their sum, the conditional probabilities (see the case 1D in Fig. 2). Further details about such an approach can be found in [Costa et al., 2007; McLachlan et al., 1992; Duda et al., 2000].

4.1. Highway and air transportation networks construction The US air transportation network is composed of cities/towns connected by airlines. We considered the Pajek database [Batagelj & Mrvar, 2006] excluding the cites of Alaska and in the islands, such as Puerto Rico and Hawaii, because we are interested only in cities that can be reached by highways. The obtained network is composed of 244 vertices and 1896 edges. This network presents degree distribution following a power law. From the

P(x|C1)P(C1)

R1

P(x|C2)P(C2)

R2

Fig. 2. Example of maximum likelihood classification in a 1D space. The boundary of the decision regions corresponding to classes C1 and C2 is indicated by the dashed line.

April 14, 2010

13:29

WSPC/S0218-1274

02608


cities obtained in the US air transportation network, we obtained their highway links from US maps available in the World Wide Web. The resulting network, which presents 244 vertices and connected by 472 highways, does not follow a degree power law [Gastner & Newman, 2006].

5. Results and Discussion In order to investigate the path-star transformation in real-world networks, we considered the US air transportation network as well as its respective highway network. Initially, we determined which of the considered theoretical models best reproduces the highway topological properties. We considered the set of measurements described in Sec. 2, i.e. {kmax , cc, knn , , r, B, cD , hk2 , hcc2 , dv2 }. Then, we generated 50 realizations of the models presented in Sec. 2.2, i.e. GN, ER, WS, BA, LSF, NLSF with α = 0.5 and NLSF with α = 1.5; and calculated the measurements for each network realization. The parameters of each model were defined so as to obtain the same average degree as the real-world network. In the case of the WS model, the value of p was also determined to generate a network with the same average clustering coefficient as the real networks. The set of

801

measurements was standardized in order to present zero mean and standard deviation equal to one. The standardization of a random variable consists of subtracting its respective average and dividing by the standard deviation [Costa & Cesar Jr., 2001; Costa et al., 2007]. In this way, each of the networks was represented by a vector of 10 measurements, which describes its respective topological properties. Next, we performed the projection of the network into the 2D space by considering the canonical variable analysis (see Sec. 3). From each network model, represented by a respective cloud of points in the projected 2D space, we obtained the probability condition by the nonparametric fitting procedure described in Sec. 4, therefore obtaining the respective classification regions. The highway network was then projected into this space by using the same projection matrix determined in the canonical analysis, and verified that the GN theoretical model generates the networks whose topologies properties are most similar to the real-world highway network (see Fig. 3). In this way, cities tend to be connected according to geographic distance — i.e. the probability that two cities i and j are connected would be P (i → j) ∼ e−λDij . Table 2 presents the comparison between the measurements obtained for the real-world highway network and the GN theoretical model. The topological properties of the networks

Fig. 3. The classification of the highway network after the application of canonical variable analysis and maximum likelihood decision in order to obtain the classification spaces. The highway network was classified as geographical (GN), which indicates that the connections between cities are established mainly based on the geographical distance between cities. The considered models are: + GN, × ER, ⊕ BA, WS, ♦ LSF, NLSF (α = 0.5) and NLSF (α = 1.5). The highway network is represented by (an arrow indicates the position). The lines correspond to the boundary of the separation regions.

April 14, 2010

802

13:29

WSPC/S0218-1274

02608


Table 2. Average and standard deviation of each of the network measurements taken into account for characterization of the US highways network and the network resulting from the Waxman geographic model (GN). Each measurement was calculated taking into account 50 realizations of each model. Measurement

Highway

GN network model

kmax cc knn r B cD hk2 hcc2 dv2

11 0.23 4.9 5.8 0.10 0.19 0.027 0.082 0.16 0.82

11.5 ± 1.0 0.15 ± 0.02 4.5 ± 0.2 7.6 ± 1.0 0.28 ± 0.06 0.17 ± 0.11 0.015 ± 0.008 0.097 ± 0.010 0.11 ± 0.01 0.78 ± 0.01

resulting from the GN model are close to the highway network even when all original measurements are considered, as shown in Table 2. Next, we investigated which model best reproduces the US air transportation topology. In this case, in addition to the models considered for the highway network classification, we also took into account two versions of the pathstar transformation in its uniform (UPST) and

preferential (PPST) versions. So, the considered models included GN, ER, WS, BA, LSF, NLSF with α = 0.5, NLSF with α = 1.5, and the networks resulting from the UPST and PPST. The original real-world highway transportation network was transformed considering several path-star transformations (uniform or preferential) which were performed until the transformed network reached the same degree as the real-world air transportation network. Because the choice of the paths and respective pivots is stochastic, we performed 50 realizations for each uniform and preferential transformation of the highway transportation network. The classification methodology was then applied in a similar fashion as in Sec. 4. As shown in Fig. 4, the preferential path-star transformation (PPST) is the model that resulted in topological properties (quantified by the measurements) most similar to those of the original real-world air transportation network. Table 3 presents the comparison between the measurements obtained for the networks resulting from the PPST, UPST and the original air transportation network. It is interesting to note that the uniform path-star transformation (UPST) is also reasonably close to the real network in that measurement space. Although the measurement values are not identical, the similarities

Fig. 4. The classification of the US air transportation network. The networks obtained from the path-star transformation with preferential attachment applied on the highway network are the best that reproduce the air transportation topology. The considered models are: + GN, × ER, ⊕ BA, WS, ♦ LSF, NLSF (α = 0.5) and NLSF (α = 1.5). The networks resulting from the uniform path-star transformation (UPST) are represented by and for the preferential path-star transformation (PPST), by ◦. The air transportation network is represented by (an arrow indicates the position). The lines correspond to the boundary of the separation regions.

April 14, 2010

13:29

WSPC/S0218-1274

02608

Modeling the Evolution of Complex Networks Table 3. Average and standard deviation of network measurement for the US air transportation network and the networks resulting from the preferential path-star transformation (PPST) and from the uniform path-star transformation (UPST). Each measurement was calculated taking into account 50 realizations of each model. Measurement Airports PPST networks UPST networks kmax cc knn r B cD hk2 hcc2 dv2

124 0.65 55 2.18 −0.28 0.18 0.0057 1.45 0.17 0.53

93 ± 13 0.52 ± 0.02 40 ± 3 2.19 ± 0.04 −0.32 ± 0.03 0.12 ± 0.03 0.0054 ± 0.0003 1.67 ± 0.10 0.10 ± 0.04 0.48 ± 0.01

81 ± 12 0.37 ± 0.02 34 ± 2 2.25 ± 0.04 −0.33 ± 0.04 0.10 ± 0.03 0.0056 ± 0.0002 1.74 ± 0.73 0.09 ± 0.01 0.47 ± 0.01

observed in such table suggest that the path star transformation network is in principle related to the way in which airlines are established between cities. The projection of the networks into the 2D space can result in loss of information. Nevertheless, we observed that the two main axes defined by the first two canonical variables cover almost all variation in the measurements. For the highway, the four first standard deviations are 1.45, 1.27, 0.27, and 0.27; and for the US air transportation network, 1.38, 0.43, 0.30, and 0.04.

6. Concluding Remarks The path-star transformation has been suggested in this work as a means to model the evolution of complex networks where the existing paths motivate the creation of respective stars of connectivity. The potential of this approach, complemented by optimal multivariate methods for dimensionality reduction and decision theory, has been demonstrated with respect to the US highway and air transportation networks. First, we showed that the real-world highway transportation network has properties which are substantially close to those of respective geographical models with the same number of nodes and average degree. Then, we transformed this highway network by using pathstar transformations (with uniform and preferential choice of pivots along each path) until the resulting network had the same average degree as the real-world air transportation network. The original air transportation network, as well as the pathtransformed versions of the highway network, were then compared among themselves as well as with

803

several other theoretical models from the complex networks literature. The results clearly indicate that while none of the latter models allow reasonable adherence to the real-world air transportation network, the path-star transformed versions of the original highway network resulted in properties substantially close to those of the realworld air transportation network, with the preferential model yielding the best adherence. These results suggests that the path-star transformations of the original highway transportation network provide a model of how air transportation routes are established while considering the previously existing highway transportation structure. Prospects for future works include the consideration of geographical features and population densities while defining the air routes.

Acknowledgments Luciano da F. Costa is grateful to FAPESP (05/ 00587-5), CNPq (301303/06-1 and 573583/20080) for financial support. Francisco A. Rodrigues acknowledges FAPESP sponsorship (07/50633-9), Paulino R. Villas Boas acknowledges FAPESP sponsorship (08/53721-9).

References Amaral, L. A. N., Scala, A., Barthélémy, M. & Stanley, H. E. [2000] “Classes of small-world networks,” Proc. Natl. Acad. Sci. USA 97, 11149–11152. Barabási, A.-L. & Albert, R. [1999] “Emergence of scaling in random networks,” Science 286, 509–12. Batagelj, V. & Mrvar, A. [2006] Pajek Datasets, http://vlado.fmf.uni-lj.si/pub/networks/data. Bishop, C. M. [2006] Pattern Recognition and Machine Learning (Information Science and Statistics) (Springer-Verlag, NY). Campbell, N. A. & Atchley, W. R. [1981] “The geometry of canonical variate analysis,” Syst. Zool. 30, 268–280. Christensen, C. & Albert, R. [2007] “Using graph concepts to understand the organization of complex systems,” Int. J. Bifurcation and Chaos 17, 2201– 2214. Costa, L. da F. & Cesar Jr., R. M. [2001] Shape Analysis and Classification: Theory and Practice (CRC Press). Costa, L. da F. [2004a] “The hierarchical backbone of complex networks,” Phys. Rev. Lett. 93, 98702. Costa, L. da F. [2004b] “Reinforcing the resilience of complex networks,” Phys. Rev. E 69, 66127. Costa, L. da F. & Rocha, L. E. C. [2006] “A generalized approach to complex networks,” Eur. Phys. J. B 50, 237–242.

April 14, 2010

804

13:29

WSPC/S0218-1274

02608


Costa, L. da F. & Silva, F. N. [2006] “Hierarchical characterization of complex networks,” J. Stat. Phys. 125, 841–872. Costa, L. da F. & Andrade, R. F. S. [2007] “What are the best concentric descriptors for complex networks?” New J. Phys. 9, 311. Costa, L. da F., Rodrigues, F. A., Travieso, G. & Villas Boas, P. R. [2007] “Characterization of complex networks: A survey of measurements,” Adv. Phys. 56, 167–242. Criado, R., Hernandez-Bermejo, B. & Romance, M. [2007] “Efficiency, vulnerability and cost: An overview with applications to subway networks worldwide,” Int. J. Bifurcation and Chaos 17, 2289. Duda, R. O., Hart, P. E. & Stork, D. G. [2000] Pattern Classification (Wiley-Interscience). Erd˝ os, P. & Rényi, A. [1959] “On random graphs,” Publ. Math.-Debr. 6, 290–297. Freeman, L. C. [1977] “A set of measures of centrality based on betweenness,” Sociometry 40, 35–41. Freeman, L. C. [1979] “Centrality in social networks: Conceptual clarification,” Social Networks 1, 215–239. Gastner, M. & Newman, M. [2006] “The spatial structure of networks,” Eur. Phys. J. B 49, 247–252. Girvan, M. & Newman, M. E. J. [2002] “Community structure in social and biological networks,” Proc. Natl. Acad. Sci. USA 99, 7821–7826. Guimera, R., Mossa, S., Turtschi, A. & Amaral, L. [2005] “The worldwide air transportation network: Anomalous centrality, community structure, and

cities’ global roles,” Proc. Natl. Acad. Sci. USA 102, 7794–7799. Johnson, R. A. & Wichern, D. W. [1998] Applied Multivariate Statistical Analysis (Prentice Hall). Krapivsky, P. L., Redner, S. & Leyvraz, F. [2000] “Connectivity of growing random netwoks,” Phys. Rev. Lett. 85, 4629–4632. McLachlan, G., Wiley, J. & InterScience, W. [1992] Discriminant Analysis and Statistical Pattern Recognition (Wiley, NY). Milo, R., Kashtan, N., Itzkovitz, S., Newman, M. E. J. & Alon, U. [2003] Uniform Generation of Random Graphs with Arbitrary Degree Sequences (condmat/0312028). Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D. & Alon, U. [2002] “Network motifs: Simple building blocks of complex networks,” Science 298, 824–827. Newman, M. E. J. [2002] “Assortative mixing in networks,” Phys. Rev. Lett. 89, 208701. Pastor-Satorras, R., Vázquez, A. & Vespignani, A. [2001] “Dynamical and correlation properties of the internet,” Phys. Rev. Lett. 87, 258701. Wang, X. [2002] “Complex networks: Topology, dynamics and synchronization,” Int. J. Bifurcation and Chaos 12, 885–916. Watts, D. J. & Strogatz, S. H. [1998] “Collective dynamics of small-world networks,” Nature 393, 440–442. Waxman, B. M. [1988] “Routing of multipoint connections,” IEEE J. Sel. Areas Commun. 6, 1617–1622.