Decentralized algorithms for evaluating centrality in complex networks

0 downloads 0 Views 2MB Size Report
emphasis on the algorithm of the betweenness centrality. The betweenness centrality is the most complex measure and best suited for describing network ...
SUBMITTED TO IEEE

1

Decentralized algorithms for evaluating centrality in complex networks Katharina A. Lehmann, Michael Kaufmann

Abstract— Centrality indices are often used to analyze the functionality of nodes in a communication network. Up to date most analyses are done on static networks where some entity has global knowledge of the networks properties. To expand the scope of these analyzing methods to decentral networks we will propose here a general framework for decentral algorithms that calculate centrality indices. We will describe variants of the general algorithm to calculate four different centralities, with emphasis on the algorithm of the betweenness centrality. The betweenness centrality is the most complex measure and best suited for describing network communication based on shortest paths and predicting the congestion sensitivity of a network. The communication complexity of this latter algorithm is asymptotically optimal and the time complexity scales with the diameter of the network. The calculated centrality index can be used to adapt the communication network to given constraints and changing demands such that the relevant properties like the diameter of the network or uniform distribution of energy consumption is optimized. Index Terms— Centrality Indices, distributed algorithm, Selfdiagnostic networks.

I. I NTRODUCTION INCE the first days of telecommunications the world has changed dramatically. After a long period where a letter would take days to be delivered the invention of cable and telephone made instant communication possible and achievable for many people. To grant this service a widespread network of cables had to be built that is being remodeled periodically. However, due to the tremendous costs it had never been possible to rebuild the whole hardwired system despite the fact that the demands on communication have changed drastically in the last decades. In contrast to this wireless communication networks and fast changing peer-to-peer networks give the ability to choose the appropriate network architecture. Power efficient and stable networks are accomplishable if this architecture is dynamically changed, according to the demands of the user. But up to date most protocols for peer-to-peer-networks just randomly choose the set of peers with which data is shared directly. Due to this, the resulting communication network has random features that hamper e.g. efficient searching algorithms. A first attempt to overcome this random character of the network has been proposed by Ng and Sia [3]: every

S

This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. The authors are with Parallel Computing, Wilhelm-Schickard-Institut f¨ur Informatik, Eberhard-Karls-Universit¨at T¨ubingen, Sand 14, 72076 T¨ubingen, Germany. Email: lehmannk/[email protected]

participant of the peer-to-peer network holds a list of the kind of data packages he is interested in. A new participant chooses some entry point and then walks randomly through the existing network to find persons with similar interests. Then the participant chooses some of them for building up so called quality links and additionally some random links to guarantee the connectivity of the whole network. By this procedure every participant will be clustered into a subnet of persons with similar interests. A fast searching algorithm is now able to use the long random links in areas that are not likely to contain the required data and to switch to a broadcasting protocol in subclusters of persons with similar interests to the query. The appropriate network architecture will also be extremely important in so called wireless multi-hop ad hoc communication networks. In classic mobile networks communication packets are routed over a centralized backbone system of basestations while in multi-hop networks the mobile devices have the routing functionalities. Each device is able to communicate directly with a part of the adjacent neighbors and provides routing services for communication between non-adjacent devices. There are two strict constraints that have to be considered: One is the limitation of resources such as energy or bandwidth, the second is the lack of any global information. To generate stable and efficient network architectures special algorithms have to be developed that are working decentrally and allow self-organization of the system. In [4] the following local organisation rule is proposed: Every new node will send a so called hello messages to its direct neighbors within a specified transmission radius to get connected to them. If the number of neighbors is too small within this radius it enhances its transmission power until a given minimal number k of neighbors is achieved. Every node receiving a hello message from a new node is forced to answer with a hello-reply message even if it already has the minimal number of neighbors. We will refer to this model in the following as k-next-neighbors model. The resulting network will be connected for reasonable high with high probability and most devices will not have many more than k neighbors. Both introduced decentral approaches generate quite functional networks but the main problem is that they tend to disfavour some nodes in a severe manner. In the first model commonly known entry points will have a higher probability of gaining too many links and in the k-nearest-neighbors model the special geometry of the network is disregarded. Since the network results in a grid like architecture routing on approximately shortest paths will affect inner nodes much more than nodes on the border of the network. That is, energy 

0000–0000/00$00.00 c 2002 IEEE

SUBMITTED TO IEEE

consumption is not uniformly distributed but allows higher demands in some regions while relieving others unproportionally (s. Fig. 6(a)). To overcome this situation the evaluation of the current network architecture is needed, followed by some appropriate adaption if necessary. A common way to determine the ability of a network to provide efficient communication is the calculation of centrality indices, first introduced in the social network analysis [5], [6]. One of the most prominent of such indices is the betweenness-centrality index that has become an important analysis tool in fields as diverse as epidemiology, design of router networks, load of communication networks and co-citation networks [7], [8], [9], [10], [11]. It measures which proportion of communications on shortest paths would be afflicted by a removal of a given node in the graph and was first proposed by Freeman [2]. It has been shown that this static feature of a network has a non-trivial inter-dependency with the load of the communicating nodes [10]. As a static feature of the network it gives a good approximation of the network’s sensitivity to congestion. As such it can be used for the routing protocols to avoid paths with high congestion probability or to reconstruct the connections from individual nodes to optimize the network structure. The improvement of communication networks by evaluating the centrality and appropriate adaptation has been first proposed in [9] but the author did not provide any decentral algorithm with which the nodes could do the evaluation themself. We want to propose here a general framework that can be used for the calculation of different centrality indices that are based on shortest paths and distances in the network. The emphasis of this paper is on the calculation of the betweenness-centrality index which gives most information on the structure of the network. We will show that the proposed algorithm for this latter centrality index is asymptotically optimal in its communication complexity and that its time complexity scales with the diameter of the network. We will further introduce the concept of evolving networks. By a periodically evaluating and subsequently adaptation the network is globally optimized. We will provide a proof of concept that illustrates the flexibility of evolving networks. The model is applicable to many networks first of all wireless multi-hop ad-hoc communication networks, but also router networks, peer-to-peer networks, upcoming e-government systems, and even biological networks. The remainder of the paper is organized as follows: in Section 2 we define the setting of a wireless multi-hop system and describe a general framework for all centrality calculating algorithms. Section 3 gives the details of the algorithms for four classic centrality indices. We will further analyze their respective complexity and the decentrality, scalability and reliability of the algorithms. In Section 4 we will discuss how the evaluation of centrality can be used to optimize the global network architecture.

2

II. G ENERAL F RAMEWORK A. The Model: Definitions and Settings 1) General terms: A wireless multi-hop ad hoc communication network consists of a set of communication devices that are connected by communication protocols that enable direct communication between devices within a given distance. In the following we will describe such a network in the following terms: Let be a graph with undirected edges, without multi-edges. Every node represents one device, every edge represents a direct communication between nodes and . The number n of nodes is defined as the cardinality of , the number m of edges is defined as the cardinality of . A node is a neighbor of node if both are directly communicating with each other. We assume that every node maintains a list of its neighbors and that the graph is always connected. The degree of node is defined as the number of its neighbors. A path from a source node to a target node is defined as a set of nodes such that and all are directly connected. The hop length of a path denotes the number of edges on it, that is . The distance between two nodes and is defined as the minimum hop length of any path between and . Every path with minimum hop distance length will be called shortest path between and . A node is called predecessor of node with respect to some source node if and are neighbors and there is at least one shortest path between and running over . Vice versa, is called successor of node . Neighbors that are neither predecessors nor successors with respect to a given source node are called neutral. denotes the set of all predecessors of node with respect to source node . The number of shortest paths between and will be denoted , with . The number of shortest paths between by and that run over node , that is , is denoted by . The probability of lying on a randomly . picked shortest path from to is given by The diameter of the graph is defined as the length of the longest shortest path between any two nodes of the graph. The transmission radius of a node is defined as the furthest geographical distance to any of its neighbors. Every node has a unique identification string . 2) Synchronization: We assume that the system can be synchronized. The basic idea is that each communication device has a maximal possible transmission radius . Let be the time needed for any message to travel . Such, any message sent at time will be received by all neighbors at the latest at time . We further assume that any computation at any node can be done within a fixed time span . This assumption is based on the fact that the number of messages received at one time unit is restricted by the number of nodes in the system. In summary, we assume that the receiving and computation of all current informations in the system is finished after . With the introduction of a global some time span

 

        

 



  

' 1&, 23#  "6/7# ' # 

 !" # $% '&( *)",+-+.+- "/10  *4  *4-5&3

67!" #  8 :9JC @ K LNMPO@M%QOR.Q SUT VXW # Y 

Z1V[ 

\ ] \ #9 ] # ]_^ ] 9 ]^

SUBMITTED TO IEEE

3

#

clock it is now possible to synchronize the system. At a given starting time  all nodes will send information to all neighbors. Every node will wait time interval for messages from its neighbors and after that start computation on the received messages. The newly generated informations will be to be sure that every sent to all neighbors in time  node has received all relevant messages and has finished every necessary computation. To simplify things the notion ’received (computed) in time ’ will be a short cut for ’received  (computed) in the time interval from time to  time ’.  3) Analysis of algorithms: The communication complexity  of an algorithm is defined as the number of elementary messages that are sent throughout the algorithm (adapted from [15]). The time complexity of an algorithm is defined as the number of time units required until the algorithm terminates [15]. The package complexity of an algorithm is defined as the number of packages that have to be sent throughout the algorithm. A package is defined as some set of elemantary messages that can be sent together in one message.

]

# 9]9]^

#4 # 9  9JC  

The Betweenness Centrality is the most complex measure. It sums over all probabilities . Such, it measures the expected routing service demands of node if every node was communicating with every other simultaneously. This can be interpreted as an approximative congestion sensitivity of . C. Two phase model The general framework for algorithms calculating these centralities consists of two phases. In the first phase, called Count Phase, the distance or number of shortest paths is calculated, respectively. In the second phase, called Report Phase, the according value is reported back to all other nodes. We will first describe the two phases and then show that both phases can be interlaced with each other. The general framework is sketched in Fig. II-C. In the following, messages that carry new and relevant data will be called relevant messages. A message that contains data a node already has received earlier is called a backfiring message. 1) Count Phase: To describe the calculation of the number of shortest paths between any source node and target node we first have to state the following Lemmata. Lemma 1: A node is on the shortest path from a source node to a target node if and only if . This follows directly from the definitions of shortest paths and distance. Lemma 2: The number of shortest paths from to is the sum of shortest paths from to all predecessors of :

#

# '# 8 B !'  9  #

A>JC?

Graph Centrality [13] 



 

(2)

Stress Centrality [14] 

   > A >JC  

Q> C



 



(3)

  

Betweenness Centrality [2] 

   >  >JC @ K

Q> C

 



(4)

!  "

Centrality indices are designed such that the highest value indicates the most central node. An example graph with these centrality indices can be seen in Fig. 1 The interpretation of the meaning of this centrality indices is based on two assumptions. First, most routing protocols try to establish communications on shortest paths. Second, the hop distance is a (scaled) approximation for the real length of the path between two communicating nodes. Under this assumptions, the Closeness Centrality gives an indication how long it would take the node to communicate in a serial manner to all other nodes in the network. The Graph Centrality indicates how long the parallel communication to all other nodes would take at most. The Stress Centrality is measuring on how many shortest paths a node lies.



# %$

COUR T AB> #

A >JC

#

 ##

(5)

Lemma 2 follows from Lemma 1. We use the synchronization of the network as assumed above. All nodes start with the counting at the same global time . At this starting time an initial number of shortest paths (NOSP) message is generated and sent to all neighbors. Every NOSP message contains the of the source node and an integer number that denotes the number of shortest paths from to the currently sending node. This number will be called the weight of the message. The weight of the initial message will be ’one’ since the number of shortest paths from a node to itself is defined to be one. Based on Lemma 2, every node will sum the weight of all messages from direct predecessors with respect to . This sum is the number of shortest paths from to and will be stored such that it can be retrieved by the of the source node. This newly computed number of shortest paths will be broadcasted with to all its neighbors in the next time unit. This includes also the broadcasting to the predecessors. Thereby, the message will be a backfiring message for some of the neighbors. To decide whether a message came from a predecessor and therefore is relevant or a backfiring message every node checks for all messages whether the already has an entry in their number of shortest paths table. If so, it ignores the according message and will not broadcast it further. By definition of the synchronization it is clear that in time all nodes with distance will receive their first message

Z1V[P





Z'V P

A >S

Z'V[!P

#

Z1V[P

#/

SUBMITTED TO IEEE

4

Graph Centrality

Closeness Centrality

CG(a)=1/2

CC(a)=1/6

a

a CC(d)=1/5

CC(b)=1/7

b

e

d

CC(c)=1/6

CG(b)=1/3

CBC(e)=1/8

b

CG(d)=1/2

CG(c)=1/2

c

e

d

c

Stress Centrality

Betweenness Centrality

CS(a)=4

CB(b)=2

a

b CS(d)=10

CS(b)=2

b

Fig. 1.

CG(e)=1/3

CS(e)=0

CB(a)=1

e

a

d CS(c)=4

CB(c)=2

c

c

CB(d)=7

CB(e)=0

d

e

The figure shows the four different centrality indices that are based on shortest paths and therefore are important for communication networks

)

,1 (a

(a ,1

)

t0

Decentralized Centrality Calculation a e

t1 (a ,b

e t2

)

)/2 )/2

(a ,d

)/1

a

(a ,c

)

,2 (a

(a ,2

c

t3

)

(a,2)

e

(a,e)/1

e

)/2

d

,c (a

d c

c

b

a b

e

a

(a ,1

)

,1 (a

t2

(a,1)

(a,d)/1

b

(a ,c

(a ,1

d

)/1

)

,1 (a

b

d

,b (a

)

b

a

c

)/1

c t1

a

,d (a

)/1

d

(a (a ,c) ,e /2 )/1

b

d

e

)/2 ,c /1 (a ,e) (a

)/2

,c (a

C

(a,c)/2 (ae)/1

Count Phase Report Phase Fig. 2. The figure shows the general algorithm for calculating centralities in a decentralized manner at the example of calculating the number of shortest paths from node to all other nodes. On the left side of the figure the Count Phase is shown, on the right the Report Phase. Red messages indicate backfiring  node is sending its NOSP message to its neighbors  and  .  broadcasts it to (ignored) messages that will be ignored by the receiving node. In time and  in . Additionally  sends a report message   to and  because it knows its number of shortest paths to . This report message will be ignored by  since it is not yet in the Report Phase regarding source node . Also node  will broadcast the NOSP message to (ignored),  and  . Additionally, it  sends the report message    to all its neighbors. In time every node knows its number of shortest paths to and nodes  and  start reporting it back. The last report message will reach node at time and the algorithms terminates.

SUBMITTED TO IEEE

5

from source node and are able to calculate their number of shortest paths to . With this observation the hop distance between any two nodes can be determined by storing the time at which the first message with has been received. It follows that no weight is needed in a NOSP message if only the distance is calculated. It is important to note that the NOSP messages from different source nodes do not interfere with each other. Thus, all number of shortest paths and distances can be calculated simultaneously. Moreover, every node can collect all relevant NOSP messages and send them to all its neighbors in packages. The elementary NOSP messages might then be coded as pairs of ,weight). 2) Report Phase: As observed above every node with distance to any other source node will know its distance and number of shortest paths in time . To report the according value back every node will create a report message with an -pair and as a weight the distance or number of shortest paths , respectively. This report message is sent to all neighbors of in time . A node receiving a report message in time will perform the appropriate computation to calculate its centrality and forward the message without changing it to all neighbors in time . To avoid the forwarding of backfiring messages each node has to build up an array of size . In this is stored whether the report message with a certain -pair has been received already. Every distance or number of shortest paths is uniquely identified by the source and target node and will only be reported once. It follows, that every relevant report message will have a different -pair. Since all pairs of nodes will report some value back an array of size is necessary and sufficient. The detailed computations for the four different centralities introduced above will be described in more detail in the next section. 3) Interlacing of the Phases: All NOSP messages are independent from each other as are the report messages from each other. Of course, a report message can only be sent if the according distance or number of shortest paths is already calculated. But beside this dependency a node can be in different phases with respect to different source nodes. That is, a node may at the same time be waiting for some NOSP message from some node , sending the report message for some node and having finished the Report Phase for another node two time units ago. 4) Analysis: We will first analyze the time complexity of the general algorithm and then determine the communication and package complexity. Let denote the furthest distance of any node to . Node will receive its last NOSP message at time  . It will send its last report message at time  and it will receive its last report message at time   . It is important to observe that until this time the node has received at least one report message every two time units. The reason for this is that there is some node in every distance from to and that the time until the according report message gets back to is twice the distance. With this observation, every node is

#4

Z1V[P

@Z1V[!%

B!" # Z1V

@Z1V[# 2 Z1V[P  #



# 4-5&





#/

A >JC #

# 4 # /U5I&

) 1Z V

Z'V

V[  

#

)

P^ ^ P^

# # R.SUT # 5  & R . U S T # ) .R SUT 5 &  ;

able to determine the end of the calculation and afterwards react to the calculated value.  In a global view the algorithm will stop after time units. That is, the time complexity T of any algorithm in this . framework is  It is very important to state that the diameter of a graph is more depending on the geographical area that the networks spans than on the number of nodes in it. We assume that every node has an average transmission radius that depends very little on the number of nodes in the system. In this case, the expected diameter of the graph will be the diameter of the geographical area divided by the average transmission radius. Such, the time complexity T is not scaling with the number of nodes but approximately scaling with the root of the spanned geographical area. It is obvious that the time complexity cannot further be diminished. To calculate the communication complexity we make the following observation. In the Count Phase every node sends one NOSP message over every edge. In the Report Phase every node sends  report messages back to the source.  

Thus, the communication complexity is  . This is optimal since every node needs to know the distance or number of shortest paths to every other node to calculate the centrality index it is interested in. To calculate the package complexity we observe that one node cannot receive and send more than different elementary messages in one time unit. This implies that the number of packages sent over one edge per time unit is bounded by a constant. The package complexity P is therefore given by 

 .

VW9;

V W

VXW

;

)

)9

@V W

III. D ECENTRALIZED C ALCULATION

OF

C ENTRALITY

In the following we will sketch the algorithms for the four different centralities and analyse their respective complexity. In the following, a report message that contributes to the centrality index of a node is called a contributing message.



A. Closeness Centrality The distance between all nodes is calculated in the Count Phase as described above. A report message is contributing to the Closeness Centrality of node if it contains its . Every node will compute its Closeness Centrality by adding up all weights from contributing messages. Contributing message will not be broadcasted since no other node beside is interested in the weight of the message. After having received the last contributing message the Closeness Centrality is determined by taking the inverse of the sum.





Z'V



B. Graph Centrality



The distance between all nodes is calculated in the Count Phase as described above. Every node will store the currently maximal weight of all received contributing messages. After it has received its last contributing message the stored value is inversed and hence the Graph Centrality of the node.

SUBMITTED TO IEEE

6

C. Stress Centrality To calculate Stress Centrality distance and number of shortest paths have to be computed in the Count Phase. The report messages hold as weight the distance from pair . Before describing the Stress Centrality algorithm a further lemma is needed. Lemma 3: The number of shortest path from a source node to a target node running over is given by  . Proof: The proof is given by an inductional argument. For a direct successor Lemma 3 and Lemma 2 coincide regarding that the number of shortest paths between and is one if and are neighbors. Assume that the lemma is true for all nodes up to distance from . Let be a node in distance . Accordingly, the number of shortest path from any source node to is given by

Z1V

@Z1V[P 2 Z1V[#

A >JC @ K  A > S A S C

#



A >JC  

#

#

9;



#

AB>JC2@ K 

#



CUO R T AB> @ K

 COUR T A > S A S  A> S C A S OR T A S C O R C T A S # Z'V @ Z1V[P U Z'V # 



#

# %$

# %$

(6)





#

#

(8)

Equ. (6) is given by lemma 1, equ.(7) uses the assumption. # is by lemma 1 the number of # %$ Regarding that shortest paths from to lemma 3 is proved. To calculate its Stress Centrality a node has to decide whether a given report message with -pair is contributing to its centrality or not. According to lemma 1 this can be easily determined by adding up the distances of to and to and compare it to the weight of the message. If the weight equals the sum the message is contributing otherwise the message will just be broadcasted to all neighbors. Following lemma 3, a node will calculate for all contributing messages with -pair the product  of and sum them up. This is possible because the number of shortest paths is symmetric. That is, and these values are stored in the NOPS table of . After a node has received the last report message the correct value of its Stress Centrality is obtained.

#

A >S A SC

@Z1V[P U Z'V # 

A > S  A S >

D. Betweenness Centrality The Betweenness Centrality is the most interesting centrality index for a communication network. It is superior to the Stress Centrality because it indicates the congestion sensitivity of a node where the Stress Centrality just absolutely counts the number of shortest paths the node lies on. The algorithm is nearly the same as the one for Stress Centrality but it needs additionally the report of the number of shortest paths between any node pair (referred to as NOSP weight). The definition of contributing is the same as in Stress Centrality algorithm. For every contributing report message with -pair it will calculate the  and divide it by the NOSP appropriate product of weight of the message. The sum of all these fractions gives the Betweenness Centrality. Fig. 3 and Fig. 4 describes the pseudocode for this algorithm.

'#

Z1V

A @Z1> S V[AP US C Z'V # 

Fig. 3. This pseudocode describes the Count Phase of a Betweenness Centrality calculating algorithm.

(7)

# %$

Z1V

Count Phase for calculating Betweenness Centrality at node v  initialize first message with ownID and initial numberOfShortestPaths 1  new message( , 1);  send message to all neighbors in time  while any message has been received in time  do forall messages do if if  of received message has been stored already in time  do ignore else do

store  and sum numberOfShortestPaths over all messages with same 

store time and numberOfShortestPaths together with 

new message( , numberOfShortestPaths)

add new message to listOfMessagesToSent od od  sent listOfMessagesToSent to all neighbors at time   od

Report Phase for calculating Betweenness Centrality at node v forall   between v and s known at time  generate report message with ( , ) and weight     send report messages to all neighbors in time   while any report message has been received in time  do forall messages do if ID-pair ( , ) has already been registered do ignore else do

multiplicate   with    , divide by weight of message and add value to betweenness-centrality memory cell

add report message to listOfMessagesToSent od od  sent listOfMessagesToSent to all neighbors at time   od Fig. 4.

Pseudocode for the preparing and handling of report messages

E. Decentrality, Scalability and Reliability In the following we want to discuss the decentrality, scalability and reliability features of the general framework. 1) Decentrality It can be easily seen by the above given pseudocodes that all operations only require the list of neighbors which with direct communication is possible. Since these lists can be provided by algorithms with local decision rules like in Glauche et al. [4], the algorithm is totally decentral. 2) Scalability As mentioned above the proposed algorithm scales in its time complexity with the diameter of the graph. As outlined before the diameter of a graph is normally depending on the area the network spans and not as much on the number of nodes in this area. It follows that the time needed for calculation might be constant if the area is restricted. The demands on memory scale with . This can be reduced to a linear dependency without enhancing the communication complexity by more than

)

SUBMITTED TO IEEE

7

a factor. 1 3) Reliability under failure Centrality indices are measures for static graphs. If the computation time is so long that a big part of the network has meanwhile changed the results will not be very meaningful. However, we want to show here that the algorithms always terminates and will give meaningful results if the change of the network has not been too dramatically. The termination of any of the algorithms is given by the fact that every node will forward every NOSP and report message just once. The quality of centrality indices after some change of the network is depending on the architecture of the network. The architecture of a wireless ad-hoc network is grid like due to the restricted transmission radius. By this the number of shortest paths between any nodes is very high and the failure of some portion of the graph likely will not change the distance between them. This implies that the calculated values for Closeness Centrality and the Graph Centrality of the remaining nodes will resemble the values in the changed network. This result can be further improved if failing nodes are able to broadcast a failure notice with all known distances to other nodes. With this every other node is able to update its centrality index by simply subtracting the according values. The situation is more complicated with Stress and Betweenness Centrality. With every failure the number of shortest paths between any node pair is changing in a complex way. This makes any updating of the current calculated values very complicated. Still, all remaining nodes will calculate the correct centrality with respect to the moment were the Count Phase began. It is reasonable to assume that the Betweenness Centrality depends mainly on the geographical position a node has in the spanned network area. If it is on a border of that area the likeliness is high that the index is low. If it is an inner node the index will be high. With this observation it follows that the absolute centrality index may vary due to minor changes in the network but still the calculated value will reflect the position of a node in the network and an adaptation according to the current value is reasonable even in cases of failure. IV. D ISCUSSION We have shown in this paper that the decentral calculation of diverse centrality indices is possible. The general framework has been shown to work optimally with respect to all analyzed complexity measurements. We want to discuss here how the proposed algorithms give rise to optimization of the network. We introduce here a model of evolution of the networks that consists of evaluating and adaptation phases. We assume that some lower level protocol will generate the network architecture whenever a node is lost or a new node is attached to the network. This architecture might not be optimal. As described above the 1 We

will be happy to give the details of this improvement on request.

algorithms for a peer-to-peer network in [3] and wireless multi-hop ad hoc networks in [4] disfavour some of the nodes. The latter will produce a mesh or grid like architecture where inner nodes are more probable to route a communication than are nodes on the border of the network. This can be seen in Fig. 6(a). A periodically evaluation of for example the Betweenness Centrality will reveal these differences. With an appropriate local rule each node decides whether and how it will change its list of neighbors. This gives rise to an emergent behaviour that optimizes the global network architecture. The evolution of a network as a cyclic process of evaluation and adaptation is sketched in Fig. 5. As a proof

Evolution of a Network Loss & addition of nodes General Architecture Protocol Adjusts network by local rules Appropriate Adaptation, e.g. re-attachment

generates networks dynamically

Evaluation of Centrality Reveals bad substructures

Fig. 5. The diagram shows an evolving network in which a basic architecture algorithm generates networks. The network is changing whenever nodes are lost or added to it. All nodes periodically evaluate one or more centrality indices. After evaluation they adapt their direct neighborhood according to some local rule. With an appropriate adaptation rule nodes with an too high load will be relieved, the diameter of the network decreased and the communication stabilized.

of concept we made some simulations on a network with 1000 nodes. The nodes were identically indepently distributed in a (10,000 x 10,000) 2D-area and connected according to the k-next-neighbors rule with 10 nearest neighbors. One example can be seen in Fig. 6(a). We conducted 10 evaluation steps with the following adaption rule. Adaptation rule If the Betweenness Centrality of a node is smaller than a given minimum it connects to the next furthest node. Thus it increases its transmission radius. If the Betweenness Centrality of a node is higher than a given maximum it will remove the connection to the neighbor with the highest centrality index. The minimum threshold was set to 1000, the maximum to 50 000.

SUBMITTED TO IEEE

8

747

747

285

947

285

947 632

398

632

398

190

880

190

880

213 647

213

520

385

647

520

385

206

206

712

712

600

600 360

360 590

590

904 706

904 706

417

773

417

773

649

649 442

442

727

519

589

727

519

589

777

777

35

35

350

350 329

599

329

599

699 910

699 910

548

548

670

670

735

735

953

953

454

454 479

479

352

352 740

644

740

644

951

951 915

915 81

81

507

708

507

708

897

897

253

253 508

508 898

898

793

793 125

125

478

680

478

99

646

588

489

680

787

99

646

588

489

787

41

41

682

682 909

909

990 985

990 985

689

324

529

689

324

529 950

950 698

698 355

48

759

977

434 991

637

822

822

139 799

991

637

659

143

359

77

143

359

799

77

686

686

539

307

539

307

309

542

759

977

434 659

139

355

48

309

542 968

625

968

625

713

713 198

198

55 988

55 988

452

452

287

287 938 89

938 89

811

221

811

221

214 34

620

251

403

214 34

620

231

251

403

231

72 924

956

123

72

742

924

937

956

123

742

937 751

751 376

376 247

247

936

936

273 332

455

702 467 192

467 192

226

806 687

124

254

279 226

624

806 687

492 153

629

157

675

183

300 158

304

421

279

163

332

455

702 157

675 254

492 669

273

124

183

300 158

304

421

153

624

629

163

669

111

111 238 115

238 115

622

318

622

318

284

284

863

831

82

558

863

831

82

558

846

846

176

176

78

78 996

100

272

278

104

546

546

23

145

358

996

100

272

278

104 23

145

358

393

393 792

792

530

530

569

549

181

569

369

763

322

549

181

582

582 656 568

970

180

615

809

235

888

550

180

615

809

833

833

690

690

364

672

655

970

966

235

888

550

369

763

322

656 568 966

140

364

672

655

140

544 368

976

532

602 182

544

856

182

368

370

374

784

856

784

348 972

972 306

496

635

306

496

635

654

812

812

594

581 555

338

976

532

602

370

374 348

654 594

581 555

338

779

779

420

420

194

194 586 908 323

663

556

586 908 323

963

663

556

928

963

928

245 810

457

245

174

810

457

498

531

482

174 498

531

482 739

739

842

463

842

463

202

652

843

758

758

493

469

168

202

652

843

493

469

168

828

335

828

335

873

764

873

764

339

339 610

610

484

484 864

516

567 845 282

845 408

159

282

408 308

729

574

525

345 159

850

229

729

715

113

256

308

850

864

516

567

345

715

113

256

229 574

525

3

3

1

270

1

879

270

879

42 431

42

22

431

912

958

22

912

958 940

940 585

585 547

776

547

776

852

741

86 129

816

895

514

15

852

741

86 129

816

895

518

514

15

518

552

552 858

858 607

607 835

835

460

406

460

406 108

633

108 633

732 372

732 372

411

411

730

778

730

778

982

982

396 560

396 560

233 220

233

379

26

330

577

220

379 533

326

665

215

884

26

330

577 533

326 147

147

665

215

884 334

334 678

366

678

366

772

772

283

283

664

391 75 172 701 44

545

701 728

189

44

545

448

728

189

106 500

500 51

133

491 905

900

575

818

887

887

844

444

51

133

913

491 905

818

804

343

80

658

303

106

913

575

448

343

80

658

303

900

664

391

75 172

844

444

804

12

12 102

102

232

232 63

964

443

63

964

243

697

443

243

697 453

453

685

685

813

262

17 919

813

27

997

93

93

43

29

43

29 823

67

320

262

17 919

27

997

752

823

67

320

752

252

333

252

333

228

228 266

617

57

774

849

381

266

617

57

774

849

381

660 38

660 38

293

6

293

6

76

76

962

962 766

316

138

112

237

298

766

316

138

112

237

651

704

298

651

704

37 187

37

327

187

668

327

668

573

573 614

614 559

536

559

536

310

310

691

691

336

336

705

319 114

705

319 114

760

142

760

142

13

13

79

79

134 914

881

340

914

881

340

584

130

717 584

999 662

989

999 662

605

429

989

429

661 137

134

130

717

605

661

317

395

137

317

395

869

869

18

18 540

676

540

676

5

290

5

290

471

471 466

466

45

413

45

413 767

767

782

782

750

886

750

886 627

794 579

423

579

862 432 164

983

32

384

837

967

692

641

692

935

19

515

694

354

384

667

347

694

893

893

230

230

923 127

896

933

296

815

923 127

896

933

693

475

36

415

723

641

354

935

19

515

299

885 983

32

667

347

801

468 561

862 432 164

837

942

726 162

415

723

967

627

794 423

468 561

299

885

801

942

726 162

693

475

36

296

815

932

932 24 805

24

388

557

805

945

388

557

945

363

363

894

222

894

892

734

218

785

222

892

734

218

785

796

796

265

265 402

992

261

797

878

402

992

261

797

878

87

87

606

606

66

66 195

195

872

116

872

116 592

592

640

640

674

674

200

200 167

167

248

440

955

248

440

955

921

666

921

666

502

280

502

280

244

244 721

817

721

817 534

534 239

754

239

754

883

883 74

430

110

551

74

430

110

551

480

866

480

866 566

240

825

33

673

639

566

240

825

33

673 745

639

745

73

73

720

720

631 807 49

631 807

105 49

90

511

105

90

511

152

765

598

152

765

598

714

509

714

509

286

465

286

465

907

907 613

613

648

141

890

109

258

109

325 526

497

930

10

826

580

524

205

824

313

930

356

512

10 165

392

576

526 356

512 337

762 671

185 593 459

325 392

576

83

60

554

236 611

258

497 826 205

648

762 671

185 593 459

580 824

141

890

554

236 611

242

83

60

524

165

337

313

242

738

738 925

925

96

96

9

9 169

859

169

122

859

122

700

748

700

748

636

193

636

193

7

7

40

971

40

971

178

178 931

458

830

931

458

830

481

851

481

851

571

571

965

965 865

865

435 445

435 445

857

857

62

62

733

733

450

450

136

414 362

645

136

414 362

645

946

761

946

761

382

382

854

854 436 117 487

436

757 117

25 877

926

487

456

196

477

860

344

877

267

267

175

177

719 564

944

250 118

250 118

959 841

800

473

959 841

451 473

634

638 281

802

315

501 173

795

486

638 281

802

315

173

210

819

795

486

819 409

409

927

927

154

154

126 688

126 688

595

0

474

331

975

175

177

719 564

944

451

210

595

917

0

474

331

975

906

917

906 722

276

54

101

722

276

54

101 441

441 798

798 257

257

68

259

483

838

994

472

346

166

553

537

746

166

746

219 349

84

84

875

14

14

389

389

224

224 227

227 783

783

737

476 91

314

148

737

476

847

612

987

377

995 146

612

889

570

390

405

199

986

780

949

397

889

570

390 377

506

786

780

949

397

952

847

504

987 506

786

148

834

427

504

91

314

834

405 995

346

439 769

537 219

472

209

553

439 769

349

68

259

483

838

994 209

875

427

477

401

446 803 503

634 501

860

344

121 781

803 503

800

757

25

456

978

922

401

446

121

926

196

978

922 781

199

986

146

952

46

46 623

179

623

179

513

603

513

603

353

11

353

11

394

979

394

979

425

425

736

736

911

961

911

961

65

65 424

424

861

711

255

217 874

201

596

743

743 103

103 297

297 867

274

39

867

274

39 461

416

461 416

707

707

419

419 731

630

731 630

98

98

208

876

903

4 771

288 902

902

681

681 643

643

70

70 107

289

289

499

69

373

47

47

28

621 974

216

128

31

52

216

517

517

653

149

28

621 974

128

31

941

184

241

52

373

791

184

241

499

69

941

791

208

876

827

903

4 771

107

151

653

149

583

151

583

562 407

268

510 407

572 92 642 993

587

207

16

207

20

790

616 312

609

20

790 292

616 312

609

246 58

53

918 400

53

918 204

400

871

840

650

204

604

399

94

277 449

449

677

870

871

428

399

94

277

150

58

749

21

840

650

428 604

16 998

246 749

21

291

263

934 703

132

679

61

8

703

993

587 957

998

263

934

572 92 642

291

132

679

61

8

292

562

268

510

957

861

711

255

217 874

201

596

827 288

527

677

870

527

808

626

948

295

839

71

591

85

855

150

948

295

839

71

591

85

855

808

626 426

464

375

271

901

64

426

464

212

375

271

901

64

212

294 683

294 683

755

770

755

770 341 203

788

341 203

788

365 943

404

718

365 943

404

718

973

973

969

969 848 485

891

848 485

170

160

170

160 891

916

628

916

628

156 775

156 775

521

695

521

695

367

367

351 188

929

351 188

899

899

657

161 357

186 543

657

161

56

438

357

186

249

832

929

543

56

438 249

832

495

495

684

684 2

920

2

920

302

305

302

305

43730

386

43730

386

301 387

422

301 387

422

95

95 528

528 565

565

269

269 144

981

538

155

814

211

144

981

311

960

538

155

814

211 311

960

371

371 789

789

724

378

696

724

378

696 710

618

494

710

618

494 275

135

725

954

954

608 836

608 836

361

361

984

984 505

131

131

756

756 120

829

619

275

135

725

505

321

264

120

829

619

853 470

383

321

980 462

191 433

342

535

264

853 470

980 462

191

383

433

342

535

418

418 709

821

59 223

709

821

59 223

380

380

447

744

447

744

488

488

50

50 490

410 234

490

410

563

225

234

939

563

225

939

328

328

88

88

601

171

820

601

412

171

820 412

578

578

197

197 541

541 119

119 597

597

522

522 523

523

260 716

260

882

768

716

882

768

753 97

868

(a) Network before evolution

753 97

868

(c) Network after evolution

Fig. 6. The figure shows on the left a communication network of 1000 nodes. The nodes are identically independently distributed on a (10,000x10,000) grid. Each node is connected to its next 10 neighbors, symbolized by an edge between the nodes. The network on the right has evolved from the left network after 10 simulation steps. In each simulation step the Betweenness Centrality of all nodes has been calculated. Subsequently, every node with a Betweenness Centrality index less than 1500 enhanced its transmission radius to connect to the next furthest. Every node with a Betweenness Centrality index of more than 50 000 cut the connection to the neighbor with the highest Betweenness Centrality. The result is a network in which nodes on the border spread their connections deep into the network. The inner part of the network is partly sparser than before. The evolved network has a lower diameter and lower average distance. Whereas in the left network some nodes have Betweenness Centralities of up to 105 000, in the right network the maximal index is 49 000.

After 10 simulation steps the diameter of the graph is reduced from 26 to 22, the average hop distance from 10.66 to 10.07. The maximal transmission radius has increased from 1262 to 1776, the average transmission radius from 654 to 814. The maximal Betweenness Centrality is more than halved from 105 000 to 49 000. Of course, every graph will reduce its diameter if the average transmission radius is enhanced. The important observation here is that the nodes with high load in routing will maintain or decrease their transmission radius. The global decrease of the diameter is therefore conducted by increasing the transmission radius of the less loaded nodes. This single simulation shown here is exemplary. Still, finding the correct minimal and maximal Betweenness Centralities according to the number of nodes is not trivial as is neither determining an appropriate adaptation rule. We want to emphasize that the main goal of our paper is to show that a decentral computation of centralities is possible in an efficient way. The above given example is therefore just an illustration of what we think can be done with these centralities. Nonetheless, the given adaptation rule might be a good starting point for real applications: It encourages the nodes to enhance their transmission radius if they are not likely to suffer high routing demands from others. Simultaneously, highly recommended router points are relieved. Thereby, every node contributes similar amounts of energy to the global system, either by maintaining a higher transmission radius or

by an expanded routing service. R EFERENCES [1] U. B RANDES , ”Faster Evaluation of Shortest-Path Based Centrality Indices,” J. Math. Soc., vol. 25(2), pp. 163-177, 2001 [2] L.C. F REEMAN , ”A set of measures of centrality based on betweenness,” Sociometry, vol 40, pp. 35-41, 1977 [3] C.H. N G , K.C. S IA , C.H. C HAN , I. K ING , ”Peer Clustering and Firework Query Model in the Peer-to-Peer Network,” Dep. Comp. Science & Eng., Chinese University of HongKong, HongKong, China, Tech. Rep.,2002 [4] I. G LAUCHE , W. K RAUSE , R. S OLLACHER , M. G REINER Continuum percolation of wireless ad hoc communication networks Physica A, 325 (3-4), 577-600 (2003) [5] S. WASSERMANN , K. FAUST Social Network Analysis: Methods and Applications Cambridge University Press, 1994 [6] J. S COTT Social Network Analysis: A handbook Sage Publications, London, 2003 (2nd ed.) [7] M. E. J. N EWMAN , M. G IRVAN Finding and evaluating community structure in networks preprint cond-mat/0308217 [8] M.E.J. N EWMAN Who is the best connected scientist? A Study of scientific coauthorship networks Phys. Rev. E 64, (2001) [9] V. K REBS The social Life of Routers The Internet Protocol Journal 3(4),14-25 (2000) [10] P. H OLME Congestion and centrality in traffic flow on complex networks Advances in Complex Systems 6, 163-176 (2003) [11] K LOVDAHL , A.S., G RAVISS , E.A., YAGANEHDOOST, A., R OSS , M.W., WANGER , A., A DAMS , G. AND M USSER , J.M. Networks and Tuberculosis: An Undetected Community Outbreak Involving Public Places Social Science and Medicine, 52(5), 681-694, 2001. [12] G. S ABIDUSSI The centrality index of a graph Psychometrika 31, 581603 (1966) [13] P. H AGE , F. H ARARY Eccentricity and Centrality in Networks Social Networks 17, 57-63 (1995)

SUBMITTED TO IEEE

[14] A. S HIMBEL Structural parameters of communication networks Bull. Math. Biophys.15, 501-507 (1953) [15] R OBERT G. G ALLAGER Distributed Minimum Hop Algorithms Technical Report LIDS -P- 1175, MIT, (1982)

9

Suggest Documents