Fault-Tolerant Architecture for Peer to Peer Network Management Systems Maryam Barshan1, Mahmood Fathy1, and Saleh Yousefi2 1
Computer Engineering Faculty, Iran University of Science and Technology(IUST)
[email protected],
[email protected] 2 Computer Engineering Department, Urmia University
[email protected]
Abstract. In this paper we propose a 3-tier hierarchical architecture which is based on peer to peer model for network management purpose. The main focus of the proposed architecture is provisioning fault tolerance property which in turn leads to increasing the availability of the Network Management System (NMS). In each tier of the architecture we use redundancy to achieve the aforementioned goal. However we do not use redundant peers thus no peer redundancy is imposed to the system. Instead we use some selected peers in several roles and therefore only add some software redundancy which is easily tolerable by advanced processors of NMS’s peers. Due to the hierarchal structure failure of nodes in each tier may affect NMS's availability differently. Therefore we examined the effect of failure of peers which play different roles in the architecture on the availability of the system by means of extensive simulation study. The results show that the proposed architecture offers higher availability in comparison to previously proposed peer to peer NMS. It also offered lower sensitivity to failure of nodes. Keywords: availability, fault tolerance, network management, hierarchical P2P networks.
1 Introduction Nowadays the interest of almost all companies in taking advantage of new technologies and reaping their benefits leads to increase of network complexity. One of the challenges is offering various services in a high quality. However emergence of new equipment and services and increasing the number of users with diverse service demands make the management of new generation networks difficult. In order to succeed in management of such complex networks a robust, reliable NMS is required. IETF’s SNMP (Simple Network Management Protocol) and ITU-T’s TMN (telecommunication Management Network) have been two main network management technologies in the computer networking industry. Nowadays, due to some new challenges including dynamic topology configuration, interoperability among heterogeneous networks, QoS guaranteed services, etc they show some weakness points in the management of such complex networks [1, 2]. To address these shortcomings two S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 241–252, 2009. © Springer-Verlag Berlin Heidelberg 2009
242
M. Barshan, M. Fathy, and S. Yousefi
alternate technologies are recently introduced in the network management community: web services and peer to peer (P2P) [3]. The well known characteristics of P2P approaches lead to scalability, flexibility, reliability and improving the quality of current network management solutions. Therefore one of the solutions for addressing network management challenges is to use P2P-based NMSs which are based on overlay networks. A management overlay can potentially encompass different administrative domains. Thus it can put together different human administrators and diverse systems and networks in order to accomplish a management task in a cooperative and integrated manner. Basically a network management system will work effectively and efficiently only when it could gain the information it needs from the network and could be able to change the configuration or apply attributes when necessary. Therefore, one of the requirements of a management system to deliver the needed service is to provide reasonable availability. Availability of a system can be investigated from several viewpoints including QOS [4], security [5] and fault tolerance [6]. Fault tolerance is the ability of a system to offer services even in the presence of a fault. It is one of P2P networks requirements in order to avoid data losses and to support proper message transferring. Redundancy is a technique for applying fault tolerance and increasing reliability and availability of P2P systems. It can be used for different resources. In this paper we propose architecture for increasing the availability of the network management system. We intend to achieve such a goal through increasing fault tolerance property of the NMS by applying redundancy. However the architecture we proposed does not impose any peer (hardware) redundancy but software redundancy. This is mainly because we use some peers in several roles and thus add some software redundancy which is easily tolerable by advance processors of NMS’s peers. We conduct extensive simulation study to examine the performance of the proposed architecture in presence of nodes' failure. We also investigate effect of failure in different nodes and sensitiveness of the architecture to those failures. To the best of our knowledge this work is the first work which addresses fault tolerance in peer to peer network management and the proposed architecture is the first one in its type. The remaining of this paper is organized as follows. In Section 2 related works have been reviewed. In Section 3 the proposed architecture is introduced. In section 4 we evaluate performance of the proposed architecture using extensive simulation study and bring results. Finally, section 5 concludes the paper along with some guidelines for future work.
2 Related Work The work related to our study can be categorized in three main topics including P2P-based network management systems, fault tolerance in P2P systems and fault tolerance in hierarchical P2P systems. Reference [7] has carried out one of the first investigations of using P2P paradigm in network management. The authors of [8] designed a P2P overlay to address fault and performance management. Another use of P2P management which is used for Ambient Networks (ANs) has been reported in by [9]. More importantly in a European Celtic Madeira project [10] the goal is to implement P2P-based management
Fault-Tolerant Architecture for Peer to Peer Network Management Systems
243
systems for mesh and ad-hoc networks. The project uses inherent P2P characteristics for automatic and dynamic network management in this kind of networks. Madeira peers are organized in some clusters with selected nodes as cluster heads. The cluster heads are re-clustered in another layer and this trend is continued until one peer remains at topmost level. The aforementioned process leads to a multi-tier hierarchal P2P architecture. The authors of [11] point out that using such architecture and more specifically a 2-tier hierarchical P2P technology is a flexible, scalable and easy to use solution in network management applications. Note that [10] and [11] like all P2P architecture make use of application layer routing which causes improvement of connectivity between management entities. Furthermore the advantage of using grouping in each layer for balancing management tasks is discussed. Deploying redundancy is a technique for increasing reliability and availability of P2P systems. In DHT (Distributed Hash Table) P2P systems, three kinds of redundancy are discussed: replication and erasure coding or combination of the two methods [12, 13]. Due to high churn rate in P2P networks, the authors in [14] proposed a technique in order to ease fault discovery and network recovery aiming at having provisioning dependability and performance. This method is called MSPastry and is a new implementation of the Pastry for real environments which has consistent message routing. By assuming high dynamic rate in structured P2P networks, reference [15] has designed a failure recovery protocol and has evaluated its performance. Moreover the authors in [16] answer two issues: first how data structure can be built for routing in presence of faulty nodes and the second, how a secure message routing can be achieved? In [17] an efficient method for fault tolerance in hierarchical P2P systems is introduced. It proposes a so called multiple publication technique which normal peers connect to one or more SP (Super Peer) in other groups. When a normal peer finds out that its corresponding SP no longer works, it selects one of the other group’s SP as its new SP. The authors in [18] took advantage of BSP for applying fault tolerance in each group, i.e. whenever a SP fails, it is replaced by a BSP. The paper further proposed a scalable algorithm for assigning peers into groups, selecting SPs and maintaining overlay network. In [19] the authors used redundancy but their methodology is different from the one in [18] in that in each group some SPs form a virtual SP. Then node belonging to the virtual SP uses Round Robin (RR) to serve as a SP. [20] reports using of 2 layers for SP fault tolerance: first all peers, are organized in a flat layer and then peers with more resources arrange another overlay for managing other lessresource peers. In case of a SP failure, peers use flat layer instead of second overlay, because they are already organized in that layer and have necessary connections with other peers in that layer. However, delivery is not guaranteed.
3 Proposed Architecture for P2P-Based NMS Proposed network management system is a 3-tier hierarchical architecture. The layers from down to up are as follows: LLM (Low Level Manager), MLM (Mid Level Manager) and TLM (Top Level Manager). Peers of each layer are arranged in some groups. Note that the aforementioned nodes belong to NMS and are called manager nodes hereafter. The manager nodes are used for the sake of management of some other nodes called managed elements (end nodes) which are actually nodes in the network
244
M. Barshan, M. Fathy, and S. Yousefi
under management. LLM groups are in charge of collecting management data from managed elements. As shown in Fig.1, end nodes connect to their SPs through a star topology. Indeed due to the fact that they are managed devices they do not need to connect to each other in the NMS. This is not in contradiction to their ability to communicate with each other in the network under management. Furthermore, each end node has the address of a BSP for the sake of using it if necessary.
Fig. 1. Connectivity of end nodes and corresponding SP
In the proposed NMS architecture, each peer in LLM can be used as a BSP for other peers which act as SPs for end nodes. For example in Fig. 2, A′ is as a BSP for A and A′′ as a BSP for A′ . The reason why SP of each group is used as BSP of others is applying redundancy with minimum overhead. Thus there is no hardware overhead in this case. In each LLM group one SP and one BSP (the most powerful peers) are selected among all peers for making connection to upper tier (i.e. the MLM layer). The process of selecting SP and BSP nodes is out of scope of this paper. In each LLM group peers are connected to the selected SP and BSP through a star topology. It should be stressed that there is a direct link between a SP and BSP in these groups. This link is used to send alive messages in order to get informed of any failure. Fig. 2 gives a representation of an LLM group structure when the redundancy factor is 2.
Fig. 2. LLM layer intra-groups connectivity
MLM layer is formed by grouping SPs and BSPs of the LLM layer groups. The proposed dynamic topology for the MLM group is depicted in Fig. 3. As shown in the figure, SPs are connected through a degree-three Chordal ring and BSPs are connected to their SPs neighbors. It should be noted that in Fig. 3 each BSP is just in role of backup and does not have any responsibility in MLM functionality until its corresponding SP fails. When the SP fails, the BSP connects to MLM through neighbor of the failed SP. Thus during the time in which the SP is active, it is in charge of managing both LLM and
Fault-Tolerant Architecture for Peer to Peer Network Management Systems
245
Fig. 3. MLM layer intra-groups connectivity
MLM layers and during this whole period BSP just is in charge of collecting management data from its assigned end nodes. Fig. 4 shows set of connections between SPs and BSPs of LLM layers in the MLM layer. As shown in the figure, in addition to link between LLM and MLM layers some horizontal links are created among LLM groups (i.e. the groups in the same hierarchal level). These horizontal links shown by dashed lines in the figure, may be taken advantage in order to make possible more cooperation in the proposed NMS. Moreover, in this layer (i.e. MLM) no additional peers are used thus no hardware cost is imposed to the proposed NMS. It is worthy of mention that Management by Delegation (MbD) can be performed in MLM layer.
Fig. 4. Cooperation between LLM layer SPs and BSPs
In the MLM layer we use a set of SP nodes called VSPs (Virtual SP). Using a Round Robin method at any moment of the time (e.g. any failure time) one of the members of VSP set takes the responsibility of SP in each MLM group. The current SP node makes management decisions and links LLM layers to the TLM layer. Based on this organization some SPs take turn serving other peers. Note that as illustrated in Fig. 5 from the TLM point of view this set can be assumed such as one SP. At any moment of time the peer which is in charge of TLM and MLM connectivity is called RSP (Real SP). MLM groups are formed from SPs and BSPs of LLM layer for being in charge of LLM and TLM connectivity. One of the problems of considering static architecture here is increasing load in edge peers (peers between MLM and TLM layers). To alleviate this problem VSPs are used making dynamic connection with TLM. In fact using VSPs leads to distribution of management data transfer load among some more powerful SPs.
246
M. Barshan, M. Fathy, and S. Yousefi
Fig. 5. Method of connecting MLM to TLM
The topology of TLM layer, shown in Fig. 5, due to its importance and communicating with human administrators is considered Full-mesh. The whole TLM layer can be assumed as one virtual peer, as well. The peers of this layer are different from peers of MLM which were formed from SPs and BSPs of LLM groups. In fact, at first all peers are divided into two parts. One part is used to form LLM layer and then MLM from SPs and BSPs of LLM groups. So the other part is used to organize TLM layer. Finally, the schematic of the proposed architecture which is used in simulation is depicted in Fig. 6.
Fig. 6. A sample of hierarchical proposed architecture
In MLM layer of described architecture any SP/BSP failure is periodically being checked. When the BSP/SP realizes that its corresponding SP/BSP is failed, immediately reports it to TLM through a critical message (failure alarm). Then a source routing algorithm is executed in the TLM in order to update (reconfigure) the architecture through replacing the BSP in the architecture instead of the failed SP. In other words, due to absence of the SP, the replaced BSP takes the responsibility of linking LLM
Fault-Tolerant Architecture for Peer to Peer Network Management Systems
247
layer to the TLM layer. Note that this failure will be reported to human administration in TLM layer and repair process is launched afterward. Due to report of failure toward upper levels (i.e., from LLM and MLM toward TLM and in the case of MbD, from LLM toward MLM), there is no need to take k more than 2 (k denotes redundancy factor). In fact, the probability of coincident failure of a SP and BSP can be ignored.
4 Experimental Results Evaluation of the proposed architecture has been performed through the PeerSim simulator [21]. The simulated NMS is composed of 100 management peers (Fig. 6). The number of TLM peers in this scenario is considered 4 out of 100, while the total number of LLM members is considered 96. These 96 peers have been arranged in some 8-member groups. For each of these 12 groups one SP and one BSP have been selected and moved to MLM layer. 8-member groups are formed in this layer again. As a result the number of LLM members is 74 and the number of MLM members is 24. In each MLM group three SPs are selected as VSPs. Then the mentioned topology is applied to the whole structure and a kind of source routing with minimum latency is implemented to route messages from different elements of the architecture. Failure rates are taken different for each peer type. VSPs in MLM layer have lower failure rate in comparison with other nodes in MLM layer. Furthermore, SPs and BSPs have lower failure rate in comparison with LLM peers. In simulation the failure rate of LLM and TLM peers are assumed to be zero ( λ LLM = λTLM = 0, TLM layer because of Full-mesh topology and LLM layer due to the fact they are connected to the end nodes which are not included in the NMS). The times to failure for MLM peers are taken from exponential distribution with the failure rate (i.e. λMLM and λVSP ) given in each figure. It should be noted that our aim in λ MLM is failure rate of all MLM peers except VSPs. In order to evaluate the availability of proposed NMS architecture we measure percentage of available peers. The measured availability is in TLM's viewpoint. In other word if some peers are available but are not reachable by TLM they are considered as unavailable peers. The following figures are depicted from the average of 1000 experiments. In repairable case, after finishing repair time, the SP takes back its responsibility again. In this case until next failure (10 times of failure are assumed for each peer) the BSP has role just in LLM layer and no longer has responsibility in MLM layer. Moreover in this case, times to repair are considered to be fixed and within a specific period of time (500 time slots = 1/40 simulation time). Furthermore, the number of periodic alive checking is 500 and number of message sending for checking available peers is 100 for the whole of simulation time. The non-fault tolerant architecture which is used for comparison has the same assumption as the proposed architecture in terms of number of peers, number of groups and number of peers in each group. However it has the following differences: not using BSPs in LLM layer, not using VSPs in MLM layer and not using RR method for selecting RSP in each MLM group (indeed in this architecture VSP failure means RSP failure).Thus the topology is considered star in LLM layer (just with SPs), degree-three Chordal ring for MLM layer (without BSPs connections) and Full-mesh for TLM layer.
248
M. Barshan, M. Fathy, and S. Yousefi
The following figures are drawn based on repairable assumption to show the availability improvement in proposed architecture in comparison to the non-fault tolerant architecture (figure 7). We also intend to show the effects of different failure rates (figure 8) as well as sensitiveness of the proposed architecture in presence of VSP and MLM failures (figure 9). Note that in each figure the values of failure rates are mentioned. repairable case λ (MLM)=1/20000, λ (VSP)=1/20000 percentage of available peers
101 100 99 98 97 96 95 94 93
proposed architecure
92
non-fault tolerant architecture
91 0
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 Trial No.
Fig. 7. The proposed architecture availability versus the non-fault tolerant availability Table 1. Average, Min and Max values of the curves of Fig. 7 Repairable case Proposed arch. λ MLM =1/20000 λ VSP = 1/20000 Non-fault tolerant arch. λ MLM =1/20000 λ VSP = 1/20000
Avg
Max
Min
97.91
99.70
97.46
95.34
99.66
94.56
First we show the effectiveness of the proposed architecture in improvement of NMS's availability in Fig. 7. This figure is depicted assuming the same failure rate for MLM and VSP ( λVSP = λMLM = 1/20000) as well as the probability of no failure in TLM and LLM peers. It follows from the figure that NMS's availability in proposed architecture is noticeably improved (on average 97.91% versus 95.34%. Since 1% of availability is more important 2.57% increase on average) in comparison to the non-fault tolerant one. Next in Fig. 8 we aim at studying the effect of failure rate along with peer types on the NMS's availability. In Fig. 8(a) we assume no failure in other peers and vary failure rate of VSP nodes. As concluded from the figure by increasing failure rate the availability of the NMS deteriorates considerably (on average 98.5%, 97.1%, and 94.2% for different failure rates in the figure. i.e. a total of 2.147% decrease on average). In figure 8(b) the availability of NMS is shown for different failure rates of MLM nodes. As is illustrated in the figure it obeys the same trend of Fig. 8(a) unless it shows that effect of failure in VSP nodes is more considerable (on average 99.4%, 98.78%, and 97.65% for different failure rates in the figure. i.e. a total of 0.872% decrease on average) in comparison to other MLM nodes. This phenomena is also validate in Fig 8(c) in which the same failure rates of MLM and VSP nodes present quite noticeable difference (on average 99.39% versus 98.47%. i.e. 0.92% difference on average) in NMS's availability. In this figure other failure rates except ones mentioned are taken zero.
Fault-Tolerant Architecture for Peer to Peer Network Management Systems
repairable case
repairable case
100
percen tag e o f a va ilable p ee rs
percentage of available peers
101 99 98 97 96 95 94 93 proposed architecture, λ (VSP)=1/20000 proposed architecture, λ (VSP)=2/20000 proposed architecture, λ (VSP)=4/20000
92 91 90 0
249
100.50 100.00 99.50 99.00 98.50 98.00 97.50 97.00 96.50 96.00 95.50
proposed architecture, λ (MLM)=1/20000 proposed architecture, λ (MLM)=2/20000 proposed architecture, λ (MLM)=4/20000
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 Trial No.
Trial No.
8(a). Comparing different VSP failure rates
8(b). Comparing different MLM failure rates
Percentage of available peers
Repairable case 100.5 100 99.5 99 98.5 98 proposed architecture , λ (MLM)=1/20000
97.5
proposed architecture , λ (VSP)=1/20000
97 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 Trial No.
8(c). The effects of similar MLM and VSP failure rates Fig. 8. The effect of MLM and VSP failure rates on the NMS's availability Table 2. Average, Min and Max values of the curves of Fig. 8(a) Repairable case Proposed arch. λ VSP =1/20000 Proposed arch. λ VSP =2/20000 Proposed arch. λ VSP =4/20000
Avg
Max
Min
98.50
99.66
97.90
97.11
99.31
96.59
94.21
98.82
93.51
Table 3. Average, Min and Max values of the curves of Fig. 8(b) Repairable case Proposed arch. λ MLM =1/20000 Proposed arch. λ MLM =2/20000 Proposed arch. λ MLM =4/20000
Avg
Max
Min
99.40
99.95
99.27
98.78
99.88
98.59
97.65
99.68
97.16
Table 4. Average, Min and Max values of the curves of Fig. 8(c) Repairable case Proposed arch. λ MLM =1/20000 Proposed arch. λ VSP =1/20000
Avg
Max
Min
99.39
99.89
99.27
98.47
99.80
98.10
250
M. Barshan, M. Fathy, and S. Yousefi
Figure 9 evaluates the sensitivity of the proposed architecture in comparison to non-fault tolerant architecture to peer failure. For this purpose we measure the amount of availability decrease in response of some increase in failure rates. In figure 9(a) we increase the failure rate of VSP nodes from 1/20000 and 2/20000 and measure NMS's availability for our proposed architecture and non-fault tolerant one. Furthermore figure 9(b) shows the same phenomena in the presence of failure in MLM nodes. As it follows from the figures both VSP and MLM failures have less influence on the availability of the proposed architecture than non-fault tolerant architecture. For the failure rate numbers in the figure, the sensitivity of the proposed architecture in comparison to the non-fault tolerant one is 3.38 percent lower (for VSP failure) and 0.31 percent lower (for MLM failure). Repairable case 102
100
Percentage of available peers
Percentage of available peers
Repairable case 102
98 96 94 92 Proposed architecture , λ (VSP)=1/20000 Proposed architecture , λ (VSP)=2/20000 non-fault tolerant architecture , λ (VSP)=1/20000 non-fault tolerant architecture , λ (VSP)=2/20000
90 88 86
100 98 96 94 92 Proposed architecture , λ (MLM)=2/20000 Proposed architecture , λ (MLM)=4/20000 non-fault tolerant architecture , λ (MLM)=2/20000 non-fault tolerant architecture , λ (MLM)=4/20000
90 88
0
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 Trial No.
0
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 Trial No.
9(b) Sensitivity to MLM failure
9(a) Sensitivity to VSP failure
Fig. 9. Comparing the sensitivity of the proposed architecture and non-fault tolerant architecture to VSP and MLM failures Table 5. Average, Min and Max values of the Table 6. Average, Min and Max values of curves of Fig. 9(a) the curves of Fig. 9(b) Repairable case Proposed arch. λ VSP =1/20000 Proposed arch. λ VSP =2/20000 Proposed arch. λ VSP =4/20000 Proposed arch. λ VSP =8/20000 Non-fault tolerant arch.
λ VSP =1/20000 Non-fault tolerant arch.
λ VSP =2/20000 Non-fault tolerant arch.
λ VSP =4/20000 Non-fault tolerant arch.
λ VSP =8/20000
Avg
Max
Min
97.86
99.67
97.36
96.32
99.46
95.64
93.47
98.80
92.67
88.59
97.49
86.94
95.33
99.46
94.51
Non-fault tolerant arch.
91.88
99.38
90.75
Non-fault tolerant arch.
Repairable case Proposed arch. λ MLM =1/20000 Proposed arch. λ MLM =2/20000 Proposed arch. λ MLM =4/20000 Proposed arch. λ MLM =8/20000 λ MLM =1/20000 λ MLM =2/2000
85.76
98.59
84.35
Non-fault tolerant arch.
75.92
97.56
73.16
Non-fault tolerant arch.
λ MLM =4/20000 λ MLM =8/20000
Avg
Max
Min
97.88
99.61
97.30
97.30
99.54
96.73
96.00
99.39
95.42
93.96
99.03
93 .16
95.33
99.56
94.52
94.61
99.54
93.74
93.01
99.54
92.14
90.49
99.06
89.31
5 Conclusion Due to weakness points of traditional client/server based network management systems, we aimed at designing a high available NMS based on Peer to Peer paradigm. In
Fault-Tolerant Architecture for Peer to Peer Network Management Systems
251
this paper we proposed a self-reconfigurable and self fault-managed architecture by applying fault tolerant property. Our main focus is to offer high availability with minimum overhead contrary to costly redundancy mechanisms. For this purpose we use multi-role peers where each node in addition to its normal role plays the role of back up for other selected nodes. Thus, no peer redundancy is added except of some software redundancy which is easily tolerable by today's advance NMS peers. Experimental results show the effectiveness of the proposed architecture in improvement of NMS's availability in comparison to a non-fault tolerant NMS. It also offers less sensitivity to a node failure. We further examined the effect of failure of nodes in different level of hierarchy in availability of the proposed NMS. For future work we intend to study availability of the proposed fault-tolerant architecture from an analytical point of view.
Acknowledgment This work is supported by the Iran Telecommunication Research Center (ITRC).
References 1. Choi, M.-J., Won-Ki Hong, J.: Towards Management of Next Generation Networks. IEICE Trans. Commun. E90–B(11), 3004–3014 (2007) 2. Li, M., Sandrasegaran, K.: Network Management Challenges for Next Generation Networks. In: Proceedings of the IEEE Conference on Local Computer Networks 30th Anniversary (LCN 2005), Sydney, Australia (2005) 3. Cassales Marquezan, C., Raniery Paula dos Santos, C., Monteiro Salvador, E., Janilce Bosquiroli Almeida, M., Luis Cechin, S., Zambenedetti Granville, L.: Performance Evaluation of Notifications in a Web Services and P2P-Based Network Management Overlay. In: 31st Annual IEEE International Computer Software and Applications Conference (COMPSAC 2007), Beijing, china, vol. 1, pp. 241–250 (2007) 4. Menascé, D.A., Almeida, V.A.F., Dowdy, a.W.: Performance by Design: Computer Capacity Planning by Example. Prentice Hall PTR, Englewood Cliffs (2004) 5. Stallings, W.: Cryptography and Network Security Principles and Practices, 4th edn. Prentice Hall, Englewood Cliffs (2005) 6. Avižienis, A., Laprie, J.-C., Randell, B., Landwehr, C.: Basic concepts and taxonomy of dependable and secure computing. IEEE Transactions on Dependable and Secure Computing 1(1), 11–33 (2004) 7. State, R., Festor, O.: A Management Platform Over Peer-to-Peer Service Infrastructure. In: Proceedings of 10th International Conference on Telecommunications, ICT 2003, Vancouver, BC, Canada, pp. 124–131 (2003) 8. Binzenhöfer, A., Tutschku, K., Graben, B.a.d., Fiedler, M., Arlos, P.: A P2P-based framework for distributed network management. In: Cesana, M., Fratta, L. (eds.) Euro-NGI 2005. LNCS, vol. 3883, pp. 198–210. Springer, Heidelberg (2006) 9. Brunner, M., Galis, A., Cheng, L., Colás, J.A., Ahlgren, B., Gunnar, A., Abrahamsson, H., Szabó, R., Csaba, S., Nielsen, J., Schuetz, S., Prieto, A.G., Stadler, R., Molnar, G.: Towards ambient networks management. In: Magedanz, T., Karmouch, A., Pierre, S., Venieris, I.S. (eds.) MATA 2005. LNCS, vol. 3744, pp. 215–229. Springer, Heidelberg (2005)
252
M. Barshan, M. Fathy, and S. Yousefi
10. Llopis, P.A., Frints, M., Abad, D.O., Ordás, J.G.: Madeira: A peer-to-peer approach to network management. In: The Wireless World Research Forum (WWRF), Shanghai, China (April 2006) 11. Zambenedetti Granville, L., Moreira da Rosa, D., Panisson, A., na Melchiors, C., Janilce Bosquiroli Almeida, M., Margarida Rockenbach Tarouco, L.: Managing Computer Networks Using Peer-to-Peer Technologies. IEEE Communications Magazine (October 2005) 12. Rodrigues, R., Liskov, B.: High availability in dHTs: Erasure coding vs. Replication. In: Castro, M., van Renesse, R. (eds.) IPTPS 2005. LNCS, vol. 3640, pp. 226–239. Springer, Heidelberg (2005) 13. Chen, G., Qiu, T., Wu, F.: Insight into redundancy schemes in DHTs. The Journal of Supercomputing 43(2), 183–198 (2008) 14. Castro, M., Costa, M., Rowstron, A.: Performance and dependability of structured peer-topeer overlays. In: IEEE, Proc. Dependable Systems and Networks (DSN 2004), June 2004, pp. 9–18 (2004) 15. Lam, S.S., Liu, H.: Failure Recovery for Structured P2P Networks: Protocol Design and Performance Evaluation. In: SIGMETRICS/Performance 2004. ACM, New York (2004) 16. Hildrum, K., Kubiatowicz, J.: Asymptotically Efficient Approaches to Fault-Tolerance in Peer-to-Peer Networks. In: Fich, F.E. (ed.) DISC 2003. LNCS, vol. 2848, pp. 321–336. Springer, Heidelberg (2003) 17. Lin, J.-W., Yang, M.-F., Tsai, J.: Fault Tolerance for Super-Peers of P2P Systems. In: Proceedings of the 13th Pacific Rim International Symposium on Dependable Computing (PRDC), pp. 107–114 (2007) 18. Garcés-Erice, L., Biersack, E.W., Felber, P., Ross, K.W., Urvoy-Keller, G.: Hierarchical peer-to-peer systems. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 1230–1239. Springer, Heidelberg (2003) 19. Yang, B., Garcia-Molina, H.: Designing a Super-Peer Network. In: Proceedings of the 19th International Conference on Data Engineering, Bangalore, India, March 2003, pp. 49–62 (2003) 20. Panisson, A., Moreira da Rosa, D., Melchiors, C., Zambenedetti Granville, L., Janilce Bosquiroli Almeida, M., Margarida Rockenbach Tarouco, L.: Designing the Architecture of P2P-Based Network Management Systems. In: IEEE Symposium on Computers and Communications (ISCC 2006), Pula-Cagliari, Sardinia, Italy (2006). 21. PeerSim: A Peer-to-Peer Simulator, http://peersim.sourceforge.net/