International Conference on Emerging Trends in Computer and Electronics Engineering (ICETCEE'2012) March 24-25, 2012 Dubai
Strong Consensus in Cloud Computing Farah Habib Chanchary and Samiul Islam
Over a long period of time, two standard protocols Byzantine Agreement (BA) problem and consensus problem have been extensively studied and used for fully connected network, broadcasting network and generalize connected network. Since no standard topology is available for cloud systems and this issue demands more careful consideration on how cloud computing resources can be managed with minimum effort. In this paper authors focuses on the application of strong consensus protocol in a proposed cloud topology so all processing nodes can reach an agreement with minimum round of message exchange. The rest of this paper is organized as follows. Section II introduces strong consensus problem and its conditions. Section III presents related works done by others. Section IV describes the cloud topology used in this study. In section V authors discussed the proposed protocol and its properties, behaviors of faulty nodes and allowable number of faulty nodes. Section VI illustrates complete process of the protocol with an example. Conclusions are presented in section VII.
Abstract – In the context of distributed system it is very important that there exists a common agreement among all participating nodes of the system. Over a long period of time, Byzantine agreement (BA) and consensus problem have been used in a fully connected network, broadcasting network and generalized connected network for achieving the goal of agreement. These agreement protocols maintain a set of rules to allow the healthy nodes of the distributed system to agree on a common value to aid the reliable execution of tasks. Recently cloud computing has emerged as a new paradigm of large-scale distributed computing. It refers to the use and access of multiple server-based computational resources via a digital network where users may access the server resources using any kind of computing devices. In order to achieve successful cloud storage management, massive data processing and resource scheduling, a cloud computing infrastructure often requires that all processing nodes of the underlying network should reach a common agreement. In this paper, we focused on the application of strong consensus problem on the cloud computing infrastructure to ensure smoother communication among all the healthy nodes in the distributed system to reach a common consensus with minimum round of message exchange. Keywords – Arbitrary Fault, Cloud Computing, Manifest Fault, Secure Transmission, Strong Consensus Protocol, Symmetric Fault.
II. STRONG CONSENSUS PROTOCOL I.
INTRODUCTION
Pease, Shostak and Lamport first proposed and solved the BA problem in [1]. Consensus problems and BA are two closely related fundamental problems in agreement on a common value in distributed system. A variant of distributed consensus, called the strong consensus (SC) problem, was introduced by G. Neiger [2]. The SC problem assumes the followings, 1. There are n processors in the synchronous network of which at most F processors are subjected to Byzantine fault. 2. Each processor P starts with an initial value v i , that belongs to a finite set, V, of all possible values (|V| = m). 3. Each fault free processor sends the initial value v i to all other processors. 4. On receipt of the value v i , each processor exchanges the received value with other processors. 5. In addition, there is an adversary that controls up to F (n > max{mF, 3F}) of the processors and can arbitrarily deviate from the designated protocol specification. 6. After (n-1)/max{m,3}+1 rounds of messageexchange, a common value can be obtained.
C
LOUD computing system is an Internet based system development in which large scalable computing resources are provided “as a service” over the Internet to users. In cloud system, applications are provided and managed by the cloud server and data is also stored remotely in the cloud configuration. In most of the cases, cloud computing infrastructures consist of services delivered through parallel or distributed data centers, which appear to consumers as a single point of access for their computing needs. For successful execution of users' requests a cloud infrastructure needs excellent teamwork by several clusters of computing nodes and servers those can be originally located in different places and it is expected that all nodes in a cluster exchange information with one another. It is very common that some of these components suffer from faulty symptoms and act maliciously. So it is required that a common agreement should be established among all fault-free computing nodes prior to executing some tasks.
Farah Habib Chanchary is with Department of Computer Science, Najran University, Najran, Kingdom of Saudi Arabia (e-mail:
[email protected]). Samiul Islam is with Department of Computer Science, Najran University, Najran, Kingdom of Saudi Arabia (e-mail:
[email protected]).
The protocol for the SC problem is to enable all fault-free processors to obtain a common value. After execution of the protocol, the common value obtained by the fault-free processors shall be the value that satisfies the following 36
International Conference on Emerging Trends in Computer and Electronics Engineering (ICETCEE'2012) March 24-25, 2012 Dubai
conditions: (1) Agreement: All fault-free processors agree on the same common value v; (2) Strong Validity: v is the initial value of some fault-free processors or the default value (the default value is predefined).
responsible for managing one or more services requested by users. Level P consists of total m number of clusters t j , 1≤j≤m where each of these clusters have processors p k ∈t j , 4≤k≤p where p is the total number of processors in t j cluster. Each cluster is associated with a server depending on the type of processing performed by them. A server can have one or more dedicated clusters for evaluating user’s requests. All processors of a cluster are connected to other processors of the same cluster. Multiple transmission media have been used to connect the central node C with servers of level S and processors of all clusters for reliable communication. Both level S and level P may have nodes with different faulty status. Each node in the network can be uniquely identified but a node does not know the faulty status of other nodes in the underlying network.
III. RELATED WORKS Though the applications of BA and Consensus have been practiced in generalized network for last few decades, they have not been studied for particularly cloud computing so far by many people. In 2002, Hsiao, Chin and Yang examined SC problem in general network, in which both processors and communication links may be subjected to different faulty types simultaneously and the network topology need not to be fully connected [3]. In 2009, Chien, Wang and Liang presented a paper focusing on Consensus problem in a combined wired/wireless network where the servers, processors and communication links all components can be faulty [4]. They introduced a hierarchical concept in their model where most of the communication and computation overhead were supposed to be fulfilled by the consensus servers. In 2009, Yan, Wang and Huang [5], [6] proposed a network topology for cloud computing having two levels where the transmission media which connects nodes of the same level as well as nodes of different levels can be faulty and introduced Dual Consensus Protocol which solves the consensus with malicious and dormant transmission media. Same authors suggested a protocol for the agreement problem for a cloud computing with fallible processes [7]. Reference [8] explored this idea by including both faulty media and nodes. In [9] authors proposed a cloud topology with a central node for better control of agreement and employed Consensus protocol on it. This paper tries to explore the effect of SC on a similar cloud topology and finds the number of allowable faulty nodes with minimal round of message exchange.
The system model is as follows, 1. C receives user’s requests and sends to all servers of level S. 2.
All servers of level S communicate among themselves and all fault-free servers decide unanimously which server is responsible for a particular request and the set of clusters associated with it.
3.
This information is passed to C and C directs the requests to clusters those are dedicated for the job.
IV. CLOUD TOPOLOGY The cloud topology used in this paper uses a hierarchical concept [fig.1] [9]. This network is assumed to be reliable and synchronous. In the distributed system of cloud computing, nodes (servers and processors) are connected to the Internet and provide computation, software, data access, and storage services to the end users. Clients may use any computing devices and network interfaces to access and send requests to cloud service providers. To meet different support system the cloud topology requires a set of servers and one or more clusters of processors associated with each server. In the proposed topology a user’s requests will be received by a central node C with an assumption that C is a fixed, fault free node with high bandwidth. All servers and processors are organized in two levels. Level S consists of a set of servers s i , 4≤i≤s where s is the total number of servers. These are powerful devices with high computing ability, storage spaces and memory. All nodes of level S are connected to other nodes of the same level through reliable transmission media. Each of these servers is
Fig. 1 Cloud computing topology
V. PROPOSED PROTOCOL This paper considers the following behaviors of faulty nodes, strong consensus protocol (SCP) and its modified properties, and restrictions on the number of allowable faulty nodes.
37
International Conference on Emerging Trends in Computer and Electronics Engineering (ICETCEE'2012) March 24-25, 2012 Dubai
A. Behaviors of Faulty Nodes:
Then C passes v c to all processors p k of the selected clusters. By this way, all processors p k receives their initial value v c and follow SCP for the agreement. The total process can be structured as follows.
This investigation assumes that only the nodes can be faulty and not the communication link. All nodes may have any one of three symptom, they are 1) manifest fault 2) symmetric fault and 3) arbitrary fault. Dormant faulty nodes can show crash failure property or omission failure property. In crash failure, a node stops executing prematurely and shows no more activities. In manifest failure a node omits or delays to send and receive messages thus produces missing messages. A symmetric faulty node sends the same wrong value to all processors and an arbitrary faulty node may send different wrong values to different processors. The last two types may work in coordination with other faulty nodes to prevent faultfree nodes from reaching a common value [3].
a)
b) All servers of level S follow message passing algorithm SCP using STP-P1 to come to an agreement, and obtain a common value v c . v c is transferred to C using STP-P2. c)
The proposed Strong Consensus Protocol (SCP) will be executed in both level S and level P and nodes of both levels need to exchange messages. To maintain security, all nodes of the cloud network topology transmit values using Secure Transmission Protocol (STP) [9] which (a) identifies manifest faulty nodes and (b) provides a secure communication among C, servers of level S and processors of level P. STP has two phases P1 and P2. P1 will be used by nodes of level S and nodes of level P for internal communication only. P2 will be used by central node C to communicate with level S and level P. Descriptions of STP-P1 and STP-P2 are as follows.
•
C passes v c to designated clusters t j of level P using STP-P2 so that processor p k ∈t j can accomplish a specific task requested by a user.
d) All processors p k ∈t j execute SCP starting with v c as their initial value and exchanging messages among each other using STP-P1 to obtain consensus value v.
B. Properties of Strong Consensus Problem:
•
The central node C receives a user request and initiate the process by passing the initial value v i ∈V to all servers of level S using STP- P2.
All servers and processors in the network have to undergo two phases in SCP, namely Message Exchanging (ME) phase and Agreement Making (AM) phase. ME phase collects messages exchanged by each node in every round and AM phase computes the common value with a majority voting system. I. Message Exchanging phase• Decide the number of rounds required by message exchanging phase according to (1).
STP-P1: Symmetric cryptographic system, Advanced Encryption Standard (AES) will be used to encrypt and decrypt delivered messages by a shared key. Both sender and receiver will know the shared key in advance. A message sent by a fault-free node can be correctly decrypted by the receiver using the common secret key. A message sent by a manifest faulty node cannot be decrypted by the symmetric key hence a manifest fault can be identified.
γ = ( n − 1) max{m,3} + 1
(1)
• For round r = 1 do the following steps. o Create a tree data structure with root R in the level 0 and set val(R)=null. o Each node transmits its initial value v i to all other nodes of the same cluster excluding itself. o Each node receives n-1 values from other nodes and stores them in the level 1 of the tree.
STP-P2: In order to make the communication among nodes of different levels secure and safe over Internet, a digital signature will be used to encode the common value transmitted from level S to P via C. Since we assume that the central node C is fault-free, a value sent by C to level S or to level P will always be identified correctly by decoding it with its digital signature.
o For the manifest faulty server, initial value transmitted from it will be replaced by φ value and all fault-free servers ignore the messages received from it in every subsequent message exchange round. •
For round 2 ≤ r ≤ γ do the following steps. o Each node transmits the value of level (r-1) to all other nodes excluding itself and repeating nodes.
Once C receives a user request, it initiates SCP by proposing and transferring an initial value v i ∈V to all servers (|V| = m). All non-faulty servers of level S follows message passing algorithm SCP to come to an agreement. In SCP, every node transmits its initial value to all other nodes excluding itself and receives values from other nodes and eventually agrees on a common value v c . v c should be the same value proposed by C or a predefined default value ∂ in case a common value cannot be obtained. C receives the common value v c from level S and cluster number t j , |j|≥1.
o Each node receives (n-r) values from other nodes and stores them in the level r of the tree. o If the value at level (r-1) is φ, then all value transmitted from it will be replaced by φ value.
38
International Conference on Emerging Trends in Computer and Electronics Engineering (ICETCEE'2012) March 24-25, 2012 Dubai
II. Agreement Making phase•
VI. EXAMPLE This section illustrates the working procedure of SCP in level S with a sample cloud setup and a set of values. We assume that in a sample cloud environment, level S contains 8 servers s i where 1≤i≤8 and these servers are connected to central server C with reliable communication media. Server s 2 , s 4 and s 7 suffer with arbitrary, symmetric and manifest fault respectively. All other servers are fault-free. The set of values used in this example is {0, 1, 2}. Initially C receives a user request and it transmits initial value 1 to all servers. The values received by each of these servers are shown in table 1. Since s 4 is a symmetric faulty server, it will send same wrong value to all other servers. Suppose s 4 sends the value 2. On the other hand, s 2 is an arbitrary faulty server so it will try to mislead other fault-free servers to come to an agreement by sending different values to different servers. Figure 2 shows values passed by s 2 to other nodes in all rounds of SCP.
Majority value MAJ will be calculated for each tree represented by a node according to the following steps. Begin o
If x is a leaf then MAJ(x)=val(x).
o
Else Begin let y be the most common value of MAJ(αx), for all child x of vertex α stored at depth i in the tree; • let w be the number of copies of the value y; • let z = max{m,3}*(r−i+1) + [(n−1) mod h]; if w≥ z and y = φ then output val(α) else if v ≠ φ then output v else if v = φ then output φ End •
o
TABLE I INITIAL VALUES OF SERVERS
If majority value does not exist then result is default value ∂.
s
s2
1
End
1
Hence, two properties of Consensus problem in cloud computing environment are modified as follows, 1) Consensus: All fault-free servers and all fault-free processors associated with a fault-free server agree on a common value; 2) Validity: If the central node is fault-free and proposes an initial value v i then all fault-free nodes (both servers and processors) agree on v i .
s
s
3
1
1
1
1 2
S2
0
S5
1 S6
Manifest-faulty server Symmetric-faulty server Arbitrary-faulty server
Fig. 2 Server s 2 transmits different values to other servers.
s a =0
This example uses 8 servers and 3 choices of initial values {0, 1, 2}, so SCP requires 3 rounds of message exchange phases (3=(8-1)/3+1). Figure 3, 4 and 5 show trees generated by s 1 in round 1, 2 and 3 respectively. In the agreement making phase, s 1 takes the majority value from each of its vertices starting from leaves to root and presented in table 2. Similar process will be followed by all other servers. Once all servers agree on a common value, each server will transmit its common value to C by using STP-P2. C can uniquely identify each value transmitted by these servers, which server is responsible for this particular user request and the number of clusters of processors will be required for executing the process.
s > max{(s − 1) / 3 + m(s a + s s ), (s − 1) / 3 + 2(s a + s s )} if s m =0 and s a ≤ (s − 1) / 3 (4)
For level P the constraint is (5) if p a =0 (6)
p > max{( p − 1) / 3 + m( p a + p s ), ( p − 1) / 3 + 2( p a + p s )}if
p m =0 and p a ≤ ( p − 1) / 3
1
Fault-free server
(3)
p > max{( p − 1) / 3 + mp s + p m , ( p − 1) / 3 + mp s + p m }
1
0
S7
(2)
p > max{m( p a + p s ) + p m ,3( p a + p s ) + p m }
s8
S4 0
In consensus problem the total number of allowable faulty nodes depends on the total number of nodes in the underlying network. Since the proposed cloud network has two separate levels of nodes, we will have two constraints for allowable faulty nodes. For level S the constraint is -
if
1
S3
S8
s 7
S1
C. Restrictions on the number of allowable faulty nodes:
s > max{(s − 1) / 3 + ms s + s m , (s − 1) / 3 + ms s + s m }
s6
5
1
2
s > max{m(s a + s s ) + s m ,3(s a + s s ) + s m }
s
4
(7)
39
International Conference on Emerging Trends in Computer and Electronics Engineering (ICETCEE'2012) March 24-25, 2012 Dubai L0
L0
L1
val(S1) = 1 val(R) = null S1 tree val(S2) = 0 val(S3) = 1 val(S4) = 2 val(S5) = 1
L1
S5 tree
L0
val(S1) = 1 val(S2) = 0 val(S3) = 1 val(S4) = 2 val(S5) = 1 val(S6) = 1 val(S7) = φ val(S8) = 1
L0
L1
val(S5) = 0
val(S1) = 2 S4 tree val(S2) = 2 val(S3) = 2 val(S4) = 2 val(S5) = 2
val(S6) = 1 val(S7) = φ val(S8) = 0
val(S6) = 1 val(S7) = φ val(S8) = 1
val(S6) = 2 val(S7) = φ val(S8) = 2
L1
S6 tree
L1
val(S1) = 1 S3 tree val(S2) = 1 val(S3) = 1 val(S4) = 2 val(S5) = 1
val(S1) = 0 S2 tree val(S2) = 1 val(S3) = 1 val(S4) = 2
val(S6) = 1 val(S7) = φ val(S8) = 1 L0
L0
L1
L0
val(S1) = 1 val(S2) = 1 val(S3) = 1 val(S4) = 2 val(S5) = 1
L1
S7 tree
L0
val(S1) = φ val(S2) = φ
val(S1) = 1 val(S2) = 0 val(S3) = 1 val(S4) = 2 val(S5) = 1
val(S6) = φ val(S7) = φ val(S8) = φ
val(S6) = 1 val(S7) = φ val(S8) = 1
Level 3 MAJ(0, 2, 0, 0, φ, 0) MAJ(1, 2, 1, 1, φ, 1) MAJ(2, 2, 2, 2, φ, 2) MAJ(0, 1, 2, 1, φ, 1) MAJ(1, 1, 2, 1, φ, 1) MAJ(2, φ, 2, φ, φ, φ) MAJ(0, 1, 2, 1, 1, φ)
L1
S8 tree
val(S3) = φ val(S4) = φ val(S5) = φ
TABLE II DECISION MAKING PHASE FOR S 1 .
L1
L2
L1
Val(R) = null S1 tree
L1
L2
val(S6) = 1 val(S7) = φ val(S8) = 1
L2
VII.
L2
Val(S5) val(S5S1) = 1 =1 val(S5S2) = 0 val(S5S3) = 1 val(S5S4) = 2
Val(S7) val(S7S1) = φ = φ val(S7S2) = 2 val(S7S3) = φ val(S7S4) = φ
val(S3S6) = 1 val(S3S7) = φ val(S3S8) = 1
val(S5S6) = 1 val(S5S7) = φ val(S5S8) = 1
val(S7S5) = φ val(S7S6) = φ val(S7S8) = φ
Val(S2) val(S2S1) = 0 =0 val(S2S3) = 1 val(S2S4) = 2 val(S2S5) = 0 val(S2S6) = 1
Val(S4) val(S4S1) = 2 =2 val(S4S2) = 2 val(S4S3) = 2 val(S4S5) = 2 val(S4S6) = 2
Val(S6) val(S6S1) = 1 = 1 val(S6S2) = 1 val(S6S3) = 1 val(S6S4) = 2 val(S6S5) = 1
Val(S8) val(S8S1) = 1 = 1 val(S8S2) = 0 val(S8S3) = 1 val(S8S4) = 2
val(S2S7)= φ val(S2S8) =0
val(S4S7)= φ val(S4S8) =2
val(S6S7) = φ val(S6S8) = 1
val(S8S6) = 1 val(S8S7) = φ
Val(S1) val(S1S2) = 0 =1 val(S1S3) = 1 val(S1S4) = 2 val(S1S5) = 1
Val(S3) val(S3S1) = 1 =1 val(S3S2) = 1 val(S3S4) = 2 val(S3S5) = 1
val(S1S6) = 1 val(S1S7) = φ val(S1S8) = 1
L0
L1
val(S8S5) = 1
L2 L2 L3 L3 L3 Val(S1S4) val(S1S4S2) = 2 Val(S1S6) val(S1S6S2) = 1 Val(S1S2) val(S1S2S3) = 0 val(S1S4S3) = 2 val(S1S6S3) = 1 =1 =1 val(S1S2S4) = 2 =1 val(S1S4S5) = 2 val(S1S6S4) = 2 val(S1S2S5) = 0 val(S1S4S6) = 2 val(S1S6S5) = 1 val(S1S2S6) = 0 val(S1S4S7) = φ val(S1S6S7) = φ val(S1S2S7) = φ val(S1S4S8) = 2 val(S1S6S8) = 1 val(S1S2S8) = 0
val(S1S3S8) = 1
CONCLUSION
REFERENCES [1]
L2
Val(S1S3) val(S1S3S2) = 1 val(S1S3S4) = 2 =1 val(S1S3S5) = 1 val(S1S3S6) = 1 val(S1S3S7) = φ
1
This paper deals with strong consensus problem, a variant of agreement protocol, in context of cloud computing topology. Cloud computing is a new concept of distributed system where all nodes must reach a common agreement for processing tasks. This study combined the idea of strong consensus protocol with secured transmission protocol and proposed SCP protocol to enhance the security and communication ability of participating nodes of a cloud topology in the agreement process with minimum round of message exchange and maximum number of allowable faulty nodes.
Fig. 4: Trees generated by s 1 in round 2 of SCP.
Val(R) = null S1 tree
MAJ(0, 1, 2, 1, 1, 2, 1)
Level 1
In the next step, C will transmit the common value v c to all processors p k of cluster t j and processors of level P will start processing the phases of SCP. Eventually all processors will go through ME and AM phases of SCP and come to an agreement.
Fig. 3: Trees generated by all servers in round 1 of SCP. L0
Level 2 0 1 2 1 1 2 1
Val(S1S5) val(S1S5S2) = 0 val(S1S5S3) = 1 =1 val(S1S5S4) = 2 val(S1S5S6) = 1 val(S1S5S7) = φ
Val(S1S7) val(S1S7S2) = 2 = φ val(S1S7S3) = φ val(S1S7S4) = 2 val(S1S7S5) = φ val(S1S7S6) = φ
val(S1S5S8) = 1
val(S1S7S8) = φ
[2] [3]
[4]
[5]
[6]
L2
L3 Val(S1S8) val(S1S8S2) = 0 val(S1S8S3) = 1 =1 val(S1S8S4) = 2 val(S1S8S5) = 1
[7]
val(S1S8S6) = 1 val(S1S8S7) = φ
[8]
Fig. 5 Trees generated by s 1 in round 3 of SCP.
[9]
40
M. Pease, R. Shostak, and L. Lamport, "Reaching Agreement in the Presence of Faults", Journal of the Association for Computing Machinery, Vol. 27, No 2, April 1980, pp 228-234. G. Neiger, “Distributed consensus revisited,” Information Processing Letter, Vol. 49, pp. 195-201, 1994. H. S. Hsiao, Y. H. Chin and W. P. Yang, "Reaching Strong Consensus in a General Network", Journal of Information Science and Engineering, pp 601-625, 2002. C. F. Cheng, S. C. Wang and T. Liang, "Investigation of Consensus Problem over Combined Wired/Wireless Network", Journal of Information Science and Engineering 25, 2009, pp 1267-1281. K.Q. Yan, S.C. Wang, S.S. Wang and C.P. Huang, "Revisit Consensus with Dual Fallible Communication in Cloud Computing", Proceedings of the International Multi Conference of Engineers and Computer Scientists, Vol. I, March 18 - 20, Hong Kong, 2009. S. C. Wang, K. Q. Yan, and C. P. Huang, "Consensus Under Cloud Computing Environment within Malicious Faulty Transmission Media", The E-Learning and Information Technology Symposium, Tainan, Taiwan, 1 April, 2009. S. C. Wang, S. S. Wang, K.Q. Yan and C. P. Huang, "The Anatomy Study of Fallible Processes Agreement for Cloud Computing", International Journal of Advanced Information Technologies (IJAIT), Vol. 4, No. 2, December 2010. S. C. Wang, S. S. Wang, K. Q. Yan, and C. P. Huang, "The New Territory of Generalized Agreement in a Cloud Computing Environment", Journal of Computers Vol.21, No.1, April 2010. F. H. Chanchary and S. Islam, "Challenges of Using Consensus Protocol in Cloud Computing", First Taibah University International Conference on Computing and Information Technology, to be published.