Group-based Dynamic Computational Replication ... - CiteSeerX

5 downloads 1099 Views 509KB Size Report
Email: {lotieye, hwang}@disys.korea.ac.kr. †. IT R&D Center ... of the properties of volunteer group such as availability, credibility, and volunteering service time.
Group-based Dynamic Computational Replication Mechanism in Peer-to-Peer Grid Computing SungJin Choi∗ , MaengSoon Baik† , JoonMin Gil‡ , ChanYeol Park§ , SoonYoung Jung¶ , and ChongSun Hwang∗ ∗ Dept.

of Computer Science & Engineering, Korea University 5-1 Anam-dong, Seongbuk-gu, Seoul 136-713, Republic of Korea Email: {lotieye, hwang}@disys.korea.ac.kr † IT R&D Center, SAMSUNG SDS Email: [email protected] ‡ Dept. of Computer Science Education, Catholic University of Daegu Email: [email protected] § Supercomputing Center, Korea Institute of Science and Technology Information(KISTI) Email: [email protected] ¶ Dept. of Computer Science Education, Korea University Email: [email protected] Abstract A peer-to-peer grid computing is complicated by heterogeneous capabilities, failures, volatility, and lack of trust because it is based on desktop computers at the edge of the Internet. In order to improve the reliability of computation and gain better performance, a replication mechanism must adapt to these distinct features. In other words, it is required to classify volunteers into groups that have similar properties and then dynamically apply different replication algorithms to each group. However, existing mechanisms do not provide such a replication mechanism on a per group basis. As a result, they cause a high overhead and poor performance. To solve the problems, we propose a new group-based computational replication mechanism to adapt to a unstable, untrusted, dynamic peer-to-peer grid computing environment. Our mechanism can reduce the number of redundancy and therefore complete many tasks by adaptively replicating computations on the basis of the properties of volunteer group such as availability, credibility, and volunteering service time.

I. I NTRODUCTION A grid computing system is a platform that provides the access to various computing resources owned by institutions by making virtual organization [4, 5]. On the other hand, a peer-to-peer grid computing system is a platform that achieves a high throughput computing by harvesting a number of idle desktop computers (which is called volunteers) owned by individuals at the edge of Internet using peer-to-peer computing technologies [1-11]. The peer-to-peer grid computing systems usually support embarrassingly parallel applications which consist of many instances of the same computation with their own data. The applications are usually involved with scientific problems, which need large amounts of processing capacity over long periods of time. Recently, there has been a rapidly growing interest in peer-to-peer grid computing systems because of the success of the most popular examples such as SETI@Home [1], distributed.net [2]. The peer-to-peer grid computing is complicated by heterogeneous capabilities, node failures, intermittent presence, and lack of trust [1-11]. A peer to peer grid computing is based on desktop computers at the edge of Internet, so volunteers have various properties (i.e., CPU power, network bandwidth, and latency). In addition, volunteers are exposed to link and crash failures. Moreover, volunteers are voluntary resource providers, so they can freely join and leave in the middle of the executions without any constraints. Accordingly, they have various volunteering times (i.e., the time of donation). In addition, a public execution (i.e., the execution of a task as a volunteer) is stopped arbitrarily. Furthermore, volunteers are not totally dedicated only to a peer-to-peer grid computing, so public executions get temporarily suspended by a private execution (i.e., the execution of a private job as a personal user). In this paper, we regard the unstable situations as volunteer autonomy failures because they lead to the delay

and blocking of the execution of tasks and even partial or entire loss of the executions. The volunteer autonomy failures occur more frequently than in a grid computing environment because a peer-to-peer grid computing system is based on dynamic desktop computers. Volunteers have different occurrence rate of volunteer autonomy failures depending on their computation behavior. Finally, some malicious volunteers tamper with the computation and then return corrupted results. In such an environment, replicating computation is important not only to tolerate those failures and erroneous results, but also to improve better performance. However, existing replication mechanisms do not adapt to highly dynamic, unstable, untrusted peer-to-peer grid computing environment. In addition, they do not apply different replication strategies depending on the various properties of volunteers. Moreover, they do not provide group-based replication mechanisms. As a result, they suffer from high overhead and performance degradation. In this paper, we propose a new group-based dynamic computational replication mechanism in a peer-to-peer grid computing environment. Our proposed mechanism adaptively replicates computation on the basis of volunteers’ properties such as volunteer availability, volunteer credibility, and volunteering service time. To achieve this, we classify volunteers and construct volunteer groups. Our proposed mechanism apply different replication algorithms to volunteer groups at the same time depending on their properties. In other words, it dynamically adjusts the number of redundancy, and selects replica (i.e., volunteers that execute the replicated computations) according to the respective volunteer group properties. As a result, our replication mechanism reduces the number of redundancy and therefore completes more tasks. The rest of the paper is structured as follows. Section II overviews related replication approaches and motivation. Section III describes our group-based dynamic computational replication mechanism in details. Section IV presents simulation results. Section V concludes the paper. II. BACKGROUND AND M OTIVATION A. Replication Approaches Replication is a well-known technique to improve reliability and performance in distributed systems [20, 21]. Some studies have been made on replication in a grid computing environment or a P2P computing environment [12-18]. Li and Mascagni [12] proposed computational replication to improve performance in a large-scale computational grid. They propose how to determine the number of task replicas to meet the performance goals on the basis of node and network failure rates. Kondo et al. [13] proposed computational replication (i.e., duplication and timeout mechanisms) in a desktop grid computing environment. With the duplication mechanism, each task is replicated by the maximum number of redundancy. With the timeout mechanism, each task is replicated if its result is not returned within a predefined timeout. Ranganathan and Foster [14] proposed dynamic replication strategies in a data grid environment. They provide six different strategies to replicate large amounts of data for the purpose of reducing bandwidth consumption and access latency. Ranganathan et al. [15] proposed dynamic model-driven replication to improve data availability in a large peerto-peer communities. They provide the methods not only to compute the number of replicas per file, but also to determine the location for a new replica on the basis of storage and transfer costs. Cohen and Shenker [16] proposed replication strategies in unstructured peer-to-peer networks. They propose uniform, proportional, and square-root replication to minimize the expected search size. Cuenca-Acuna et al. [17] proposed replication to increase the availability of shared data in unstructured peer-to-peer systems. They replicate files using an erasure code. Sarmenta [18] proposed sabotage-tolerance mechanism for volunteer computing systems. The proposed mechanism tolerates erroneous results from malicious volunteers by using majority voting and spot-checking mechanisms. Especially, with majority voting, the same task is replicated at different volunteers as much as the number of redundancy to meet the desired error rate. Most replication approaches focus on data replication (i.e., replicating data or file), whereas Li and Mascagni [12], Kondo et al. [13], and Sarmenta [18] deal with computational replication (i.e., replicating the execution of task). Data replication is mainly used to improve data availability and access time in a peer-to-peer network or a data grid computing environment. On the other hand, computational replication is mainly used for fault tolerance or result certification in a computational grid or a desktop grid computing environment. In particular, Li and Mascagni

[12] and Kondo et al. [13] used replication for fault tolerance and performance, whereas Sarmenta [18] for result certification (i.e., tolerating malicious volunteers). In addition, existing replication mechanisms [12, 13, 18] are not on a per group basis. B. Motivation Some replication strategies have been proposed in a P2P network or a grid computing environment. However, there are no dynamic computational replication mechanisms to adapt to a peer-to-peer grid computing environment. When existing computational replication approaches apply to a highly dynamic peer-to-peer grid computing environment, there are some problems as follows. 1) There are no computational replication mechanisms on a per group basis. In a peer-to-peer grid computing environment, peers (i.e., volunteers) have different properties such as the occurrence rates and types of volunteer autonomy failures, availability, credibility, and volunteering time. These distinct properties make it difficult for volunteers to execute tasks reliably and continuously. In order to improve the reliability of computation and performance, the replication mechanisms must consider the distinct properties of volunteers. In order words, it is required to classify volunteers into groups that have similar properties, and then to apply various replication algorithms to each group. However, there are no group-based computational replication mechanisms in a peer-topeer grid computing environment. Existing replication mechanisms apply only one replication algorithm at a time statically. That is, the same replication algorithm is applied to all volunteers. As a result, there are high overhead and poor performance. 2) Existing replication approaches do not adapt to a dynamic peer-to-peer grid computing environment. In a peer-to-peer grid computing environment, volunteers have various properties such as capabilities (i.e., CPU power, storage, or network bandwidth), location, availability, credibility, and so on. To adapt to various properties, the replication mechanisms should consider properties of volunteers when deciding the number of redundancy and selecting a replica. However, existing replication mechanisms do not consider various properties of volunteers. As a result, they require more redundancy, and undergo high overhead as well. For example, Sarmenta [18] simply selects the volunteer that firstly completes a task as the next replica without considering its credibility or availability. If the selected volunteer has low credibility, more volunteers are needed to achieve majority voting. If the volunteer with low availability is chosen, the majority voting is delayed or blocked because of the volunteer autonomy failures. In addition, Li and Mascagni [12] do not consider the various properties (i.e., volunteer autonomy failures) when deciding the number of redundancy or selecting a replica. 3) Existing replication mechanisms do not consider volunteer autonomy failures. The peer-to-peer grid computing system respects the autonomy of volunteers. In other words, volunteers can leave in the middle of public execution, and volunteers are allowed to execute private execution at any time while interrupting the public execution. These failures are referred to as volunteer autonomy failures in this paper because the public execution is stopped or suspended. The volunteer autonomy failures occur more frequently than in a grid computing environment because a peer-to-peer grid computing system is based on dynamic desktop computers. Especially, volunteer autonomy failures occur much more frequently than crash and link failures in a peer-to-peer grid computing environment. Therefore, the replication mechanism must consider the volunteer autonomy failures when deciding the number of redundancy or selecting a replica. If the crash and link failures are only considered when calculating the number of redundancy, the number of redundancy is wrongly calculated. As a result, the number of redundancy does not meet the reliability sufficiently. In addition, if the selected volunteers have high rate of volunteer autonomy failures, the execution of task is not completed even though the task is continuously replicated. For example, Li and Mascagni [12], Kondo et al. [13], and Sarmenta [18] do not consider the volunteer autonomy failures when calculating the number of redundancy or selecting a replica. To solve those problems, we propose a new dynamic computational replication mechanism based on volunteer groups which are constructed according to volunteer availability, volunteer credibility, and volunteering service time. Our proposed mechanism applies different replication algorithms to volunteer groups, depending on their properties.

III. G ROUP - BASED DYNAMIC C OMPUTATIONAL R EPLICATION M ECHANISM Our dynamic computational replication mechanism provides not only how to calculate the number of redundancy, but also how to select replicas on the basis of volunteer groups. In this section, we firstly describe how to construct volunteer groups depending on properties of volunteers such as volunteer availability, volunteer credibility, and volunteering service time. Second, we illustrate how to calculate the number of redundancy. Third, we present how to select replicas. Finally, we explain how to distribute tasks to replicas. A. How to construct volunteer groups Volunteer group is a set of volunteers that have similar properties such as volunteer availability, volunteer credibility, and volunteering service time. The volunteer groups are constructed for the purpose of dynamically applying different replication algorithms at the same time. When volunteers are classified, their CPU and memory capacities are important, while volunteering time, availability, and credibility are more important than the former because a peer-to-peer grid computing system is based on dynamic desktop computers. Moreover, the volunteering time, availability, credibility are varied [3, 4, 19, 24]. Therefore, the task completion time is strongly affected by the latter factors. In this paper, we focus on the latter factors. The traditional availability in distributed systems is only related to crash and link failures [20, 21]. In a peerto-peer grid computing environment, the execution is more frequently delayed and blocked by volunteer autonomy failures than by crash and link failures. Thus, the availability must reflect volunteer autonomy failures in a peer-topeer grid computing environment. To this end, we newly define a volunteer availability. In addition, the traditional credibility is calculated according to whether the result is correct or not [18]. In a peer-to-peer grid computing environment, all volunteers do not generate their results on account of volunteer autonomy failures. Thus, this feature should be reflected to define a volunteer credibility. The volunteering time, volunteer availability, and volunteer credibility are defined as follows. Definition 1 (Volunteering time): Volunteering time (Υ) is the period when a volunteer is supposed to donate its resources. Υ = ΥR + ΥS

Here, the reserved volunteering time (ΥR ) represents the reserved time when a volunteer provides computing resources. A volunteer mostly performs public execution during ΥR , rarely performing private execution. However, the selfish volunteering time (ΥS ) represents unexpected volunteering time. Thus, a volunteer usually performs private execution during the ΥS , and sometimes performs public execution. Definition 2 (Volunteer availability): Volunteer availability (αv ) is the probability that a volunteer is operating correctly and is able to deliver the volunteer services during volunteering time Υ. αv =

M T T V AF M T T V AF + M T T R

Here, the MTTVAF represents ”mean time to volunteer autonomy failures” and the MTTR represents ”mean time to rejoin”. The MTTVAF represents the average time before a volunteer autonomy failures happen, and the MTTR represnets the mean duration of volunteer autonomy failures. The αv reflects the degree of volunteer autonomy failures, whereas the traditional availability in distributed systems is mainly related with crash failure. Definition 3 (Volunteer credibility): Volunteer credibility Cv is the probability that the result produced by a volunteer is correct. CR Cv = ER + CR + IR

// VA : A class, VB : B class, VC : C class, VD : D class // V GA0 : A’ class, V GB 0 : B’ class, V GC 0 : C’ class, V GD0 : D’ class // To classify the registered volunteers into A, B, C, D classes, respectively ClassifyVolunteers(V ); // To construct volunteer groups if (Vi ∈ VA ) then // Vi : one of the classified volunteers if (Vi .Cv ≥ ϑ) then // Vi .Cv : Cv of Vi Vi → V GA0 ; // → : assign else V i → V GC 0 ; fi; else if (Vi ∈ VB ) then if (Vi .Cv ≥ ϑ) then V i → V GB 0 ; else V i → V GD 0 ; fi; else if (Vi ∈ VC ) then V i → V GC 0 ; else V i → V GD 0 ; fi; Fig. 1.

Algorithm of volunteer group construction

Here, ER represents the number of erroneous results, CR represents the number of correct results, and IR represents the number of incomplete results. ER + CR + IR means the total number of tasks that a volunteer executes. In the case of majority voting, if volunteers within voting group reach an agreement, their Cv becomes higher. The volunteering time does not reflect the volunteer autonomy failure. If a volunteer suffers from volunteer autonomy failures, the time to execute its task decreases. Thus, we define a volunteering service time as follows. Definition 4 (Volunteering service time): Volunteering service time (Θ) is the expected service time when a volunteer processes the public execution during Υ. Θ = Υ × αv

When volunteers are classified, Θ is more appropriate than Υ because Θ represents the time when a volunteer actually executes each task in the presence of volunteer autonomy failures. Volunteer groups are constructed by the algorithm of volunteer group construction as shown in Fig 1. First, the registered volunteers are classified into A, B , C , and D classes depending on volunteering service time and volunteer availability as shown in Fig. 2 (a). Then, the classified volunteers are classified into each volunteer group according to volunteer credibility. By the algorithm, volunteer groups are categorized into four classes (A0 , B 0 , C 0 , and D0 classes) as shown in Fig. 2 (b). Here, ∆ is the expected computation time of a task. ϑ is the desired credibility threshold. The A0 volunteer group has both high Cv , high Θ, and high αv enough to execute tasks reliably. There is high possibility to produce correct results in the A0 volunteer group. The B 0 volunteer group has high Cv and high αv , but low Θ. It has a high possibility to produce correct results. However, it cannot complete their tasks because of lack of the computation time. In addition, volunteer autonomy failures occur frequently in the middle of execution. The C 0 volunteer group has high Θ, but low Cv and low αv . It has time enough to execute tasks. However, its results might be incorrect. Therefore, in order to strength the credibility, C 0 volunteer group must do more spot-checking or placing more redundancy for voting than A0 or B 0 volunteer group. The D0 volunteer group has low Cv , low Θ, and low αv . It has no time enough to execute tasks. In addition, there is scarcely any possibility to produce correct results. Moreover, volunteer autonomy failures occur frequently in the middle of execution. Therefore, tasks are

'

'

Dv

Cv

B

A

B’

A’

(Intermediate quality)

(High quality)

(High-intermediate quality)

(High quality)

D

C

D’

C’

(Low quality)

(Intermediate quality)

(Low quality)

(Low-intermediatequality)

-

4

(a)

Fig. 2.

(b)

4

The classification of volunteers and volunteer groups

not allocated to the D0 volunteer group, because not only management cost is too expensive, but also results are incorrect. B. How to calculate the number of redundancy Our dynamic computational replication mechanism calculates the number of redundancy on the basis of each volunteer group. In addition, it exploits volunteer autonomy failures and volunteer credibility simultaneously when calculating the number of redundancy. In a peer-to-peer grid computing environment, volunteer autonomy failures occur much more frequently than crash and link failures. Therefore, we must consider volunteer autonomy failures when calculating the number of redundancy. However, existing methods 1 do not consider volunteer autonomy failures. To reflects the volunteer autonomy failures, our mechanism makes use of volunteer availability and volunteer autonomy failures as follows. The number of redundancy r for reliability is calculated by Eq. 1. Here, τ represents the MTTVAF of a volunteer, and τ 0 represents the MTTVAF of a volunteer group. 0

(1 − e−∆/τ )r ≤ 1 − γ

(1)

The parameter γ is the reliability threshold. τ 0 = (V1 .τ + V2 .τ + · · · + Vn .τ )/n

Here, n is the total number of volunteers within a volunteer group. The Vn .τ means τ of a volunteer Vn . ∆ In Eq. 1, the expression e− τ 0 2 represents the reliability of each volunteer group, which means the probability ∆ to complete tasks within ∆. It reflects volunteer autonomy failures. The (1 − e− τ 0 )r means the probability that all replicas fail to complete the replicated tasks. Each volunteer group has different r. For example, the A0 and C 0 volunteer groups have smaller r than the B 0 volunteer group. In a peer-to-peer grid computing environment, some malicious volunteers tamper with the computation and then return corrupted results. In this case, the peer-to-peer grid computing systems must detect and tolerate the erroneous result, which is called result certification. To this end, majority voting and spot-checking have been exploited 3 . In this paper, we focus on majority voting. Especially, we calculate the number of redundancy for majority voting by using volunteer credibility. 1

Ranganathan et al. [14] proposed how to calculate the number of redundancy per files in a large peer-to-peer communities. Li and Mascagni [12] proposed the number of redundancy in computational grid. 0 2 If the lifetime of a volunteer is exponentially distributed, then the reliability of the volunteer R(t) is : R(t) = e−λ t [19-22]. The 0 parameter λ refers to the rate of volunteer autonomy failures. If the probability that tasks are completed at time interval ∆ is calculated, ∆ then the e− τ 0 is also calculated because λ10 = τ 0 . 3 Sarmenta [18] proposed how to calculate the number of redundancy for majority voting. However, he does not consider volunteer credibility and volunteer autonomy failures. In addition, he does not provide replication mechanism on a per group basis

Start

Choose a volunteer group

If majority voting is considered?

No

Yes Calculate r by Eq. 2

r meets Eq.1

No

Calculate r by Eq. 1

Yes Decide r to the volunteer group

Does volunteer group exist?

No

End

Yes

Fig. 3.

Algorithm for calculating the number of redundancy

The number of redundancy r for majority voting is dynamically calculated through Eq. 2. Here, r = 2k + 1 4 . 2k+1 X µ2k + 1¶ (1 − Cv0 )i (Cv0 )(2k+1−i) ≤ 1 − ϑ (2) i i=k+1

The parameter ϑ is the desired credibility threshold that a task achieves. The parameter Cv0 means the probability that volunteers within each volunteer group generate erroneous results. Cv0 = (V1 .Cv + V2 .Cv + · · · + Vn .Cv )/n ¡2k+1¢ P In Eq. 2, the left expression 2k+1 (1−Cv0 )i (Cv0 )(2k+1−i) represents the error probability to each volunteer i=k+1 i group, which is bounded by the following equation [23]. [4Cv0 (1 − Cv0 )]k+1 √ 2(2Cv0 − 1) πk

Each volunteer group has the different number of redundancy. For example, the A0 and B 0 volunteer groups have smaller r than C 0 volunteer group. To apply volunteer autonomy failures and volunteer credibility simultaneously when calculating the number of redundancy, we propose the following algorithm as shown in Fig. 3. C. How to select replicas After deciding the number of redundancy to each volunteer group, our mechanism selects replicas (i.e., volunteers to execute the replicated task) according to the number of redundancy. Therefore, each volunteer group has many replication groups, which refer to a set of replicas for a task. To make a replication group for a task, volunteers within each volunteer group are sorted by volunteer availability αv , volunteering service time Θ, and volunteer credibility Cv . Especially, A0 volunteer group is sorted by αv and then by Θ. The Cv does not matter because the value is beyond the desired credibility ϑ in A0 volunteer group. B 0 volunteer group is sorted by Θ and then by αv . The Θ is important because of insufficient volunteering service 4 If k malicious volunteers return the erroneous results (i.e., volunteers exhibit Byzantine failures), a minimum of 2k + 1 volunteers are needed to achieve k fault tolerance [19, 20].

V0

Ti

Ti+1

Ti+2

V0

Ti

Ti+1

Ti+2

V1

Ti

Ti+1

Ti+2

V1

Ti+1

Ti+2

Ti

V2

Ti

Ti+1

Ti+2

V2

Ti+2

Ti

Ti+1

(a) Parallel distribution

Fig. 4.

(b) Sequential distribution

Parallel and sequential distribution TABLE I S IMULATION E NVIRONMENT

Case Case1

Case2

Case3

P. αv Θ Cv P. αv Θ Cv P. αv Θ Cv

A’ 103(51.5%) 0.86 42 0.98 72 (36%) 0.84 36 0.98 33 (16.5%) 0.83 35 0.98

B’ 18 (9%) 0.87 17 0.98 24 (12%) 0.80 16 0.98 47 (23.5%) 0.81 13 0.98

C’ 64 (32%) 0.79 39 0.87 74 (37%) 0.78 32 0.87 60 (30%) 0.78 36 0.87

D’ 15 (7.5%) 0.86 17 0.89 30 (15%) 0.75 16 0.80 60 (30%) 0.74 13 0.86

Total 200 0.84 37 min. 0.94 200 0.80 29 min. 0.91 200 0.79 23 min. 0.91

time in B 0 volunteer group. C 0 volunteer group is sorted by Cv and the by αv because it has low credibility. After each volunteer group is sorted, the replication groups are constructed according to r. When a volunteer suffers from failures, the failed volunteer is replaced by a new one. In our replication mechanism, the failed volunteers are replaced by volunteers with higher or equal quality in order to keep the credibility and availability higher or equal. For example, the failed volunteers in C 0 volunteer group are replaced by new volunteers in A0 or C 0 volunteer group. D. How to distribute tasks to replicas The method to distribute a task to replication group is categorized into two approaches : parallel distribution and sequential distribution as shown in Fig. 4. In Fig. 4, the replication group consists of volunteers, V0 , V1 , and V2 (that is, r = 3). With the parallel distribution, the task (Ti ) is distributed to all members at the same time as shown in Fig. 4 (a), and then executed simultaneously. On the other hand, with the sequential distribution, the task (Ti ) is distributed and executed sequentially as shown in Fig. 4 (b). In the case of A0 volunteer group, sequential distribution is more appropriate than parallel one because the former can perform more tasks. That is, A0 volunteer group has high possibility to produce correct results, so it can carry out its task reliably without failures (especially, volunteer autonomy failures). For example, if V0 completes the task Ti and its reliability is satisfied in Fig. 4 (b), there is no need to execute it at V2 . In addition, in the case of majority voting, if the first two results of Ti+2 generated at V1 and V2 are the same, there is no need to execute the Ti+2 at V0 as shown by the dotted line in Fig. 4 (b) because majority (i.e., 2 out of 3) is already achieved. Therefore, the volunteers can execute other tasks as soon as majority is reached, instead of the tasks indicated by the solid line in Fig. 4 (b). In the case of B 0 volunteer group, sequential distribution is more appropriate than parallel one, just like with the A0 volunteer group. However, B 0 volunteer group has low Θ, so it can not complete their tasks because of the lack of the computation time. Therefore, the manager of B 0 volunteer group must provide task migration in order to execute the tasks continuously. During task migration, a former volunteer effects the new volunteer to which a task is migrated. In other words, if the malicious volunteer is wrongly selected as the new volunteer, it ruins the

Fig. 5.

The average number of redundancy without majority voting

correct result that was generated by the former volunteer. Therefore, the new volunteer must be chosen among B 0 or A0 volunteer group, not C 0 or D0 volunteer group. In the case of C 0 volunteer group, parallel distribution is more appropriate than sequential one. C 0 volunteer group has enough time to execute tasks. However, it has low credibility, so it has a high probability that its result is incorrect. Moreover, each volunteer suffers from volunteer autonomy failures owing to low αv . In the case of majority voting, the voting procedures are delayed if sequential distribution is adopted. In other word, it takes a longer time and high overhead to complete the result certification. In the case of parallel distribution, the overhead and time are smaller than in sequential distribution relatively because voting procedure for each task is completed within a step as shown in Fig. 4 (a). IV. E VALUATION We evaluate our group-based dynamic computational replication mechanism with existing replication mechanisms [12, 13, 18]. The evaluation focuses on how much performance improvement is achieved, depending on whether volunteer groups are considered. To achieve this, volunteer groups are intentionally set up, which have different volunteering service time, volunteer availability, volunteer credibility as described in Table 1. We make use of a simulation to evaluate the proposed mechanism. The simulation was conducted on the basis of the ”Korea@Home” [24-27] project. Now, the Korea@Home has about 7,500 volunteers and its daily performance is about 300 Gflops in average. In Korea@Home, volunteers can take part in one of three kinds of applications: global risk management, new drug candidate discovery, and climate prediction. The 200 volunteers have various volunteer availability, volunteer credibility, and volunteering service time as shown in Table 1. Here, P. (i.e. population) represents the number of volunteers. We assume that the range of M T T V AF is 1/0.2 ∼ 1/0.02 minutes, and M T T R is 3 ∼ 10 minutes. A task consumes 16 minutes of execution time on a dedicated Pentium 1.4 GHz. In Table 1, the volunteering service time Θ is decreasing from Case 1 to

Fig. 6.

Fig. 7.

The average number of redundancy with majority voting

The number of completed tasks with majority voting

Case 3. Case 1 is smaller than Case 2 with respect to volunteer availability and volunteer credibility. Case 2 is different from Case 3 with respect to volunteering service time. Cases 1 and 2 have more A0 and C 0 volunteer groups than B 0 and D0 volunteer groups. On the other hand, Case 3 have more D0 volunteer groups than A0 and B 0 volunteer groups. Each simulation was repeated 10 times per case. Fig 5. shows the average number of redundancy without result certification (i.e., majority voting). That is, the Eq. 1 is used to calculate the number of redundancy r. Our proposed mechanism that calculated r on the basis of each volunteer group has lower r than the mechanisms that calculates r without considering volunteer groups. As the volunteering service time Θ decreases (i.e., as in Case 1 to Case 3), r increases as shown in Fig. 5 (a) to Fig. 5 (c). Similarly, as the reliability threshold γ increases, r increases. The difference between two lines in Case 3 is greater than that in Case 2 not only because the population of A’ and C’ volunteer groups in Case 3 is smaller than that in Case 2, but also because the Θ in Case 3 is small than that in Case 2. Fig. 6 shows the average number of redundancy r when the result certification (i.e., majority voting) is considered. The r is calculated by the algorithm as shown in Fig. 3. We assume that the credibility threshold ϑ is 0.98 and γ is 0.5. In this case, the r is affected by volunteer credibility Cv . First of all, our proposed mechanism has lower r than the mechanisms that calculates r without volunteer group. Second, when comparing Case 1 with Case 2 and Case 3, the r increases because Case 1 has greater than Case 2 and 3 with respect to Cv . Finally, the difference between two bars in Case 3 is greater than that in Case 2 not only because the population of A0 and C 0 volunteer groups in Case 3 is smaller than that in Case 2, but also because the Cv in Case 3 is smaller than that in Case 2. Fig. 7 shows the number of completed tasks when result certification is considered. The ϑ and γ are the same as Fig. 6. Our adaptive replication mechanism calculates r to each volunteer group, and then distributes tasks to volunteer groups by using distribution approaches (i.e., parallel distribution and sequential distribution). As a result,

it completes more tasks than the mechanism that without applying volunteer groups and distribution approaches. Case 1 has higher αv , Θ and Cv than Case 2 and Case 3, so it has the larger number of completed tasks than the latter. V. C ONCLUSION In this paper, we proposed a group-based dynamic computational replication mechanism that adapts to a dynamic peer-to-peer grid computing environment. Our proposed mechanism applies different replication algorithms to volunteer groups at the same time according to their properties. To this end, we firstly specified volunteer autonomy failures, volunteer availability, volunteer credibility, and volunteering service time, which reflect the unstable, untrusted, and dynamic properties in a peer-to-peer grid computing. Second, we proposed how to construct volunteer groups according to the above properties. Third, we proposed the methods to calculate the number of redundancy to each volunteer group to meet the required reliability and credibility threshold. Our proposed mechanism dynamically adjusts the number of redundancy, and selects replicas on the basis of the properties of each volunteer group. Fourth, we proposed how to construct replication groups in each volunteer group according to its properties. Fifth, we proposed the methods to distribute task to replication groups : parallel and sequential distribution approaches. Finally, we evaluated our proposed mechanism by simulations based on Korea@Home [24-27]. Our simulation results showed that the proposed mechanism reduces the number of redundancy and therefore completes more tasks. ACKNOWLEDGMENT This work was supported by the Korea Institute of Science and Technology Information (KISTI) as Korea@Home project. R EFERENCES [1] SETI@home, ”http://setiathome.ssl.berkeley.edu” [2] Distributed.net, ”http://distributed.net” [3] D. S. Milojicic, V. Kalogeraki, R. Lukose, K. Nagaraja, J. Pruyne, B. Richard, S. Rollins, and Z. Xu, ”Peer-to-Peer Computing”, HP Laboratories Palo Alto HPL-2002-57, March 2002. [4] I. Foster and A. Iamnitchi, ”On Death, Taxes, and the Convergence of Peer-to-Peer and Grid Computing”, 2nd International Workshop on Peer-to-Peer Systems (IPTPS’03), February 2003. [5] F. Berman, G. C. Fox, and A. J. G. Hey, ”Grid Computing : Making the Global Infrastructure a Reality”, Wiley, 2003 [6] L. F. G. Sarmenta, S. Hirano. ”Bayanihan: Building and Studying Volunteer Computing Systems Using Java”, Future Generation Computer Systems Special Issue on Metacomputing, Vol. 15, No. 5/6., 1999. [7] G. Fedak, C. Germain, V. Neri, and F. Cappello, ”XtremWeb: A Generic Global Computing System,” 1st IEEE Int. Symposium on Cluster Computing and the Grid: Workshop on Global Computing on Personal Devices, pp. 582-587, May 2001. [8] M. O. Neary, S. P. Brydon, P. Kmiec, S. Rollins, and P. Cappello, ”Javelin++: Scalability Issues in Global Computing”, Concurrency: Parctice and Experience, pp. 727-735, December 2000. [9] D. P. Anderson, ”BOINC: A System for Public-Resource Computing and Storage,” 5th IEEE/ACM Int. Workshop on Grid Computing, pp. 4-10, Nov. 2004. [10] A. Chien, B. Calder, S. Elbert, and K. Bhatia, ”Entropia: architecture and performance of an enterprise desktop grid system,” Journal of Parallel and Distributed Computing, Vol. 63, Issue 5, pp. 597-610, 2003. [11] D. Thain, T. Tannenbaum, and M. Livny, ”Distributed Computing in Practice : The Condor Experience,” Concurrency and Computation: Practice and Experience, Vol. 17, Issue 2-4, pp. 323-356, Feb. 2005. [12] Y. Li and M. Mascagni, ”Improving Performance via Computational Replication on a Large-Scale Computational Grid ”, 3rd IEEE/ACM Int. Symposium on Cluster Computing and the Grid (CCGRID 2003), pp.442 -448, May 2003. [13] Derrick Kondo, Andrew A. Chien, and Henri Casanova, ”Resource Management for Rapid Applicatin Turnaround on Enterprise Desktop Grids”, Supercomputing 2004 (SC 2004), November 2004. [14] K. Ranganathan and I. Foster, ”Identifying Dynamic Replication Strategies for a High-Performance Data Grid”, 2nd Int. Workshop on Grid Computing (GRID 2001), pp.75-86, November 2001. [15] K. Ranganathan, A. Iamnitchi and I. Foster, ”Improving Data Availability through Dynamic Model-Driven Replication in Large Peerto-Peer Communities”, 2nd IEEE/ACM Int. Symposium on Cluster Computing and the Grid (CCGRID2002), pp. 346 -351, May 2002. [16] E. Cohen and S. Shenker, ”Replication Strategies in Unstructured Peer-to-Peer Networks”, The annual conf. of the Special Interest Group on Data Communication (SIGCOMM’02), pp. 177-190, October 2002. [17] F. M. Cuenca-Acuna, R. P. Martin and T. D. Nguyen, ”Autonomous Replication for High Availability in Unstructured P2P systems”, 22nd International Symposium on Reliable Distributed Systems (SRDS 2003), pp. 99-108, October 2003. [18] L. F. G. Sarmenta, ”Sabotage-Tolerance Mechanisms for Volunteer Computing Systems”, Future Generation Computer Systems, 18(4), 2002.

[19] D. Kondo, M. Taufer, J. Karanicolas, C. L. Brooks, H. Casanova and A. Chien, ”Characterizing and Evaluating Desktop Grids: An Empirical Study”, 8th Int. Parallel and Distributed Processing Symposium (IPDPS’04), April 2004. [20] P. Jalote, ”Fault Tolerance in Distributed Systems”, Prentice-Hall, 1994 [21] A. S. Tanenbaum and M. V. Steen, ”Distributed Systems: Principles and Paradigms”, Prentice Hall, 2002. [22] K. S. Trivedi, ”Probability and Statistics with Reliability, Queuing and Computer Science Applications”, Second Edition, WILEY, 2002. [23] Yu. A. Zuev, ”On the Estimation of Efficiency of Voting Procedures”, Volume 42, Number 1, pp. 73-81, Theory of Probability & Its Applications, 1998. [24] Korea@Home, http://www.koreaathome.org/eng/ [25] S. Choi, M. Baik, C. Hwang, J. Gil, and H. Yu, ”Mobile Agent based Adaptive Scheduling Mechanism in Peer to Peer Grid Computing”, International Conference on Computational Science and its Applications (ICCSA 2005), LNCS 3483, pp. 936-947, May 2005. [26] S. Choi, M. Baik, C. Hwang, J. Gil, and H. Yu, ”Volunteer Availability based Fault Tolerant Scheduling Mechanism in Desktop Grid Computing Environment”, 3th IEEE International Symposium on Network Computing and Applications, Workshop on Adaptive Grid Computing, pp.476-483, August 2004. [27] M. Baik, S. Choi, C. Hwang, J. Gil, and H. Yu, ”Adaptive Group Computation Approach in the Peer-to-Peer Grid Computing Systems,” Workshop on Adaptive Grid Middleware, September 2004.

Suggest Documents