Task Reallocation for Maximal Reliability in

2 downloads 0 Views 344KB Size Report
paper the service reliability of a DCS in uncertain topologies is analytically ... introduce uncertainty in the task re-allocation (TR) process. DTR is typically ...
Task Reallocation for Maximal Reliability in Distributed Computing Systems with Uncertain Topologies and Non-Markovian Delays Jorge E. Pezoa and Majeed M. Hayat∗ Department of Electrical and Computer Engineering University of New Mexico, Albuquerque, NM 87131, USA; E-Mail: {hayat,jpezoa}@ece.unm.edu ∗ M. M. Hayat is also with Center for High Technology Materials, University of New Mexico, Albuquerque, NM, USA Abstract—The ability to model and optimize reliability is central in designing survivable distributed computing systems (DCSs) where servers are prone to fail permanently. In this paper the service reliability of a DCS in uncertain topologies is analytically characterized by using a novel regeneration-based probabilistic analysis. The analysis takes into account the stochastic failure times of servers, the heterogeneity and randomness of both service times and communication delays, as well as arbitrary task-reallocation policies. Auxiliary age variables are introduced in the analysis to capture the memory associated with the non-Markovian (non-exponential) communication and service random times, thereby enabling the recursive analytical characterization of reliability. Implications of the non-exponential times on reliability are studied, and the results are compared to those obtained using a Markovian formulation; in particular, the effect of increasing the mean communication times is investigated. The model is further used to solve the optimization problem of task-reallocation for maximal reliability, and the results are compared to those from Monte-Carlo simulations and actual experiments conducted on a small-scale DCS over the Internet. Index Terms—reliability, regeneration time, distributed computing, task reallocation, load balancing, communication delay, node failure, uncertain topology

I. I NTRODUCTION A distributed computing system (DCS) is a distributedmemory, multi-processor computing environment that provides to its users means to serve large, computational-intensive workloads. Unlike parallel computing environments, server nodes are geographically dispersed and present heterogeneous computing capabilities. In addition, ultrafast local-area networks used in parallel computers are replaced in DCSs by wide area networks (WANs), whose links exhibit both low bandwidth and significant latency in the information exchange. Dynamic task re-allocation (DTR) is a run-time control action that dictates when and how tasks must be exchanged among the servers. The goal of a DTR policy is to use efficiently the resources of the system. The main challenge that any decentralized DTR policy faces is the risk of taking DTR decisions based upon dated and/or incomplete information about the state of servers and links. The lack of a central coordinator in a DCS forces the servers to exchange information about their state. Such information is typically the number of tasks queued at each server. Moreover, upon the failure of a server, failure-notice (FN) messages have to be

broadcasted to inform the working nodes about the failed state of a server. However, the unavoidable speed and bandwidth limitations imposed by the WAN communicating the servers introduce uncertainty in the task re-allocation (TR) process. DTR is typically employed as the mechanism to achieve an even distribution of tasks among the servers [1], [2]. In addition, DTR can be used to improve the service reliability of a DCS, that is, to increase the probability of processing a workload before all servers fail permanently. With this aim, DTR policies maximize the reliability while simultaneously reduce the response time of the entire workload [3]–[7]. Existing analytical models for the service reliability are based on heuristics, genetic algorithms, optimization and probability theory [8]– [10]. We argue that in uncertain communication environments, like those imposed by a WAN, the task-reallocation problem has to be tackled in a probabilistic framework. In our earlier work [11], we have studied the problem of devising decentralized DTR policies for maximizing the service reliability of a DCS assuming a Markovian queueing network. In this paper we have relaxed the assumption on the exponential distribution of the random times governing the DCS. To this end, we have introduced in our analysis a continuous-time age matrix, which keeps track of the memory of all the non-exponential random times. The age matrix augments the state-space model presented in [11] yielding a hybrid continuous and discrete state-space. This hybrid model is used in conjunction with an age-dependent stochastic regeneration theory to analytically characterize the service reliability of a DCS. The model for the reliability is used to devise optimal DTR policies with maximal service reliability. Next, we follow the algorithm presented in [11] to extend, in a scalable fashion, the optimal policies for two-server DCS to systems with an arbitrary number of servers. In our evaluations we have found that the Markovian approach presented in [11] does not provide an approximation for the reliability when the average communication delays are severe. Also, we have experimentally characterized the random times of a testbed DCS and used our model to predict the service reliability of the system. Predictions are compared with both actual experiments and Monte-Carlo (MC) simulations and the effect of stochastic network delays on the service reliability is investigated. This paper is organized as follows. In Section II we build the

age-dependent regeneration theory for the service reliability of a DCS. In Section III we compare the age-dependent approach with our previous Markovian regenerative solution. DTR policies for maximal reliability are devised and tested using real experiments. Our conclusions are given in Section IV. II. T HEORY A. Problem statement Consider a DCS composed of n servers that communicate through a fully connected network. A workload comprising M independent tasks has to be served by the DCS. Assume that at t = 0, there are mj tasks allocated at the P jth server, with mj n a non-negative integer number and M = j=1 mj . Suppose that servers can fail permanently at any random instant and suppose also that at t = 0 all the servers are functioning. Assume also that servers frequently exchange information messages; therefore, each server has an estimate of the number of tasks queued at other servers. The service reliability is defined here as the probability that the workload assigned to the DCS can be served before all servers fail. The question that we ask and answer in this paper is the following. How should the M tasks be re-allocated to n servers such that the service reliability is maximized? More precisely, we must determine the number of tasks, Lij , to migrate from the ith to the jth server in order to maximize the service reliability, with Lij a non-negative integer and i, j ∈ {1, . . . , n}, i 6= j. In this paper we have assumed that there is no arrival of external tasks to the DCS. This assumption can be relaxed to accommodate dynamic situations by triggering a local DTR action every time a server receives a new workload [2]. Note that the TR problem tackle here can be solved only if some degree of task redundancy is provided by the DCS. The task redundancy mechanism described here is a distributed version of the centralized task redundancy solution given in [6]. We have assumed that each server has a secondary system that backs up unserved tasks of the main server and handles the failure of the main server. Upon failure of the latter server, the backup system re-allocates the unserved tasks onto the working servers in the DCS, and also handles the reception of tasks in transit to the main node. After receiving such tasks, the backup system re-allocates them back to the DCS. Finally, the following assumptions are imposed on the random times characterizing the DCS: Assumption A1: For any j 6= k, the following times are random and their probability distribution functions (pdfs) are known: (i) Wki : the service time of the ith task at the kth server, with pdf fWki (x); (ii) Yk : the failure time of the kth server, with pdf fYk (x); (iii) Xjk : the transfer time of the FN packet sent from the jth to the kth server, with pdf fXjk (x); and (iv) Zik : the transfer time of the ith group of tasks sent to the kth server, with pdf fZik (x)1 . 1 For convenience, the following random times are set to infinity in special cases: (i) the service time at a faulty server or at a server with no task; (ii) the failure time of a faulty server; and (iii) the transfer time of either a FN packet or a group of tasks when no FN packet or group of tasks is in transit.

Assumption A2: All the random variables listed in Assumption A1 are mutually independent. B. State-space model for the service reliability Given an initial configuration of the DCS, the dynamics of the system is governed by the random times in Assumption A1. We define the system configuration to be the number of tasks at each server, the number of working and failed servers in the system and the number of tasks queued in the communication network. In this paper we construct a state-space representation for the system configurations and formulated the stochastic process modeling the dynamics of the DCS. In an earlier work [11] we presented a discrete state-space model, similar to the one described here, for the service reliability of a DCS in a Markovian setting (i.e., when all the random times are exponentially distributed). For completeness, we review here the germane definitions. In a Markovian setting, the state of an n-server DCS can be described using three fundamental quantities: (i) the n-dimensional, column vector M whose ith component specifies the number of tasks queued at the ith server; (ii) the n-by-n, binary matrix F whose ijth element describes the functional or dysfunctional state of the jth server as perceived by the ith server2 ; and (iii) the network-state matrix C that specifies the number of tasks in transit over the network to each server. These state matrices, however, are not sufficient for describing a non-Markovian DCS where random times have memory. One of the main contributions of this paper is to develop an age-dependent extension of the state-space model in [11]. This is discussed next. 1) Auxiliary age variables: Let T be a non-negative random variable with pdf fT (t). Intuitively speaking, if we know that T > a, then we can think of the aged version, Ta , of T as the “replacement” of T measured from a. More precisely, the age variable, a, associated with the random variable T , is a non-negative, real-valued variable that defines on the event A = {T ≥ a} the random variable Ta = T − a, termed the aged version of T , whose pdf is equal to the conditional pdf of T given that A has occurred. Namely, fTa (t; a) = fT |A (t|a). (Note that if T is an exponentially distributed random variable, then the pdfs of T and Ta are identical.) Here we exploit such property of the age variable as follows: as soon as a random time T is triggered by some event its associated age variable is set to zero, and as time elapses, the age variable keeps track of the age of T and accordingly adjusts the pdf of T to show the effect of elapsed time on its likelihood. In what follows, we will associate to each random time an aged-version of it. This enables us to specify the “age” of the DCS and will further enable us to characterize recursively the service reliability. Let aMi and aFi be age variables associated with the service time of a task at the ith server and the failure time of the ith server, respectively, with i = 1, . . . , n. Also, let aFij be the age variable connected to the random transfer time of a FN 2 In [11] F is the concatenation of n row vectors. For clarity of the presentation, here we have stacked the vectors to form a matrix. A similar stacking was carried out for C.

packet from the ith to the jth server, with i, j = 1, . . . , n, i 6= j. We can arrange all these age variables in the column vector aM and the n-by-n matrix aF . The aM vector contains the aMi age variables and the aF matrix contains both the aFi variables (at the diagonal of the matrix) and the aFij variables (at the off-diagonal positions). Let aCj,i be the age variable associated with the transfer of the jth group of tasks to the ith server, with j a positive integer. We can also arrange these age variables in matrix form to obtain aC , whose ijth component is the age variable aCj,i . 4 We define the system-age as the concatenated matrix a = (aM , aF , aC ). Further, for a given time t we define the age-dependent system-state matrix  as the concatenated matrix 4 S(t) = M(t), F(t), C(t), a(t) , which describes completely the state of an n-server DCS. (Note that in a Markovian setting the memoryless property of the exponential distribution makes the system-age matrix unnecessary; therefore, thesystem-state matrix reduces to S(t) = M(t), F(t), C(t), a(t) as in [11].) With this we can introduce the stochastic process {S(t), t ≥ 0} characterizing the stochastic dynamics of the DCS. We now define formally the service reliability of a DCS. Let the workload completion time be defined as the random time taken by the DCS to serve its entire workload when the initial system configuration is as specified by S0 = S(0). 4 More precisely, we define T (S0 ) = inf{t > 0 : M(t) = 0 and C(t) = 0}. Note that by construction, the workload completion time is infinite when all the servers have failed and at least one task remains unserved; and P{T (S0 ) = ∞} > 0 since servers can fail permanently with non-zero probability. The service reliability is defined as the probability that the system workload can be processed before all servers fail, that 4 is R(S0 ) = P{T (S0 ) < ∞}. Recall that at t = 0 an arbitrarily specified DTR policy, L, is executed by all the servers; therefore, the reliability is also a function of L. Note that the service reliability is less than unity since P{T (S0 ) = ∞} > 0. In Section II-C2, Theorem 1 and Remark 1, we characterize recursively the service reliability of a DCS. C. Characterization of the service reliability 1) Rationale: The idea of our approach is to define a regeneration event and analyze the stochastic process emerging immediately after the first occurrence of this event. The regeneration event is defined as the first occurrence of either the service of a task at any server, the failure of any server, the reception of a FN packet by any server, or the reception of a group of tasks by any server. The point here is that upon the occurrence of the regeneration event, a fresh copy of the original stochastic process emerges at the regeneration time albeit with a new initial configuration that transpires from the regeneration event. In a Markovian setting, the memoryless property of the exponential distribution guarantees that the process regenerates itself at any time. However, when the stochastic process is not Markovian, it is necessary to keep track of the memory of all non-exponential distributions in order to yield a regenerative process. Therefore, auxiliary

age variables, a, introduced in the previous section must be included in the description of the process configuration. Recall the process {S(t), t ≥ 0} and suppose that at time t = 0 the system configuration is as specified by S0 = (M0 , F0 , C0 , a0 ). Formally, we define the age-dependent regeneration time, denoted by τa , as the minimum of the following four random variables: the time to the first task service by any server, the time to the first occurrence of failure at any server, the time to the first arrival of a FN packet at any server, or the time to the first arrival 4 of a group of tasks at any server. Mathematically, τa =  min mink Wk1 , mink Yk , minj6=k Xjk , mink,i Zki . The upcoming example describes how the age-dependent regeneration time and system-state matrix yield a recursive characterization for the service reliability. Suppose that the first event occurring in the DCS happens to be the execution of a task at the ith server at t = s. The occurrence of this event implies that all the random times governing the DCS have aged by s units of time and there is one less task queued at the ith server; all the other dynamics remain unchanged. Thus, the occurrence of the event {τa = s, τa = Wi1 } gives birth to a new DCS at t = s, represented by {S(t), t ≥ s}, that is statistically identical to the original process while having a new initial configuration 0 0 0 0 0 S0 = (M0 F0 C0 a0 ) resulting from the regeneration event {τa = s, τa = Wi1 }. More precisely, the new initial system 0 configuration is as follows: M0 is identical to M0 but with 0 0 one unit less at its ith element, F0 = F0 , C0 = C0 , and the 0 new system-age matrix is a0 = a0 + s with the ith component 0 of aM set to zero if at least one task remains queued at the ith server. Similar transformations on the initial configuration are observed when the regeneration event is any of the remaining events, namely, the failure of the ith server, the arrival of a FN packet from server j to server k, or the arrival of the ith group of tasks to the kth server. 2) Characterization of the service reliability: Before stating Theorem 1, we introduce some useful definitions. Let us define 4 the term GX,a (s) = P{X = τa |τa = s}fτa (s), where X is any of the random times listed in Assumption 1, fτa (s) is the pdf of the age-dependent regeneration time τa and P{X = τa |τa = s} is the probability that the regeneration event is {τa = X} conditional on the event {τa = s}. This conditional probability can be computed explicitly, either analytically or   m1 numerically, using Assumptions 1 and 2. Let M = ,    m2 f11 f12 g1 L1,1 L2,1 . . . Lg1 ,1 F = , and C = with f21 f22 g2 L1,2 L2,2 . . . Lg2 ,2 fij ∈ {0, 1} for i, j = 1, 2, gi is a non-negative integer representing the number of groups of tasks in transit to the ith server, and Lk,j is the kth group of tasks being transferred to the jth server, k ∈ [1, gi ]. Theorem 1, whose sketch of proof is given in the Appendix, characterizes the service reliability of a DCS. For ease of the presentation we show here the two-server case. This characterization can be extended to an n-server system in a

Consequently, starting with S0 and (1) we have to construct a system of eighteen recurrence equations, which has to be Theorem 1. Consider a two-server DCS with an ar- solved following a particular order. Equations forming such bitrarily specified initial system configuration S(0) = system are derived in a straightforward manner using (1) and (M, F, C, aM , aF , aC ). The service reliability satisfies: the new initial configurations shown at the right hand side of Z ∞" (1). Finally, recursions are solved using the following initial R(M, F, C, aM , aF , aC ) = conditions: R(S0 ) = 1 when there are no tasks to be served in 0 the DCS and R(S0 ) = 0 when both servers have failed and at   2 X least one task is remains unserved. m1 − δ1i GWi1 ,a (s) R , F, C, aiM , (aF , aC )+s At this point we highlight three differences between the m2 − δ2i i=1 Markovian and non-Markovian characterizations for the ser 2 X vice reliability. First, in a Markovian setting the conditional i ii g1 +δ2i L1,1 . . . Lg1 ,1 m2 δ2i + GYi ,a (s) R M , F , , g2 +δ1i L1,2 . . . Lg2 ,2 m1 δ1i probabilities associated with the regeneration event remain i=1 constant, while in the non-Markovian case are age-dependent. 2 2  X X i ii ji Second, in a Markovian setting the reliability is characterized (aM +s) , (aF +s) , aC +s + GXij ,a (s) R M, F , by an algebraic recurrence equation with constant coefficients, i=1 j=1,j6=i while in the non-Markovian scenario the recursion comprises  g 1  X m1 + Li,1 f11 C, (aM , aF , aC )+s + GZi1 ,a (s) R , F, an integral with age-dependent coefficients. Third, the statem2 space representation for the service reliability is discrete in the i=1  Markovian case, while in the non-Markovian case is a hybrid g1 − 1 L1,1 . . . Li−1,1 Li+1,1 . . . Lg1 ,1 , (aM , aF , aC )+s discrete and continuous representation. g2 +δ0f11 L1,2 . . . Lg2 ,2 Li,1 δ0f11  g 2 X D. Optimal task re-allocation policy for a DCS m1 g + δ0f22 L1,1 . . . + GZi2 ,a (s) R , F, 1 m2 + Li,2 f22 g2 − 1 L1,2 . . . Recall that the service reliability is a function of the DTR i=1 # policy L, which is executed by all servers at t = 0. For a twoLg1 ,1 Li,2 δ0f22 , (aM , aF , aC )+s ds, (1) server system, we can employ the reliability model given in Li−1,2 Li+1,2 . . . Lg2 ,2 Theorem 1, with the initial system configuration S0 , to search where δij the Kronecker delta, mi −1 is set to zero when mi = for the optimal DTR policy, L∗ = (L∗12 , L∗21 ), that maximizes 0, and the vector vi (correspondingly, matrix Aij ) is identical the service reliability. Formally, we have: to v (correspondingly, A) but with its ith (correspondingly, (L∗12 , L∗21 ) = argmax R(L; S0 ), (2) ijth) component set to zero. 2 (L12 ,L21 ) Remark 1. A non-Markovian characterization for the relia- subject to: (i) Lij +ri = mi for i = 1, 2, i 6= j; and (ii) Lij an bility of an n-server DCS can be obtained in a straightforward integer number in [0, mi ], for i = 1, 2, i 6= j. manner following the same principles as those for a two-server For a DCS with an arbitrary number of servers, we can system. However, the state-space of such characterization attempt to solve the optimization problem using the n-server grows exponentially in the number of servers, yielding an characterization for the reliability; however, computing the intractable model. reliability using the exact n-server characterization is computaIn order to compute the service reliability, R(S0 ), we must tionally expensive for large number of servers as the number of consider the system   configuration at t = 0. This configuration computations grows exponentially in the number of servers. As r1 an alternative for systems with an arbitrary number of servers, is: (i) M0 = where ri is the number of tasks queued r2 we follow [11] and provide a sub-optimal algorithm for DTR at the ith server, with ri = mi − Lij , i = 1, 2, i 6= j. Lij is policies that scales linearly with the number of servers. The the number of tasks reallocated from the ith to the jth server key idea is to decompose an n-server system into several twoaccording to an arbitrary DTR policy executed at t = 0; (ii) server DCSs and exploit our exact characterization of optimal F0 is an all-ones matrix because at t = 0 both servers are policies for two-node systems. assumed to be functioning; (iii) L12 and L21 tasks are being transferred in the network; therefore, g1 = g2 = 1, L1,1 = L21 E. Algorithm for devising maximal reliability DTR policies and L1,2 = L12 ; and (iv) The system-age matrix is the null The algorithm computes the number of tasks to re-allocate matrix. After plugging S0 in (1), we obtain a recursion for from the ith to the jth server at the kth iteration, L(k) ij , as R(S0 ) in terms of r1 and r2 , the number of tasks queued at follows. The ith server has an estimate, m ˆ j,i , of the number the servers after executing an arbitrary DTR policy. of tasks queued at the jth server. Using these estimates, the ith It turns out that to solve this recursion not only the values server constructs its collection of candidate recipient servers, of S0 for r1 − 1 and r2 − 1 are required, but also other Ui . From such collection, the ith server picks the jth server, (k) system configurations, such as when only one of the servers say, and obtains Lij by solving (2) with m1 = ri and m2 = is functioning, when more than one group of tasks is in transit m ˆ j,i , where ri is the number of tasks queued at the ith server to a server and when no tasks are in transit in the network. assuming that such server has already re-allocated tasks to all straightforward manner.

0.6

Algorithm 1 DTR policy for multi-server DCSs 0.55 0.5 Exponential Model 1 Model 2 Pareto r=∞ Pareto r=50 Pareto r=1 Pareto r=0.1

0.45 0.4 0.35 0.3 0

10

20

30

L12, tasks

40

50

40

50

(a) 0.55

for all j ∈ Ui and exit 0.5

0

R(L;S )

Lij = end if end loop

(k) Lij

R(L;S0)

(0)

Require: K, m ˆ j,i and Lij , with j = 1, . . . , n, i 6= j Ensure: Lij 0 (0) Set Ui = {j : Lij > 0}, Ui = ∅ and k = 1 loop while j ∈ Ui do Ui ← Ui \ {j} P P (k−1) (k) m1 = mi − `∈Ui Li` − `∈U 0 Li` i m2 = m ˆ j,i (k) Solve (2) using m1 and m2 to obtain Lij 0 0 Ui ← Ui ∪ {j} end while 0 (0) Set Ui = {j : Lij > 0}, Ui = ∅ and k ← k + 1 Pn ` (k) (k−1) ´ if j=1 Lij − Lij = 0 or k > K then

Exponential Model 1 Model 2 Pareto r=∞ Pareto r=50 Pareto r=1 Pareto r=0.1

0.45

0.4

0.35 0

10

20

30

L12, tasks (b)

0.35 0.3 0.25

R(L;S0)

its other candidate recipient servers, with the exception of the jth server. In order to produce an algorithm independent of the (k) order in which servers are selected from Ui , the Lij quantities are iteratively computed until all of them converge to some value or until a maximum number of iterations, K, is reached. A pseudocode for the algorithm is shown in Algorithm 1. Note that each server has to solve at most n−1 times the optimization problem (2), and such computation has to be repreated no more than K times. From this, we observe that the complexity of the algorithm increases linearly in the number of servers. The algorithm requires the following parameters: K, m ˆ j,i (0) and Lij . The parameter K is selected by the user. The estimates m ˆ j,i are obtained from queue-length information packets frequently exchanged among the servers. Finally, the (0) initial DTR policy, Lij , is computed using: $ % ˆi M (0) −1 Lij = m ˆ j,i − λfj Pn (3) −1 `=1 λf` ˆ i = mi + Pn where M ˆ j,i is the total load in the system j=1,j6=i m (as estimated by the ith server), λfj is the failure rate of the jth server and bxc is the greatest integer smaller than or equal to x. Note that (3) follows the intuition that more tasks have to be allocated to the most reliable servers.

0.2 Exponential Model 1 Model 2

0.15

Pareto r=∞ Pareto r=50 Pareto r=1 Pareto r=0.1

0.1 0.05 0 0

5

10

15

20

25

30

L12, tasks

35

40

45

50

(c) Fig. 1. Service reliability under different scenarios: (a) Low network-

delays; (b) Moderate network-delays; and (b) Severe network-delays.

III. R ESULTS A. Comparing Markovian and non-Markovian models 1) Service reliability of a two-server system: Let us compare predictions for the service reliability obtained using the non-Markovian characterization to those provided by the Markovian model in [11]. In our calculations we have assumed that the workload comprises m1 = 50 and m2 = 25 tasks, allocated at servers 1 and 2, respectively. The mean service time per task is 5 and 2 s for servers 1 and 2, respectively. We have also assumed that failure times follow exponential −1 distributions with means λ−1 f1 = 300 and λf2 = 150 s. Communication channels were assumed to be homogeneous and three cases have being considered: low, moderate and severe network-delays. In the low network-delay case, migrating a

task to the fastest server and processing the task takes, on average, less time than processing the task at the slowest server. In particular, for the low delay case the ratio between these two times was set to one half. For the moderate and severe network-delay cases the aforementioned ratio was set to one and five, respectively. Also, the mean transfer time of FN packets is 0.2, 0.4 and 1.0 s for the low, moderate and severe delay scenario, respectively. We have employed different stochastic models for the service and transfer times. The Markovian setting is represented in the Exponential model. In Model 1 service times follow exponential distributions while transfer times follow shifted

Low delay Moderate delay Severe delay

TABLE I O PTIMAL DTR POLICY FOR EACH MODEL AND NETWORK - DELAY CONDITION . I N ALL CASES L21 = 0.

0.8

R(L;S0)

0.6 0.4 0.2 0 50

30 40

20

30 20

L12, tasks

TABLE II

10

10 0

0

Low delay Moderate delay Severe delay Model Reliability L12 Reliability L12 Reliability L12 Exponential 0.637 28 0.598 24 0.423 7 Model 1 0.638 28 0.601 24 0.429 7 Model 2 0.639 28 0.601 24 0.429 7 Pareto r = ∞ 0.661 28 0.634 26 0.468 10 Pareto r = 50 0.644 29 0.616 26 0.426 9 Pareto r = 1 0.644 29 0.618 26 0.418 9 0.650 30 0.627 26 0.412 9 Pareto r = 0.1

L21, tasks

Fig. 2. Service reliability for the Pareto r = 1 model as a function of the DTR policy for different amounts of network-delays.

exponential distributions. In Model 2 both service and transfer times follow shifted exponential distributions. Both service and transfer times follow Pareto distributions for the Pareto r models. The parameter r in the Pareto models denotes the ratio between the variance of the Pareto distribution and its exponential approximation. For comparison, all distributions modeling the same random time have identical mean. Figure 1 shows the service reliability as a function of several DTR policies, for the three network-delay conditions considered. In these DTR policies the number of task reallocated from server 2 to 1, L21 , is 12 tasks (approximately 50% of the initial load is re-allocated from the second to the first node). It can be noted that the Markovian approximation for the service reliability shows a remarkable accuracy in the low network-delay case. In fact, the relative approximation error is below 1.6% in all cases, except for the Pareto r = ∞ model where the error is below 8.8%. As the mean transfer time increases, the Markovian approximation for the reliability looses its accuracy. For example, the relative approximation error is less that 10% when network-delays are moderate, while in the case of severe network-delays the Markovian approximation produces an unacceptable amount of error (about 120% error). Observe also that as the mean transfer times increase the service reliability decreases. In order to counterbalance the effect of network delays on the service reliability, the optimal DTR policy dictates that the number of tasks to migrate from server 1 to server 2 has to be reduced as compared to the case of low network delays. We solve now the optimization problem (2) to find the DTR policy with maximal service reliability for each network-delay condition. Figure 2 shows, for all the network-delay cases considered, the service reliability as a function of the DTR policies when Pareto r = 1 is the model to optimize. Optimization results for all the models and all the network-delay cases are listed in Table I. Note that, when the network-delay is low the Markovian characterization predicts a maximal reliability of 0.637, which is achieved by the policy L12 = 28, L21 = 0. Such policy is in fact the optimal policy for Model 1, Model 2 and Pareto r = ∞ only. If one executes the Markovian

S ERVICE RELIABILITY FOR DIFFERENT MODELS AND THEIR M ARKOVIAN APPROXIMATION . Initial load (m1 , . . . , m4 ) (160,0,0,0) (0,0,0,160) (40,40,40,40) (10,10,60,60) (160,0,0,0) (0,0,0,160) (40,40,40,40) (10,10,60,60)

Pareto Exponential r = ∞ r = 50 r = 1 r = 0.1 Low network-delays 0.274 0.236 0.229 0.229 0.235 0.351 0.302 0.292 0.292 0.292 0.575 0.537 0.532 0.532 0.530 0.622 0.590 0.586 0.585 0.586 Severe network-delays 0.069 0.055 0.051 0.051 0.058 0.093 0.070 0.065 0.062 0.074 0.432 0.401 0.401 0.412 0.373 0.412 0.367 0.361 0.365 0.345

optimal policy as the optimal DTR policy for the remaining Pareto models, the service reliability is reduced in about 1% with respect to its optimal value. Similarly, when networkdelays are either moderate or severe the policy devised by the Markovian approximation is optimal only for Models 1 and 2. For the remaining models, the service reliability is reduced in approximately 4% when the optimal policy devised by the Markovian approximation is executed. Let us discuss now the effect of the optimal DTR policies on the usage of the computing resources. If we consider the low network-delay case, optimal policies dictate that between 56 and 60% of the load initially allocated at the first server (slowest) have to be migrated to the second server, while the latter server must keep all its initial load. Note that, on average, server 2 processes its initial load in 50 s, and note also that, transferring 28 tasks from server 1 to server 2 takes 28 s. Consequently, the optimal task re-allocation is perceived by the second server as an instantaneous exchange of load. In addition, note that processing 53 tasks at server 2 takes 106 s, on average, while serving the remaining 22 tasks at server one takes 110 s, on average. Therefore, the optimal policy keeps both servers busy for approximately the same amount of time, thereby efficiently using the computing resources of the DCS. When network-delays are severe, computing resources cannot be utilized equally. In this case, optimal policies trade off between transfer times and utilization of the servers. 2) Service reliability of a multi-server system: We have also maximized the service reliability of a four-server DCS employing the algorithm presented in Section II-E. We have assumed that the workload comprises M = 160 and that failure times follow exponential distributions with mean failure −1 −1 −1 times λ−1 f1 = 300 s, λf2 = 100 s, λf3 = 150 s, and λf4 = 200

Est. pdf Log Normal Pareto Weibull

0.5

i,1

f W (t)

0.4

0.1

0 0

5

10

15

20

t, s (a) 2 Est. pdf Shft. Exp Shft. Gamma Pareto Weibull

1.8 1.6 1.4

i,2

1.2 1 0.8 0.6 0.4

B. Maximizing the service reliability of a real DCS

0.2 0 0

0.5

1

1.5

2

2.5

3

t, s (b) 0.64 Theoretical

0.62

Simulations Experimental

0.6

R(L;S0)

We have characterized experimentally the random times of a small-scale testbed DCS that uses the Internet to communicate its servers. A detailed description of the testbed is given in [11]. Figures 3(a) and (b) show the normalized histograms as well as fitted pdf for the service time of server 1 and the transfer time of tasks from server 2 to 1. The parameters of the fitted pdfs were estimated using maximum likelihood estimators. The estimated pdf for each random time was selected considering the total squared error between the normalized histogram and each fitted pdfs. From the experimental characterization, we have that: (i) service time at servers 1 and 2 follows lognormal distributions with means 4.858 and 2.357 s, respectively; (ii) task transfer times follow shifted gamma distributions with means Z 12 = 1.207 and Z 21 = 0.803 s; and (iii) FN packet transfer times follow shifted gamma distributions with means X 12 = 0.313 and X 21 = 0.145 s. Note that according to our classification of the network-delay, this values correspond to a low network-delay case. The initial workload and the failure times of the servers are free parameters of the system. The initial workload was set to m1 = 50 and m2 = 25 tasks and failure times were assumed to follow exponential distributions with means 300 and 150 s. The optimal DTR policy for the two-node testbed is L12 = 26 and L21 = 0 tasks and provides a service reliability of 0.6007. Figure 3(c) shows theoretical predictions, MC simulations as well as experimental results for the service reliability of the two-node testbed. Results show the case when the optimal re-allocation from server 2 to 1 is used (L21 = 0), while different number of tasks are migrated from the first to the second server. In both simulations and experiments, the service reliability is calculated by averaging failure or success outcomes. A total of 10000 and 500 independent realizations of each policy have been considered in computing simulation and experimental results, respectively. Fig. 3(c)

0.3

0.2

fZ (t)

s. The average service times were set to be 6 s, 2 s, 3 s, and 4 s for servers 1, 2, 3, and 4, respectively. The remaining parameters are the same as those in the two-server analysis. Table II lists the maximal service reliability obtained in cases of low and severe network delays. The service reliability was obtained through simulations and the values listed in Table II correspond to centers of 95% confidence intervals, for which the estimated service reliability will not differ from the true value by more than 0.001. For comparison, the column “Exponential” presents results obtained for which optimal policies devised using the Markovian model are executed. In these cases, the exponential approximation produces relative errors between 1 and 20%. It must be noted that the accuracy of the Markovian model in the low delay case has been reduced as compared to the two-server case, generating a maximum relative error of 16%. We comment that policies devised using Algorithm 1 achieve a service reliability within 70% of the optimal service reliability. The optimal values were computed using a MCbased exhaustive search over the number of tasks.

0.58 0.56 0.54 0.52 0.5 0.48 0

10

20

30

L12, tasks

40

50

(c)

(a) Normalized histogram and fitted pdf of: (a) Service time at server 1; and (b) Task transfer time from server 1 to 2. (c) Service reliability as a function of DTR policies. Fig. 3.

shows a remarkable agreement between simulations and the non-Markovian theoretical predictions. Experimental results show also a fairly good agreement with the theoretical curves, where the relative error between predictions and experiments is less than 7%. Note that if no task re-allocation is performed, the service reliability is reduced in approximately 15%. When a Markovian approximation is employed to devise the optimal DTR policy for the two-node testbed, the service reliability is reduced in 1%. IV. C ONCLUSIONS We have presented a novel and rigorous characterization for the service reliability of a DCS in the presence of non-Markovian communication delays. The non-Markovian

characterization for the reliability is a generalization of our Markovian model reported in [11]. The new representation for the service reliability provides insight about the effect of network-delays on the accuracy of a Markovian model for the reliability. Our results indicate that when the network delays are relatively large compared to service times, the error in estimating the reliability, as a result of falsely assuming exponentially distributed random delays, becomes significant, thereby necessitating the use of our non-Markovian model. To this end, our calculations show relative errors of approximately 120%. It must be mentioned that the accuracy provided by the non-Markovian model in predicting the service reliability comes at expense of more computations as compared to the Markovian model. In order to capture the memory associated with the nonexponential random times, we have introduced in our model a fundamental quantity, termed as the continuous-time age matrix. This matrix enables us to characterize recursively and analytically the service reliability of a DCS assuming a very general framework. Based on a two-server characterization for the reliability, we presented a scalable algorithm for devising DTR policies maximizing the reliability of multi-server DCSs. The mathematical framework developed in this paper for modeling DCSs is general and can be utilized to calculate other performance metrics, such as computing speed-up as well as statistics of queue-length of servers and sojourn time of workloads. ACKNOWLEDGMENT This work was supported by the Defense Threat Reduction Agency (Combating WMD Basic Research Program) and in part by National Science Foundation (award ANI-0312611). R EFERENCES [1] M. Trehel, C. Balayer, and A. Alloui, “Modeling load balancing inside groups using queuing theory,” in Proc. 10th Int. Conf. on Parallel and Distributed Computing System, New Orleans, LO, 1997. [2] S. Dhakal, M. Hayat, J. Pezoa, C. Yang, and D. Bader, “Dynamic load balancing in distributed systems in the presence of delays: A regeneration-theory approach,” IEEE Trans. Parallel and Dist. Systems, vol. 18, pp. 485–497, 2007. [3] S. Dhakal, M. Hayat, J. Pezoa, C. Abdallah, J. Birdwell, and J. Chiasson, “Load balancing in the presence of random node failure and recovery,” in Proc. IEEE IPDPS ’06, Rhodes, Greece, 2006. [4] Y.-S. Dai and G. Levitin, “Optimal resource allocation for maximizing performance and reliability in tree-structured grid services,” IEEE Trans. Reliability, vol. 56, pp. 444–453, 2007. [5] Y.-S. Dai, G. Levitin, and K. Trivedi, “Performance and reliability of tree-structured grid services considering data dependence and failure correlation,” IEEE Trans. Computers, vol. 56, pp. 925–936, 2007. [6] G. Attiya and Y. Hamam, “Reliability oriented task allocation in heterogeneous distributed computing systems,” in Proc. Ninth Int. Symp. Computers and Comms., 2004, pp. 68–73. [7] C.-I. Chen, “Task allocation and reallocation for fault tolerance in multicomputer systems,” Trans. Aerospace and Electronic Systems, vol. 30, pp. 1094–1104, 1994. [8] G. Attiya and Y. Hamam, “Task allocation for maximizing reliability of distributed systems: a simulated annealing approach,” Journal Parallel and Dist. Computing, vol. 66, pp. 1259–1266, 2006. [9] Y. Hamam and K. Hindi, “Assignment of program tasks to processors: a simulated annealing approach,” European J. of Op. Research, vol. 2000, pp. 509–513, 122.

[10] D. Vidyarthi and A. Tripathi, “Maximizing reliability of a distributed computing system with task allocation using simple genetic algorithm,” Journal of Systems Architecture, vol. 47, pp. 549–554, 2001. [11] J. Pezoa, S. Dhakal, and M. Hayat, “Decentralized load balancing for improving reliability in heterogeneous distributed systems,” in Submitted to IEEE Int. Conference on Parallel Processing 2009, 2009, available http://www.ece.unm.edu/lb.

A PPENDIX Sketch of proof of Theorem 1: First, the pdf of τa can be computed using:  X Y  1 − FWj (w; aj ) , (4) fτa (s) = fWj (w; aj ) j∈I

k∈I,k6=j

where I indicates a particular indexing of the pdfs listed in Assumption A1 and FWj (w; aj ) [correspondingly, fWj (w; aj )] is the cumulative distribution function [correspondingly, the pdf] of the jth random time that is parameterized by the age aj . Consider the initial system configuration S = (M, F, C, aM , aF , aC ). We can compute the service reliability conditioning on Z the regeneration time as follows: ∞

P{T (S) < ∞|τa = s}fτa (s) ds

R(S) =

(5)

0

To calculate the conditional probability in the integrand, we exploit the definition of the regeneration time to decompose the conditional probability according to all possible, disjoint regeneration events. That is, P2 P{T (S) < ∞|τa = s} = k=1 P{τa = Wk1 |τa = s}× P2 P P{T (S) < ∞|τa = s, τa = Wk1 } + j=1 j6=i P{τa = Xij | P2 τa = s}P{T (S) < ∞|τa = s, τa = Xij } + k=1 P{τa = Yk | P2 Pgi τa = s}P{T (S) < ∞|τa = s, τa = Yk } + i=1 j=1 P{τa = Zji |τa = s}P{T (S) < ∞|τa = s, τa = Zji }. It can be shown that upon the occurrence of a regeneration event, the random times emerging at the regeneration time satisfy Assumptions A1 and A2; therefore, the following observations hold: (i) a fresh copy of the underlying stochastic process emerges at τa but with a new initial age-dependent systemstate matrix; (ii) the emergent stochastic process is independent of the original process and satisfies assumptions A1 and A2; and (iii) the independence of the new process allows us to shift the time origin to t = s. For example, if the regeneration event is the service of a task at the first server, after applying the previous observations we can show that P{T (S) < ∞|τa = s, τa = W11 } = P{τa + T (S0 ) < ∞|τa = s, τa = W11 }, where T (S0 ) is the random time taken by the DCS emerging at the regeneration time to serve all its tasks when the initial configuration is S0 = (M0 , F0 , C0 , a0M, a0F , a0C ).This new m1 − 1 initial configuration is precisely M0 = , F0 = F, m2   0 C0 = C, a0M = , a0F = aF + s, and a0C = aC + s. aM2 + s By exploiting the independence between the emergent and the original process, and recalling that τa = s, we obtain: P{τa + T (S0 ) < ∞|τa = s, τa = W11 } = P{T (S0 ) < ∞}. We obtain (1) after applying observations (i)–(iii) to all the remaining regeneration events. 2

Suggest Documents