Fault-Tolerant Two-Stage Open Queuing Systems With ... - IEEE Xplore

3 downloads 0 Views 471KB Size Report
The author is with the Computer Engineering Program, Middle East Tech- .... a failure, the failed server is repaired with mean repair rates η1 and η2. The fraction ...
IEEE COMMUNICATIONS LETTERS, VOL. 18, NO. 9, SEPTEMBER 2014

1523

Fault-Tolerant Two-Stage Open Queuing Systems With Server Failures at Both Stages Enver Ever, Member, IEEE

Abstract—Two-stage open queuing systems are used to model and evaluate interaction between two systems in computer and communication networks. Application areas include two-stage internetworking mechanisms, memory servers, and interaction of wireless communication systems. Realistic features, such as finite capacities, feedback from one stage to another, and failures of servers usually complicate analytic solutions. This study presents a new analytical model and solution approach for two-stage open queuing systems with feedback, blocking, and multiple servers, as well as failures at both stages. Unlike the existing studies, systems considered in both stages can be fault tolerant. Numerical results presented comparatively with simulation show that the new approach performs well, in terms of accuracy and computation time. Index Terms—Fault tolerant systems, performability, open queuing networks.

I. I NTRODUCTION

T

WO STAGE open queuing systems are used to model computer and communication configurations in which the jobs receive service in series. The analysis of these systems has been of interest for communication and network engineers for many years to model the interaction between two systems, such as processor-I/O devices, mass storage devices, and ConcentratorProcessor combinations. However the main areas of interest are computer networks and communications. One example for the use of two stage queuing systems is optimization of scheduling algorithms for load balanced switches [1]. With the recent developments in the field of wireless systems, many researchers are interested in performance modeling of wireless communication systems. One example for such studies is [2] where two dimensional continuous time Markov chains are employed for multi-rate schemes. The importance of analytical models is emphasized in [3] as well for wireless fading channels. Two dimensional Markov chain analysis is performed for Markov channels. A generalized Pollaczek-Kinchin formula is derived instead of employing matrix-geometric method for steady state solution. In [4] a two stage open queuing system is employed to model cellular network-wireless local area network interaction for new vertical hand-off schemes. In these studies, the interaction between two systems is not considered in presence of failures for both stages since the existing solution approaches present significant limitations. However networks in general, and the communication systems in particular experience a number of challenges to their

Manuscript received February 10, 2014; revised June 8, 2014 and June 27, 2014; accepted June 29, 2014. Date of publication July 8, 2014; date of current version September 8, 2014. The associate editor coordinating the review of this paper and approving it for publication was B. Bellalta. The author is with the Computer Engineering Program, Middle East Technical University, Northern Cyprus Campus, Kalkanlı, Güzelyurt, Mersin 10, Turkey (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/LCOMM.2014.2336841

operation including failures of switching nodes and/or communication links, or any sort of critical function units. Therefore it is imperative to have a framework and methodology to study the performance in presence of failures. One typical scenario where interaction of two systems should be considered with failures is hand-off between heterogeneous wireless cellular systems in presence of channel failures [3], as well as hardware/software related problems [5]. Performability modeling is a critical issue to be considered for fault tolerant systems and defined as the correct methodology to analyze the performance degradations caused by failures. In other words, the changes in a system’s ability to serve incoming requests should be taken into account. The importance of performability evaluation is emphasized in [5] and [6]. A set theoretic method is proposed in [6] and two dimensional Markov chains are proposed to model stand alone wireless communication systems in [5]. According to Burke’s theorem, in an M/M/m or M/M/∞ system with Poisson arrival rate λ, the departure instants of the customers constitute a Poisson process with intensity λ, and it is possible to have a product form expression for the joint state probability of the system formed by two queues in case they are both infinite [7]. However for tandem systems with finite queuing capacities and feedback, the irregularities introduced makes it difficult to have product form solutions. Therefore two state variables are introduced in order to change the transition rates according to the status of both of the queues in studies such as [8], [9]. When two dimensional processes are used to model a system with two state variables, spectral expansion and matrix-geometric methods provide efficient and accurate computations [8], [9]. Let the state variables on horizontal and vertical dimensions of a lattice be I(t) and J(t) respectively. I(t) specifies the size of the matrix R for matrix-geometric solution method and number of eigenvalues, eigenvectors involved in spectral expansion method where J(t) specifies the number of sub-matrices in matrix-geometric approach and number of balance equations involved in spectral expansion. The limitations introduced by I(t) are more significant since it may cause ill-conditioned matrices [9]. These limitations are more evident when both of the state variables are used to represent the number of jobs in two different systems. Dominant eigenvalues method is employed in [9] in order to avoid the ill-conditioned matrices. Although this approach works well for loaded systems, for lower arrival rates the accuracy of the approach is not in desirable levels. Solution approaches are presented in [10] and [11], to avoid ill-conditioned matrices, however in these studies only the second stage is fault tolerant and the first stage is assumed to be reliable. In this study the operative states of both stages are considered together with the number of jobs at each stage. It is possible to represent the number of jobs together with operative states using I(t), and to solve the system using matrix-geometric or spectral expansion methods, however this would severely limit the capacity and the number of servers that can be considered since the size of I(t) would become the number of operative states times the queuing capacity.

1089-7798 © 2014 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

1524

IEEE COMMUNICATIONS LETTERS, VOL. 18, NO. 9, SEPTEMBER 2014

Fig. 1. Two stage fault tolerant open queuing system with multiple servers.

Two stage open queuing systems where both stages are fault tolerant are not considered in existing studies. In this study a new approach is presented where the operative states of servers at stage one and two are considered together with a two dimensional Markov process. Two state variables are defined for each operative state to represent the number of operative servers in stages one and two. The performance measure of interest is then calculated for each of the operative states. Spectral expansion method is employed for the solution of two dimensional Markov processes involved. Numerical results and CPU times are presented comparatively with the results obtained from simulation to validate the new approach. II. T WO S TAGE O PEN Q UEUING S YSTEM The queuing capacities of the two stage open queuing system considered are denoted as K1 and K2 for stages one and two respectively. L1 and L2 represent the maximum capacities which include the number of servers in each state and the queuing capacities. S1 and S2 are the numbers of servers in each stage, therefore L1 = S1 + K1 and L2 = S2 + K2 . σ1 and σ2 represent the Poisson arrivals to stages one and two. Servers at both stages are homogeneous where service times are exponentially distributed with rates μ1 and μ2 . Operative periods are exponentially distributed with failure rates ξ1 and ξ2 for the servers in stage one and two respectively. Following a failure, the failed server is repaired with mean repair rates η1 and η2 . The fraction of jobs leaving the systems is quite important especially for the analysis of interaction between two stages. θ1 is used to represent the fraction of jobs leaving the system after receiving service from stage one, and θ2 is used to represent the feedback from stage two to stage one. Fig. 1 shows the queuing system under study. III. M ODELING AND S OLUTION A PPROACH Let us consider a two dimensional Markov process for the operative states of the overall system on a finite lattice strip. Let’s define this Markov process for the availability model as XA = {I(t), J(t); t ≥ 0}. The state space of this process is ({0, 1, . . . , S2 } × {0, 1, . . . , S1 }). The steady state probabilities of this Markov process can be defined as Ps2 ,s1 where s2 , and s1 represent the numbers of available servers for stages two and one respectively. The possible transitions in XA are defined in terms of ξ1 , ξ2 , η1 , η2 and are shown in Fig. 2. Single repair facility is assumed for each stage, however it is possible to modify XA for various repair policies. Using the spectral expansion method, the sizes of transition matrices employed for steady state solution of XA are (S2 + 1) × (S2 + 1). The process XA evolves with the following instantaneous transitions: Aj : Purely lateral transition rates, caused by a change in operative states of stage two, Bj : One-step upward transition

Fig. 2.

Operative states of the system.

rates, caused by a repair in stage one, Cj : One-step downward transition rates, caused by failure of one of the servers at stage one. The transition matrices A, B, and C for XA are ⎞ ⎛ 0 η2 0 ··· 0 0 ⎜ ξ2 0 η2 ··· 0 0⎟ ⎟ ⎜ ⎜ 0 2ξ2 0 η2 ··· 0 ⎟ (1) Aj = A = ⎜ . ⎟, .. .. ⎟ ⎜ . . ··· . 0⎠ ⎝ . 0 .. 0 . 0 S 2 ξ2 0 η 2 ⎞ ⎛ η1 0 0 ··· 0 ⎜ 0 η1 0 · · · 0 ⎟ ⎟ ⎜ .. Bj = B = ⎜ (2) . ··· 0 ⎟ ⎟, ⎜0 0 ⎠ ⎝ 0 0 0 η1 0 0 0 · · · 0 η1 ⎞ ⎛ jξ1 0 0 ··· 0 ⎜ 0 jξ1 0 · · · 0 ⎟ ⎟ ⎜ 0 0 jξ1 · · · 0 ⎟ . (3) Cj = C = ⎜ ⎟ ⎜ .. ⎠ ⎝ . 0 0 0 0 0 0 · · · 0 jξ1 The performance model used for composite performability measures is similar to XA and the two dimensional processes presented in [8]–[11]. However, in this study both of the stages considered are fault tolerant and the performance model provides performance measures for each state in XA (Fig. 2). Therefore the sizes of the two dimensional processes where the numbers of jobs in each stage are presented using two state variables change depending on the numbers of available servers. Let’s define the two dimensional Markov processes for performance model as Xs 2 ,s1 where s2 = 0, 1, 2, . . . , S2 , and s1 = 0, 1, 2, . . . , S1 . The state spaces of these processes are {0, 1, . . . (K2 + s2 )} × {0, 1, . . . (K1 + s1 )}. The steady state probabilities of each Xs 2 ,s1 can be defined  as Pi,j where i, and j represent the number of jobs at stages two and one respectively when there are s1 and s2 numbers of servers at first and second stages. The possible transitions in two-dimensional processes Xs 2 ,s1 are defined in terms of σ1 , σ2 , μ1 , μ2 , θ1 , θ2 , s1 , and s2 and are shown in Fig. 3. Each two dimensional Markov process Xs 2 ,s1 evolves with instantaneous transition matrices of size (K2 + s2 + 1) × (K2 + s2 + 1). Aj is the matrix for purely lateral transition rates, caused by job

EVER: FAULT-TOLERANT TWO-STAGE OPEN QUEUING SYSTEM WITH SERVER FAILURES AT BOTH STAGES

1525

ative state presented in XA . Once the state probabilities are known for each operative state of the servers, a performance measures of interest is computed by using spectral expansion method on Xs 2 ,s1 . Let us define the state numbers of XA and each Xs 2 ,s1 from zero to H2 and to H1 in horizontal and vertical dimensions (I(t), J(t)) respectively. In spectral expansion method, the state probabilities in a row can be defined as v j = (P0,j , P1,j , . . . , PH2 ,j ) ,

Fig. 3.

Performance model of the two stage system.

arrivals or departures at stage two. Bj is for one-step upward transition rates, caused by a job arrival at the first stage, and Cj is for one-step downward transition rates, caused by the departure of a serviced job from the first stage. The transition matrices Aj , Bj , and Cj for each Xs 2 ,s1 are ⎛ 0 σ2 0 ··· 0⎞ ⎜ ⎜ ⎜ Aj =⎜ ⎜ ⎝

min(i,s2 )μ2 (1−θ2 ) 0 . . . 0



0

··· .. .

σ2

(4)

σ1

0

···

0

.

0

0

0

···



min(j,s1 )μ1 θ1 min(j,s1 )μ1 (1−θ1 ) 0 · · · .. . 0 min(j,s1 )μ1 θ1 0 . . .

0 0

..

.

0

0⎟

0

0⎟

···



⎟, 0⎟

0 .. .

.

0

0⎞

0

0

⎟ ⎠

0 min(i,s2 )μ2 θ2 σ1

0



⎟ ⎟ ⎟. ⎟ min(j,s1 )μ1 (1−θ1 )⎠

(5)

0

(6)

min(j,s1 )μ1θ1

(7)

Certain diagonal matrices are defined for A, B, and C, (DjA , DjB , DjC ) [8], and the balance equations are given as  (8) v 0 D0A + D0B = v 0 A0 + v 1 C1 ,  A B C v j Dj + Dj + Dj = v j−1 Bj−1 + v j Aj + v j+1 Cj+1 , 1 ≤ j ≤ M − 1, (9)  A B C v j D + D + D = v j−1 B + v j A + v j+1 C, M ≤ j < H1 , (10) A C v H 1 [D + D ] = v H 1 −1 B + v H 1 A. (11) Using normalization equation and characteristic matrix polynomials, the steady state probabilities are expressed as Pi,j =

H2

+1 al ψl (i)λj−M +bl φl (i)βlH1 −j , M −1 ≤ j ≤ H1 . l

l=0

(12)

0

min(i,s2 )μ2 (1−θ2 ) 0 .. .. . . 0 . . . 0 min(i,s2 )μ2 (1−θ2 ) 0

σ1 0 ··· ⎜min(i,s2 )μ2 θ2 ⎜ σ1 0 0 min(i,s2 )μ2 θ2 ⎜ . Bj =⎜ . ⎜ . 0 min(i,s2 )μ2 θ2 σ1 ⎜ . .. ⎝ .

⎜ ⎜ Cj =⎜ ⎜ ⎝

⎟ ⎟ · · ·⎟, ⎟ ⎟ σ2⎠

0 ≤ j ≤ H1 .

For Markov processes considered, λl and βl are H2 + 1 eigenvalues, each that are strictly inside the unit circle, and al , bl are arbitrary constants which can be scalar or complex-conjugate where 0 ≤ l ≤ H2 . The details of spectral expansion method for finite capacity queues can be found in [8]. The state probabilities obtained from Xs 2 ,s1 can be used to calculate various performance measures such as mean queue length (MQL). The computed performance measures are then used as the reward rates [5], together with the state probabilities obtained from availability model (XA ). Once we have the state probabilities from XA and a reward rate for each state of XA from the corresponding Xs 2 ,s1 , it is possible to compute the composite performability measures for two stage open queuing networks with multiple servers subject to failures at both stages. Let’s define the reward rate, E(j) computed for lattice Xs 2 ,s1 as E(j)s2 ,s1 . Using the computed reward rates together with the probabilities obtained from XA , the E(j) of the overall system can be calculated as E(j) =

S2

S1

(E(j)k,h × Pk,h ) .

(13)

k=0 h=0

Please note that min(i, s2 )μ2 (1 − θ2 ) represents the departure rate of served jobs from the second stage, where min(i, s2 ) specifies the minimum value of the number of servers available in stage two and number of jobs in the second stage. The transition rate matrices do not depend on j for j ≥ M , where M is an integer threshold value [8]. For XA , M = S1 , since the downward transitions are always dependent on the number of servers at stage one. For Xs 2 ,s1 , M = s1 , since the downward transitions do not change if the number of jobs in the system is greater than the number of operative servers. Therefore, the transition matrix C is same as Cj where j = s1 . The solution approach considered in this study uses spectral expansion method to find the state probabilities of each oper-

The methods proposed in [9]–[11] also compute composite performability measures with some approximations. However these approaches are not able to consider fault tolerant systems in both stages. IV. N UMERICAL R ESULTS The results obtained from the analytical model are presented together with the results from a discrete event simulation software written in C++ language and validated to simulate the actual system. The results obtained from the simulation runs are within the confidence interval of 5% with a confidence level of 95% [7]. The simulation runs are stopped at the end of an

1526

IEEE COMMUNICATIONS LETTERS, VOL. 18, NO. 9, SEPTEMBER 2014

Fig. 4. M QL1 , and M QL2 values for systems with various numbers of servers prone to failures.

Fig. 5. Blocking probabilities for systems with various numbers of servers prone to failures. TABLE I ACCURACY OF THE N EW A PPROACH AND THE CPU T IMES

of servers, although the new analytical approach requires additional computations since the size of state space to be handled increases as well as the number of times spectral expansion method is called, it is evidently superior to the simulation. The computational complexity for computing eigenvalues/ eigenvectors employed in spectral expansion solution is O(N 3 ) [8], where N is the size of I(t). The new approach proposed in this paper starts by solving each Xs 2 ,s1 for each of the operative states. The complexity of this step is O(S1 S2 L32 ). Then a performance measure is computed for all Xs 2 ,s1 with complexity O(S1 S2 L1 L2 ) and finally, XA is also solved for operative state probabilities with complexity O(S23 ). Considering the definitions of L1 and L2 , the complexity of the new approach can be expressed as O(S1 S2 L2 max(L1 , L22 )). V. C ONCLUSION The main contribution of the new approach is its ability to consider fault tolerant systems in both stages. The method is particularly useful for modeling interaction between fault tolerant systems such as integration of wireless communication systems, memory servers, and various types of internetworking units. Unlike the previous solution approaches, the method presented in this paper does not have to make an assumption that at least one of the systems in interaction never experiences failures. The discrepancies between the results obtained by the new solution approach and simulation are less than 5%, and the solution approach is efficient in terms of computation time. To the best of our knowledge this is the first study where two dimensional Markov chains are employed to represent the operative states of two different stages together. One main advantage of using such an approach is the ability to incorporate different repair strategies. R EFERENCES

event, as soon as the desired level of precision is reached for all the computations. The desired level of precision is less than the confidence interval which is the maximum allowable deviation from the quantity to be estimated at a given confidence level. In Figs. 4 and 5, M QL values and blocking probabilities (P B) are presented for stages one and two as functions of σ1 for various numbers of servers at first and second stages. The parameters used are K1 = 100, K2 = 10, μ1 = 5, μ2 = 2, ξ1 = ξ2 = 0.001, η1 = η2 = 0.5, θ1 = 0.5, θ2 = 0.5, and σ2 = 0.5. In Figs. 4 and 5 the maximum discrepancies between the simulation and the new approach are less than 4.3%, 4.7%, 4.8% and 3% for M QL1 , M QL2 , P B1 , and P B2 values respectively. Numerical results are presented for the same parameters unless it is stated otherwise in Table I. Results in Table I also confirm the accuracy of the new approach, since the discrepancies are less than 5% which is a well accepted confidence interval for the simulation [10], [11]. The results in Table I also show the computation times of simulation and new approach which are obtained using a computer with 1.73 GHz Intel(R)Core(TM) i7 CPU, 6 GB RAM and NAG library. The results presented for computation times show the efficiency of the new approach. Especially for systems with high numbers

[1] B. Hu and K. L. Yeung, “Feedback-based scheduling for load-balanced two-stage switches,” IEEE/ACM Trans. Netw., vol. 18, no. 4, pp. 1077– 1090, Aug. 2010. [2] N. Gunaseelan, L. Lingjia, J. Chamberland, and G. H. Huff, “Performance analysis of wireless hybrid-ARQ systems with delay-sensitive traffic,” IEEE Trans. Commun., vol. 58, no. 4, pp. 1262–1272, Apr. 2010. [3] L. Huang and T. T. Zhu, “Generalized Pollaczek-Khinchin formula for Markov channels,” IEEE Trans. Commun., vol. 61, no. 8, pp. 3530–3540, Aug. 2013. [4] X. Weiwei and S. Lianfeng, “Modeling and analysis of hybrid cellular/ WLAN systems with integrated service-based vertical handoff schemes,” IEICE Trans. Commun., vol. E92-B, no. 6, pp. 2032–2043, Jun. 2009. [5] K. S. Trivedi, S. Dharmaraja, and X. Ma, “Analytic modeling of handoffs in wireless cellular networks,” Inf. Sci., vol. 148, no. 1–4, pp. 155–166, Dec. 2002. [6] S. V. Dhople, Y. C. Chen, and A. D. Dominguez-Garcia, “A settheoretic method for parametric uncertainty analysis in Markov reliability and reward models,” IEEE Trans. Rel., vol. 62, no. 3, pp. 658–669, Sep. 2013. [7] E. Gelenbe and G. Pujolle, Introduction to Queueing Networks, 2nd ed. Chichester, U.K.: Wiley, 1998. [8] R. Chakka, “Spectral expansion solution for some finite capacity queues,” Ann. Oper. Res., vol. 79, no. 1–4, pp. 27–44, May 1998. [9] I. Mitrani, “Approximate solutions for heavily loaded Markov-modulated queues,” Perform. Eval., vol. 62, no. 1–4, pp. 117–131, Oct. 2005. [10] O. Gemikonakli, E. Ever, and A. Kocyigit, “Approximate solution for two stage open networks with Markov-modulated queues minimizing the state space explosion problem,” J. Comput. Appl. Math., vol. 223, no. 1, pp. 519–533, Jan. 2009. [11] E. Ever, O. Gemikonakli, A. Kocyigit, and E. Gemikonakli, “A hybrid approach to minimize state space explosion problem for the solution of two stage tandem queues,” J. Netw. Comput. Appl., vol. 36, no. 2, pp. 908–926, Mar. 2013.