Empirical Reliability Modeling of Transaction Oriented ...

1 downloads 0 Views 414KB Size Report
Dharmendra Prasad Mahato and Ravi Shankar Singh. Department of Computer Science and Engineering,. Indian Institute of Technology (BHU),Varanasi, India ...
1

Empirical Reliability Modeling of Transaction Oriented Autonomic Grid Service Dharmendra Prasad Mahato and Ravi Shankar Singh Department of Computer Science and Engineering, Indian Institute of Technology (BHU),Varanasi, India 221 005. E-mail: (dpmahato.rs.cse13, ravi.cse)@iitbhu.ac.in This paper presents reliability modeling of transaction oriented autonomic grid service. Transaction oriented autonomic grid technology is aimed at providing reliable services for users by hiding the complexity of the service and protecting the system from various failures. The reliability in these systems is greatly affected by the occurrence of failures. A coloured petri net model, CPN-TOGS (CPN-Transaction Oriented Grid Service), is presented in this paper for analyzing the empirical reliability in transaction oriented autonomic grid service. The model maintains the recovery of the failed processes by using both local level and replicated level recovery.

Keywords: Reliability; Transaction Oriented Grid Service. 1. Introduction Grid computing provides a service oriented infrastructure with standardized protocols and services for enabling access to and for sharing of geographically distributed hardware, software and information resources [1], [2]. The grid technology is limited not only to scientific computing but also it is expanding in large scale to business applications by providing autonomous environment and reducing the users’ interventions as much as possible [3]. In order to create reliable and real-time applications with multiple transparencies on location, replica, concurrency and failure, transaction processing and its management in grid technology are needed as effective means by sharing a large number of resources among different organizations [4], [5], [6], [7], [8], [9]. To make the system consistent and free from the various failures, today’s grid technology provides autonomic environment with some recovery mechanisms in a selfconfiguring, self-optimizing, self-healing, and self-protecting fashion so that complexity associated with the grid can be reduced and the system can be made easy to use [10]. Transaction oriented autonomic grid service, which is a set of operations executing on heterogeneous grid sites, is responsible for ensuring the reliable and real-time execution of inherently business applications [11]. But the handling of real time transactions in autonomic grid is a challenging task, because it is an important means to protect the systems

2

from various failures and to complete the task by the deadline [10]. Transaction oriented autonomic grid service reliability is the probability of all of the subtasks (short-lived or long-lived) involved in the considered service to be executed successfully [1], [12], [13]. This paper presents CPN-TOGS model for the empirical evaluation of reliability of transaction oriented autonomic grid service. The model also describes two types of recovery of failed processes: local level recovery and replicated level recovery. The remainder of this paper is organized as follows. Section 2 reviews related research and discusses existing approaches to grid service and transaction oriented grid service reliability modeling and analysis. In section 3 some background information is presented. Section 4 presents the proposed CPN-TOGS model in detail. Section 5 presents simulation setup and the results in different scenarios. Section 6 concludes the paper. 2. Related Work Although the modeling and analysis of grid service reliability has attracted a lot of research attention, reliability modeling and analysis in transaction oriented autonomic grid service has not been sufficiently studied. Azgomi et al. have presented a model to analyze grid service reliability [14] where task scheduling by RMS and the task execution within grid resources were modeled using coloured Petri nets (CPNs). To improve the grid service reliability, Guo et al. presented a fault recovery technique in grid systems and conducted an empirical modeling and analysis of grid service reliability with fault recovery [2]. Later on, Guo et al. presented a multi-objective task scheduling model and proposed an ant colony optimization (ACO) algorithm to solve the fault recovery to enhance the grid service reliability [1]. For effective and accurate assessment of grid service reliability Doguc et al. presented a new method based on data-mining algorithm K2 to discover the grid system data using Bayesian networks (BN) [15]. Dai et al. again presented a virtual approach for modeling grid services and obtained grid service reliability using Morkov models, Queuing theory, and the graph theory [16]. Levitin and Dai introduced grid service reliability, performance indices and a fast numerical algorithm for the evaluation were presented [12]. In [17,18], Dai et al. presented an algorithm to achieve greatest possible expected service performance or reliability, based on graph theory, Bayesian approach, and the evolutionary optimization approach and presented a virtual tree-structured model. In [19, 20], Dai et al. presented models for reliability and performance analysis of tree-structured grid service considering data dependence and failure correlation. Recovery of failed process also plays an important role in analyzing the reliability of the grid systems and it is found that fault recovery studied in most

3

related research is achieved by migration mechanism, i.e., when a failure occurs on a grid node, the state information is migrated to another node on which the subtask execution is resumed from the interrupted point [9]. Heddaya and Helal analyzed the impact of local fault recovery on the service reliability of distributed computing system where the service is restarted locally, thus, the global communication is minimized. However, the research on the impact of local fault and migration (replicated) fault recovery on the transaction oriented autonomic grid service reliability is very scarce [21]. 3. Background Grid service reliability Rgs is defined as the probability that a set of programs contained by a grid service can be completed successfully on multiple resources and links [2]. Considering the situations of node failure, communication link failure, and software failure in the reliability analysis, there are some key assumptions that must be considered such as, 1) The user jobs are generated with exponential distribution, 2) When a service request reaches the RMS, the RMS reply immediately, 3) Each node can execute more than one task at a time as a node consists of a cluster of homogeneous or heterogeneous processors, 4) The failures in different elements are independent, 5) The failure processes of nodes are modeled by Poisson’s distribution. When a service S is received by the RMS, the service S is divided into m subtasks and these subtasks are assigned to w nodes. The processing time of subtask i (i = 1, 2,…,m) on node k is 𝑐 𝜏𝑖𝑘 = 𝑖 , (1) 𝑠𝑘

where ci is the computational complexity of subtask i and sk is the processing capability of node k (k = 1, 2,…,w) [2]. When 𝜆𝑘 be the failure intensity of node k, the probability that the node k functions without any failure during the processing time of subtask i is 𝑝𝑖𝑘 = 𝑒 −𝜆 𝑘 𝜏𝑖𝑘 (2) where 𝜏𝑖𝑘 is given from equation (1). When subtask i is being executed, the communication time between RMS and node k is as 𝑙𝑖𝑘 = 𝑒 𝑎 𝑖𝑘 𝑦 𝑘 , (3) where aik be the amount of data exchanged between RMS and node k when subtask i is being executed and yk be the mean speed of communication link between RMS and the node k [2].

4

When 𝜖𝑘 be the failure intensity of communication link between RMS and the node k, the probability that the communication link functions without any failure is 𝑞𝑖𝑘 = 𝑒 (−𝜖 𝑘 𝑙𝑖𝑘 ) , (4) where lik is given by the equation (3) [2]. When 𝜆𝑝𝑟𝑜𝑔 be the failure intensity of the program running on the node k where the subtask i is being executed, the probability that the program on node k functions without any failure is 𝑟𝑖𝑘 = 𝑒 (−𝜆 𝑝𝑟𝑜𝑔 𝜏𝑖𝑘 ) , (5) Now the probability that the subtask i is successfully completed by node k or the reliability of subtask i is given by 𝑅𝑖𝑘 = 𝑝𝑖𝑘 𝑞𝑖𝑘 𝑟𝑖𝑘 , (6) For improvement in grid service reliability, a subtask is assigned to several nodes for parallel execution. When a node completes a subtask successfully, it returns the output to the RMS. If Ni be the node set to which subtask i is assigned, then the reliability of subtask i, 𝑅𝑆𝑢𝑏 𝑖 is given by 𝑅𝑆𝑢𝑏 𝑖 = 1 − 𝑘∈𝑁𝑖 (1 − 𝑅𝑖𝑘 ) , (7) where Rik is from equation (6). When output of all the subtasks are received by the RMS, the grid service is said to be completed successfully. Hence, the grid service reliability (Rgs) is given by 𝑅𝑔𝑠 = 𝑚 (8) 𝑖=1 [1 − 𝑘 ∈𝑁𝑖 (1 − 𝑅𝑖𝑘 )] , But transaction processing in grid is not easy to achieve because, grid services are autonomous and loosely coupled and they do not allow to be locked by outside application [5]. B. Transaction Oriented Autonomic Grid Service Availability Transaction oriented autonomic grid service availability Ai can be defined as the probability that at any time a required minimum fraction of transactions are finishing within a given deadline [22]. It can be derived in terms of mean time to failure (MTTFi) and mean time to repair (MTTRi) for ith component used in a transaction processing in grid environment. Therefore, 𝑀𝑇𝑇𝐹 𝑖 𝐴𝑖 = , (9) 𝑀𝑇𝑇𝐹 𝑖 +𝑀𝑇𝑇𝑅 𝑖

C. Mean Transaction Oriented Autonomic Grid Service Delay Mean transaction oriented autonomic grid service delay D can be defined as the sum of all non-overlapped delays from service initiation time to service result time. If Sit is the service initiation time, Stt is the service transmission

5

time, Sd is the service propagation delay, Spt is the service processing time, and Srdt is the service result display time, then 𝐷 = 𝑆𝑖𝑡 + 𝑆𝑡𝑡 + 𝑆𝑑 + 𝑆𝑝𝑡 + 𝑆𝑟𝑑𝑡 ,

(10)

4. CPN-TOGS Model 4.1. Reliability Analysis of Transaction Oriented Grid Service The probability that there are exactly f failures in component i in time interval t and 𝜆𝑡𝑔 being the mean number of failures per unit time is given by 𝑅𝑡𝑜𝑔𝑠 𝑖 (𝑓, 𝑡) =

(𝜆 𝑡𝑔 𝑡)𝑓 ∗𝑒

−𝜆 𝑡𝑔 𝑡

𝑓!

∗ 𝐴𝑖 ,

(11)

where Ai is the availability of component i from equation (9). When t is replaced by mean transaction oriented grid service Delay Di of component i, then the equation (11) reduces to as 𝑅𝑡𝑜𝑔𝑠 𝑖 (𝑓, 𝐷𝑖 ) =

(𝜆 𝑡𝑔 𝑖 𝐷 𝑖 )𝑓 ∗𝑒

−𝜆 𝑡𝑔 𝐷 𝑖 𝑖

𝑓!

∗ 𝐴𝑖 ,

(12)

where 𝜆𝑡𝑔 𝑖 is the expected number of failures per unit time and it is equal to 1

𝑀𝑇𝑇𝐹 𝑖

. In equation (11), if f = 0, then 𝑅𝑡𝑜𝑔𝑠 𝑖 0, 𝐷𝑖 = 𝑒 −𝜆 𝑡𝑔 𝑖 𝐷𝑖 ∗ 𝐴𝑖 ,

(13)

For improvement in transaction oriented grid service reliability, a subtask is assigned to several nodes for parallel execution. When a node completes a subtask successfully, it returns the output to the RMS. If Ni be the node set to which subtask i is assigned, then the reliability of subtask i, 𝑅𝑡𝑜𝑔𝑠 𝑖 is given by 𝑅𝑡𝑜𝑔𝑠 𝑖 𝑆𝑢𝑏𝑖 = 1 −

𝑘∈𝑁𝑖 (1

− 𝑅𝑡𝑜𝑔𝑠 𝑖 (𝑓, 𝐷𝑖 )),

(14)

When output of all the subtasks are received by the RMS, the grid service is said to be completed successfully. Hence, the transaction oriented grid service reliability (Rtogs) is given as 𝑅𝑡𝑜𝑔𝑠 =

𝑚 𝑖=1[1



𝑘 ∈𝑁𝑖 (1

− 𝑅𝑡𝑜𝑔𝑠 𝑖 (𝑓, 𝐷𝑖 ))]

(15)

6

5. Simulation and Results 6. 5.1 Simulation set up In simulation set up, the CPN-TOGS model is categorized into four categories using CPN tool.  Category 1: In this category, the model has no transaction management and the recovery of failed processes are done on local node.  Category 2: In this category, the model has no transaction management and the recovery of failed processes are done on replicated nodes.  Category 3: In this category, the model has transaction management and the recovery of failed processes are done on local node.  Category 4: In this category, the model has transaction management and the recovery of failed processes are done on replicated nodes. 5.2 Results Reliability in di erent categories 0

100

200

300

400

500

600

1

1 Category 4 Category 3 Category 2 Category 1

0.98

0.98

0.96

0.94 0.92

Reliabilit y

Reliabilit y

0.96 0.94

0.92 0.9 0.9

0.88 0.86

0.88 0

100

200

300 Time in ms

400

500

600

Fig. 1: Reliability in different categories 1) Reliability in different categories: In Fig.1, the different values of reliability with respect to time (in milli seconds) has been shown. when the range of deadline be from 150 to 500 milli seconds, we got that the grid service with transaction management having local recovery (Category 3) had the highest value of reliability and the grid service without transaction management having local recovery (Category 1) had the lowest value of reliability.When the range of deadline be from 150 to 500 milli seconds, the grid service with transaction management having replicated recovery (Category 4) had the highest value

7

of reliability and the Category 1 had the lowest value of reliability. In the table I, the reliability of different categories are given, when deadline ranges from 150 to 500 ms. 2) Reliability Analysis with local recovery and replicated recovery: Here we conclude that Category 3 is better option than Category 1. In Fig. 2, the analysis is shown to be done having local level recovery while Fig. 3 demonstrates the analysis having replicated level recovery. Here it is concluded that Category 4 is better than Category 2, provided that the deadline be within 300 ms and if the deadline is above this 300 mark, both of the categories are better options to use. 3) Reliability Analysis with and without Transaction Management: When transaction management is used, Category 4 is better than Category 3 (as we see in Fig. 4). According to the Fig. 5, it is found that at all possible values of deadline, the reliability in grid service (without transaction management) which uses replicated recovery (Category 2) is more than that in Category 1. Category 1 Versus Category 3 0

100

200

300

400

Category 2 Versus Category 4 500

0

600

1 Category 3 Category 1

0.98

0.98

0.96

0.96

0.96

0.94

0.94

0.92

0.92

0.92

0.9

0.9

0.9

0.88

0.88

0.88

0.98

100

200

300

400

500

600

1

1

1 Category 4 Category 2

0.98

0.94 0.94 0.92

Reliabilit y

Reliabilit y

Reliabilit y

Reliabilit y

0.96

0.9

0

100

200

300 Time in ms

400

500

600

0.88 0.86 0

100

200

300 Time in ms

400

500

600

Fig.2: Reliability with local recovery Fig.3: Reliability with replicated recovery

8 Category 3 Versus Category 4 0

100

200

300

400

Category 1 Versus Category 2 500

600

1

0

1 Category 4 Category 3

0.98

100

200

300

400

500

600

1

1 Category 2 Category 1

0.98

0.98

0.98

0.96

0.94

0.94

0.92

0.94 0.94 0.92

Reliabilit y

0.96

0.96 Reliabilit y Reliabilit y

Reliabilit y

0.96

0.92

0.92 0.9

0.9 0.88 0

100

200

300 Time in ms

400

500

Fig.4: Reliability with TM deadline (ms) 150 200 300 400 500

Category 1 0.901 0.955 0.958 0.973 0.980

600

0.9

0.88

0.88

0.86

0.9 0.88 0

100

200

300 Time in ms

400

500

600

Fig.5: Reliability without TM Category 2 0.901 0.960 0.968 0.972 0.983

Category 3 0.875 0.958 0.972 0.972 0.983

Category 4 0.888 0.954 0.976 0.978 0.985

Table1: Reliability analysis in different categories 7. Conclusions and Future works The proposed model CPN-TOGS is based on the reliability definition and the service provided by a transaction oriented grid environment has been modeled. From the experimental results, it was analyzed that the short-lived transactions and long-lived transactions have different values of reliability. The grid service having local recovery also had the lowest value of reliability. The transaction oriented grid service having replicated recovery had the highest value of reliability. The future work will be on the empirical analysis of dependability in transaction oriented grid service where not only reliability term will be concerned but also availability, safety, maintainability, and integrity terms will have to be focused. References 1.

2.

S. Guo, H.-Z. Huang, Z. Wang, and M. Xie, Grid service reliability modeling and optimal task scheduling considering fault recovery, Reliability, IEEE Transactions on, 60, 263 (2011). S. Guo, H.-Z. Huang, and Y. Liu, Modeling and analysis ofgrid service reliability considering fault recovery, New Generation Computing, 29, 345–

9

3. 4.

5.

6.

7. 8.

9.

10.

11.

12.

13.

364 (2011). [Online]. Available: http://dx.doi.org/10.1007/s00354-0090114-8 . I. Foster, The grid: A new infrastructure for 21st century science, Grid Computing: Making the Global Infrastructure a Reality, 51 (2003). P. Bernstein and E. Newcomer, Principles of Transaction Processing: For the Systems Professional. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1997. Y. Cardinale, J. E. Haddad, M. Manouvrier, and M. Rukoz, Cpn-tws: a coloured petri-net approach for transactional-qos driven web service composition, Int. J. Web Grid Serv., vol. 7, no. 1, pp. 91–115, Feb. 2011. [Online]. Available: http://dx.doi.org/10.1504/IJWGS.2011.038389 . J. Gray, The transaction concept: Virtues and limitations (invited paper), in Proceedings of the Seventh International Conference on Very Large Data Bases - 7, ser. VLDB ’81. VLDB Endowment, 144 (1981). [Online]. Available: http://dl.acm.org/citation.cfm? id=1286831.1286846 J. Gray and A. Reuter, Transaction Processing: Concepts and Techniques, 1st ed. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1992. M. Husemann, M. von Riegen, and N. Ritter, Transactional coordination of dynamic processes in service-oriented environments, in Web Services, 2007. ICWS 2007. IEEE International Conference on, July 2007, pp. 1024–1031. D. P. Mahato, L. S. Umrao, and R. S. Singh, Adaptability in transaction oriented grid service, in Parallel, Distributed and Grid Computing (PDGC), 2014 International Conference on, Dec 2014, pp. 239–244. F. Tang, M. Li, and J. Z. Huang, Real-time transaction processing for autonomic grid applications, Engineering Applications of Artificial Intelligence, 17, 799 (2004), autonomic Computing Systems. [Online]. Available:http://www.sciencedirect.com/science/article/pii/S095219760400 1228. F. Tang, M. Guo, M. Li, and L. Li, Transaction management for reliable grid applications, in Advanced Information Networking and Applications, 2009. AINA ’09. International Conference on, 427(2009). G. Levitin, Y.-S. Dai, and H. Ben-Haim, “Reliability and performance of star topology grid service with precedence constraints on subtask execution, Reliability, IEEE Transactions on, 55, 507 (2006). G. Levitin and Y.-S. Dai, “Service reliability and performance in grid system with star topology,” Reliability Engineering & System Safety, 92, 40 (2007).

10

14. M. A. Azgomi and R. Entezari-Maleki, “Task scheduling modeling and reliability evaluation of grid services using coloured petri nets,” Future Generation Computer Systems, 26, 1141 (2010). [Online]. Available: http://www.sciencedirect.com/science/article/ pii/S0167739X10001093. 15. O. Doguc and J. E. Ramirez-Marquez, An automated method for estimating reliability of grid systems using bayesian networks, Reliability Engineering & System Safety, 104, 96 (2012). 16. Y.-S. Dai, Y. Pan, and X. Zou, A hierarchical modeling and analysis for grid service reliability, Computers, IEEE Transactions on, 56, 681 (2007). 17. Y.-S. Dai, G. Levitin, and X. Wang, “Optimal task partition and distribution in grid service system with common cause failures,” Future Generation Computer Systems, 23 209 (2007). [Online]. Available: http://www.sciencedirect.com/science/article/ pii/S0167739X06001038. 18. Y.-S. Dai and G. Levitin, Optimal resource allocation for maximizing performance and reliability in tree-structured grid services, Reliability, IEEE Transactions on, 56 444 (2007). 19. Y.-S. Dai, G. Levitin, and K. S. Trivedi, Performance and reliability of treestructured grid services considering data dependence and failure correlation, Computers, IEEE Transactions on, 56, 925 (2007). 20. Y.-S. Dai and G. Levitin, Reliability and performance of tree-structured grid services, Reliability, IEEE Transactions on, 55, 337 (2006). 21. A. Heddaya and A. Helal, Reliability, availability, dependability and performability: A user-centered view, (1997). 22. V. Mainkar, Availability analysis of transaction processing systems based on user-perceived performance, in Reliable Distributed Systems, 1997. Proceedings., The Sixteenth Symposium on, 10 (1997).

Suggest Documents