A NEW MODELLING APPROACH TO ENHANCE ... - CiteSeerX

5 downloads 24490 Views 138KB Size Report
Reliability and uptime are the key indicators of business systems supported by ..... of failed site: During the failure of a site if an alternate server cannot be find.
Proceedings of iiWAS2006

A NEW MODELLING APPROACH TO ENHANCE RELIABILITY OF TRANSACTIONAL ORIENTED WEB SERVICES

Adil M. Hammadi1 ), Saqib Ali ), Fei Liu1 ) Abstract Reliability and uptime are the key indicators of business systems supported by web services today. Business processes are linked together in web services by chain of components. Success or failure of business is determined by the reliability of this chain. In this paper, we proposed an enhanced reliability model for web services computing focusing on transactional-oriented web services applications. We presented an online reservation system case study to explain our model. We also conduct an evaluation to check the reability of the proposed system.

1. Introduction Reliability and availability are key criteria to provide trustworthy web services. These are related to what is called Quality of Service (QoS) for web services [2]. Microsoft term for reliability, security, availability and privacy is trustworthy computing as discussed by Zhang [6]. Borrow this term for web services, a framework is needed for fault tolerant web services. A service that is frequently unavailable can stain the provider’s reputation and business. In an event in 2002 where e-bay suffered a 22-hour long outage, the revenue lost was estimated at five million US dollars [5]. Therefore, failure detection and fast recovery are needed for web services. Another very useful metric that is needed is the measure of reliability so that reliability of different service providers can be compared. Web services are platform and language independent and implemented in distributed environment. Web service providers that want to publish their web services do it through a public registry. It is, therefore, difficult to trust on the reliability of different web services. For example we can develop a server for online travel service. This server makes use of different airline reservation systems, car rental and hotel reservation systems. In this scenario individual service of airline reservation, hotel reservation and car rental may have fault tolerance techniques implemented but fault tolerance of online travel service cannot be guaranteed. In this paper we proposed a model that helps to enhance the reliability of web services. The structure of this paper is as follows; Section 2 gives the work on fault tolerance of Web Services computing. Section 3 discusses our proposed model. Section 4 discusses correctness and evaluation of our model and section 5 is the conclusion and future work. 1 Department of Computer Science and Computer Engineering, La Trobe University, Bundoora, 3086, Australia. {adil.hammadi, s.ali, f.liu}latrobe.edu.au

203

Proceedings of iiWAS2006

2. Background Fault tolerance of web services computing has been addressed by some researchers in recent years. The work has been done by focusing reliability as vertical layer of web services stack model [5], [10]. Also attention has been paid on making web services message delivery reliable [7], [3], [4], [1]. For asynchronous mode of transaction, middleware has been used by Meheshwari and Erradi [8], [1]. Based on the development in web services as mentioned above we advocated that web service messages may not necessarily represent web services transactions. The distinction between the two is important as states through which web services pass throughout processing do not necessarily represents the states of web services transactions. Therefore, we did some work on fault tolerance of transaction-oriented web services applications. We propose a middleware-based fault tolerance model that incorporates global and local fault tolerance techniques for transaction-oriented web services applications with reliability evaluation. We use middleware between web service requester and web service provider to increase reliability, as used by Meheshwari [8] and Erradi [1]. We discussed the need of fault tolerance of transactionoriented web services application and focused our work on that. Not only that we have presented the failure and recovery models but also the complete discussion that how faults can be detected, isolated and recovered. We address these issues in our proposed model in section 3.

3. Our Proposed Model As discussed in section 2 that our model is middleware-based. To better explain our model, we use following case study. Case Study: To illustrate the importance of reliability of web databases in web services we choose online reservation system as a case study. We chose this scenario because airline reservation system is among the systems that need high availability and reliability. Online reservation system is basically for travel agents and provides the services of airline reservation and hotel reservation. Travel agents pay a very nominal fee to use these services. Online reservation server acts as consumer/coordinator for the web services it requested. The request is then dispatched to respective servers, that is, airline server and hotel server. These servers act as participants. The Figure 1 depicts the whole process of our case study.

Figure 1: Online Reservation System

204

Proceedings of iiWAS2006

The following process is adopted to make appropriate reservation as discussed in our case study. x x x x

Consumer/customer calls travel agent for booking. Travel agent takes the details from the customer and fills in the reservation form. Travel agent sends reservation form to online reservation server to start the reservation process. Online reservation server, after looking in the service registry, dispatches the request to airline and hotel reservation servers.

Web Services combine the services from airline and hotel reservation servers. Reliability of airline and hotel reservation servers may be given but the reliability of online reservation server is not known. Our proposed model is capable of determining reliability of online reservation system. In the following section we are going to discuss our reliability model using example as discussed in our case study. 3.1. Proposed Reliability Model We define transaction-oriented Web Services application as an application that makes significant use of web databases. Reliability of transaction-oriented web services application is defined as the probability that a transaction request in a Web Services environment will complete in given period of time. This excludes the case where transaction processing is not possible due to transaction failure. We used a component based approach to develop our reliability model. First we will discuss our model in general and then in following sections we will discuss each component in detail. A transaction T enters a system when a consumer requests service requester for a service. Service requester (client) passes the request to transaction manager in middleware. Transaction manager defines the transaction context and passes the service request to queuing manager. Queuing manager checkpoints the request and queues them for service providers. The service request is then passed to servers from the queues. Servers after processing the request send their reply back to transaction manager. Transaction manager checks the checkpoint log to check that whether all legs (branches) of a transaction T are complete and then it composes the reply and sends it back to client. Client passes the reply to consumer. The whole lifecycle is shown in the Figure 2.

Figure 2: Proposed reliability model

205

Proceedings of iiWAS2006

Figure 2 depicts the proposed reliability model. We assume that sub-transactions for a global transaction can process in parallel and independent of each other. Therefore, result of their execution is asynchronous. To compose the result of these sub-transactions, transaction manager looks into composition document. In the case where both the sub-transactions of a transaction in our scenario return OK, composition document or workflow document is consulted and result is sent back to client. The Figure 3 shows the different transaction levels used in our reliability model.

Figure 3: Transaction levels in reliability model

To explain our model better we discuss briefly the transaction levels in Web Services environment. For the sake of fault tolerance, conceptually, we divide a transaction into two levels; global transaction level and sub-transactions level. In our case study, a single reservation request from a customer is a global transaction and airline and hotel reservations are sub-transactions for that global transaction. Figure 3 depicts this concept. This is important for fault isolation of web services and to avoid abort of whole transaction in the case of failure. Since our model is component-based, we discuss components of our model in following sections. 3.1.1. Transaction Manager In distributed transaction model like X/Open, transaction API (Application Programming Interface) is used to invoke the operations on transaction manager [9]. We define transaction interface in a transaction manager as “A transaction manager interface is defined as a 4-tuple vector , where x x x

W: is the workflow document. Workflow document contains the information of how the related transactions (for a specific business task) are processed and composed. C: is the Coordination protocol. This protocol is used to coordinate transactions and their reply. T: is the Transaction protocol. Depending on the type of application, transaction protocol can be selected that is too strict as 2-PC or flexible.

206

Proceedings of iiWAS2006

x

P: is the user Priority for transactions. That is, user can assign priority to their transaction depending on some criteria like who pays more gets the more priority.”

All these parameters are optional and if parameters are not passed by application then default protocol are used by transaction manager. Client connects to transaction manager by transaction interface which defines the functionality of transaction manager. If we define the functionality of transaction manager statically, this is against the dynamic nature of web services application. In our model, we implement transaction interface which works with most of transaction-oriented web services applications but in future we will add the reconfiguration of transaction interface in our model so that it may work with any type of transaction-oriented web services application. The important role of transaction manager is to provide coordination using coordination protocol. Also for transaction processing, it implements transaction protocol. Transaction context is different from one application to another. Since the choices of all these protocols vary from application to application, client passes the transaction request with its WSDL (Web Services Description Language) documents to transaction manager and on the basis of WSDL documents transaction manager determines the transaction context for that application, coordination protocol and transaction protocol. Transaction manager is also responsible for converting transaction requests into web services call. On receiving the response of transactions, another important role of transaction manager is to compose those replies and send them to client. To compose the reply, transaction manager needs the information about the workflow of transaction. This information is also passed by client to the transaction manager at the time of invocation of web services. 3.1.2. Queue Manager Since we are using queues in our model, we use queuing manager to create, manage, and destroy the queue. We define queue manager as a software module that takes input in the form of messages from transaction manager, stores them in the queues and then dispatches them to respective servers. The known problem associated with queues is priority of messages in the queues. Therefore, priority algorithm must be defined so that it will not affect the performance of the system. Before queue the message in a queue, the queue manager records the details of transaction into a checkpoint log. This log helps later in recovery of a fail transaction. 3.1.3. Checkpoint Log Third most important component of middleware is the checkpoint log to record the details of transaction messages before they are en-queued in a queue. The log is created at stable storage so that in the case of failure of middleware, the states of messages can be retrieved and processed. For the purpose of reliability, the information stored in checkpoint log contains information about global transactions and its sub-transactions and to which service provider they are sent and whether they are completed or not. So if a leg of a global transaction fails (due to any reasons specified above), the status of that global transaction is marked as partial commit in the checkpoint log. Recovery procedure then consults this log to update the status of partial commit transaction if it is 207

Proceedings of iiWAS2006

recovered. We discuss recovery procedure in detail in next section. On receiving the acknowledgement that all sub-transactions have been completed, checkpoint log mark the status of global transaction as completed transaction and this record is deleted from the log. As we mentioned earlier, checkpoint log contains all the information it needs to recover a failed transaction. This also includes status of sub-transactions and status of global transactions. We define following status for sub-transactions and global transactions in our model. When a sub-transaction is submitted to a queue its status is ready as it is ready to be dispatched. If a sub-transaction is submitted to a server for processing its status is in-process as it is being processed in the system. If a sub-transaction is completed in a specified time its status is marked as completed. If there is a transaction failure then the status of that sub-transaction is marked as transaction fail. The status of a sub-transaction is fail if it fails to process within specified time due to site failure or communication failure. Based on the status of sub-transactions, the status of global transactions can be complete, fail or partial commit. The status of global transaction is complete if all sub-transactions processed successfully. If the status of some or any of sub-transactions of a global transaction is fail due to communication and site failure then the status of that global transaction is partial commit. If the status of a sub-transaction is transaction fail then the status of global transaction is fail. 3.1.4. Failure From fault tolerance of web services application point of view, we consider site failure and transaction failure in our paper as site failure is the failure of service provider’s server. In case of site failure, no request can be sent to or received from failed server. In case of transaction failure processing of transaction request is not possible. For example, seat is not available for reservation in airline reservation system then the transaction cannot be completed.

4. Correctness and Evaluations In this section we discuss the informal analysis of correctness of our proposed model with the help of scenario as explained in our case study. 4.1. Correctness Correctness is validation that how much reliability our model provides in reality [12]. We validate our model in following three cases: A. No failure In our scenario, online reservation system a global transaction may be called as reservation. There are two possible outcomes of this transaction: OK if reservation is processed or FAIL if reservation is not processed. The global transaction consists of two sub-transactions; airline reservation and hotel reservation. Since both these sub-transactions are processed independently on separate servers, their result will be asynchronous. Suppose the result of airline reservation request fulfills first, transaction manager will receive its response (OK). Transaction manager consults the checkpoint log and updates the status of that sub-transaction to complete. Also from checkpoint log it is determined that answer of hotel reservation transaction is not received yet. On receiving the response (OK) from hotel reservation transaction, transaction manager updates its status to complete in the log and status of reservation is updated to complete in the log. The transaction’s response is then composed according to the workflow document and result is sent back to the client. 208

Proceedings of iiWAS2006

B. Transaction failure Now we suppose that one of the sub-transaction fails due to transaction failure. In this case, the whole transaction is aborted. If related sub-transaction has already been completed, then transaction manager has to run the compensation transaction. For example, in our scenario, if airline reservation fails to succeed due to seat unavailability then whole reservation has to abort. During this period if hotel reservation has been completed then due to the airline reservation’s failure the transaction manager has to run compensation transaction for hotel reservation. Also in this case, the status of sub-transactions and status of global transaction are updated accordingly, and the entry for global transaction is deleted from the checkpoint log. C. Site failure In this case, the site (web database) on which the transaction request has been carried out is down. The transaction manager is notified about this failure by SOAP (Simple Object Access Protocol) error mechanism when a response is not received within specified time from that site. There are two possible cases during site failure. We discuss each case separately. Case 1: Alternate server is available. In the case of a site failure, the transaction manager first searches registry for an alternate server that can provide same set of services as the failed server. If such a backup server can be found for failed server then the queue for failed server can be redirected to backup server and processing of the requests will not be interrupted. Case 2: Recovery of failed site: During the failure of a site if an alternate server cannot be find then transaction manager is responsible for the recovery of failed transactions on failed server. Transaction manager immediately updates the status of the sub-transaction that was being processed on this site at the time of failure to fail. The status of global transaction is updated to partial commit. During site failure no other request can be sent to that site for processing and no response can be received. All the requests in the queue for that site are blocked. Transaction manager probes the failed site periodically and detects when the site is available. The recovery procedure starts when the site becomes available. During the failure of a site, the sub-transactions on other sites may have processed and their status has been updated in checkpoint log. Transaction manager does not compose the replies of a transaction unless it does not upgrade the partial commit to complete. For recovery of a subtransaction, the status changes from fail to ready again in the log and it is dispatched to site for processing. 4.2. Evaluation Using our case study we compare our model with traditional web services model without any middleware. We look into fault tolerance mechanisms of SOAP and WSDL layers and then compare these mechanisms with our fault tolerance technique. This comparison shows that our fault tolerance technique works well for transaction-oriented web services applications and it applies to global fault tolerance mechanism to recover the faults.

209

Proceedings of iiWAS2006

Global Transaction and SOAP layer Fault is detected through SOAP fault code in our model. It is same as fault is detected in traditional web services model. The only difference here is of recovery. In traditional web services model there is no recovery and a request may have to be resubmitted. In our model recovery is done as explained in section 4.1 part C. An example below show the fault code used in SOAP message. Example 2 : Sample SOAP fault where the detail element information item is to be interpreted in the context of the "env:Sender" and "m:MessageTimeout" fault codes. We can see from above example that fault occurs in sender and the cause of this fault is Message Timeout. This can also be understood by “Reason tag” which shows that sender is timed out. Global Transaction and WSDL layer Since transaction ‘reservation’ in our case study made up of two sub-transactions, namely, airline reservation and hotel reservation, it is not possible to document fault tolerance policy of ‘reservation’. Processing of global transaction depends on the processing of sub-transactions. In traditional web services model, although operational fault can be detected for a sub-transaction, recovery of global transaction is not possible. In our model, we can still use WSDL fault model for a database operation but we also provide recovery of sub-transaction as well as recovery of global transaction. An example below shows that how WSDL fault mechanism works for an operation. Example 3 : Consider a transaction for hotel room booking. For an operation “CheckAvailability” of hotel room, WSDL documentation is given in an example “GreatHotel Interface Definition”. In this example first an interface is defined for an operation “CheckAvailability” and then fault name is defined within that interface. Fault name is “InvalidDataFault”. By looking at the schema it appears that this fault occurs when data provided for operation is not of correct type. Operation cannot be completed in this case.

5. Conclusion and Future Work Since fault tolerance of web services especially fault tolerance of transaction-oriented web services application is an important issue, we provided a framework to improve the reliability of these applications. By using middleware and by conceptually dividing a transaction into two levels, we can detect, isolate and recover fault. A comparison of our approach with SOAP and WSDL fault code shows that our model can handle global faults, which is an improvement on reliability of transaction-oriented web services. However, we are using middleware which may have performance overhead as compared to traditional web services model. Also, since we are using queues, order of transaction and priority of transaction cannot go together. Sometimes the order of transaction is more important without which whole transaction processing is useless. As a part of evaluation, we compared our model with traditional web services model (without middleware). Actual implementation of our model remains a task for future. 2 3

An example is given at: http://www.w3.org/TR/soap12-part1/#faultcodes This example is taken from http://www.w3.org/TR/wsdl20-primer/. 210

Proceedings of iiWAS2006

References [1].

Abdulkarim Erradi, P.M. A Broker-based Approach for Improving Web Services Reliability. in IEEE International Conference on Web Services (ICWS'05). Orlando, FL, USA, 2005.

[2].

Birman, K., Can Web Services Scale Up? Computer, 38(10): p. 107- 110, 2005.

[3].

Contributors: Fujitsu Limited, H.L., NEC Corporation, Oracle Corporation, Sonic Software Corporation, and Sun Microsystems, Inc., Web Services Reliability (WS-Reliability) Ver 1.0. 2003.

[4].

Contributors: IBM, B.S., Microsoft, TIBCO Software, Web Services Reliable Messaging Specification. 2005.

[5].

He, W. Recovery in Web Service Applications. in IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE'04). 2004.

[6].

Jia Zhang, L.-J.Z., Jen-Yao Chung. WS-Trustworthy: A Framework for Web Services Centered Trustworthy Computing. In IEEE International Conference on Services Computing (SCC 2004). Shanghai, China, 2004.

[7].

Lomet, D.B. Robust Web Services via Interaction Contracts. in TES'04 Workshop 2004.

[8].

Maheshwari, P.T., H.; Liang, R.;. Enhancing Web services with message-oriented middleware. in Proceedings of the IEEE International Conference on Web Services (ICWS’04). 2004

[9].

Valduriez, M.T.Ö.a.P., Principles of Distributed Database Systems. Second edition ed.: Prentice Hall, 1999.

[10]. Vijay Dialani, S.M., Luc Moreau, David De Roure and Michael Luck. Transparent Fault Tolerance for Web Services Based Architectures. in Parallel Processing: 8th International Euro-par Conference. 2002. [11]. Witold Abramowicz, M.K., Dominik Zyskowski. Duality in Web Services Reliability. in Advanced International Conference on Telecommunications and International Conference on Internet and Web Applications and Services (AICT/ICIW 2006). 2006. [12]. Zhang, J.Z.a.L.-J. Criteria Analysis and Validation of the Reliability of Web Services-oriented Systems. in IEEE International Conference of Web Services (ICWS'05). 2005.

211