for highly available web services for mission critical ... increase the availability of web services. 1. ... domain require a more loosely coupled model where.
Enhancing Web Services Availability Subil Abraham Mathews Thomas IBM Global Solutions Center Roanoke, TX USA
Abstract Little work has been reported on highly available web services which are essential for mission critical applications. In this paper we propose an architecture for highly available web services for mission critical applications. The central idea is the enhancement of web services by the introduction of a central hub to increase the availability of web services.
1. Introduction As Web services gain widespread acceptability, more and more organizations will begin to deploy mission critical applications using web services. Companies are not willing to make their business critical applications dependent on external web services unless important non-functional requirements related to security, reliability and availability are addressed. Quite a bit of work is already being done around areas such as reliability and security of web services but not much has been written about how to ensure that a web service is available for mission critical applications [1] which can have a severe impact on the company’s day to day operation if it is not functioning properly. The ideas presented here also apply to non-mission critical services though the actual implementation may be too costly to make it practical. This paper will look at some of the existing features within web services that can be used to improve the availability of web services. We will delineate known weak points with the existing solutions and propose extensions to enhance availability of web services.
2. Highly Available Web Services There are several mechanisms to ensure that a web service is highly available and these fall into two broad categories. One focuses on making the systems that provide the service as resilient as possible. The second focuses on features available within the web service architecture. A well designed web service will utilize
Johnson Thomas Department of Computer Science Oklahoma State University USA
both of the above to ensure that web services meet the desired high availability criteria.
2.1 High Available Systems There are a variety of techniques currently in place to ensure that the backend systems that provide the web service are highly available. They primarily fall into the following categories [2] [3]: • Infrastructure availability: This focuses on ensuring that the underlying infrastructure required to support the web is made as available as possible. Fail over can occur between servers, networks and disk arrays • Middleware availability: This focuses on ensuring that the middleware stack is highly available. Multiple versions of the database, web application server and messaging bus will be distributed across systems and fail over will occur to another system if one of the underlying components fail • Application availability: This focuses on ensuring that the application which provides the web service that is built on top of the infrastructure and middleware is as available as possible. The application is distributed so that failure at any node can result in another node taking over. Monitoring tools detect the failure at the different layers and make the decision on when and where the fail over should occur to. The assumption is that making the underlying systems highly available will ensure that the web service is highly available. There are, however, several shortcomings with this approach: - Even if mission critical systems meet the five 9’s criteria (ie. Is up 99.999% of the time), that does not mean that the systems are performing well during the entire period. - Even if a system meets the five 9’s criteria, there is no guarantee that the system will be up at the critical juncture when needed. - Even if the service is perfectly up all the time, the service may not be accessible as the underlying infrastructure may not be up. This is becoming more
Proceedings of the 2005 IEEE International Conference on e-Business Engineering (ICEBE’05) 0-7695-2430-3/05 $20.00 © 2005
IEEE
critical as most web services functions in the future will occur over the internet.
2.2 Features that enable reliability A high degree of reliability is necessary while exchaning messages in a business-to-business scenario where important transactions are conducted over the web. WS-Reliability & WS-ReliableMessaging are specifications that address reliable messaging requirements within the context of current web services standards [5]. WS-ReliableMessaging is a specification that allows messages to be delivered reliably between source and destination nodes. WS-Transaction [6] is another specification that provides the means to implement "Atomic" transactions and "Business Activity" transactions. Traditional transaction models are not appropriate for Web services as it is suitable only for short-lived transactions. Transactions occurring in the Web-service domain require a more loosely coupled model where communication patterns are asynchronous and WSTransaction addresses this problem. WS-ReliableMessaging and WS-Transaction do not provide a way to guarantee high availability for the web-service. For example, WS-Reliability guarantees that a message will be delivered to a receiver node irrespective of the fact that the web service may be down. Similarly WS-Transaction guarantees that the client will not be left in an invalid state when one of the participants in the transaction is not operational.
3. A more available web services paradigm To provide consumers of web services a more stable environment where they can feel confident that their mission critical applications will be able to invoke the services that they require, we propose certain enhancements to the existing web services architecture as well as a new paradigm that will improve on the existing availability of web services.
etc. This provides a higher degree of availability to the web service thus allowing the client to service its request without any significant time delay. We propose an extension to WSDL so that it describes all the available transport protocols that this service may operate in. This will allow the Web service client to automatically reconfigure its TCP endpoint in the event that a transport layer fails. We show below a snippet from a WSDL document. The tag in WSDL describes the message format and the protocol details for the webservice. The tag enclosed within the binding tag has an attribute which indicates the transport protocol that is being used. In this case, the protocol being used is "http". …
By extending the tag we can introduce a few more transport layers as shown below: …
In the above case we have extended the soap tag to include two endpoint addresses where the client can access the web service. The first one is over “http” and the second one is over “jms”. The tool that generates the Web Service client parses this WSDL and to generate code such that if "transport1" fails, then it can dynamically failover to "transport2" thus making the Web Service highly available.
3.2 A new paradigm
3.1 Enhancements to web services model
We propose two models that to enhance web service functionality. One model operates at the enterprise level while the other operates across enterprises.
When a Web service is created, it is usually tied to a particular transport protocol such as http, ftp etc. This information is specified in an XML document called WSDL which is a format for describing a web service. If the server that is hosting the web service is unable to service the client using this transport protocol, then the web service fails thus making it unavailable. A failover mechanism can be put in place at the server such that if there is a failure at the transport level, the server can still service the client by switching to a different transport protocol such as SMTP, TCP/IP
3.2.1 Enterprise level gateway: Any web service request should initially go through a gateway within the enterprise where the service is being hosted. This gateway (figure 1) will do the following: - Determine if the provider of the service is available and able to provide the service based on the existing state of the system. A system may be available but not necessarily be able to provide the required service to the user due to security or performance implications. The gateway is constantly monitoring the different services provided by the enterprise.
Proceedings of the 2005 IEEE International Conference on e-Business Engineering (ICEBE’05) 0-7695-2430-3/05 $20.00 © 2005
IEEE
Monitoring a service has several implications including possible degradation of performance as well as how exactly to determine if the service is running optimally. There are two main mechanisms that could be used to do this. o The service provider monitors itself and issues an occasional heartbeat to the gateway indicating its state. The gateway will decide what to do if a request is sent. If the gateway does note receive a heartbeat within a given time frame it will assume the service is unavailable and not direct any requests to the service. o Another possibility is for the gateway to occasionally invoke a “slimmed” down version of the service itself or a call requesting the service to indicate its current state. The gateway will use the response from this call to determine whether to route future requests to the service. - If the gateway determines that the service is in a well functioning state, the request is passed on to the service and the response returned to the user - If the gateway determines that the service is not functioning well, the gateway can either respond to the requester by informing the requester that the service is not available or by having the service automatically redirect the request to a service that exists in another enterprise based on how the requestor would like the service handled. Gateway
Requestor
response Determines if single service can fulfill request request Requests access to service not available
Requested service cannot fulfill request and requestor agrees that alternative source be located
Web Service Fulfills request
Fulfills request
Alternate service not available or requestor does t t lt t i response
Figure 1: Enterprise level gateway
The gateway has two primary functions; the first is to monitor the health of the different services and the second is to pass a request to the appropriate service or redirect it to another service if the desired service is not healthy or its state cannot be determined. There are several mechanisms to monitor the health of a service. One way is to determine the health of the system that the service is running on. Existing monitoring tools need to be enhanced so that the critical applications feed important state data to these tools. The gateway needs to occasionally invoke the service and ensure that a correct result is retrieved. If a service is deemed to be
healthy, the call can simply be passed on to the appropriate service. If not, the gateway may look for an alternative service which could be predefined as a database lookup or by some other discovery service. The above model will not work if the point of failure is in the network between the gateway and the requestor. Another disadvantage is that the request may go through several hops before it is fulfilled and this may not be acceptable to the requestor. We propose a web services hub to overcome this difficulty. 3.2.2 Web Service Hub: The web services hub obtains request for a given service and fulfills that service based on who is most capable of fulfilling the request (figure 2). A secure VPN or dedicated line will be established between the requestor and the hub. The provider of the service will also be setup in a similar manner when interacting with the hub. The enterprise gateway may also have a secure VPN or dedicated line to it but the advantage with the hub is that the requestor only has to set up single connection to the hub instead of multiple connections to different gateways. The web service hub will do the following: Requestor
request Requests access to a service
IEEE
Cannot determine single service; sends request to multiple services (some may not be an exact match)
Analyses response and sends most appropriate response response back to requestor based on criteria specified by requestor (response time, 80% of requests fulfilled, etc.)
Web Service Fulfills request Fulfills request Fulfills request Fulfills request
Figure 2: Web Service Hub
- When it receives a request, it makes a determination on which service provide can best fulfill the request. The hub is constantly monitoring the different services to determine their health. If a clear winner is found, the hub will send the request to the service and the response is returned to the requestor. - If there is no clear winner, the hub could publish the request to all the likely services that are possible candidates and return the response to the user as soon as the first service returns a request. The requester may also want the response to be sent to multiple services and request all the responses be returned to the user or only the first one that responds. There are performance implications here as unnecessary calls may be made so issuing such requests should be done judiciously.
Proceedings of the 2005 IEEE International Conference on e-Business Engineering (ICEBE’05) 0-7695-2430-3/05 $20.00 © 2005
Web Service Hub Determines that requested service can fulfill request
The above clearly has several advantages over the previous model. The primary disadvantage is the complexity involved in setting up the hub not only from a technical perspective but also from getting the different partners to agree upon the mechanisms for communications. Compared to the gateway, the hub has to manage invocation from several requestors and also determine the best response. The criterion that determines what exactly a best response is can be fairly complex as it could involve more complicated criteria than performance or security. A mechanism has to be in place for the user to select and dynamically modify the criteria even after the user has issued the call. The hub also has to implement various administrative functions such as registration of services and requestors and pricing mechanism for service calls.
3.3 Implementation An environment was set up to test the validity of the above and potential performance impact. Three scenarios were tested: a) A requestor made a simple web service request to convert a currency from one denomination to another. The call was as follows: convert (input_amount, input_ denominator, converted_amount, output_denominator) 40,US$,??,Can$ Fulfiller
Requestor 40,US$,50,Can$
b) A similar setup to (a) was created and an intermediate gateway was introduced. requestor
provider
gateway
state
The requestor makes the request to the gateway which then examines a database to determine if the provider is in an active state. If so, the request is sent to the provider and the results returned. The gateway is constantly running and monitors the provider. The provider updates the database at the requestor’s site with its state. The web service was set to time out at certain intervals and in such cases the requestor sent the service to another service provider. c) A similar set up to (b) was created. In this case, multiple requestors and multiple providers. It was initially tested with a single requestor and the request was sent to multiple providers. Multiple requestors and multiple providers were then tested. The following summarizes the request Scenario (a) vs (b)
Assume web service not up: Simple: Send -> response (fail or slow) -> try another service -> response (success) Gateway: send -> response (fail to another service) -> response (success)
Scenario (a) vs (c) Hub: Send -> multicast -> respond Simple: Send -> respond. Need to put timer in? Hub: Multiple senders -> distribute load -> respond Simple: Overload system?
4. Conclusions Highly available web services is essential for mission critical applications and little work has been reported in this area. In this paper we have proposed an architecture for highly available web services for mission critical applications. The central idea is the enhancement of web services by the introduction of a central hub to increase the availability of web services. Future work will investigate the performance of the proposed architecture as well as specifying the business process flow in the proposed architecture using BPEL [4] or some other such specification language. Defining the ‘health’ of the web service remains a largely unexplored problem
5. References [1] K Birman, R Van Renesse, W Vogels, “Adding High Availability and Autonomic Behavior to Web services”, Proc 26th IEEE Int Conf on Software Engineering, 2004 [2] Evan March and Hal Stern, Blueprints for High Availabilty, Wiley Publishing Inc, 2003 [3] Floyd Piedad, Michael Hawkins, High Availability: Design, Techniques and Processes, Prentice Hall, 2000 [4] T. Andrews, F. Curbera, et al. “Business Process Execution Language for Web Services”, Version 1.1, 2003. http://www-128.ibm.com/developerworks/library/ws-bpel/ [5] WS-Reliable messaging - http://www-128.ibm.com/ developerworks/webservices/library/specification/ws-rm/ [6] WS-Transactions //msdn.microsoft.com/library/ default. asp?url=/library/enus/dnglobspec/html/ws-transaction.asp
Simple: send -> response (Success). Total time = t Gateway: send -> response( Success). Total time = t’
Proceedings of the 2005 IEEE International Conference on e-Business Engineering (ICEBE’05) 0-7695-2430-3/05 $20.00 © 2005
IEEE