DWSDM: A Web Services Discovery Mechanism Based on a Distributed Hash Table Quanhao Lin The Department of Computer Science and Engineering, Shanghai Jiaotong University, China
[email protected]
Ruonan Rao The Department of Computer Science and Engineering, Shanghai Jiaotong University, China
[email protected]
Abstract Web services enable seamless application integration over the network regardless of programming language or operation system. A critical factor to the overall utility of web services is an efficient discovery mechanism, but the existing web services discovery is processed in centralized approaches such as UDDI, which has the limitations of single point of failure and performance bottleneck. In this paper, we propose a decentralized web services discovery mechanism (DWSDM) based on a distributed hash table (DHT) to solve these problems. DWSDM leverages DHT as a rendezvous mechanism between multiple registry centers. Besides, the proposed layered architecture which decouples application tier for registry center and overlay tier for network topology could facilitate system implementation quickly. Experimental results on the prototype implementation based on improved Chord algorithm show that DWSDM achieves insignificant improvements on both robustness and scalability.
1. Introduction Web services are emerging as a powerful technology for organizations to integrate their applications within and across organizational boundaries. An important aspect is that the interactions should be done automatically by computers. Thus the critical factor is a scalable, flexible and robust discovery mechanism, which provides the desired functionality. Current web services discovery employs a centralized repository such as UDDI [1], which leads to single point of failure and performance bottleneck. Even though UDDI has been the de facto industry
Proceedings of the Fifth International Conference on Grid and Cooperative Computing Workshops (GCCW'06) 0-7695-2695-0/06 $20.00 © 2006
Minglu Li The Department of Computer Science and Engineering, Shanghai Jiaotong University, China
[email protected]
standard for web services discovery, imposed requirements of tight-replications among registries and lack of autonomous control have severely hindered its widespread deployment and usage [2]. In fact, the only popular uses of UDDI are as private UDDI registries within an enterprise's boundaries. In one word, centralized approaches combined with replication have many drawbacks, e.g. poor scalability and less consistency on large registries. Peer-to-Peer system (P2P for simplicity), as a complete distributed computing model, could supply a good solution to build the decentralized repository for web services discovery [3]. In this paper, we explore how Distributed Hash Table (DHT for simplicity) technology can be leveraged to develop scalable distributed web services discovery architecture. A DHT is a P2P distributed system that forms a structured overlay allowing more efficient routing than the underlying network. We design a decentralized web services discovery mechanism based on DHT called DWSDM. By our architecture, developers could implement a decentralized service registry rapidly. First, proposed two-layer architecture decouples application tier for registry center and overlay tier for network topology, and DHT Based Distribution provides a public DHT service for applications. Second, we have extended the DHT routing algorithm based on Chord [8] using Successor Replication Strategy for robustness and scalability in DWSDM. The rest of the paper is organized as follows: Some related work is introduced in section 2. In section 3, we describe our proposed architecture, which is followed by DHT Based Distribution with improved Chord algorithm in section 4. In section 5, we show experimental results and evaluation. Finally, we conclude our paper in section 6.
RC P P RC
DHT Based Distribution RC P
RC P
The problems pertaining to web services discovery have long been taken attention by both academia and industry. Current approaches can be broadly classified as centralized or decentralized. The centralized approaches include UDDI, where central registries are used to store web services descriptions, while our task is to decentralize registries for scalability and robustness. The decentralized approaches are based on P2P systems, which are more scalable, more fault tolerant and more efficient. Neuron [4] system constructs a virtual share space which allows the web services to be described in arbitrary forms, and can be executed on top of Tornado. In [5], a P2P approach to web services discovery has been proposed. It mainly focuses on the complex queries containing partial keywords and wildcards by means of dimension reducing indexing scheme that effectively maps the multidimensional information space to physical peers. UP2P4UDDI [6] constructs a network for UDDI by connecting all registries with unstructured P2P technology, but searching in UP2P4UDDI network is often based on flooding or its variation, which does not offer guarantees and can not use the network bandwidth effectively. Another approach which uses semantic description of services combined with P2P network topology is described in [7]. This approach using DAML-S for service description focuses on service semantic and ontology. However, our work differs from those previous studies. We focus on the architecture of decentralized service registry by means of DHT Based Distribution. Furthermore, we extend Chord algorithm to implement DHT protocol for system robustness and scalability.
service in the overlay tier, which supports a traditional hash table's simple put/get interface, but also offers increased capacity and availability by partitioning the key space across a set of cooperating peers and replicating stored data. The architecture of RCP is shown in Figure 2.
RCP
2. Related work
Figure 1. The architecture of DWSDM The service providers publish their web services via Web GUI in a RCP, so dose service requesters locate the demanded services. Service Description Extractor gets the necessary service metadata from WSDL documents and user inputs on the web browser. Service Key Mapper is responsible for associating a numeric value in the DHT key space with each service metadata from Service Description Extractor. Service Deployer has the duty to deploy the published service information to the local DB called Service Repository. DHT Router mediates between the local RCP and DHT service provided by DBD. DBD connects all the RCPs together and facilitates searching services across RCPs. The details about DBD will be introduced in the section 4.
3. System overview 3.1. The architecture of DWSDM DHT based Decentralized Web Services Discovery Mechanism (DWSDM for simplicity) is a Peer-to-Peer web services discovery mechanism based on a distributed hash table. Figure 1 shows the high-level proposed architecture of the DWSDM. In this architecture, two layers, which are application tier and overlay tier, are designed for registry center and network topology respectively. Registry Center Peer (RCP for simplicity) not only acts as a service registry repository to store the service information but also acts as a peer in the P2P network to route messages via DHT Based Distribution (DBD for simplicity). DBD provides the infrastructure
Proceedings of the Fifth International Conference on Grid and Cooperative Computing Workshops (GCCW'06) 0-7695-2695-0/06 $20.00 © 2006
Figure 2. The architecture of RCP The key to web services discovery lies in service publishing and service lookup. When a new service is added to a RCP, the potential search items extracted from service description and service metadata by Service Description Extractor are hashed by the
Service Key Mapper and used as DHT keys to publish the service in the DBD. The value stored for this service uniquely identifies the service, including the URL of the RCP that stores the service information and the search items of the service in that RCP. Similarly when queries arrive, they are parsed and search items are identified. The search items are hashed and the values stored with these hash keys are retrieved from the DBD, then the service requester can get the service information directly from the RCPs which store the exact services. In our system, the key-value pair inserted into DBD is something we can modify, because the Service Description Extractor and Service Key Mapper, which are plugged components, can be replaced if necessary. The key now used is service name's hash value; the value contains a search item and the URL of the RCP which stores the published service. In the next, we demonstrate what the sequence of operations should be for two crucial scenarios: publishing a new service and querying a service.
3.2. Publishing a new service Figure 3 shows how a client publishes a new service to the DWSDM. Client
Web GUI
Service Description Extractor
Service Key Mapper
Service Deployer
DHT Router
publishService saveToLoalRepository response getSearchItem
and the value consists of the search items and the URL of the local RCP.
3.3. Querying a service Figure 4 shows how a client queries the RCP for a service. Client
Web GUI
Service Key Mapper
DHT Router
RemoteRCP1
RemoteRCP2
queryService getHashkey hashkey getRCPs
RCPs getServiceDesc response getServiceDesc response response
Figure 4. Querying a service in DWSDM Once again, the client contacts one RCP of the DWSDM. A Hash Key obtained from Service Key Mapper will be delivered to DHT Router. Because there may be more than one value under a Hash Key in the DBD, a list of RCPs will be returned from DHT Router. Then Web GUI contacts all the RCPs in the list to get all possible service information and responses these to Client for user to determine which one the client prefers.
searchItem
4. DHT based distribution
getHashKey hashKey insertKeyValuePair response response
Figure 3. Publishing a new service in DWSDM A client contacts a RCP via Web GUI on that RCP. It then publishes a service to local Service Repository via Service Deployer. Then Service Description Extractor will extract necessary search items from WSDL documents and service description metadata that the user inputs before publishing. Service Key Mapper has the hash function such as SHA-1 to generate a Hash Key. The Hash Key is used to locate appropriate node in DBD by means of DHT Router. At last a new key-value pair is inserted into the DBD, where each key is obtained by Service Key Mapper
Proceedings of the Fifth International Conference on Grid and Cooperative Computing Workshops (GCCW'06) 0-7695-2695-0/06 $20.00 © 2006
As mentioned above, we propose to build the DWSDM on top of a DHT infrastructure called DBD. This choice is primarily motivated by the simplicity of the put/get abstraction the DHT provides, which is powerful enough for the task. DHTs are benchmarked to introduce a new generation of large-scale Internet services. Their goal is to provide the efficient location of data items in a very large and dynamic distributed system without relying on any centralized infrastructure. DHT maintains a collection of key-value pairs on the nodes. For our deployment, the key is the hash from the Service Key Mapper, the value stored is search items from Service Description Extractor and the URL of RCP which stores the published service information. In the DHT based distribution, Chord serves as the experimental substrate of our work. Nevertheless, our design does not depend on the specific DHT implementation and can work with any DHT protocols.
4.1. Improved chord algorithm We use Successor Replication Strategy [12] to improve the robustness of the system. Based on the Chord algorithm, we know that each key-value pair will be stored on the root node of the key. A simple replication approach is to replicate the key-value pair stored on the root node to its k successors. This schema, called Successor Replication, is adaptive when nodes join or leave the DBD. When a node joins, it will take over some of key-value pairs and replicas from its successor node. When a node leaves, no explicit procedure is required, because the key-value pairs stored on the leaving node are already replicated on the successors, we can find them all the same. In our system, we make k equal to one, that is to say, we only replicate the key-value pairs from root node to root node's direct successor. This paper takes two quite different replication strategies, and evaluates their robustness for nodes left. 1. Lazy Successor Replication (LSR for simplicity), the replication does not take place until one node in the DBD just wants to leave the DBD network initiatively. 2. Eager Successor Replication (ESR for simplicity), replication takes place when a new key-value pair is put into the DBD, then the replication about this new inserted keyvalue pair never happens again. The difference between these two strategies is the time when the replication happens.
5. Experimental results and evaluation
In the simulation, m in the Chord algorithm equals to 32, that is to say, the key space is from zero to 232, initial number of nodes in DBD is 20, and 1000 web services are published to the DWSDM.
5.1. Robustness We get the service hit number when some of nodes in the DBD fail. This can hint the system robustness. From the results in Figure 5, we can see that when we use LSR strategy, because the key-value pairs on the node are replicated to successor node on time before the node leaves, all the service information can be found even if half of nodes (10 in 20) leave or fail, this strategy can solve the node failure problem very well. When we use ESR strategy, service information is stored on two nodes after service information is inserted into DBD, so if only one node fails, all service information also can be found, it could solve the single point of failure. But if a sequence of nodes fails, some of service information will be lost. If no replication is used, even single node failure will result in some service information lost. If more and more nodes fail, the number of hint will descend rapidly. So, the LSR strategy is the best of all, but this strategy need nodes replicate necessary information initiatively just before the nodes leave, this does not often happen like this in fact, such as the network crash (e.g. getting offline or having communication problem) is the main reason of node failure. In order to make the system more robust, we can use both LSR and ESR strategies. Experimental results improve that Successor Replication Strategy can solve the single point of failure and make the system more robust. 1200 1000
number of hit
David Liben-Nowell etc. points out that Chord's maintenance bandwidth to handle concurrent node arrivals and departures is near optimal[9], and Chord has the features of simplicity, provable correctness, and provable performance compared with other lookup protocols [10], such as CAN, Pastry and Tapestry, so we use Chord algorithm to organize the DBD's routing table. For a detailed description of Chord, please refer to [8].
800 600 400
LSR ESR No Replcaiton
200
In this section, we evaluate DWSDM by simulation and compare the Successor Replication Strategies in the improved Chord algorithm for robustness; also, we will evaluate the scalability of the system. According to our blue print described above, we use mySQL for local Service Repository and implement DHT protocol with improved Chord algorithm in java. RCPs and DBD are implemented on application tier and overlay tier respectively according to the Common API described in [11].
Proceedings of the Fifth International Conference on Grid and Cooperative Computing Workshops (GCCW'06) 0-7695-2695-0/06 $20.00 © 2006
0 1
2
3
4
5
6
7
8
9
10
number of nodes fail
Figure 5. Robustness
5.2. Scalability We measure the scalability of DWSDM in terms of routing cost in DBD. To measure the routing cost, we measure the average number of nodes visited for query
a service. For each query, the initial RCP is selected uniformly at random from the DWSDM. Figure 6 shows the results, where the routing cost is low and also increased gracefully with increasing number of nodes in DBD. For example, query a service takes 4.274 hops on the average in a 100 nodes system, whereas it takes 5.877 hops when the node number increases to 1100. Thus we conclude: Compared to a centralized service discovery system, DWSDM is both scalable and efficient in terms of routing. 7
routing cost
6 5 4 3 2 1 0 100 200 300 400 500 600 700 800 900 1000 1100
number of nodes in DBD
Figure 6. Scalability
6. Conclusions Emerging P2P technologies are appropriate and suitable to the increasingly decentralized nature of modern organizations. A novel Decentralized Web Services Discovery Mechanism based on DHT is proposed in this paper. In DWSDM, the web services description are stored in a distributed way, which can avoid single point of failure and performance bottleneck. Proposed layered architecture which decouples application tier for registry center and overlay tier for network topology could facilitate system implementation. Also, the improved Chord algorithm can enhance the system robustness and scalability even when some nodes fail in the DBD.
7. Acknowledgement This paper is supported by the Shanghai Science and Technology Development Foundation under Grant No.05dz15005.
8. References [1]. http://www.uddi.org/, UDDI Version 3.0, Published Specification. [2]. Fred Hartman, Harris Reynolds, "Was the Universal Service Registry a Dream?", In the Web Services Journal, Dec, 2004.
Proceedings of the Fifth International Conference on Grid and Cooperative Computing Workshops (GCCW'06) 0-7695-2695-0/06 $20.00 © 2006
[3]. Mike P. Papazoglou, Bernd J. Kramer, Jian Yang, "Leveraging Web-Services and Peer-to-Peer Networks", 15th International Conference on Advanced Information Systems Engineering(CAiSE 2003) Austria, June, 2003. [4]. Hung-Chang Hsiao, Chung-Ta King, "Neuron:A WideArea Service Discovery Infrastructure", International Conference on Parallel Processing, August 2002. [5]. Cristina Schmidt, Manish Parashar, "A Peer-to-Peer Approach to Web Service Discovery", World Wide Web, 2004. [6]. De-Ke Guo, Hong-Hui Chen, Xian-Gang Luo et al, "Enhance UDDI and Design Peer-to-Peer Network for UDDI to Realize Decentralized Web Service Discovery". [7]. Massimo Paolucci, Katia Sycara, Takuya Nishimura, Naveen Srinivasan, "Using DAML-S for P2P discovery", International Conference on Web Services, 2003. [8]. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan, "Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications", ACM SIGCOMM`01, San Diego, California, USA, September 2001. [9]. David Liben-Nowell, Hari Balakrishnan, David Karger, "Observations on the Dynamic Evolution of Peer-toPeer Networks", First International Workshop on Peerto-Peer Systems (IPTPS`02), Cambridge, MA, March, 2002. [10]. Dejan S.Milojicic, Vana Kalogeraki, Rajan Lukose, Kiran Nagaraja1, Jim Pruyne, Bruno Richard, Sami Rollins, Zhichen Xu, "Peer-to-Peer Computing", HP Laboratories Palo Alto, March, 2002. [11]. Frank Dabek, Ben Zhao, Peter Druschel, John Kubiatowicz and Ion Stoica, "Towards a Common API for Structured Peer-to-Peer Overlays", Second International Workshop on Peer-to-Peer Systems (IPTPS`03), Berkeley, CA, USA, February, 2003. [12]. Min Cai, Ann Chervenak, Martin Frank, "A Peer-toPeer Replica Location Service Based on A Distributed Hash Table", ACM/IEEE Conference on Supercomputing (SC2004), 2004.