Towards Evolving Web Sites into Grid Services Environment - CiteSeerX
Recommend Documents
WS-notification family of specifications has been ... WS-Resource Properties [7], WS Resource Lifetime ... analysis of five different implementation of WSRF,.
Because of the ongoing development in this field .... service, freelance experts's and out-of-work crew ..... platform [10, 13] that guides the developer through the.
Abstract. Objectives: Grid Enabled Medical Simulation Services (GEMSS) aims to provide a ... an environment in which computationally demanding tools relevant to the health sector can .... successful execution of the fluid dynamics analysis.
Jun 27, 2003 ... Developing Web Services in a Computational Grid Environment. Xiaobo Yang,
Mark Hayes, Andrew Usher. Cambridge eScience Centre.
... Services in a Computational Grid Environment. Xiaobo Yang, Mark Hayes, Andrew Usher ..... D. Snelling. Open Grid Services Infrastructure (OGSI). Version ...
there is no easy way to keep data about instructor-student interaction (e.g. .... By using the concrete example of a WS that enables students to build teams online,.
there is no easy way to keep data about instructor-student interaction (e.g. ... Keep it simple - Implement a minimal interface to get the desired functionality.
E-mail: [email protected]. Keywords: ... grating conversations into the composition of Web services. ..... the sender service namely monitoring and assessment.
that is compliant with the recently proposed Web Services Resource Frame- .... that the CustomerService can be hosted on another machine as the AccountSer-.
might form the application development framework of the future for any large scale ... and then concrete web services will be bound to every abstract service at ...
estimated response time) service provider for its required service, according to ... anticipated that web servers that host the services will be subject to increasing ...
provider clicks Delete and login, warning page appears. After confirming, the .... HOST-The name of a host providing a grid service. ⢠TYPE-Service type.
Page 1 .... mobility support by Dynamic Host Configuration Protocol (DHCP) [6], ... hostname) and IP addresses to this centre registration service. Then when a ...
host types approach is already widely adopted through Web service toolkits such as ..... are best from an interoperability standpoint, the application owner may still .... i.e. an instruction that assigns the value 10 to a primitive int is considered
have their own, more traditional life-cycle. ... Instead, we focus upon the content of the Web â the pages, applets, servlets and other program â that the user.
Whereas network capacities have reached enormous scales in the industrial countries, the exchange of ..... Supercomputing'92, Minneapolis. MILOJICIC, D., BREUGST ... Services for the Internet and Wireless Networks. John Wiley and Sons ...
relies on the specification of the components services in terms ... Proceedings of the Eighth European Conference on Software Maintenance and Reengineering ...
HTML responses by the server; these pair-wise. Proceedings of the Eighth European Conference on Software Maintenance and Reengineering (CSMR'04).
from the corporate, health, research and small business sectors and exercise the ... Data must be separated from the application software so that only data is ... used to open accounts and fix payment details. ..... Monitor billing information.
of the MonALISA monitoring system, and adapting the framework to a secure message-based transport protocol. INTRODUCTION. The Clarens Web Services ...
infrastructure on all parties involved. The emerging field of Web services provides an approach that leverages the heterogeneity of IT systems by using open ...
Such functionality is inevitable in the grid environment hosting a vast number of services ..... In the domain of Semantic Web Services, the Web Service Model-.
ISDN network by using an Asterisk server. Several additional supplementary services like Call Forwarding, Call Completion, a Voice Mailbox and a Call Center ...
Towards Evolving Web Sites into Grid Services Environment - CiteSeerX
[email protected] . Alternatively, for further information about ... . . .
Appeared in Proceedings of IEEE International Symposium on Web Site Evolution (WSE’05), IEEE Computer Society, 2005, pp. 95-102 (Pre-Print Version) (Preprinted Version)
Towards Evolving Web Sites into Grid Services Environment Jianzhi Li and Hongji Yang Software Technology Research Laboratory, De Montfort University, Leicester, LE1 9BH, England {jianzhli, [email protected]} Abstract Grid services are emerged by integrating Grid computing and Web services to perform a seamless information processing system across distributed, heterogeneous, dynamic virtual organisations that the user can access from any location. The Web Service Resource Framework (WSRF) was announced as a new way for manipulating "stateful resources" to perform Grid services. It defines the concept of “stateful resources” and how they can be discovered, queried and manipulated via Web services. In this paper, we proposed a Grid services-oriented reengineering approach to create stateful resources from conventional HTML Web sites, which applies hierarchical cluster and wrapper techniques to extract and translate Web sites resources. It supports services identification and packaging and archives Web site evolution into Grid services environment by exploiting WRSF.
1. Introduction A Grid can be defined as a layer of networked services that allow users single sign-on access to a distributed collection of compute, data and application resources. The Grid services allow the entire collection to be seen as a seamless information processing system that the user can access from any location. It is basically Web Services with improved characteristics and services. Web Services are the technology of choice for Internet-based applications with loosely coupled clients and servers. But, the implementations of Web services are typically stateless. Compared this, Grid services are evolved to make possible of dynamically sharing and coordinate dispersing of heterogeneous services resources. As Grid services are dynamic and stateful, a new mechanism is needed to address this issue. The Open Grid Services Infrastructure (OGSI) specification version 1.0 [23], was released in July 2003, which defines a set of conventions and extensions for the use of Web service definition language (WSDL) and
XML schema to enable stateful Web services. But criticisms of OGSI from the Web services community are: there is too much stuff in one specification, it does not work well with existing Web services and XML tooling, it is too object oriented and it does not support the forthcoming WSDL 2.0 standards. As a substitution, Web Services Resources Framework (WSRF) [6], as well as the related WS-notification family of specifications has been proposed. WSRF and WS-notification capture all of the functionality provided by OGSI, but do so in a way that integrates better with evolving Web services standards. Specifically, the WSRF definition relies upon the WS-addressing specification. In addition, the WSRF definition expresses the capabilities of the OGSI definition in a way that is more consistent and will be more familiar to Web services developers, in general [5]. The WSRF has been used to define conventions for modeling and managing state in distributed systems based on Web services context, and WS-Notification. It consists of a set of specifications: WS-Resource Properties [7], WS Resource Lifetime [8], WS-Base Faults [9] and WS Service Group [10], that define how WS-Resources are named, discovered, queried, indexed, altered, and how their lifetimes are managed. Nowadays, Web sites often evolve from small and simple collections of purely HTML pages to big and complex applications, offering advanced transaction and data access facilities. For Web sites Evolution, they can be seen as software systems comprising thousands of line of "code" (HTML) split into many "modules" [4]. Static HTML pages are sort of programs with encapsulated data, which infringe software engineering good practice. This paper describes a service-oriented approach to evolving Web sites into Grid services environment based on WSRF. It is organised as follows: related work are introduced in Section 2. The proposed approach is demonstrated in Section 3, and an example of evolving a university Web sites is presented as well. Section 4 concludes the paper with a summary of our experience to date.
2. Related Works
3. Proposed Approach
As Grid is an emerging technique, there is not much work try to integrate Grid technique with Web sites. Many approaches have been proposed on extracting Web sites information and reengineering Web site into Web services. [21] presents various issues and challenges in Web site development and maintenance, especially in the public domain. [2] suggests to applying conventional reverse engineering techniques such as code analysis and clone detection in order to reduce duplicated content and achieve maintainable Web sites. The developed system includes an HTML parser and analysers that separate content from layout by integrating in HTML pages scripts for retrieving the dynamic data from a database. In [22], they proposed a cluster method based on the automatic extraction of the keywords of Web Pages. Lixto [1] is a wrapper generation tool which is well suited for building HTML/XML wrappers. Otherwise, [4] presents a tool-supported method to reengineer Web sites, that is, to extract the page contents as XML documents structured by expressive DTDs or XML Schemas. In [20] and [15], they presented an approach to WWW information integration, based on the development of a canonical domain model in XML and the wrapping of existing WWW applications with wrappers capable of communicating about entities in this common model with the applications and with an intermediary mediator. [16] aims at developing a technique for wrapping existing Web applications with WSDL descriptions so that organisations with a presence on the “browser-accessible Web” can easily enable programmatic access of their functionalities to other applications. From the perspective of Grid technique, as an improvement of OGSA, the WSRF was announced in 2004 as a new way for manipulating "stateful resources" to perform Grid services. [5] compared OGSI and WSRF, and described the relationship between the concepts, mechanisms, and syntax defined by the OGSI and the proposed WSRF as well as the related WS-notification family of specifications. [14] describes a design to achieve compliance with the WSRF specifications using Microsoft .NET technologies. WSRF.NET is the toolkit for implementing WSRF-compliant Web services that is built on top of the .NET platform and [24] describes an application of exploiting WSRF and WSRF.NET for Remote Job Execution in Grid Environments. In [13], they presented a detailed analysis of five different implementation of WSRF, which include: GT4-JAVA, pyGRIDWare, GT4-C, WSRF::Lite and WSRF.NET.
3.1 Extracting Information from Web Site Web sites often evolve from small and simple collections of Hypertext Markup Language (HTML) which defines the style, structure, and content of the Web pages to complex applications, offering advanced transactions and data access. As HTML is a short-constrained language, it should be useful to check the validity of these sources before working on it. The analysis of Web pages usually starts at a specified initial URL. Firstly, it determines the type of the URL. If it is a HTML pages, it extracts all links to other Web page, and subsequently analyses there in the same manner as the initial URL. Given the description of each Web site page in terms of feature vectors, it is possible to exploit similarity or distance measures to agglomerate entities into cluster. Pages can easily be moved without affecting the links if the file structures are preserved. Mostly, Web sites are built with pure HTML pages, with some embedded content such as images and increasingly the inclusion of scripts both as part of the HTML document or as a means of generating the HTML page. And the designers usually highlight interesting output information with HTML tags, such as or
or
, or with a distinct highlight font attribute, such as color. Also, the file path is a good clue to detect the page types through a Web site. To enable data and structure extraction, first, we built an abstract parse tree to identify with the corresponding paths and possibly some properties of the elements themselves, then, we associated strings which are represented by the text of content leaves to every node of the tree as the attributed elements. In order to extract instances from HTML document, rules must be formulated for how to traverse the tree-structure of the document in order to locate the instances of the constituents. These rules are formulated using XPATH in order to localise the HTML tree that spans all the concept constituents. Our previous work has described a Grid oriented approach on mining and reusing legacy code [18]. Now, employing our matrix on using cluster and slicing techniques, the HTML documents could be cleaned and the contents are separated from layout by integrating in HTML pages scripts for retrieving the dynamic data from a database. As a result, the HTML documents are cleaned and the functions can be reused and extracted from the existing program together with the data they use.
After parsing, the system stores a rationalised copy of the extracted HTML data in central data storage with each unique element stored in a separate part of the repository. [2] defines a relational database which can be used for the central data storage. Rather than storing the details of how to reconstruct the pages, the original HTML page is modified to use scripts that retrieves the data from storage and regenerate pages as required. These modified files contain only the structure of the original HTML files. The files are stored in an organised directory structure on the server. It is these pages that will be accessed.
3.2 Translating to XML While both HTML and XML are languages for representing semi structured data, the first is mainly presentation oriented and is not really suited for database applications. To import existing Internet information sources to Web Services domain, the next step after mining the information is to extract the relevant information from HTML documents and translate them into XML, which can be easily queried or further processed. The benefits of the approach is that without any efforts to duplicate the Web sites code, they inherit all features from the sites while can be enriched with Web Service features. Furthermore, they can be easily integrated each-other to perform dynamic services in Grid environment.
Figure 1. A page of De Montfort University
For the purpose of connecting the extracted Web site to services, the interfaces are essentially defined in a complex XML type. It is attributes are the name, type and name space. The XML grammar can be used to identify all publicly-available methods, their signatures, and their return types. Faculty of Computing Sciences & Engineering CSEFaculty Admissions Faculty of Computing Sciences and Engineering De Montfort University The Gateway Leicester LE1 9BH UK (0116) 257 7456 (0116) 257 7693 [email protected] Alternatively, for further information about ... […]
Figure 2. XML translation of CSE Web page Interface creation [19] is proceed by a data flow analysis which identifies all of the variables referenced either directly by the function. This includes overlaying data structures and indexed arrays. If an elementary item in a structure is referenced, then the whole structure is included in the interface. The proposed approach for translating is relied on the use of XML schema for creating patterns in hierarchical order. When facing a new HTML document, the only pattern is with a unique instance. In the extraction step, the HTML document has been parsed into a tree representation rooted at the tag, and document subtrees that may contain information of interest can be efficiently located. Each node in the tree represents a pair of matching HTML tags enclosing a part of the document. The label on the node corresponds to the entity of the target concept contained in this part of the document. Then, the HTML elements of the document parse tree can be easily marked by XML patterns. The output by the extractor is well-suited for translation into XML. By exploiting the hierarchical structure of the pattern instance base and using pattern names as default XML element names, the HTML attributes can be translated and appeared in the XML output. Take a university Web site as example, we first define a pattern and then define a sub pattern . The sub pattern relationship in this case expresses that each extracted instance of must occur within one instance of . Pattern names act as default XML element names.
Each pattern characterises one kind of information. The set of extracted instances of a pattern, which are either HTML elements, list of elements, or strings, depends on the current page. Also, the data and its structure can be defined by XML document. As an example, we apply the approach to reengineering the Web site of De Montfort University. For the needs of this case study, we focus on examining one very simple page: the Faculty of Computing Sciences & Engineering (CSE). Figure 1 gives a glance at the Web page of the CSE Faculty. Concern to the data structure, this page describes an administrative staff and his contact information, as well as some other concerned information. It also contains a picture and much other related links. By analysis the CSE faculty Web page, we create the patterns of “Faculty”, “FacultyName”, “Stuff”, “Telephone”, “Fax”, “Email”, “Description”, “ Information”, and so on. Figure 2 display the result of XML translation of CSE faculty Web page.
3.3 Stateful Resource Creation Grid Services are basically Web Services with improved characteristics and services. The most important improvement is that Grid services are stateful and transient services. It can remember what have been done from one invocation to another with the support of stateful resources. The definition of stateful resources is given as resources have a specific set of state data expressible as an XML document, have a well-defined lifecycle; and be known to, and acted upon, by one or more Web services [12]. Its state may be implemented as an actual XML document that is stored in memory, in the file system, in a database, or in some XML Repository. WS-Addressing [3] provides transport-neutral mechanisms to address Web services and messages. This specification defines XML elements to identify Web service endpoints and to secure end-to-end endpoint identification in messages. A WS-Addressing endpoint reference is an XML serialisation of a network-wide pointer to a Web service. This pointer may be returned as a result of a Web service message request to a factory to create a new stateful resource. The patterns have been used by WS-Addressing to indicate the relationship between Web services and stateful resources. The composition of the Web services and the stateful resource can be referred as a WS-Resource. Resource property elements are almost identical to service data elements. The only difference is that resource property element declarations are simply XML global element declarations. A resource properties document collects resource property
elements, and the resource properties document is associated with an interface by using an XML attribute on the WSDL 1.1 portType. The WS-Resource Properties [7] specification defines the type and values of those components of a WS-Resource’s state that can be viewed and modified by service requestors through a Web services interface. In WS-Resource Properties, a set of more specific operations for getting and setting resource properties: single-element get, multi-element get/set, and XPath query. Stateful resource is defined by a single XML Global Element Declaration (GED) in a given namespace, comprising a set of references to XML GEDs of the individual resource properties. A specific resource’s state may be implemented as an actual XML document it participated as services resource. …