Development of Applications with Service ... - ACM Digital Library

4 downloads 52367 Views 362KB Size Report
International Conference on Computer Systems and Technologies -. Development of Applications with Service-Oriented Architecture for Grid. INTRODUCTION.
International Conference on Computer Systems and Technologies - CompSysTech’08

Development of Applications with Service-Oriented Architecture for Grid Vladimir Dimitrov Abstract: Service-Oriented Architecture (SOA) is an approach for development of highly distributed integrated enterprise applications. Development of SOA applications needs of: process orchestration software; mediation software for message exchange; and repositories for registration and discoveries of services. Grid is service-oriented; it has repositories for service definition and discovery; but still there are no wide accepted mechanisms for orchestration and mediation. Key words: service, service-oriented architecture, grid, distributed computing.

INTRODUCTION The intent of this paper is to investigate IBM approach to SOA and its applicability to Grid. First of all we explain what SOA by IBM is. Then we investigate what Grid by IBM is. SOA BY IBM In [1] IBM defines SOA as: “… an architectural style for creating an Enterprise IT Architecture that exploits the principles of service orientation to achieve a tighter relationship between the business and the information systems that support the business.” SOA is following principles of service orientation. SOA is an architectural style. The main achievement of SOA is tighter relationship between the business and the information systems. SOA is applied for development of Enterprise IT Architecture that integrates business information systems. Information system (IS) by [2] is “A system, whether automated or manual, that comprises people, machines, and/or methods organized to collect, process, transmit, and disseminate data that represent user information.” Information systems are two kinds: On-Line Transaction Processing (OLTP) – short running and long running ones as data mining processes. OLTP IS is composed of independent short transactions and the state of business process is supported persistently by a Database System. These IS are database centric. Long running information systems are more complicated and their state has to be supported in the main memory, so a special support is needed in case interrupts or failures. Both kinds of information systems are addressed here. To clarify IBM SOA we give the definitions of terms used above in [1]: • “…service-orientation is a way of integrating your business as a set of linked services.” Or in other words service-orientation looks at business (business processes) as structured in services linked with each other. • “…a service is a repeatable task within a business process.” This definition of the service as repeatable task is not precise. From here it is not clear what is the granularity of this repeatable task. Right understanding of service granularity is the condition for success or fail of a SOA based project. If the services are very small much execution time is spend for communications. If services are very large they are complex and hardly manageable. The experience shows that well designed business services are well granulated services. What the business name a service it is a service. It is possible some system services to appear later in the development, but they only support software design and implementation – they do not extract any functionality from the business services, they utilize implementation of some functionality of business services. This approach is in some contradiction with IBM’s Service-Oriented Modeling and Architecture (SOMA) [12] approach. • “A composite application is a set of related and integrated services that support a business process built on SOA.” This means that composite application is a SOA implementation of business process. The SOA standards widely accepted from the main players on the market are: - II.9-1 -

International Conference on Computer Systems and Technologies - CompSysTech’08

• The language is Business Process Execution Language (BPEL) [7]. This is a XML based language developed by OASIS consortium. In BPEL the business process is described as communicating Web services. • Web services are specified with Web Services Description Language (WSDL) [8] – XML based description language of Web services. In WSDL are specified: operations on Web service, messages that Web service interchange through its operations and where Web service resides. • Message exchange among web services follows Simple Oriented Access Protocol (SOAP) [10] – exchange protocol of XML messages over a transport level protocol usually HTTP. • Web services are registered (published) in repositories supporting Universal Description, Discovery and Integration (UDDI) [9] protocol. How IBM supports these standards? Business processes should be described (from methodological point of view) with WebSphere Business Modeler (WSBM). This tool permits business process to be simulated and optimized. One of the benefits of IBM SOA approach is optimization of business processes. The problem here is with the simulation data - to achieve good results huge amount of real data has to be prepared. This is very expensive and as result simulation frequently is shifted out from the development process. Business processes in WSBM are specified in user-friendly diagrams. All BPEL construct have graphic representations. WSBM is mainly modeling tool and as such one it permits the business process to be described interactively going down in more details. Web services are described as tasks interchanging business items. Task can be specified with its operations. All diagram elements could be well documented with a rich set of properties. WSBM is integrated with another IBM tool Rational Software Architect. This integration supports Rational Unified Process (RUP) [11]. Business Use Cases Model and Use Case Model are very well integrated with WSBM diagrams and other RUP artifacts. Detailed business processes specified in WSBM diagram are exported to WebSphere Integration Developer (WSID). Briefly, the tasks are exported as WSDL specification, business items as XSD Schemas, and business processes as BPEL specifications. WSID supports integration with external services and allows development of services from legacy software via adapters. WSBM diagrams are exported to WSID diagrams. The last ones are BPEL based diagrams, but they are straightforward graphical implementation of BPEL. It is very important to know what is exportable and what is not exportable from WSBM diagrams and using this knowledge to design suitable for export WSBM diagrams. WSID is not modeling tool - it is developer’s tool. In reality, if the first modeling phase is unsuccessful for any reasons, the developers start the implementation directly with this tool. This is not good approach, but good experts on both tools WSBM and WSID are rear events. With WSID newly developed services are developed and integrated, legacy software are integrated via adapters, and external services are integrated too. Implemented business process is deployed to WebSphere Process Server (WSPS) for execution. The business process specified in BPEL is a template. It is used to instantiate an execution process. Templates are deployed to WSPS. Process instances are created when an initiating message is received – there is no special notation for initiating, they are usual messages, but process templates have information which messages can create a new executing process instance. Single WSPS can run simultaneously thousands of processes in parallel. There is another alternative for process specification in WSID – business state machine, which is extended finite state machine. This is more traditional for IBM approach, but service orchestration with finite state machine is very different from business process diagram from business point of view. Finally, these thousands of processes concurrently running, is one of the challenges of SOA implementation. - II.9-2 -

International Conference on Computer Systems and Technologies - CompSysTech’08

BPEL is originally designed to support statefull processes on stateless services. This idea permits stateless services to be executed on any suitable working node and the process is orchestrated only from one working node containing the process state. But one of intends of IBM is to use SOA for integration of legacy software and it is clear that service based on legacy software is usually statefull. In IBM SOA statefull service is not an exception, but rule. This rule permits system not to be engineer, but to be reengineered. Usually, the reengineering of monolithic software is badly accepted by the clients – they pay for new system, but soon they find out that only part of their system will be new. Another problem of IBM SOA is human-computer interactions. The design of business process is design of process flow, but not human interaction with the system. IBM offers WebSphere Portal (WSP) as a mediator between WSPS and the users. The problem is that when business process needs of human interaction the corresponding service implementing this interaction suspends the process until an answer from the human is not received. This service is called human task. The development of human tasks is supported by WSP which is well integrated with WSPS. When the system is big one, with many users there are very many suspended processes blocking resources. One approach to resolve this problem is to design human-computer interactions as specialized business workflow process. Processes can communicate with this interaction process without blocking resources. The message exchange is supported through the concept of IBM Enterprise Service Bus (ESB). It is implemented directly in WebSphere and is supported in several other products among which is WebSphere Message Broker (WSMB). These implementations are not very compatible with each other. IBM, like the other SOA market players, claims that supports above mentioned SOA standards, but in reality supports functional and interchange compatibility. All SOA standards are XML based, which means overhead in performance and traffic. For example, when one service sends a message to another one, first service has to convert the message from its internal representation to XML message, this message should be delivered to the other service, and then the second service has to parse the message and convert it in its internal format. If both services are on the same computer, then this procedure can be shifted out. WSMB intention is to convert messages from/to different formats. Every received message by WSMB is converted in its internal format and only then to target one. XSLT transformations are supported, but WSMB works faster with ESQL transformation – internal language for describing message conversions. There is no warranty that message interchange between services developed on different hardware/software platforms can be implemented without problems. Experience shows that at least week is needed to implement such an interchange. As result of that, it is clear that the best environment for business process implementation is homogenous one. GRID BY IBM IBM started with several research projects on Grid computing to formulate its vision on the topic. Initially IBM used Globus Toolkit - it is an implementation of Open Grid Services Architecture (OGSA) [3]. From the very beginning IBM look on the Grid as an environment for SOA. One successful scientific solution using Globus Toolkit and SOA is the Telescience project [14]. It integrates remote instrumentation, Grid-based distributed computing and data management. Another early project is Grid Medical Archive Solution [4] that is used in University Health Care System [18] available in Georgia and South Carolina. This system is used to store, retrieve and deliver archived cardiology studies. IBM component is the hardware – IBM eServer xSeries and IBM TotalStorage DS4100 storage servers. IBM developed for Zurcher Kantonalbank a cluster solution that accelerates risk assessment [15]. This solution uses Sun Grid Engine [16] and IBM Clusters 1350 servers. Solution of the same kind is advertized the IBM deployment of IBM hardware (IBM eSeries xServers, IBM SAN Dwitch device, IBM TotalStorage 4400 storage server, and IBM TotalStorage 3584 Ultrium Ultra Scalable Tape Library) for LHC tier-1 - II.9-3 -

International Conference on Computer Systems and Technologies - CompSysTech’08

data center at FZK [17]. In IBM Grid and Grow for actuarial analyses [6] the Grid element is IBM eSeries BladeCenter with some Grid middleware. In all mentioned solutions IBM Grid component is IBM BladeCenter Server, and the Grid middleware IBM Tivoli Workload Scheduler LoadLeveler. This configuration is applicable to data intensive Grid computing [5]. IBM defines Grid infrastructure objectives in [19] as: creates a virtual application operating, storage, and collaboration environment; virtualizes application services execution; dynamically fulfils requests over a virtual pool of system resources; offers an adaptive, self-managed operating environment that offers high availability. In [19] the following more advanced Grid solutions are represented: • Network Grid Infrastructure for File Downloading. A network of distributed file servers enables an optimized download upon a client request for a particular file. The system is built as a Grid of dispersed download servers. This solution primarily addresses the enterprise optimization business driver. Grid vision presented here is very much like that one in [20]. This Grid consists of Grid coordinator and Grid nodes. Grid coordinator (downloadGrid management center) accepts jobs and distributes them among work nodes. The job is to download some file from the repository. The Grid coordinator creates optimized downloading plan using Feedback and Statistical Module and Optimized Plan Module. The system information database can be replicated by Replication Module. This system implements GSI as security mechanism. • Public Health Data Grid. A network of servers stores digital mammographies that are associated with explanatory notes and comments about each image. The system is built up as a Grid-enabled federated database. This solution primarily addresses the productivity and collaboration business driver. The Grid element is Globus Toolkit 3.0 • Computational Grid Infrastructure for the Upstream Oil and Gas Industry. A network of servers provides a high-performance virtual cluster to process oil field exploration applications, such as upstream petroleum processing. This solution helps to accelerate the business process and to optimize the enterprise. In this solution Globus Toolkit is used, but job submission is organized through Grid Portal that automates user interactions with the Grid environment. • Industrial Sector Data Grid. A network of data servers enables users to access heterogeneous files at different systems, regardless of where they are. This solution primarily addresses productivity and the collaboration business driver. • Computational Grid Infrastructure for Trading Analysis. A network of desktops and servers helps to gain the necessary computing power to run long and complex algorithms that are required for trading analysis. This solution helps to accelerate the business process and to optimize the enterprise. In this example, IBM WebSphere MQ is used to connect client software to DataSynapse GridServer. The last one is cluster management software. IBM WebSphere MQ is used as mediator in standard client-server architecture between the client, Grid software and the server. • Computational Grid for the Consulting Industry. A solution aims to release the computation consumption of an IBM mainframe by submitting heavy algorithm jobs to Grid nodes. This solution addresses IT optimization as its primary business driver. In [21] is stated: “Grid computing, most simply stated, is distributed computing taken to the next evolutionary level. The goal is to create the illusion of a simple yet large and powerful self managing virtual computer out of a large collection of connected heterogeneous systems sharing various combinations of resources.” This idea for resource virtualization is the mail stone for IBM. It is not so important for IBM how Grid computing is implemented and what standards are satisfied – the most important is fundamentals of Grid computing to be followed. These fundamentals are not very different from those ones presented in [22]. One interesting concept introduced in [21] is Intragrid vs Intergrid in the just a same manner as Intranet vs Internet. By default IBM and the other commercial Grid solutions vendors view cluster computing as Grid computing. - II.9-4 -

International Conference on Computer Systems and Technologies - CompSysTech’08

SOA AND GRID In [23] Grid technology and SOA are presented as merging approaches from the architecture view point. Grid application architecture is presented in Fig. 1. Merging Grid technologies with SOA means Grid application/system software to be developed (reengineered) in SOA. SOA standards are accepted and modified for Grid Computing by GGF. Starting from Globus Toolkit 3, SOA standards are implemented in the core services. The opposite process is going in IBM. The software tools supporting SOA standards - IBM WebSphere are incorporating Grid technologies [24]. IBM WebSphere Extended Deployment (WSED) implements Grid as resources virtualization. WebSphere Extended Deployment creates an On Demand Business infrastructure that spans both transactional and long-running workloads. This integrated environment dynamically determines how optimally to allocate application-infrastructure resources based on customer-defined business goals.

Figure 1 Grid SOA application architecture For IBM the corner stone is SOA and Grid is an infrastructure helping to realize the full benefits of SOA. WSED is in reality cluster management software that can be extended to use external services. There are two main points in using SOA for Grid: “How SOA could be supported by Grid?” and “Can be Grid reengineered in SOA?” SOA standards are: BPEL, WSDL, UDDI, SOAP, and XML. These standards have to be implemented in Grid middleware. Processes running on Grid have to be specified in BPEL and orchestrated by specialized nodes. Such candidate orchestration software for these nodes can be WSPS, JBoss, Oracle BPEL Process Manager, or BEA WebLogic. The most advanced orchestration middleware is WSPS. It can be integrated in Grid environment, but it works in combination with other WebSphere products, so it is hardly to imagine that WSPS can be used for an orchestration node. Oracle BPEL Process Manager and BEA WebLogic have just a same problem – they are not open enough to be well integrated with software deployed by other vendors. The only realistic candidate is JBoss. Its problem is that it is still not mature enough for serious usage. Repositories support WSDL and UDDI, so services could be registered and searched in Grid. The problem from mediation point of view is efficiency of message exchange between services. ESB works on only one cluster

- II.9-5 -

International Conference on Computer Systems and Technologies - CompSysTech’08

where the communications are highly reliable and are faster than inter cluster exchange. First SOA applications will be cluster based. Grid is diverging to SOA support environment, but many more work has to be done before it become true. SOA is an ultimate architecture for development of distributed system. Grid software is older and does not have this architecture. It is possible to develop adapters for Grid software and start orchestrate it, but it is not sensible, because such software would not run efficiently. The only way is to reengineer Grid software in SOA, but it is very expensive. CONCLUSION What is the realistic scenario for development of SOA based Grid application? First, processes have to be executed on one cluster. A worker node has to be specialized for orchestration. Application software has to be reengineered for SOA. For this reengineering tool like WebSphere can be used. Some kind of ESB has to be developed for message exchange between services in the cluster. This ESB can be MPI based. This work was supported by National Ministry of Science and Education of Bulgaria under contract VU-MI-109/2005: "Creation and development of Grid infrastructure for research and education at University of Sofia". REFERENCES [1] High, R., Jr., S. Kinder, S. Graham. IBM’s SOA Foundation. An Architectural Introduction and Overview. Version 1.0, 2005. [2] ATIS Committee T1A1. ATIS Telecom Glossary 2000, 2008. [3] What is grid computing? IBM, 2008. [4] Grid Medical Archive, IBM, 2008. [5] Grid Solution for Data Intensive Computing, IBM, 2005. [6] IBM Grid and Grow for Actuarial Analysis, IBM, 2007. [7] OASIS Web Services Business Process Execution Language (WSBPEL). [8] W3C Web Services Description Language (WSDL) 1.1. [9] OASIS Universal Description, Discovery and Integration (UDDI). [10] W3C SOAP. [11] Rational Unified Process, IBM, 2008. [12] Portier, B. SOA terminology overview, Part 3: Analysis and design, IBM, 2007. [13] Haynos, M. Perspectives on grid: An overview of WebSphere Extended Deployment, IBM, 2007. [14] Lin, A. W., Building a unified grid, Part 1: Grid architecture in the Telescience Project, IBM, 2006. [15] Zürcher Kantonalbank accelerates risk assessment with an IBM grid, IBM, 2006. [16] Sun Grid Engine, Sun, 2007. [17] FZK gains global access to petabytes of data to establish a major scientific information hub, IBM, 2007. [18] University Health Care System improves patient care with enterprise grid storage system, IBM, 2005. [19] Grid Computing: Solution Briefs, IBM, 2007. [20] Brown, M. C., Build grid applications based on SOA, IBM, 2007. [21] Berstis, V., Fundamentals of Grid Computing, IBM, 2005. [22] Foster, I., C. Kesselman, The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, 2004. [23] Anderson, T., Grid and SOA. Definition and architecture, IBM, 2007. [24] IBM WebSphere Extended Deployment: Providing enhanced infrastructure capabilities for SOA environments, IBM, 2006. ABOUT THE AUTHOR Assoc. Prof., Vladimir Dimitrov, PhD, Faculty of Mathematics and Informatics, University of Sofia, Bulgaria, Phone: +359 2 8161 549, E-mail: [email protected]. - II.9-6 -