Evolution of Web-based System: An Architecture Centred ... - DMU

12 downloads 107 Views 173KB Size Report
Web era puts new challenges for software evolution, however. ... to a number of business activities, such as implementing customer support, managing supply ...
Evolution of Web-based System: An Architecture Centred Approach Bing Qiao and Hongji Yang Software Technology Research Laboratory De Montfort University England {bqiao, hyang}@dmu.ac.uk Abstract It’s the Web that is boosting the flourish of new standards, applications and platforms. Such prosperity of Web era puts new challenges for software evolution, however. The style and method of design and construction of Website-based systems are so diverse that it is necessary to combine different techniques and classify the various Web Systems by architectures in order to systematically and effectively evolve the whole system. As far as a particular system is concerned, it must be responsive to change resulted from the introduction of either new service requirements or new platform technologies. This paper presents a method for migrating a website-based system to an evolutional architecture, which is based on a set of important, widely supported technologies and standards. Keywords: Software evolution, Model-driven Architecture, Middleware, Component-based distributed system

1 Introduction System evolution is now widely recognised as a crucial aspect of building large-scale enterprise application that applies to a number of business activities, such as implementing customer support, managing supply chains and maintaining data exchange. Software maintenance is focused on short-term, extemporaneous changes and frequently involved in a rush to get rid of a system bug. In the long term, the overall implication from the intermittent changes will make the whole system unstable, ineffective and un-maintainable and finally it will be difficult, if not impossible, to evolve such systems any more. In contrast, an “evolutional system” is capable of accommodating changes by nature, i.e., it is built in an architecture that allows for not only modification but also the importation of new application, new representation and new habitat. Instead degrading the whole system that has been observed in traditional develop-and-maintain cases, such importation will operate at a higher level and make the system more capable.

1.1 Evolution of Web-based Systems Evolutional systems are only part of the solution to the legacy problem. A method for transforming today's web-based legacy systems into evolutional systems is required. In [Weiderman 97], the approach for reengineering is classified as follow: White-box program understanding The past decade research on software reengineering is mostly around the deep program understanding, known as whitebox transformation, to model the domain, extract information from the legacy system using appropriate extraction mechanisms, or create abstractions that help in the understanding of the resulting structures [Tilley 95]. Many methods have been developed to assist the program understanding process and greatly contributed to the current system evolution research. There are considerable legacy systems reengineered successfully using white-box transformation, although some limitations to this method, e.g. the high development cost, restriction to specific coding technique, and the lack of standard framework and process. Black-box component wrapping With the growth of Internet and relevant applications, the deficiency of pure program understanding is exposed thoroughly considering the time to market and widespread distribution pressures. Web-based system is a huge heterogeneous complex of a variety of communication protocols, different client and server side technologies and a myriad of document formats, etc. Using program understanding to evolve such an information-monster is to some extent unrealistic. As a black-box transformation, wrapping or encapsulation emphasises on the external interfaces rather than the deep understanding of the internal structure of the legacy system. Wrapping technology makes gigantic

legacy systems, which can be neither replaced nor white-box transformed economically and timely, accessible to clients and up-to-date enterprise applications via Web.

1.2 Middleware for Distributed Systems A common process for wrapping legacy system is like: identifying sub-systems; evaluating middleware solutions; wrapping sub-systems into components; implementing the component of web representation; integrating components into distributed system. Regardless of underlying heterogeneous computing and networking environment, middleware presents a single view to distributed applications by means of a layer of software that is logically placed between the clients and the server. The key idea of middleware is to implement the scalability, openness and transparency of distributed computing. Applications that relies solely on specific middleware environment can get round the internal implementation details, such as programming language, operating system, network protocol, and invocation method to keep valid across the heterogeneous environment where the middleware has been ported [Tanenbaum 02]. However, the eligible legacy systems for wrapping should be able to meet the required service and the faulty parts of the system will not be identified without a deep program understanding. A combination of evolution strategies may be the most appropriate way to evolve a web-based legacy system. For example, a legacy system may be in good order except that its data schemas have become overly complex. In this case, an evolution strategy might include transforming of the data structure and wrapping for the application logic. The remainder of the paper is organised as follows. Section 2 gives the overview of the proposed approach to migration and evolution of legacy systems. The framework and process are expounded in section 3 and 4. Section 5 concludes the paper and blueprints the future work.

1.3 Other Related Work Architecture recovery has been recognised as a crucial aspect of system evolution since the late 1990s. A number of researches are dedicated to a viable solution in some circumstances. Recent efforts include: [Hassan 02], which provides a taxonomy of web applications and uses a white box reverse engineering approach; [Ding 01], which uses an approach to architecture extraction based on use-case, whereas emphasising on a light-weight method, it still needs to be improved for large-scale system; [Zdun 02], which tries to reengineer the legacy system into a Web-based system based on a reference architecture, but in order to achieve a evolutional system, this reference architecture has to be more flexible to accommodate different platform technologies; [Ivkovic 02], which applies a combined approach named Dynamo-1 to recover the dynamically linked applications, but it is not dedicated to a solution to Web-based systems. [Anquetil 00] provides an algorithm to cluster files based on their names, which is very useful in architecture recovery, especially when handling a Web-based system with hundreds of thousand files. Most of other existing approaches are limited to the reverse engineering process only or do not consider the features of Web-based systems. So a systematic and consistent method is necessary to requirements for large-scale, Web-based system evolution.

2 Proposed Approach We propose an architecture-based approach in this paper. The approach consists of four major steps shown in Figure 1: Requirements, Recovery, Migration and Deployment. The diagram is extended from [Cheesman 01] by adding reengineering-related workflows. This paper will expound the Recovery and Migration steps. New Business Requirements

Legacy Business Requirements

Use Case Models

Requirements Legacy System Business Concept Models

Recovery

Migration

Deployment

Use Case Models Specs and Architectures

Applications Figure 1

3 Architecture Recovery Architectural analysis is the decisive part of evolution process for a Web-based system. This is due to the fact that only analysis on the architectural level permits a real evolution of a system [Boucetta 99]. In this paper, the architecture recovery consists of three steps. Firstly, a taxonomy should be built up by classifying the popular Web-based systems. Every category inside represents a typical enterprise application and is defined by a set of functions and attributes. Next both business and system model are established by collecting as much as possible information from domain experts, developers, maintainers and users. Finally, according to the initial component specs & architecture, the new service specs is created and legacy system is decomposed into components. The identified components and the component specs & architecture are modifiers to each other, i.e., the specs & architecture give a clue to the search of candidate components, while the produced components are used to update the initial specs and architecture by dependencies between components, which are defined in system taxonomy. Legacy Concept Model

Use Case Model

New Concept Model Identify Business Interfaces

Architecture Recovery

Existing Interfaces

Architecture Patterns Legacy System

Create Component Specs & Architecture

Identify System Interfaces & Ops System Taxonomy

Identify Components in Legacy System New Service Specs

Decomposed Legacy System Specs Figure 2

3.1 System Taxonomy The Decomposition is established through a taxonomy of Web-based systems, upon which the architecture of Webbased legacy system is recovered and migrated to an evolutional system built on distributed components. A complete and clear taxonomy of Web-based systems is vital to the successful evolution. In the first place, such taxonomy should reflect exactly the reality of targeted systems since every component derived from it must fit into the real systems as well as possible. Furthermore, it should consider the different evolution of corresponding identified components in order to apply appropriate migration methods to particular parts of the legacy system. One relevant taxonomy comes from [Tilley 01], where Tilley and Huang classify Web sites into three categories: Class 1: Sites that are static, Class 2: Sites that provide client-side active components, Class 3: Sites that provide server-side dynamic contents. For the first two types, if the site is not evolving then its contents and structure remain the same. In contrast, Class 3 has dynamically changing contents and possible even a dynamically changing structure. This classification should be extended to have each category related with the corresponding formalised implementation details in order to identify the legacy components.

3.2 Initial Model of the Legacy System This step is to assemble an initial model of the legacy system, which involves domain experts, system developers, maintainers and users. There are three sub-steps from [Cheesman 01]:

Identify Business Interfaces. Business interfaces are abstractions of the information that must be managed by the system. • Identify system Interfaces & Ops. System interfaces and their initial operations emerge from a consideration of the use case model. The system interfaces are focused on, and derived from, the system’s interactions. • Create Initial Comp Specs & Architecture. The initial set of component specifications and their interactions are mapped to legacy system. All the information is crucial to analyse the legacy system for the decomposition. The result model should be described in UML, which is capable of modelling legacy data, code and state-of-art components. •

3.3 Identification and Calibration Identification of Components includes: identifying encapsulated data and databases, identifying encapsulated functions and clustering the identified modules Statistical Analysis and Clustering are commonly used in analysis of documents [DelphiGroup 02]. Here, it can be extended to analyses and measures co-occurrence of targets. Relative placement of targets in system is important. Statistical analysis and clustering also consider the frequency of targets, placement and grouping, as well as the distance between targets. The placement and distance are measured according to some system attributes, such as the naming convention or directory structure. The clustering facilitates the process to reduce the complexity of large systems and makes the generated architecture clearer and simpler. Usually, the architecture made of the identified components do not fit close to the initial component specs & architecture. Some approach is required to leverage between the outcomes of the top-down and bottom-up methods. Calibration of Domain Model uses the Bayesian Probability [laskey 99]. A Bayesian network structurally represents the dependency among uncertain variables of knowledge and infers about these variables. It is a directed graph composed of nodes and arcs. The nodes represent variables whose value is uncertain and the arcs represent dependency relationships between the variables. In the proposed approach, the uncertain variables represent components and the arcs represent weak implication. The existence of components can affect each other.

4 Architecture Evolution Use Case Model

Decomposed Legacy System

New Service Specs

Model System using MDA

Legacy System

Architecture Migration

Implement System Using J2EE

Publish System Using Cocoon

Evolutional System

Figure 3

Firstly, both the new service specs and the decomposed legacy system are modelled in MDA. The former is developed using a UML model that is platform-independent, while the latter must be wrapped with MDA-compliant interfaces to work with the new and evolving MDA systems. Then the produced platform-independent model (PIM) can be mapped into one or a set of appropriate infrastructure and implementation environments (the proposed approach uses EJB). Finally we deliver the web application using the Cocoon publishing framework, part of the Apache XML project.

4.1 Model Driven Architecture The Object Management Group (OMG) is boosting the Model-Driven Architecture (MDA) as a unified solution to system integration. MDA is based on widely supported OMG standards: The unified Modeling Language (UML), the Meta Object Facility (MOF), the XML Metadata Interchange (XMI), and the Common warehouse Metamodel (CWM),

which empowers developers to build adaptable systems invulnerable from rapid evolution of hardware and software platforms. MDA is dedicated to architectures capable of changes from both service requirements and platform implementation technologies. In [Siegel 01], A process is given for developing in OMG’s Model-Driven Architecture. The first step is to create the platform-independent model (PIM), which is described in UML at different levels. The base PIM expresses only business functionality and behaviour, while the PIMs at the next level include some aspects of technology even though platform-specific details are absent. Secondly, the produced PIM is translated to a platform-specific model (PSM). However, since UML is independent of middleware technologies, it is not sufficient to describe a PSM. This problem is solved by UML Profile, a standardised set of extensions (consisting of stereotypes and tagged values), which form the base for integrating UML-based modelling with Java and component-based development environment. The last step is to generate the application (e.g., J2EE, CORBA or .NET) from PSM using platform-specific code generator. This process can be implemented using formal or semi-formal transformation rules.

4.2 Wrapping and Representation SunONE and Microsoft .NET are two solutions to building Web Services based on XML and SOAP. The choice for what platform to use depends on the existing legacy systems and business concerns. Both [Maier 02] and [Vawter 01] give a detailed comparison between SunONE (J2EE) and Microsoft.NET. Such comparison is analogous to the battle among CORBA, RMI and DCOM (COM+). Although both have pros and cons, such as language-independency, platform-independency, the SunONE seems to win more support from developers and vendors. After all, Java is the de facto standard for Internet application as well as SunONE provides a better legacy integration method considering both existing hardware and software system, which holds the balance of the business strategy for system evolution. In the proposed approach, the legacy system will be wrapped using EJB. As a portion of Java 2 Platform, Enterprise Edition (J2EE), EJB plays an important role in Sun DART (Data, Applications, Reports, Transactions) architecture. The Enterprise JavaBeans (EJB) standard is a component architecture for deployable server-side components in Java. It is an agreement between components and application servers that enable any component to run in any application server. Physically, EJB is composed of a specification of rules and a set of Java interfaces. EJB components implement business logic and run as server-side components [Ambler 02]. Apache Cocoon is a publishing framework based on XML to provide server applications like XML Transformations, Model Evolves, and SoC-based (Separation of Concerns) Design. By analysing the processing instructions embedded in XML documents, Cocoon processes and formats the data and return it to the client. The supported output formats include PDF, XML, VRML, etc., which can be controlled by changing transformation rules [Apache 02]. The EJB components, Servlets and Cocoon are combined into a Model-View-Controller paradigm shown in Figure 4. This paradigm is derived from [Ambler 02] with the JSP replaced by Cocoon. Request

Servelts

Response

Cocoon Producers -------------------File Producer

Cocoon

Cocoon Processors -------------------XSLT Processor -------------------XSP Processor -------------------LDAP Processor

Cocoon Formatters -------------------HTML -------------------Text -------------------PDF

Call EJB Components

Business Layer (EJBs)

J2EE Server Figure 4

5 Conclusions The explosion in network bandwidth, the huge progress of computer hardware, and the prevalence of PC all contribute to the profound implications on what the yardstick is to develop software, or from the point of view of the business, what is the real nexus of choosing method to implement service. The answer is “endless and quick changes”. This paper addresses this issue by presenting an approach to achieving system evolution. Two of the steps, Recovery of component specs & architecture and Migration to evolutional system, have been explained. The future research will focus on a detailed taxonomy of Web-based system and the concrete implementation of this process.

Bibliography [Weiderman 97] N. Weiderman, L. Northrop, D. Smith, S. Tilley, “Implications of Distributed Object Technology for Reengineering”, 1997. [Tanenbaum 02] A. Tanenbaum, M. Steen, Distributed Systems, Principles and Paradigms, Prentice Hall, 2002. [Hassan 02] A. Hassan, “Architecture Recovery of Web Applications”, 2002. [Ding 01] L. Ding, and N. Medvidovic, “Focus: A Light-Weight, Incremental Approach to Software Architecture Recovery and Evolution”, 2001. [Zdun 02] U. Zdun, “Reengineering to the Web: A Reference Architecture”, 2002. [Ivkovic 02] I. Ivkovic, M. Godfrey, “Architecture Recovery of Dynamically Linked Applications: A Case Study”, 2002. [Anquetil 00] N. Anquetil, T. Lethbridge, “Recovering Software Architecture from the Names of Source Files”, 2000. [Boucetta 99] S. Boucetta, H. Ghezala, F. Kamoun, “Architectural Recovery and Evolution of Large Systems”, 1999. [Tilley 95] Tilley, Scott R. and Smith, Dennis B. Perspectives on Legacy Systems Reengineering, 1995. [Tilley 01] S. Tilley and S. Huang. “Evaluating the reverse engineering capabilities of web tools for understanding site content and structure: A case study”, 2001. [Abowd 97] G. Abowd, A. Goel, D. F. Jerding, M. M. McCracken, M. Moore, J. William Murdock, C. Potts, S. Rugaber, L. Wills, "MORALE Mission Oriented Architectural Legacy Evolution", 1997. [Cheesman 01] J. Cheesman, J. Daniels, UML Components: A Simple Process for Specifying Component-Based Software, 2001. [DelphiGroup 02] Delphi Group, “Taxonomy & Content Classification, Market Milestone Report”, 2002. [Laskey 99] K. Laskey, P. Barry, P. Brouse, “Development of Bayesian Networks from Unified Modeling Language Artifacts”, 1999. [Ambler 02] S. Ambler, T. Jewell, Mastering Enterprise JavaBeans, WILEY, 2002. [Apache 02] http://xml.apache.org/cocoon 2002. [Siegel 01] J. Siegel and the OMG Staff Strategy Group, “Developing in OMG’s Model-Driven Architecture”, 2001. [Maier 02] H. Maier, “Comparison of the Architectures Sun ONE and Microsoft .NET”, 2002. [Vawter 01] C. Vawter, E. Roman, “J2EE vs. Microsoft.NET”, 2001.

Suggest Documents