Web-enabling Legacy Systems via Presentation Access

4 downloads 26641 Views 147KB Size Report
legacy system and its hosting platform is satisfactory and the owner organization is ... inside Active Server Pages (ASP) or programs in C/C++,. Perl, VB, etc.
Web-enabling Legacy Systems via Presentation Access: From Webulation to Automation Mohammad El-Ramly Department of Computer Science University of Leicester Leicester, LE1 7RH, UK [email protected] Abstract After years of development and billions of invested dollars, legacy mainframe systems have become the lifeblood of many corporations. And today, given the dramatic changes that the Internet revolution has brought to today’s business processes, it has become necessary to make these systems accessible to partners, employees and customer. In principle, there are three different approaches to that end: one can enable access to the legacy system via its data, its logic or its presentation layers. In this paper, we review the state-of-the-art in the last approach, namely Web-enabling via presentation layer access. We discuss both academic research methods and industrial practices, we describe their evolution and their potential, and we comparatively evaluate their pros and cons. Finally, we present CelLEST, a lightweight method we developed for semi-automated reengineering and Web-enabling of legacy user interfaces. We present how this method uses artificial-intelligence algorithms to leverage and advance current manual technology.

1. Introduction Over years of development and investment, business software systems, such as bank finance systems, customer-relationship management (CRM) systems and airline-reservation systems, grew in size and value. They constitute one of the most important assets for many companies. [19] Corporations have invested substantially in developing these mainframe-based legacy systems and making them Y2K and Euro compliant. [25] In return, mainframe-based legacy systems have proven reliability and scalability in providing business-critical processing needs, especially for applications involving huge numbers of transactions and simultaneous users like banking and airline-reservation applications. Moreover, many of the business processes and policies of companies are encapsulated in the logic of legacy systems. For many corporations, legacy systems will remain their Information Technology (IT) backbone for years to come. [3] When the Internet emerged as a medium for conducting business, it created new opportunities for growth and cost

Eleni Stroulia Computing Science Department University of Alberta Edmonton, T6G 2H1, Canada [email protected] reduction. This gave birth to a new research area: “Webenabling legacy systems” to develop methods for opening existing legacy software for access through the Internet, an extranet and/or an intranet to the public and/or to a selected user-base of employees, customers, partners, etc. Separately, industry developed its own solutions as well. A legacy software application in general, consists of three layers: the presentation layer, the program logic layer and the data and data access layer. A Web application can access a legacy system via one or more of these layers, depending on the available legacy and Web technologies. This requires reproducing the legacy business processes and data in new formats and presentations for the old and new users [18], and possibly re-architecting the legacy application. The challenge is that many legacy systems were developed using the technology of the 1970s to mid1980s. They have been modified many times by different programmers and frequently are poorly structured and documented. Separation between data, logic, and presentation layers rarely exists. They were not designed with integration in mind. As a result, they have become complex and difficult to understand, maintain, renovate, reengineer, and/or integrate with other systems. They aged! [20,27] A main weakness of mainframe legacy applications is their user interfaces (UIs). They fall short in three areas: user access, usability and navigation: [4] User Access. Most legacy systems are proprietary monolithic systems that were not designed for integration with the Web or other technologies. Usually they do not have clear separation between their presentation, logic and data layers. This makes opening a legacy system for access via a new platform or for front-end integration with other systems a hard task. Usability. Legacy text-based UIs have limited display capabilities, and are non-intuitive and hard to learn. They were adequate for their time but dissatisfy today’s users, who are used to graphical user interfaces (GUIs) and Web interfaces. Training their new users is slow and costly. Navigation. The limited presentation capabilities of legacy text-based UIs impose tedious navigation patterns to accomplish user tasks. E.g., flipping a multi-page report may require using function keys or issuing some

commands to move between the many screens of the report. While in a GUI environment, a scroll bar gives instance access to any report page with a mouse click. In many cases, it is desired to migrate the UI of legacy system to a GUI platform or to the WWW or to integrate it with other legacy front-ends. But the performance of the legacy system and its hosting platform is satisfactory and the owner organization is unwilling to alter its code or structure in any way due to the cost and risk involved. Because invasive solutions are generally undesirable and often unnecessary, there is a need for lightweight non-invasive UI reengineering methods to support the Web-enabling of legacy systems. Web-enabling via presentation access falls in this category. It is longpracticed by industry but almost ignored by academia. In this paper, we aim to “bridge the gap” between industry and academia by giving a detailed discussion of this class of Web-enabling solutions. We compare this approach with the alternatives; we describe its evolution, state of the art and its future directions. We describe our work in the CelLEST UI reengineering project [22,24] in the Software Engineering Research Lab. at University of Alberta to leverage the current industry practices, minimize the labor needed to implement them, and develop the next generation of this technology. We employed a combination of document analysis, feature extraction, clustering, user action modeling, visualization, data mining, task model inference, XML wrapping and automated GUI layout to help develop an intelligent semiautomated lightweight method for legacy system Webenabling and UI reengineering. Furthermore, we have implemented this method in a prototype environment and evaluated it through a set of real-world case studies. This paper is organized as follows: Sections 2 and 3 brief other existing Web-enabling technology, i.e., Webenabling via data and logic access. Section 4 discusses Web-enabling via presentation access, its evolution, state of the art, pros and cons. Section 5 is an overview of the CelLEST semi-automated method for UI reengineering. Section 6 is a brief on the CelLEST method evaluation and limitations along with some conclusions.

2. Web-enabling via Data Access In this class of solutions, a Web application directly accesses the legacy database, processes the legacy data and presents it to the Web browser clients. The Web application includes all the data-processing logic required. This logic may duplicate the original legacy-application logic, if the intended Web users are the same users of the legacy system, or it may be completely different, targeting a new user group. The legacy data must be wrapped in order to be accessed using a different interface or protocol than that for which the data was designed initially. This requires using data access middleware, i.e., database gateways and bridges.

A database gateway translates between two data-access protocols. A number of de facto industry standard database gateways exist. For example, the Open Database Connectivity (ODBC) from Microsoft can be invoked inside Active Server Pages (ASP) or programs in C/C++, Perl, VB, etc. The Java Database Connectivity (JDBC) by Sun is designed for database-independent connectivity between Java applications and a wide range of SQL databases. The JDBC API can be invoked inside a JAVA program, applet or servlet or a Java Server Page (JSP). Finally, the ODMG is the standard of the Object Data Management Group for persistent object storage. A bridge translates a standard protocol to another, such as the JDBC-ODBC bridge for example. [9] Web-enabling via data access is a straightforward solution, appropriate for multi-source legacy data integration to new applications. At the same time, it has considerable drawbacks. First, any important logic, e.g., data validation and business rules, is bypassed and not utilized and needs to be duplicated in the Web application. This incurs high cost in development and maintenance. Second, it increases the data coupling between the legacy and Web applications. It is most suitable when the Web application logic is simple, e.g., Web-enabling the services of a library system. [2,21]

3. Web-enabling via Logic Access Web-enabling via application logic access relies on the availability of a mechanism to access the business logic independent from the user interface related code. This can be accomplished in different ways: Code Access. If the legacy business logic is implemented separately from the presentation logic, which is rarely the case, then theoretically, it is possible to insert a thin control layer that accepts the data extracted from the client browser’s HTTP request and invokes the appropriate legacy module. The business logic subroutine processes the request and gives back the results, which are then forwarded to the client. [25] API Access. Packages like SAP, PeopleSoft, etc. offer APIs that can be accessed via Java Native Interface (JNI) or Common Gateway Interface (CGI) code. But, in-house developed applications rarely have defined APIs. [2] Distributed Object Technology (DOT) extends object technology to the net-centric information systems of modern enterprises by using object middleware, e.g., OMG's CORBA and Microsoft COM+. The idea is to objectify (or objectize) the legacy system by creating an OO interface to individual applications, common services and business data, which makes the legacy software components “look” like objects. Then it can be accessed by other applications across a network through the OO interface. [9,21,29] This is a quite invasive reengineering solution. The main challenge is objectifying the legacy system, i.e.,

analyzing, decomposing, and then translating the monolithic plain semantics of the procedural – in most cases - legacy system to the richly hierarchic structured semantics of an object-oriented (OO) system. [9] Several methods have been developed for this task. [6] The effort needed depends on the language, style and architecture used to develop the legacy system. Component wrapping is a natural extension to DOT. In contrast with objects, components must conform to a component model. This constraint enables the component framework to provide the components with quality services. An example server-side component architecture for writing reusable business logic and portable enterprise applications is Enterprise JavaBeans (EJB) from Sun Microsystems. EJB is the basis of Sun's Java 2 Platform, Enterprise Edition (J2EE). EJB components are written entirely in Java and run on any EJB compliant server. Each bean encapsulates a piece of business logic. EJB servers provide system-level services such as transactions, security, threading, state management, resource pooling, distributed naming, and remote invocation. [9,14] EJB can wrap exiting legacy system functions and offer them as operating system and platform independent components. Like DOT, the legacy system needs to be componentified by separating its interface into modules consisting of logical units or functions. [9] DOT and component wrapping approaches can be integrated by generating CORBA wrappers for the legacy components and then using EJB to develop an application server to integrate them and offer their services in the context of Web applications. [29]

4. Web-enabling via Presentation Access In contrast to the data and logic access approaches, Web-enabling via presentation access is non-invasive and almost risk-free. This approach, in its simple form of Web emulation (webulation), emerged quite early. Gradually, webulation evolved into the more advanced screenscraping technology that takes advantage of the GUI capabilities of Web browsers. The next generation was screen mapping, which can reengineer the legacy UI using complex manipulations of the legacy data streams used for communication between the legacy host and the legacy terminals. This allows remodeling the legacy UI into the form-based style natural to Web-applications’ UI, and possibly integrating it with other legacy UIs or Webapplications. The method that we have developed in the CelLEST project advances this screen-mapping approach further: it essentially automates it using intelligent algorithms to learn the legacy-system behavior and then build the models needed for screen mapping. These models are used to automatically build a working Webbased GUI or abstract GUI specifications. More often than not, the legacy application is accessed via its presentation layer, represented by the legacy

outbound and inbound data streams. In fewer cases, the legacy UI is accessed via its presentation description if exists, e.g., CICS maps for IBM mainframe systems or AS/400 Data Description Specification (DDS) screen definitions. In either cases, access is limited to the data and operations offered via the application presentation.

4.1 Web Emulation (Webulation) Web Emulation, or Webulation [4], emerged as the natural extension of the long-practiced legacy host emulation to the WWW. The basic innovation is that the emulator runs in a Web browser or a Web server. Browser displays have the native look and feel of the host legacy screens. Transactions work exactly the same as on a legacy host “green screen” terminal, e.g., IBM 3270, by returning one screen display for one input request. Full support of legacy function and user customization of colors and fonts are normally available. Additionally, icons for function keys, copy/paste, macro recording, file transfer and other basic operations are provided. [1,5] Webulation is relatively quick and inexpensive and does not need any Web application development. It offers instant access to the legacy application to intranet and extranet users who are already familiar with the legacy system. But, it does not make the legacy system any easier to use for new users. [5] Additionally, it does not allow tailored Web-access to target different user groups with different sets of UI functionality. It solves accessibility issue, but it does not improve the usability or navigability of the legacy system. A typical implementation of webulation uses Java applets downloaded into the client Web browser. The applet runs in the Web browser Java runtime environment and establishes a connection with a Telnet server that manages access to the host application. [5,26]

4.2 Screen Scraping (Refacing) Screen scraping takes webulation a step further by offering enhanced one-for-one browser presentation of the legacy UI. [3] It reads the data stream intended for, e.g., the mainframe terminal, either via a client based terminal emulator (Java applet) or a server based program, and turns it into a Web-based GUI presentation. A legacy screen can be translated to a Web-based GUI either on the fly or using a user-defined customization. See Figure 1. In the first case, a middleware is interleaved between the Web front-end and the legacy software to act as a presentation translator by intercepting outbound legacy displays and converting them on the fly into a Web frontend using whatever available information. For IBM 3270, this is done by converting the unprotected/input fields into edit control objects and turning the other fields into labels. Some tools convert predefined strings like F1,… F24 into buttons. The resulting user interface is a slight improvement over webulation.

In the second case, more substantial improvement can be achieved. In this case, an individual customization is created for every screen to radically change its appearance and takes advantage of the presentation potential of the target platform. So, legacy screens appear “dressed up” in Web-based GUIs, with widgets, lists, buttons, images, Web links, check and choice boxes, colors, fonts, etc. Also, one can reorder data fields, change tab sequence and hide unnecessary data. [1,4,5] Some screen-scraping tools can create context-sensitive field help and lists of valid values for data fields. [5] To reface a number of legacy screens fast, some tools use customization templates, which contain common elements to all the Web-based screens replacing the legacy screens, e.g., a logo, a Web link, a customized tool bar, etc. [15] The operation of the applications is still one-for-one, i.e., one browser request equals one legacy screen display. [1] To know which customization to apply to a screen instance (snapshot), the screen scraping application should recognize the identity of the instance using a predicate or signature for every screen. Typically, this signature is based on some unique keyword(s) that appear on the screen at some location(s). Some tools offer rich pattern definition languages for the application builder to define a signature. [5,7] Such languages would allow specifying that a specific text must exist or not exist at a certain location or within an area on the screen and/or that it must be of a certain case or color, or compared to a hard-coded value (). They may allow combining multiple pattern expressions with logical operators. In many cases, this technique is sufficient to recognize the screens’ identities. But in other cases, like unstructured and multi-mode screens for example, it is quite challenging to discover an appropriate signature. E.g., some host applications have more than one mode for the same screen, e.g., Create, Review, or Update modes, with the same structure and appearance but with slight differences in the status of some fields. Each mode needs separate signature and customization. In the state of the art practices, a screen signature is manually defined and hard coded for every screen by an expert analyst, who is very familiar with the legacy system, the pattern language available and its supporting tools. Screen scraping takes advantage of the Web presentation capabilities; however, it does not benefit from its enhanced navigability. So it enhances accessibility and usability but not navigability of the legacy system. Another screen-scraping approach is to access the legacy presentation at a level lower than the UI, or the data streams used to construct it. This is the level of screen maps or descriptions if existing, e.g., CICS maps for S/390 systems or Data Description Specifications (DDS) source files for AS/400 systems. Some products extract the data necessary to build an HTML or XML form from CICS maps instead of IBM 3270 data streams.

4.3 Screen Mapping (Remodeling) Screen mapping (remodeling), is a natural extension to screen scraping. It enables substantial reengineering of the legacy UI. It enables fairly extensive modifications to the sequence of information presented to the user by combining several screens into a single GUI presentation. It offers many-for-one browser presentation of the legacy UI. [3] So, multiple host screens related to a certain user task are combined in one Web form that represents the user task more naturally in the Web world. This greatly enhances the usability and navigability of the system, while maintaining the back-end host navigational flow. It is possible to partially apply screen mapping to a legacy system by reengineering the frequent or cumbersome user tasks of choice, while using webulation and/or screen scraping for the rest of the system. The steps of a screen mapping solution are: 1. Build a behavioral model for the portion of the legacy system UI to be reengineered, 2. Build a plan for each user task to be reengineered, 3. Build/buy the middleware needed to mediate between the legacy back-end and the Web front-end, and 4. Build a Web-based GUI, which executes the user task plans via the host access middleware. The behavior model is a state-transition graph whose nodes (states) represent the screens of the legacy system. Each screen is distinguished by a predicate or a signature that uniquely identifies its instances (snapshots). Each arc on the graph is a permissible user action for this screen that causes a transition to another screen. A task plan is a description of how to open sessions with the host legacy host, gather data, complete transactions, and close sessions to accomplish a user task. Typically, this includes what user actions are needed to navigate the legacy UI in service of the user task, what inputs need to be passed to the legacy application on which screens on which locations and what outputs will be retrieved from which locations on which legacy screens. A host access middleware executes the task plans by driving the necessary navigation via the legacy host, passing the user inputs received from the Web-based GUI to the legacy application, and collecting the required outputs to feed the Web-based front-end. This middleware uses terminal access protocols such as VT100, 3270 or 5250 to communicate with the legacy system via a “virtual terminal”. Data are moved in and out of the legacy host via its UI as if data entry personnel were flawlessly entering it. [10] Such middleware can be built with EJB beans, Java servlets, or similar approaches. The Web-based GUI, e.g. HTML or XHTML, presents the reengineered UI to the user, collects his input, triggers the middleware to execute the right task plan, and reproduces the task output in HTML or XHTML. CGI scripts, Java Server Pages, or similar technologies are used to collect the user input via the client Web browser.

+------------------------------------------------------------------------------|02 OCT 00 COUNTER SERVICE | +------------------------------------------------------------------------------+---------------------------------- CLIENT ------------------------------------+ | Teller : DEW DANIEL WESTFALL Drawer : 91A CASH DRAWER A | | | | 1. Club : 272 Balance : | | 2. Last N. : +-------------------------------------------------------------------------Status : | | First N. : |02 OCT 00 Rate Type: SERVICE | COUNTER | | | +-------------------------------------------------------------------------| 5. Address1 : +---------------------------------17. Expiry CLIENT : | -------------------------------+ | 6. Address2 : | 18. Join Yr : | DRAWER A | Teller : DEW DANIEL WESTFALL Drawer : 91A CASH | 19. Mem Since: | | | | 7. City : | 1. Club 20. Drvs Lic : | : 272 Balance : | | 9. Prov/Sta : | 2. Last N. : 0987654 21. Pricing : | Status : | |12. H. Phone : | | First N. : | Counter Services: Rate Type: |13. B. Phone : | | | | |15. Contact | | 5. Address1 : Expiry : | New Client Form 17. |16. Notes | | 6. Address2 : 18. Join Yr : | | | Auto Moto | 19. Mem Since: | Association +------------------------------------------------------------------------------+ | 7. City : 20. Drvs Lic : | * Club Confirm New Member (Y/N) : . | 9. Prov/Sta 10. 21. Pricing : | . . |12. H. Phone : | * Last Name |13. B. Phone : | | |15. Contact | * First Name |16. Notes | * Address | | +-------------------------------------------------------------------------+ Confirm New Member (Y/N) . Yes No * City . . * Province

AB

* Home Phone

780 -

Buss. Phone

780 -

Ext.

More Options

* Driver Lic. # * Rate Type * Required Fields

Regular ($60)

Plus ($75)

Next >> Clear Form

Figure 1: A Legacy Screen (left), Refaced On-the-fly (middle) and Refaced Using Screen Customization (right) Screen mapping has been an industrial practice exclusively and its steps are implemented all manually. An analyst goes through each screen of the legacy system, to find a unique signature for it and to model its behavior in order to build the behavior (state-transition) model. Then s/he interviews the users about every task to be migrated to the Web and tries to understand all its possible paths and exceptions. Then, s/he manually builds a plan for this task, describing all the user actions needed to perform it, all the inputs to be entered and where they occur and all the outputs to be collected and from where they are obtained. Next, a developer builds the new Web front-end that executes the user tasks. For each task, s/he designs the necessary GUI and its layout and implements it. If it is required to migrate to a different platform too, e.g., WAP devices [10], the behavioral model and task plans can be reused but the new UI implementation should be carried out from scratch for the new platform. Typically, screen mapping allows some flexibility in legacy UI and task modeling to accommodate uncertainty and unexpected events, e.g., it may allow defining [5,16]: • Global Screens, i.e., screens that are intermittently or randomly displayed and always require the same action, e.g., error screens, system messages, etc.

• Data Looping, which allows repeating an action, e.g., displaying the next page of a query result, until a condition is met. • Landmarks, i.e., specific labels on a legacy screen that are used for relative identification of information elements (inputs or outputs) on the screen, instead of relying solely on the coordinates of the elements. • Alternative Paths, i.e., auxiliary navigational paths that may exist for a user task. A screen mapping solution can consolidate multiple existing applications, without modification, into one single integrated view. [5] Some tools can combine multiple back-end resources, e.g., 3270, 5250, VT, JDBC, etc., under one Web front-end. Thus, new user tasks can be created/automated to eliminate the need to manually transfer data between various back-ends to accomplish a task. This is useful for integrating similar legacy systems at the front-end level due to business merger for example, or complementary systems in the same organizations. Additionally, screen mapping can be used to slightly extend the legacy system functionality via repurposing or via addition of new business logic to the Web application. Repurposing is using the legacy system for tasks that it was not intended for. E.g., an insurance company offered

access to its legacy claims system to a new user group; lawyers who are neither familiar with the system, nor the system was built to answer their questions. But, the bits and pieces of information they need can be retrieved through the legacy UI. Using screen mapping, a Web front-end was designed to allow querying the system, with lawyers’ queries. Then, an application server middleware executes the relevant task plan, collect the required data and present it to the lawyer in a suitable way.

4.4 Using Presentation Access to Create an API for a Legacy System Besides Web-enabling, it might be required to package some of the legacy services in a callable API so that new Web-based applications can be built on top of the legacy system as explained in Section 3. Screen mapping was used for this purpose; the task plans essentially constitute the distinct callable services and this new API is accessed by other applications directly, in an EJB architecture or through a distributed object environment. A Web application layer can then be developed to access this new API of the legacy application. [8,9] To do so, one needs to implement the first three steps of screen mapping only, i.e., develop a behavioral model of the legacy UI, a plan for each task to be wrapped, and a host access middleware. The task plans are then executed from inside the new API.

4.5 Advantages, Limitations and Risks A market survey [3] showed that 60% of the IT staff administrating, maintaining or accessing information from legacy systems use screen scraping or screen mapping technology to integrate legacy systems with other systems. 45% of them batch data to a server for access through a client application, and a similar number modify the host application to suit client access. Out of those who use presentation access technology, 60% use it to avoid changing the host application, 30% cannot change the host application and 44% use it because of its lower cost. These results summarize the advantages of Webenabling via presentation access. It is a risk-free, noninvasive, less expensive solution. It needs no change to the legacy application. It is almost the only choice when changing the legacy application is not an option, e.g., due to lack of ownership or unavailability of the source code. It can be applied gradually using a mixture of methods. For example, webulation can give instant Web-access of the legacy application to the users familiar with it, while the UI of the complex user tasks is reengineered via screen mapping to provide an easy to use HTML frontend to the external new users, e.g., students registering for classes or customers placing orders. Finally, presentation access can be used for lightweight integration with other legacy systems and Web applications or for repurposing and limited functionality extension of the legacy system. On the other hand, screen scraping and screen mapping are labor-intensive non-automated processes. The available supporting tools are mostly limited to aiding the

manual practices. They do not automate the subtasks involved, which may require a lot of effort and intuition. Additionally, these technologies are appropriate to mature stable applications. But for dynamic applications that need frequent changes as business processes change, keeping the Web application layer up to date with the latest changes incurs high maintenance overhead. Presentation access of legacy systems is criticized for being slow as it adds a remote extra layer on top of the legacy system. This depends on the implementation model used. Modern server side and host side implementations overcome this deficiency. In a host side implementation, the host-access and the screen mapping middleware reside on the legacy host, e.g. S/390. A task plan is executed fully on the host and HTML pages are generated as needed and sent to the user with the task results or to collect inputs. Another disadvantage of presentation access solutions is their vulnerability to unexpected events related to the host connection behavior, like keyboard lockups, session disconnections, broadcast messages from the host and error messages from the legacy application. [28] Careful analysis and modeling of the legacy text-base UI and the tasks of interest reduces this risk by anticipating as many of such events as possible and including a recovery mechanism in the Web front-end application, but would require more investment and effort. Finally, presentation access Web-enabling has its limitations. It cannot extend the legacy functionality beyond what is already achievable, directly or indirectly, through the legacy presentation. It only gives access to the data exposed through the legacy presentation In practice, a variety of Web-enabling technologies may be used within the same organization due to the wide variety of legacy and Web technologies available and the unique requirements of every Web-enabling project.

5. Automating Screen Mapping in CelLEST In the CelLEST project [22,24], we used artificial intelligence, software engineering, and other methods to automate the process of “learning” and reengineering legacy UIs as much as possible. Our overall objective has been to develop an intelligent semi-automated lightweight method for legacy system UI reengineering, Webenabling, and front-end integration. We implemented and evaluated our method in a prototype environment. The premise of our automated approach is that monitoring the legacy system users while working with the legacy application and recording traces of their interaction (dialog) with the legacy UI can provide the basis for learning how the legacy UI behaves. The recorded traces are used to build, semi-automatically, the models and artifacts required for screen mapping. This includes a behavioral model of the legacy UI (statetransition model), models of the frequent user tasks of interest and an automatically generated Web-based GUI for these tasks. User feedback is used to ensure the correctness and completeness of the models generated.

The most important innovative feature of the CelLEST method is the automation of all aspects of the UI mapping and migration process, including "learning", modeling and reengineering of the legacy UI and the user tasks of interest. This substantial degree of automation reduces the time and effort required for such migration tasks and, consequently, it alleviates the problem of dealing with frequently changing legacy systems, which is quite costly in the traditional manual practices. The second novel and unique feature of the CelLEST method is that it reengineers user tasks into abstract GUI specifications, represented in XML-based syntax. These specifications are then translated to XHTML for Web access or WML (Wireless Markup Language) for WAP (Wireless Application Protocol) access, using the appropriate CelLEST interpreter. Hence, the CelLEST approach can accomplish simultaneous reengineering of the same legacy UI to different platforms, using the platformindependent abstract GUI specifications. In the following, we give an overview of the CelLEST process for legacy system Web-enabling and UI reengineering, and show how it surpasses the current practices. Figure 2 shows the overall CelLEST process. Tasks T1, T2 and T3 are the reverse engineering phase of the process. T4 and T5 are the forward engineering phase.

5.1 T1: Legacy Interface Behavior Modeling. The purpose of this task is building a behavioral model for the legacy text-based UI, in the form of a statetransition model. [11,22,23] Each node (state) of the model corresponds to a screen of the legacy system UI, identified by a unique predicate. The predicates are inferred semi-automatically, using automatically extracted features for every screen snapshot recorded in the traces, and a clustering algorithm that groups similar snapshots together and then infers such predicates. Using a patterninference algorithm, the arcs of the state-transition model are inferred automatically. Each arc models a permissible user action. A prototype tool, LeNDI (The LEgacy Navigation Domain Identifier), was developed to test the methods and algorithms used in this task. LeNDI deals with data transfer protocols that are native block-mode protocols or can be emulated in block-mode, e.g., IBM 3270 and VT100, although the concepts implemented in it are applicable to scroll-mode data transfer protocols as well, e.g., IBM 5250. In the sequel, we brief the subtasks performed by LeNDI. Trace Collection. While the legacy system users are performing their regular tasks, their interaction (dialog) with the legacy UI is recorded in the form of traces or sequences, using a specially instrumented emulator. For block-mode data transfer protocols like IBM 3270, a trace is a series of screen snapshots forwarded by the legacy application to the user’s terminal, interleaved with user actions in the form of sequences of keystrokes performed in response to receiving screen snapshots. Additional information is captured as well. For IBM 3270 data

streams, these include the total number of fields, the number of unprotected fields and the initial cursor position. We call these traces “interaction traces”. Feature Extraction. [23] In order to automatically extract legacy screen signatures or predicates one needs a rich set of features to build such predicates. LeNDI employs a variety of document analysis techniques to extract visual and other features for every snapshot. The output of this subtask is a feature vector for every snapshot. These features include: 1. The existence of special system keywords, sentences or information at the top or bottom of the snapshot, e.g., title, code, date, time or page number. 2. The information received with the outbound legacy data streams, e.g., the location and type (protected or unprotected) of 3270 data fields and the cursor position. 3. Snapshot layout features like the classification of a snapshot to “general”, “table” or “list” with some attributes for the last two categories, e.g., the number of columns, rows, etc. Another layout feature is vertical and horizontal histograms built for the entire snapshot content or for partial content of interest, e.g., numbers. Snapshot Clustering and Screen Recognition. Snapshot clustering is the process of grouping similar snapshots together for the purpose of inferring their common identity represented by a signature or predicate that uniquely distinguishes them from other snapshots. Then, given the snapshot clusters, a classifier can be induced that can correctly classify the individual snapshots into their corresponding clusters. This classifier can then be used at runtime to recognize new, previously unseen snapshots as instances of the UI states or screens,

•task patterns and examples •T2

•T3

•Task

•Task

Discovery

Modeling

•Task

•interface statetransition model

Model

•T4 •GUI

•T1

Specification

•Interface

Modeling

•Traces

Web-based GUI •Emulator •Legacy

application

Figure 2: The CelLEST UI Reengineering Process.

and hence, to infer which actions are possible on each screen and to which screens they lead. Also, knowing the snapshot identity allows the new reengineered GUI to apply whatever relevant input or output steps of a task plan to the snapshot, via the host-access middleware. LeNDI employs two clustering techniques. The first is an incremental clustering algorithm. [23] The Second is a two-phase top-down clustering algorithm. [11] The user can choose which one to use depending on the system under analysis. The accuracy of the classification depends mostly on the quality of the input traces, i.e., how well it covers the user interface behaviors. It is important to note here that our clustering process is interactive and can be guided by a CelLEST user familiar with the legacy UI. User feedback on misclustered snapshots is used to correct and fix the classifier generated. User Action Modeling. Action modeling [11] is the process of inferring a model for the user action needed to transfer the legacy system UI from a screen to another, i.e., from a state on the state-transition model to another. Different styles of user-interaction with legacy systems exist, e.g., function-keys, menu-driven, command-driven, and form-filling. An action can have several formats; e.g., a command keyword may have multiple synonyms or it may have an equivalent function key. Currently, LeNDI can model command-driven and function-keys styles. LeNDI infers each action model by comparing the instances of this action recorded in the interaction traces and applying a set of rules for command language design. For each action, LeNDI infers its syntax in terms of the function or control key(s) used and the command keyword(s) and its arguments. For the arguments, it infers their number, their syntax if any and whether they are optional or mandatory. The availability of many instances of an action results in an accurate action model. LeNDI’s user can override, rewrite or fix an inferred action model.

5.2 T2 and T3: Frequent User Task Discovery and Modeling The purpose of these two tasks is to automate the process of modeling the frequent user tasks of interest as much as possible. Hence, T2 and T3 save the intensive labor work needed to define all possible navigation paths for every task that needs to be reengineered and every piece of data exchange that takes place during the task. This is done by automatic learning from the interaction traces about the frequent user tasks in terms of what navigational path is traversed and what type of input is entered on (and output collected from) which location on which screen for every task. T2: Task Discovery. [12,13] To automate the discovery of the user's frequent interaction-patterns with the legacy system, which correspond to the frequent user tasks, two algorithms for sequential pattern mining were developed, called Interaction Pattern Miner (IPM) [12] and Interaction Pattern Miner 2 (IPM2). [13] Both algorithms can discover similar segments of interaction with the

legacy system, in the recorded traces, even with some noise in the form of spurious irrelevant screens. IPM is a depth-first algorithm, while IPM2 is a breadth-first algorithm. They require defining a criterion for interesting patterns in order to use it for deciding if a pattern is worthy of reporting or not. The criterion includes the pattern's minimum length, minimum number of occurrences (support), the maximum number of insertion errors allowed in any instance of this pattern and minimum score. Given a pattern p, the scoring function is: score (p) = log2 |p| * log2 support(p) * density(p) where |p| is the length of p, support(p) is the support of p and density(p) is the ratio of |p| to the average length of the instances of p. Since these instances may include noise, they can be longer than the pattern. For example, the segments {2,4,3,4}, {2,4,3,2,4} and {2,3,4} are instances of the pattern {2,3,4} with at most 2 insertions, where 2, 3 and 4 are the IDs given to the legacy screens by LeNDI. The average density of this pattern is 0.75. IPM and IPM2 discover all maximal patterns in the recorded interaction traces that meet the user’s interestingness criterion. A maximal pattern is not a sub-pattern of another one with the same support. After reviewing the discovered patterns, the user decides which ones correspond fully or partially to real frequent user tasks and are not spurious repeated navigations. The instances of each user task are used to build its task model. T3: Task Modeling. Mathaino [16,17] is another prototype tool of the CelLEST environment. It accomplishes a reverse engineering task, T3, and two forward engineering tasks, T4 and T5. Mathaino uses intermediate platform-independent GUI representations to support simultaneous legacy UI migration to multiple platforms. In T3, the instances of each user task discovered during T2 are analyzed comparatively to construct an abstract model of: 1. the navigational sequence through the legacy system UI to perform the user task, 2. the types of the data entered by the user through the navigation (and of the data displayed to him/her), and the locations where they occur on the legacy screens, 3. the domain of values of the inputs, and 4. the interdependencies among these values. To analyze and model the information exchange during the user task, examples of the user inputs and outputs are needed. User inputs are already recorded in the interaction traces. But the trace does not record any evidence of the output the user retrieves from each snapshot in the task. So, an expert user needs to manually highlight on the snapshots of the task instances the areas that contain the outputs extracted to successfully complete the task. Given the annotated task instances, Mathaino analyzes the flow of information to and from the legacy system to identify the user inputs required to accomplish his task, by studying all the recorded instances of this task. It compares the values used for each input field across all the task instances, and the values of all input and output

fields in the same task instance. Then, each data input field is labeled as 1. Constant, whose value is the same in all task instances; 2. Derived, whose value is obtained earlier in the task from an output field and is used as input later; 3. Redundant, Whose value is inputed multiple times in the same task; 4. Range, whose value is always one of a limited set of values; and 5. Unpredictable, whose value is independently supplied by the user. Categorizing input fields leads to a significant reduction in the user input required by the reengineered UI of the task, e.g., the user will not need to input a derived input as it will be supplied automatically. Also, it helps choosing the proper abstract widget type for each input field in task T4. The CelLEST user may inspect the identified pieces of information and name them with meaningful names Additionally, based on a comparative analysis of all instances of the same data field, Mathaino infers the coordinates of this field on the legacy screen it belongs to, in case they are fixed. In dynamic screens, such as freeforms, Mathaino attempts to discover starting and/or terminating landmarks to use for locating the data field. Finally, if alternative paths exist for a user task or subtask, the branching screens are identified and each alternative path is analyzed as described above. At runtime, the signature of the snapshot received as a result of performing an action on an instance of the branching screen decides which path to follow. The task model produced in T3 specifies the path on the interface state-transition model through which the user navigates, the flow of information between the legacy application and the user, and the syntax of the interactions through which the information is exchanged. Effectively, it constitutes a declarative and executable specification of the modeled task of the legacy application. Given values for all the “unpredictable” pieces of information identified, the model can be used to drive the legacy application and execute the modeled task.

5.3 T4: Generation of Abstract GUI Specification Mathaino uses model-based UI design heuristics to automatically produce abstract specifications for the new reengineered UI of each task, using its model. Thus, it eliminates the need for the current manual practice of piece-by-piece mapping of the task model to a GUI design. The specification is described in terms of a set of abstract forms; each corresponds to a set of screens of the legacy system. Mathaino ensures that all the output fields identified in the task model are displayed on one of the forms. For each abstract form, a related plan for navigating through the legacy screens at runtime is produced. Using model-based user-interface design heuristics, Mathaino proposes an abstract widget for each input or output field. For example, a field with an enumerated range of values is represented by a combo box or a set of

radio buttons depending on the number of values and an "unpredictable" variable is represented by a textbox. Then, the widgets are laid out on the form in a tabular manner. The user can override the default choices of widgets and layout settings. For example, s/he can change the widget type issued for a field or the number of layout columns. After applying user feedback, an XML representation of the abstract specifications is produced.

5.4 T5: Runtime GUI Generation The CelLEST runtime environment consists of two components. The front-end component is the runtime interpreter. It is responsible for interpreting the XML abstract GUI forms on a specific platform. It supplies widgets in the target platform that most closely match the abstract input widgets. Currently, an XHTML interpreter (for Web-enabling) and a WML interpreter (for WAPenabling) are available. [17] The back-end component is the host navigator, built over a host-access middleware. The host navigator executes the navigation plans of the abstract forms and passes the inputs to the legacy system and collects back the outputs. But first, the XHTML or WML interpreter passes the plan details to the host navigator in a platformindependent format. The XHTML Interpreter. For Web-enabling, the XHTML interpreter dynamically parses the XML abstract GUI forms at runtime and translates them to XHTML CGI forms. It maps the abstract GUI widgets to the appropriate CGI widgets. It uses XHTML tables to layout the produced Web page in the closest format to that chosen by the user for the abstract GUI. Also, it parses the CGI response produced by the client Web browser into the platform-independent form needed by the back-end host navigator. It is a server-side component that runs as a Java servlet on the Web server. The WML Interpreter. WML is markup language for rendering Web pages on WAP-enabled mobile Internet devices like Cellular phones and Personal Digital Assistants. A Web page in WML (also called WML deck) is limited to a maximum of 1200 bytes. To overcome the device display limitation, a deck can be divided into a number of cards. The device can display only one card at a time. The only input widgets supported by WMP are simple text boxes. WML does not support CGI but provides some features that can simulate CGI. The WML interpreter is adjusted to deal with these constraints. E.g., it implements an abstract GUI form using several WML decks if 1200 bytes are not enough. It uses a numerical menu to represent "range" input fields. It internally caches the user responses to the multiple decks corresponding to a single abstract GUI form, before submitting them to the host navigator.

6.

Reflections and Conclusions

The CelLEST method automates the screen-mapping Web-enabling approach, which has been widely used in

the software industry. Like all methods for Web-enabling via presentation access, its applicability is threatened by that fact that it cannot substantially extend the functionality delivered by the legacy application. But, an examination of the actual Web-enabling problems, reported in [3], shows that this is not a limitation in practice, since presentationbased integrations constitute the preferred approach in the majority of these cases. Therefore, the automation that CelLEST brings to this widespread practice is a substantial innovation, with a potential impact. On the other hand, the effectiveness of the method depends on the “quality” of the recorded traces and can potentially suffer when the traces are not sufficiently representative of the whole legacy UI behavior or the user tasks of interest. This potential shortcoming can be addressed with a methodological argument: given that the recording middleware is unintrusive to the legacy users, it can remain installed and recording through long periods of time and on many legacy terminals. Thus, the danger of “missing” services of interest can become minimal. We have evaluated the CelLEST environment with several case studies. Our experience shows that the method is indeed lightweight in terms of the skills it requires – CelLEST users were able to effectively model a legacy system after a short tutorial. In addition, for small- to medium-size legacy applications, with approximately 30 distinct legacy screens, the automatically constructed model is 92 to 98% accurate, requiring a few corrections by the user. [11,16,22,24] In conclusion, we are confident that the CelLEST method constitutes an advancement of the state-of-the-art in Web-enabling legacy systems and that more artificialintelligence results will be applied to other reengineering and integration challenges to provide novel automated solutions to them.

Acknowledgments We acknowledge the effort of CelLEST project team: Bruce Matichuk, Roland Penner, Paul Iglinski and Rohit Kapoor. This work was supported by a collaborative research and development grant, NSERC 215451-98 and a kind contribution from Celcorp.

References 1. Akers, L. Web-enabling legacy applications – an overview for VSE users. VSE/ESA Software Newsletter, IBM, Third/Fourth Quarter, 2000. 2. Ambler, S. Legacy Integration Techniques for Java Applications: How to Reuse Your Legacy Investments within Java applications. IBM developerWorks, Nov. 2000. 3. Attachmate, Repurposing Legacy Applications for the Web: Screen-Based Access in Perspective. A White Paper, Attachmate Corporation, Oct. 2000 4. Berman, D. and Bregar, K. Don't Replace -- Extend: Why Leveraging Your Legacy Systems Is the Way to Go. Enterprise Systems, June 2001. 5. Braswell, B., Forshay, G. and Martinez, J. IBM Web-to-Host Integration Solutions. Redbooks Series, IBM, Jan. 2002. 6. Canfora, G., Cimitile, A., De Lucia, A. and, Di Lucca, G. Decomposing Legacy Systems into Objects: An Eclectic Approach. Information & Software Technology 43(6), 401-412, 2001. 7. Celcorp. CelEngineer User’s Guide – Evaluation Version 2.0.

Celcorp, 1999. 8. Chadha, R. Integration of Web with Legacy Systems Through Java Applets and Distributed Objects, Workshop on Compositional Software Architectures, 1998. 9. Comella-Dorda, S., Wallnau, K., Seacord, R. and Robert, J. A Survey of Legacy System Modernization Approaches. Technical Note CMU/SEI-2000-TN-003, SEI, 2000. 10. Crigler, R. Use Screen Mapping For Wireless Access to Legacy Enterprise Data. Enterprise Application Integration (EAI) Journal, Aug. 2001 11. El-Ramly, M., Iglinski, P., Stroulia, E., Sorenson, P. and Matichuk, B. Modeling the System-User Dialog Using Interaction Traces. In Proc. 8th Working Conf. on Reverse Engineering (WCRE’8), 208-217, 2001. 12. El-Ramly, M., Stroulia E. and Sorenson, P., Recovering Software Requirements from System-user Interaction Traces, In Proc. 14th Int. Conf. on Software Engineering and Knowledge Engineering (SEKE’02), 2002. 13. El-Ramly, M., Stroulia E. and Sorenson, P. Interaction-Pattern Mining: Extracting Usage Scenarios from Run-time Behavior Traces. In Proc. 8th Int. Conf. on Knowledge Discovery and Data Mining (KDD 2002), 2002. 14. Howe, D. (Editor). The Free On-line Dictionary of Computing. www.foldoc.org 15. IBM. Screen Customizer Version 2.0.60: Getting Started. IBM, 1999. 16. Kapoor, R. Device-Retargetable User Interface Reengineering Using XML. Dept. of Computing Science, University of Alberta, Tech. Report TR01-11, Aug. 2001. 17. Kapoor, R. and Stroulia, E. Simultaneous Legacy Interface Migration to Multiple Platforms. In Proc. 9th Int. Conf. on Human-Computer Interaction, Lawrence Erlbaum Associates, vol. 1, 51-55, Aug. 2001. 18. Langan, G. From Legacy to the Web. Enterprise Application Integration (EAI) Journal, Jan. 2000. 19. Liu, Zheng-Yang, Mike Ballantyne, and Lee Seward, An Assistant for Re-Engineering Legacy systems, In Proc. 6th Innovative Applications of Artificial Intelligence Conf., 95-102, 1994. 20. Parnas, D. Software Aging. in Proc. 16th Int. Conf. on Software Engineering (ICSE'94), 279-287, 1994. 21. Ruh, W. A., Maginnis, F. X. and Brown, W. J. Types of Integration. In “Enterprise Application Integration: A Wiley Tech Brief”, John Wiley & Sons, Oct. 2000. 22. Stroulia, E., El-Ramly, M., Iglinski P. and Sorenson, P. User Interface Reverse Engineering in Support of Interface Migration to the Web, Automated Software Engineering, 3(10), 271-301, 2003. 23. Stroulia, E., El-Ramly, M., Kong, L., Sorenson, P. and Matichuk, B. Reverse Engineering Legacy Interfaces: An Interaction-Driven Approach. In Proc. 6th Working Conf. on Reverse Engineering, 292-302, 1999. 24. Stroulia, E., El-Ramly M. and Sorenson, P. From Legacy to Web through Interaction Modeling, In Proc. Int. Conf. on Software Maintenance (ICSM), 2002. 25. Sneed, H. M. Accessing Legacy Mainframe Applications via the Internet. In Proc. 2nd Int. Workshop on Web Site Evolution (WSE’2000), 2000. 26. Tan, Y., Lindquist, D., Rowe, T. and Hind, J. IBM eNetwork Host On-Demand: The Beginning of a New Era for Accessing Host Information in a Web Environment. IBM System J., 37(1), 133-151, 1998. 27. Visaggio, G., Ageing of a Data Intensive Legacy System: Symptoms and Remedies, Journal of Software Maintenance and Evolution, 15(3), 281-308, 2001. 28. Yample, T. Web-based Technologies for User Interface Rejuvenation. In “Web-to-Host Connectivity”, Editors: Guruge, A. and Lindgren, L., CRC Press, 185-197, 2000. 29. Zou, Y. and Kontogiannis, K. Enabling Technologies for WebBased Legacy System Integration, In Proc. of 1st Int. Workshop on Web Site Evolution (WSE’99), 1999.