The nature of software development has always been ..... executable by a set of custom-built run-time interpreters. ..... 56-63, Atlanta, Georgia, IEEE Computer.
From Legacy to Web through Interaction Modeling Eleni Stroulia, Mohammad El-Ramly and Paul Sorenson Computing Science Department University of Alberta Edmonton AB, T6G 2H1, Canada {stroulia, mramly, sorenson} @cs.ualberta.ca
Abstract In the context of the CelLEST project, we have been investigating the problem of reengineering and reusing the services provided by legacy applications, running on mainframe hosts. This work has resulted in a suite of methods, based on understanding and modeling the users’ interaction with the legacy-application interface. These methods aim at (a) modeling the behavior of the legacy user interface as a state-transition diagram, (b) recovering specifications for the application’s functions by discovering the users’ tasks as frequently occurring interaction patterns, and (c) constructing new userinterface front-ends to make the recovered legacy functions accessible through the Web. In this paper, we describe the overall process for legacy migration to the World Wide Web, using the CelLEST methods, and we illustrate it with an example case study.
Keywords Legacy application reengineering, migration, interaction reengineering.
legacy
interface
1. Motivation and Background The nature of software development has always been changing, in response to new programming languages and design paradigms. Today, “web services” is emerging as the new “standard” architectural style. New software applications are being developed, by reusing functionalities of existing applications through the web. This new architectural style and the software lifecycle it implies are extremely attractive because they can effectively address the demands for short development cycles, distributed development and global user base, at the same time. The new applications developed in this style effectively reuse existing software assets to provide new complex value-added services, fast. Of course, in order for this vision of effective reuse-based development to become a reality, a wide base of available web services is required. Today, there are relatively few “native” web services, i.e., applications that were originally developed
Proceedings of the International Conference on Software Maintenance (ICSM’02) 0-7695-1819-2/02 $17.00 © 2002 IEEE
to be accessible from the web through HTTP. Most existing applications have to be “retro-fitted” into webservices providing applications; among them, legacy applications developed for mainframe systems are a very challenging case. Legacy systems are the backbone of the data processing and business-process flow of many organizations. They are the operational specifications of business policies for these organizations, and they provide valuable services that could be very useful to the organization’s partners and customers. However, their proprietary interfaces present a big obstacle in any effort to integrate them with other systems or to make them web-accessible. To make the reengineering problem even more challenging, the owner organization’s potential partnerships may be multiple, thus urgently necessitating the web enablement of various services at once, without allowing sufficient time for such projects. What is required, then, is an efficient reengineering process that can export the legacy services of interest to the web, fast and without risking the integrity of the overall legacy application behavior. In the context of the CelLEST project1, we have developed a method for reverse engineering an executable specification of a service currently provided by the legacy application; this specification can then be accessed and executed either through a new web-based user interface or by another application. The fundamental novelty of the CelLEST approach is that, unlike traditional reengineering approaches [2, 12, 13], it is not based on legacy-code understanding. CelLEST adopts an interaction-based approach to reverse engineering; instead of understanding the structure of the application from its code, the CelLEST method models the tasks accomplished by the legacyapplication users, based on traces of the users’ interactions with the application. This approach is inspired by longstanding industry practices, referred to as screen scraping, that have been developed independently of almost all
1
The CelLEST project is conducted at the Software Engineering Research Lab of the Computing Science Department, at the University of Alberta.
software reengineering research. The CelLEST method and “screen scraping” are similar in that they both expose some aspects of the legacy interface to the new interface front-end and they also use the legacy interface to drive the underlying application. However, our method brings a substantial innovation to the traditional screen-scraping process. With screen scraping, a developer has to analyze the behavior of the interface and write code, highly specific to the original application, to extract the information of interest from the legacy user interface and to expose it to the new interface front end. Instead, with the CelLEST method, an executable model of the legacyinterface behavior and of the specific tasks of interest are semi-automatically constructed.
1.
collecting interaction traces between the users and the legacy application (by the emulator shown in the bottom-right of Figure 1);
2.
modeling the behavior of the legacy user-interface, as a state-transition model (Task T1 in Figure 1);
3.
discovering task-execution examples as instances of sequential patterns, frequently occurring in the traces (Task T2 in Figure 1);
4.
modeling the services of the legacy application, by analyzing the information-exchange behavior of the collected examples (Task T3 in Figure 1); and
5.
specifying new web-based user-interface front-ends for the identified services (Task T4 in Figure 1).
An additional advantage of the CelLEST process, as compared to code-based reengineering methods, is that it is fairly low risk, since it adopts a “wrapping”, as opposed to “grafting” [6, 14, 17] approach to forward engineering. “Grafting” approaches localize the desired application modules and directly “call” them from external applications. Instead, the CelLEST process constructs a new interface front end. This interface stands as a proxy between new web users (or applications) and the original legacy application, “translating” between HTTP (the protocol through which the proxy is accessible to its users) and the proprietary protocol through which the original interface and the legacy application communicate. By not modifying the code of the original application, this latter approach minimizes the risk of violating its internal coherence. The CelLEST process is applicable when the single objective of the migration process is to make services, currently provided by transaction-based legacy systems, available to the web, without any modification. If some change to the functionalities of the existing legacy application is necessary, the CelLEST method is not appropriate. In such cases, code understanding and modification is necessary. The rest of this paper is organized as follows. Section 2 describes, at a high-level of abstraction, the steps of the CelLEST legacy-interface migration process, the information required and manipulated by these steps and the tools we have developed to support them. Section 3 discusses the role of the user in the context of the CelLEST method. Section 4 illustrates the process with an example and discusses briefly the results of our experiments with the CelLEST toolkit to date. Finally, Section 5 concludes with a summary of the contributions of this work and our plans for future research.
2. The Interaction-based Migration Process The overall CelLEST process, shown diagrammatically in Figure 1, consists of the following steps:
Proceedings of the International Conference on Software Maintenance (ICSM’02) 0-7695-1819-2/02 $17.00 © 2002 IEEE
T2 task discovery
T3 task modeling
T4 GUI specification
definition3 task patterns and examples task model T1 interface modeling
GUI
definition2 interface state-transition model definition1
traces
Emulator Legacy application
Figure 1: The CelLEST process.
2.1 Trace collection The primary input of the overall CelLEST process is a set of system-user interaction traces, collected by a specially instrumented emulator. The legacy-application users perform their tasks with the application as usual; however, instead of using their regular software terminals, they use the special emulator. This tool serves as a regular software terminal, but in addition, it records the screen snapshots forwarded by the legacy application to the user’s terminal and the user’s actions, such as, key-presses and cursor movements. Definition 1 Tracen = s1 (keyi si)* i=1...n, 1.
n is the length of the trace,
2.
si is the ith screen snapshot received at the user’s terminal, and
3.
keyi is the sequence of key-presses issued by the user at snapshot si-1 that caused the application to send the next snapshot si to the user’s terminal.
According to Definition 1, a recorded trace consists of a sequence of snapshots of the screens forwarded by the legacy application to the user’s terminal and the user
reactions, such as key-presses, between every two snapshots.
2.2 Legacy-interface behavior modeling The objective of the second step in the CelLEST process is to extract a model of the system-user interaction, based on the collected traces. State-transition models have been traditionally used to specify the dialog between the user and the application through the user interface, for the purposes of model-based interface development and evaluation [16]. In our work, we have adopted statetransition models as the target abstract representation of the legacy-interface reverse-engineering process. The intuition behind this step is that, the collected traces represent “walks” through the underlying interfacebehavior state-transition model of the legacy application. Using these walks as examples, then, the underlying model can be inferred. The legacy-interface model produced in this phase is a directed graph: Definition 2 UImodel = (StatesUI, TransitionsUI) 1.
StatesUI = {Sti, i=1…#States}, s ∋ Tracen Þ ∃ St ∋ StatesUI: instance_of(s St),
2.
TransitionsUI = { (Stsource, Stdestination) }, (Stl, Stm) ∋ TransitionsUI Þ ∃ (si-1, keyi, si) ∋ Tracen ∧ instanceof(si-1 Stsource) ∧ instance-of(si Stdestination)
As shown in Definition 2, each snapshot in the recorded trace is an instance of some distinct state in the abstract interface model. Furthermore, for each transition in the model there must exist at least one keystroke sequence in the recorded trace that leads from a snapshot, which is an instance of the source transition state, to another snapshot, which is an instance of the destination transition state. In the overall context of the CelLEST method, the userinterface model is used as a driver of the legacy-interface behavior. Based on this model, the new front-end user interface will, given a new screen snapshot, recognize the state in which the legacy interface is and execute the necessary actions to go to the next state. The issue becomes how to identify the distinct interface states in which the legacy interface can be, and how to construct a snapshot classifier that is capable of recognizing new snapshots as instances of these distinct states. To address this problem, CelLEST adopts a snapshot clustering approach. The underlying assumption is that two visually similar snapshots should behave in the same way and therefore should be instances of the same distinct interface state. Thus, clustering visually similar snapshots together should result in clusters of same-state instances. Consequently, based on these clusters, a snapshot classifier can be induced for recognizing new snapshots as behavioral-state instances.
Proceedings of the International Conference on Software Maintenance (ICSM’02) 0-7695-1819-2/02 $17.00 © 2002 IEEE
Clustering is a generic problem with instances in a variety of application domains. In general, clustering algorithms are either batch, assuming that the complete set of input instances is available at the same time, or incremental, allowing for additional instances to be provided after initial clustering. Incremental algorithms, given a new instance, decide the cluster to which it belongs by evaluating how similar the new instance is to the existing clusters. Batch clustering algorithms are either top-down or bottom-up. Top-down algorithms start with a single cluster and continuously decompose it until a stopping criterion is met. Bottom-up algorithms start with each instance belonging to a cluster by itself and join clusters until a stopping criterion is met. Irrespective of their control flow, all clustering algorithms require a distance (or similarity) metric, on the basis of which to decide whether to split a cluster (in top-down algorithms), or whether to join two clusters (in bottom-up algorithms), or whether a new instance is similar enough to any of the existing clusters (in incremental algorithms). Any such metric depends on a set of features that describe the input instances. The result of clustering is a partition of the entire snapshot set, i.e., a set of non-overlapping clusters such that each recorded snapshot belongs to a cluster. We have explored two clustering algorithms: an incremental algorithm and a top-down algorithm that stops when the number of expected clusters is reached [19]. These two algorithms have different knowledge requirements and each one is preferable under different usage scenarios. Through analysis of several legacy interfaces, we have identified a set of distinguishing screen features [18]. We use these features to transform the recorded trace snapshots into vectors of feature values, which are then input as instances to the clustering algorithms. Once a correct partition has been produced, given the snapshot clusters, a classifier can be induced [15] that can correctly classify the individual snapshots into their corresponding clusters. This classifier can then be used at run time to recognize new, previously unseen snapshots as instances of the user-interface states. The accuracy of the classifier, and consequently the ability of the new interface front-end to monitor and control the state of the underlying legacy application, depends on two factors. First, it is important that the input traces “cover” the legacy user-interface behaviors, i.e., that enough examples of all screens of the legacy interface have been recorded. Since the emulator is not intrusive, with emulators installed to the terminals of a variety of legacy users for sufficiently long time, it is fairly simple to record long, and thus sufficiently representative, traces. Another factor affecting the classifier accuracy is the quality of the clustering step. This is why, the clustering process in CelLEST is interactive and can be guided by an expert legacy user who is familiar with the legacy interface. An expert legacy user can review the clusters and identify
incorrectly classified snapshots, in which case, the process continues until all errors are eliminated.
“normal” and “slightly exceptional” executions of a task as related to the same task.
2.3 Task-execution example mining
The CelLEST pattern-mining variant also requires as input the specification of an “interestingness” criterion defining the necessary requirements for a pattern to be recognized as a potentially interesting legacy service. This criterion is a function of how long a pattern has to be, how many times it has to occur in the trace, and how many insertion errors these occurrences may include. Given this criterion, the algorithm extracts “maximally” interesting patterns, that is, interesting patterns that are not prefixes of any other pattern with the same frequency.
The end goal of the CelLEST migration process is to make available the services currently provided by the legacy application to an extended set of users and other applications through the web. Therefore, after the overall behavior of the legacy user-interface is modeled, the objective is then to identify various user tasks accomplished with it. Intuitively, every time users perform a task, they go through the same sequence of steps; therefore, similar state-transition sequences in the collected trace should reveal “paths” in the user-interface model, whose traversal corresponds to accomplishing the distinct services offered by the legacy application. Therefore, the CelLEST method adopts a sequentialpattern mining approach to solve the problem of identifying the legacy-application services. Sequential pattern mining is a general problem with instances in a variety of areas, ranging from e-commerce (i.e., transaction behavior analysis) to bio-informatics (i.e., DNA and protein sequence analysis). A variety of algorithms have been formulated for its various instances [1, 11, 3]. In the context of CelLEST’s task-execution pattern mining, a pattern is a sequence of states: Definition 3 P = {Stk, ..., Stl, ..., Stm}, such that, there exists a sufficient set of supporting episodes epi in the trace, where 1.
instance_of(epi[1] Stk) ∧
2.
instance_of(epi[length(epi)] Stm) ∧
3.
Stj ∋ P ∧ Stk ∋ P ∧ Stj after Stk in P ∧ ∃ instance_of(epi[j] Stj) P ∧ ∃ instance_of(epi[k] Stk) Þ epi[j] after epi[k] in epi.
As shown in Definition 3, the CelLEST process aims at identifying “interesting” patterns that are ordered sequences of states that have a substantial number of supporting episodes in the trace. An episode in the input trace is considered to be an occurrence of a particular pattern, if, the first and last snapshots of the episode are instances of the first and last states of the pattern, each state in the pattern corresponds to a snapshot in the episode that is an instance of the state, and the instance snapshots appear in the episode in the same order as their corresponding states in the pattern. Note that, there can be snapshots in the episode that do not correspond to any state in the pattern. This is to allow for some degree of noise in the input examples. For example, while executing a task, a user may reach an error state and then recover to continue the task to completion. Such problems result in several spurious states that may appear in some of the examples. Our algorithm [10] can correctly recognize
Proceedings of the International Conference on Software Maintenance (ICSM’02) 0-7695-1819-2/02 $17.00 © 2002 IEEE
The discovered patterns are, once again, reviewed by an expert, who can change the criterion to narrow or widen the result set, if too few or too many patterns were retrieved. The expert can choose a group of patterns that meet the desired interestingness criterion and compact it by removing any pattern that is a sub-pattern of another pattern, even if it is maximal. This makes the results easier to comprehend. Finally the user reviews instances of each pattern to see if they correspond fully or partially to a real user task, executed with the legacy interface.
2.4 Task analysis Given a task-execution pattern, all its occurrences in the traces are collected. These pattern instances constitute essentially “demonstrations” of a user task and a corresponding legacy service that needs to be migrated to a web-enabled platform. The objective of this phase is to analyze these examples, in order to construct a semantic model of the underlying task. The user-interface model constructed by the interfacebehavior-modeling phase is syntactic [16]; it describes the visual appearance of the interface screens and the transitions between them. In comparison, the objective of this task-analysis phase is to model the pieces of information exchanged between the user and the legacy interface, their inter-dependencies, and the elementary interactions that enable this information exchange on the current interface. To that end, the pieces of information input by the user to the interface and the pieces of information displayed by the interface to the user must be identified first. Examples of the former are already available; all key-presses are already recorded in the trace by the emulator. To identify the latter, an expert user highlights on the snapshots of the task examples the areas that contain information that the user needs to receive in order to successfully complete the task in question. Given the set of annotated task-specific traces, the objective of the task-analysis step becomes to construct a detailed model of each individual interaction, in terms of 1.
whether it is a user input or data display,
2.
what is its syntax, i.e., where on the screen it occurs, how long the displayed data is or what is the syntax of the command input to the system,
3.
how it depends on other interactions that occur in the context of the task, and
4.
what are the domain-specific semantics of the information exchanged through this interaction.
The CelLEST process adopts a pattern-based learning approach to precisely model the syntax of the interactions. Localizing where on the screen the interaction takes place is fairly simple when the screen is static. For example, in screens such as navigation menus or input forms, the interaction happens in the same absolute coordinates of the screen. These coordinates are discovered as constants across all instances of the interaction in the input taskexecution examples. In more dynamic screens such as free-form data-display screens, the task-analysis method attempts to discover static “landmarks” on the screen, such as labels, for example, relative to which the interaction remains static. To address the commandlanguage learning subproblem, CelLEST also adopts a pattern-based learning approach [19]. Based on research in human-computer interaction, the design of command languages follows a fairly standard pattern [16]: the syntax of each command usually consists of the command name, a set of option specifications, and a set of parameters. CelLEST assumes this high-level pattern, and uses the actual commands, as they appear in the collected traces, as examples to refine it into individual patterns, each one specifying the syntax of an individual legacy-interface command. Having learned where on the interface data is input and displayed, the specific values of the data exchanged in the individual example traces are examined to generate hypotheses about what are the distinct types of information exchanged during the plan execution. Depending on the values input to and acquired from the application during the task execution, the following types of information dependencies can be discovered: 1.
Constants: an information entity, whose value is the same in all the corresponding actions of all the example instances of the task execution;
2.
Range: an information entity, whose value varies within a well-defined range for all example traces, i.e., in all these traces few and equally frequent values have been encountered;
3.
Derived: an information entity, whose value is obtained through an information-acquisition action early in the plan and is subsequently provided to the system through an information-input interaction;
4.
Redundant: an information entity, whose value is provided as input to the system through multiple information-input interactions; and
Proceedings of the International Conference on Software Maintenance (ICSM’02) 0-7695-1819-2/02 $17.00 © 2002 IEEE
5.
Unpredictable: an independent information entity that has to be input by the user; unpredictable entities are essentially the “true” user inputs required for the performance of the task in question.
Finally, the CelLEST user may inspect the identified pieces of information and name them with meaningful names, compose them in objects of domain-specific types and define their relations. The task model produced by this phase specifies (a) the path on the interface state-transition model through which the user navigates, i.e., the pattern underlying all the examples on the basis of which this task has been modeled, (b) the flow of information between the legacy application and the user, and (c) the syntax of the interactions through which the information is exchanged. Effectively, it constitutes a declarative and executable specification of the modeled task of the legacy application. Given values for all the “unpredictable” pieces of information identified, the model can be used to drive the legacy application and execute the modeled task.
2.5 Web-based user-interface specification The model constructed in the task-analysis phase constitutes the specification of an interesting legacy service. It also provides the basis for the abstract specification of a GUI that wraps the system’s behavior during the execution of the task at hand. CelLEST automatically generates an abstract specification of a form-based user interface that will act as its front-end. The resulting abstract user interface is specified in XML and is executable by a set of custom-built run-time interpreters. The interpreters are responsible for implementing the interface specification with the interaction toolkit available on their platform and translating the information provided by the front-end user to a sequence of calls similar to the ones that the user’s interaction with the terminal-emulator interface used to generate. At this point, we have developed interpreters for translating the abstract specification in XHTML, and thus enabling its access through XHTML-enabled browsers, and in WML to make it available to WAP devices like cell-phones, PDAs etc [8, 4]. These two platforms have widely varying requirements and we have chosen them in order to demonstrate the flexibility of the CelLEST migration processes. Furthermore, the task model, in addition to being executed by a user through a new user-interface front-end, can also be executed in response to a request from an external application. In this manner, services provided by legacy applications can effectively be integrated to provide more complex value-added services [5, 9].
3. The Role of the User in the CelLEST Process
as an expert user. All references to “the user” in the rest of this section describing the case study refer to this person.
The CelLEST process is semi-automatic: we have developed a toolkit to support the various steps we discussed in this paper. Tasks T1 and T2 in Figure 1 are supported by LeNDI [18, 19, 10]; Tasks T3 and T4 in Figure 1 are supported by Mathaino [7, 8, 4]. Since all the phases of the CelLEST process are inductive in nature, i.e., they construct models on the basis of examples, and their results are products of “unsafe” inferences. In the context of the CelLEST method, we propose to address it in two ways. The first is methodological; we assume that recording emulators will be provided to all users whose jobs involve the functionalities that need to be migrated. In this way, we expect the collected traces to “cover” all interesting aspects of the legacy user interface, albeit not necessarily the whole use interface. In addition, the CelLEST environment includes the QandA (Questions AND Answers) tool, supporting the reviewing, verification and possibly revision of these results by an expert user. The role of the QandA system is to visualize the intermediate products of the process (screen clusters and task models) so that an expert user, familiar with the legacy interface that is being migrated, can validate or revise them.
4.1 Trace Collection
Note that the knowledge requirements assumed by the CelLEST process can be met by an expert user of the legacy application. This user should be sufficiently familiar with the legacy interface to be able to recognize whether two snapshots are instances of the same screen (required by the snapshot clustering step) or whether a screen sequence is a meaningful user task (required by the task-mining step) or whether an extracted commandsyntax pattern indeed represents the corresponding user action on the interface (required by the task-analysis step). In that sense, CelLEST is a lightweight method that does not require expert software-development skills, such as parser construction for example, that are usually required by more traditional reengineering methods. This simplicity comes, of course, at a cost: this method does not actually change the legacy application, thus it cannot be used for maintenance purposes. It simply enables the access to already developed services from new platforms.
The user connected to the system and used it to inquire about the timetable information and exam schedules of various courses. The user repeated several variants of the inquiry task with different parameters. Two traces of interaction with infoMcGill were recorded: the first trace was 351 snapshots long and the other 78.
4.2 Legacy-Interface behavior Modeling LeNDI was used to extract the feature vector of every recorded snapshot. Through incremental clustering, analysis of the features revealed that a combination of two features was sufficient to uniquely identify the snapshots of infoMcGill system. These two features are the title in the middle of the second line and the number of the unprotected fields in each snapshot received. If both features match for two snapshots, then these snapshots belong to the same state. If any of them does not mach, then the snapshots belong to different states. Screen ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Screen Description First Screen List of Systems Menu InfoMcGill System Menu McGill University Timetable Menu Winter 2002 Timetable Menu Science 2002 Timetable Examination Schedules Menu Arts & Science Examination Schedules Menu Arts & Science Examination Schedules Summer Session 2002 Timetable Help for InfoMcGill – Commands Summer Studies Examination Schedules Menu Exams (August) Note Management 2002 Timetable Arts 2002 Timetable Help for InfoMcGill Education 2002 Timetable Classified Ads Menu Classified Ads: Jobs Goodbye
Freq. 2 2 38 45 47 32 33 31 102 9 3 3 1 27 39 2 5 2 4 2
Table 1: The behavioral states of the infoMcGill application.
4. An Illustrative Example In this example we describe an application of the CelLEST process to migrate one of the services of a legacy university information system to the web. The legacy application used is infoMcGill, the IBM 3270 information system of McGill University, publicly available via infomcgill.mcgill.ca. This case study was conducted by a single person, who was familiar with the legacy application in question but would not be considered
Proceedings of the International Conference on Software Maintenance (ICSM’02) 0-7695-1819-2/02 $17.00 © 2002 IEEE
The long trace was used to build the state-transition model. Single path incremental clustering [19] was used to cluster similar snapshots together. The model had 20 nodes, i.e. 20 distinct behavioral states. A list of these states is shown in Table 1. The third column is the number of snapshots of every state that was recorded in the traces. One out of 351 snapshots was not placed in its cluster, but in a separate cluster, which results in a redundant node in the state-transition model. The user feedback fixed this error. Then the shorter trace was used to test the model.
LeNDI could correctly classify 77 out of 78 snapshots in this second trace. It failed to classify one snapshot and placed it in a separate new cluster but did not miss-cluster it by classifying it in a wrong cluster.
4.3 Task-execution example mining The input traces were preprocessed by replacing consecutive occurrences of the same state with a single occurrence, followed by the count of the original number of occurrences. LeNDI’s sequential pattern mining algorithm [10] was applied to the input traces several times with various interestingness criteria, until a suitable criterion was found. This criterion specifies that the minimum pattern length is at least 4 states long, and the minimum number of occurrences of each pattern is 4. Further, it accepts only patterns with no noise, i.e., all instances of the pattern should be exactly identical. Pattern Support 1 5-4-3-7-8-9-8-7-3-4-5-6-5-4-3-7-8-9-8-7-3 4 √ 2 7-3-4-5-6-5-4-3-7-8-9-8-7-3-4-5 4 √ 3 3-4-5-6-5-4-3-7-8-9-8-7-3-4-5 5 4 7-3-4-5-6-5-4-3-7-8-9-8-7-3 5 5 3-4-5-6-5-4-3-7-8-9-8-7-3-4 6 6 3-4-5-15-5-4-3-7-8-9-8-7-3 4 √ 7 3-4-5-6-5-4-3-7-8-9-8-7-3 7 8 5-4-3-7-8-9-8-7-3-4-5 10 9 5-4-3-7-8-9-8-7-3-4 11 10 5-4-3-7-8-9-8-7-3 13 11 3-4-5-15 6 12 7-8-9-8 14 13 3-4-5-14 4 √ 14 3-7-8-9 14 15 5-6-5-4 8 16 7-3-4-5 11
Table 2: The task-execution patterns discovered in the infoMcGill trace. Using this criterion, the algorithm discovered the patterns shown in Table 2. The support column shows the number of occurrences of each pattern in the trace. The user reduced the results set by removing patterns that are subpatterns of other patterns in the results set. This left only the four patterns, shown with check marks in Table 2, which are easier to comprehend than the entire results set. Finally the user retrieved and analyzed instances of each pattern to examine if it corresponds fully or partially to real user task. He concluded that four tasks are represented by the patterns marked with √ in Table 2: 1. 3-4-5-6+-5-4-32
2
The + added in the patterns above denotes that consecutive instances of the same state may exist in the original recorded traces.
Proceedings of the International Conference on Software Maintenance (ICSM’02) 0-7695-1819-2/02 $17.00 © 2002 IEEE
2. 3-4-5-14+-5-4-3 3. 3-4-5-15+-5-4-3 4. 3-7-8-9+-8-7-3 The first pattern represents the task of finding timetable information of a science course. 7 instances of this pattern were recorded in the traces. Note that since the patterns in Table 2 overlap and pattern 1 above is a sub-pattern of some of them, one cannot get the exact number of instances of pattern 1 from Table 2. The corresponding task starts by accessing state 3, “InfoMcGill Systems Menu”, then state 4, “McGill University Timetable Menu”, and then state 5 “Winter 2002 Timetable Menu”. Patterns 2 and 3 are similar to pattern 1 but the task accomplished is finding timetable information of a management course in pattern 2 and an art course in pattern 3. Pattern 4 represents the task of finding the exam schedule of any course, i.e. science, management or art. Since entries 1, 2 and 6 show that instances of patterns 1 and 2 are always followed by instances of pattern 4, one can collapse the patterns above into 3 patterns. These more complex patterns correspond to the tasks of finding timetable information and exam schedule for a science, a management or an art course respectively. These patterns are: 1. 3-4-5-6+-5-4-3-7-8-9+-8-7-3 2. 3-4-5-14+-5-4-3-7-8-9+-8-7-3 3. 3-4-5-15+-5-4-3-7-8-9+-8-7-3 The decision regarding which are the most meaningful patterns is subjective. The pattern extraction algorithm simply identifies frequently occurring traversals in the legacy interface. Some of them may be spurious, although the likelihood of this happening can be reduced through repetitive fine-tuning of the interestingness criterion. In the end, however, the selection of the patterns of interest will have to be based on the types of functionalities that have to be migrated.
4.4 Task analysis As we just described in the previous section, one of the discovered patterns corresponds to the task of “finding timetable information of a science course”. This pattern was identified as representing a user task that should migrate, since students may want to access this information from their home machines. There were 7 instances of this pattern in the trace and these instances were used as the input to the task-analysis phase. In this phase, Mathaino discovered that there is only one independent input necessary, namely the name of the course in question. As can be seen from the analysis table in the bottom right of Figure 2(a), only the input field 0 of the 7th state is “unpredictable”. At this point in the task execution, the user has to input the name of the course s/he is interested in. Note that the same value is also provided as input on the 14th state (this is why the
Understanding the system-user interaction at the level of information exchange enables the optimization of the interaction in a way that is not possible when only the widgets involved in the interaction are modeled.
input at that later stage is characterized as “redundant”). All other inputs are “constants”, i.e., they had the same values in all 7 examples analyzed, and their values are shown in the rightmost column of the table.
4.5 Web-based user-interface specification
3.
On the basis of the information-exchange analysis, Mathaino proceeds to design two forms, as shown in Figure 2(b). The first form (shown to the left) consists of a text-entry box where the user is expected to enter the name of the course s/he is interested in. The second form (shown to the right) contains the two pieces of output information provided by the application in return: the schedule of the course meetings and the schedule of its final exam. Because the timetable information appears in dynamic locations of the screen and is not marked by any static labels, Mathaino retrieves the two results’ pages in their entirety. The forms are organized in a tabular format. The relative layout of the cells (i.e., labels to the left, as opposed on top, of the contents) and the actual labels (i.e., “course”, “class meets at”, and “exam is at”) are chosen by the user.
4. Reflections and Conclusions In this paper, we described the CelLEST method for reverse engineering the user’s interaction with a legacy application and for wrapping task-specific segments of this interaction with new web-accessible front-ends. This method consists of the following steps: First system-user interaction traces are collected by specially instrumented, unintrusive middleware. Next the dynamic behavior of the system interface is reverse engineered, in terms of the screens it presents to the user and the navigation it allows through them. Next, task-specific navigation paths are analyzed in order to extract a model of the user’s task of interest, in terms of the interface navigation and the information exchange it implies. Finally, an appropriate web-based interface is constructed for wrapping this navigation and enabling its execution from a standard web browser. The CelLEST approach exhibits three important advantages over traditional, code-based reengineering approaches: 1.
2.
It is code-independent and it can be applied to reverse engineering the interfaces of legacy systems using a block-transfer mode protocol between the host and its terminals. We believe that the level of automation enabled by the CelLEST environment promises to bring substantial cost reduction and quality improvement to current state-of-the-art industry practices. It constructs a high-level, semantically rich intermediate abstraction of the legacy system behavior to support the interface migration.
Proceedings of the International Conference on Software Maintenance (ICSM’02) 0-7695-1819-2/02 $17.00 © 2002 IEEE
It is lightweight in terms of the skills it assumes: it requires an expert-user’s understanding of the application to be migrated as opposed to a developer’s understanding. With legacy systems, developed over a long period of time by different people, the former type of knowledge is often available where the latter is not.
For purposes of web enabling and lightweight system integration, “code understanding” is an expensive and possibly brittle approach to interface reverse engineering and migration. Our experiments with the CelLEST environment indicate that “interaction understanding” is an effective alternative. This method is generally applicable and powerful. Its applicability depends primarily on the generality of the recording component, and secondarily, on the feature set we have developed for recognizing screens. We have successfully tested our recording and feature extraction components with tn3270 and vt100 legacy systems. More importantly, our interaction reverse engineering method is powerful, in that it constructs a high-level model of the interaction behavior between the legacy system and its users. Traditional approaches to user-interface migration [13] analyze the widgets implementing the legacy interface so that they replace them by functionally similar widgets in new platforms. Instead, our method constructs a model of the interface behavior in terms of behavioral states and possible commands. Thus, instead of replicating the same interaction with different widgets in new platforms, we can encapsulate interesting behavioral segments with new user-interface front-ends on different platforms. The work reported in this paper is still in progress, but the results of our initial experimentation with the interface reverse engineering, task modeling and interface migration processes are quite promising, and we continue to develop and evaluate this process.
Acknowledgements This work was supported by NSERC, the Natural Sciences and Engineering Research Council of Canada, and ASERC, the Alberta Software Engineering Research Consortium.
References 1.
2.
Agrawal, R. and Srikant, R.. Mining Sequential Patterns. 11th International Conference on Data Engineering, pp. 3-14, IEEE Computer Society Press, 1995. Antoniol, G., Fiutem, R., Merlo, E. and Tonella, P.: Application and User Interface Migration From Basic to Visual C++. International Conference on Software
Maintenance, pp.76-85, IEEE Computer Society Press, 1995.
Comprehension, June 26-29, 2002, Paris, France (to appear) IEEE Computer Society Press.
Brejova, B., DiMarco, C., Vinar, T., Hidalgo, S. R., Holguin, G. and Patten, C. Finding Patterns in Biological Sequences. Unpublished project report for CS798G, University of Waterloo, Fall 2000. Stroulia, E., Kapoor, R. V.: Reverse Engineering Interaction Plans for Legacy Interface Migration. 4th International Conference on Computer-Aided Design of User Interfaces, May 15-17, 2002, Valenciennes, France, pp. 295-310, Kluwer Academic Publishers.
11. Mannila, H., Toivonen, H. and A. I. Verkamo. Discovery of Frequent Episodes in Event Sequences. Data Mining and Knowledge Discovery, vol.1, no. 3, pp. 259-289, November 1997. 12. Merlo, E., Gagné, P.Y., Girard, J.F., Kontogiannis, K., Hendren, L.J., Panangaden, P., and De Mori, R.: Reverse engineering and reengineering of user interfaces, IEEE Software, vol. 12, no. 1, pp. 64-73, 1995.
5.
H. Zhang, E. Stroulia: Babel: An XML-based Application Integration Framework, 14th International Conference on Advanced Information Systems Engineering, 27-31 May, 2002, Toronto Canada, pp. 280-295, Springer Verlag.
13. Moore, M., Rugaber S. and Seaver, P.: Knowledgebased User Interface Migration. In Proceedings of the 1994 International Conference on Software Maintenance, pp. 72-79.
6.
Gannod, G., Mudiam, S., and Lindquist, T.: An Architectural-based Approach for Synthesizing and Integrating Adapters for Legacy Software, 7th Working Conference in Reverse Engineering, Brisbane, Australia, November 2000, pp. 128-37, IEEE Computer Society Press.
3.
4.
7.
Kong, L., Stroulia E., and Matichuk B.: Legacy Interface Migration: A Task-Centered Approach. In Proceedings of the 8th International Conference on Human-Computer Interaction, pp. 1167-1171, Lawrence Erlbaum Associates, August 1999, Munich, Germany.
8.
Kapoor, R. V., Stroulia, E.: Mathaino: Simultaneous Legacy Interface Migration to Multiple Platforms. 9th International Conference on Human-Computer Interaction, 5-10 August 2001, New Orleans, LA, USA, pp. (vol. 1) 51-55, LEA.
9.
H. Zhang, E. Stroulia: Babel: Representing Business Rules in XML for Application Integration. (Research Demonstration) 23rd International Conference on Software Engineering, 12-19 May 2001, Toronto, Canada. pp. 831-832. IEEE Computer Society Press.
10. M. El-Ramly, E. Stroulia, and P. Sorenson: Mining System-User Interaction Traces for Use Case Models. 10th International Workshop on Program
Proceedings of the International Conference on Software Maintenance (ICSM’02) 0-7695-1819-2/02 $17.00 © 2002 IEEE
14. Phanouriou, C. and Abrams, M.: Transforming Command-Line Driven Systems to Web Applications. Computer Networks and ISDN Systems, vol. 29, no. 8-13, pp. 1497-1505, September 1997. 15. Quinlan, J.: C4.5: Programs for Machine Learning, San Mateo, CA: Morgan Kaufmann, 1993. 16. Schneiderman, B.: Designing the User Interface, Addison-Wesley 1999. 17. Tucker, K. and Stirewalt, K.: Model based userinterface reengineering. 6th Working Conference on Reverse Engineering, October 1999, Atlanta, Georgia USA, pp. 56-63, Atlanta, Georgia, IEEE Computer Society Press. 18. Stroulia, E., El-Ramly, M., Kong, L., Sorenson P., Matichuk B.: Reverse Engineering Legacy Interfaces: An Interaction-Driven Approach. 6th Working Conference on Reverse Engineering, October 1999, Atlanta, Georgia USA. pp. 292-302, IEEE Computer Society Press. 19. M. El-Ramly, P. Iglinski, E. Stroulia, P. Sorenson, B. Matichuk: Modeling the System-User Dialog Using Interaction Traces. 8th Working Conference on Reverse Engineering, 2-5 October 2001, Stuttgart, Germany, pp. 208-217, IEEE Computer Society Press.
Figure 2: The “Course Timetable” task (a, top): information analysis, (b, bottom): the UI forms developed for the front-end.
Proceedings of the International Conference on Software Maintenance (ICSM’02) 0-7695-1819-2/02 $17.00 © 2002 IEEE