Business Rule Extraction from Legacy Code - Computer Software and ...

Business Rule Extraction from Legacy Code H. Huang*, W. T. Tsai*, S. Bhattacharya+, X. P. Chen*, Y. Wang*, J. Sun* *Department of Computer Science, University of Minnesota, Minneapolis, MN 55455 +Department of Computer Science and Engineering, Arizona State University, Tempe, AZ 85287

Abstract

maintenance problem. Our interaction with several large software maintenance workshops, where millions of lines of COBOL -programs have been maintained over many years, depict this situation. This is shown in Figure 1.

Business rules are operational rules that business organizations follow to peq%rm various activities. Over time, business rules evolve and the software that implemented them are also changed. As the encompassing software becomes large and aged, the business rules embedded are dincult to extract and understud. Furthermore, the encompassing software is changed without changing the corresponding documents, and thus often the business organization trust the code more than any other documents. It is possible to use a generic tool to extract business rules, however this can be un expensive exercise. This paper proposes a tailored solution approach to the business rule extraction problem, which combines variable classifications, program slicing, and hierarchical abstraction among other maintenance techniques. The proposed approach has been implemented as a system and successfully experimented with a number of industrial programs. The prototype has been demonstrated at several industrial software maintenance sites since June 1995.

company lnltlates a buslnesr

I

I

I I

I

functanal updater and maintenance

I I

r verran 1

versm

I I

I

2

vermm 3

growing dirpanty betweer the text document and roflware code, text dacumeot little undemfm

1

Figure 1. The motivation behind business rule extraction.

In such systems, when a business policy update occurs, the maintainers require to understand the relevant business rules that could (or should) correlate with that particular update operation. As discussed above, since the programmers have little trust and understanding of the text document, this is a difficult task. The maintainers require an assisted or automated approach which could extract business rules from the code. There, one of the most frequent software maintenance question is: Is it possible to extract the business rules embedded in the current legacy code so that they will be able to check with those rules in written documents? This is a critical and important issue because once the business rules are known, it will be easier for the business organization to develop new rules by modifying the existing rules. Furthermore, business rules often contain useful proprietary information that might not be available in any other forms due to a variety of reasons, such as departure of key personnel. This paper identifies BRE as an important problem. It is critical for many business operations, and is currently an expensive task for the business units that chose to involve large amounts of software in their operations. It is possible to use a generic software maintenance tool to extract business rules, however this can be an expensive

1. Introduction Legacy software systems usually contain business rules (e.g., billing regulations, business decisions) that have been coded into the systems over years. Telephone billing rates are an example of business rules. Every time a customer makes a call, the billing program of the phone company is responsible to keep track of all the relevant information, and make appropriate charges according to the billing rate. Business rules are subject to changes as the markets and technology change. When an update occurs in the business practices or rules, the corresponding segments of the software must be changed. In course of time and numerous update phases, both the software and the text documents become larger and increasingly difficult to understand and maintain. The software programmers gradually tend to focus only on the software and lose confidence in the text documents. This creates a

162 0730-3157/96 $5.00 0 1996 IEEE

,

Tune progress,

I

software almart matcher the text document. well undwrtocdtext dcwmenl

exercise, and this will be briefly discussed in section 2. In section 3, we propose a tailored solution approach for the BRE problem, which combines variable classifications, program slicing, heuristics for identifying slicing criteria, multiple representations of business rules and hierarchical abstraction among other maintenance techniques. The proposed solution approach has been implemented as a system and successfully experimented with a number of industrial programs. The implementation is discussed in section 4. Section 5 describes an example using the tool.

mapping from any business rule to its corresponding code segments that implement the rule, and vice versa. This capability will allow the software personnel to focus attention to only those segments (and functions) of the software that are relevant to a particular update operation.

3. Proposed approach In general, a business rule can be defined as a function, constraint or transformation of an application's inputs to outputs. This is so because a business rule must eventually produce some outputs to its intended users, and it must take some inputs form its users to start the processing. Formally, a business rule R can be expressed as a program segment F that transforms from a set of input variables I into a set of output variables, 0 = F (I). For example, Profit = Earnings - Expenses is a simple business rule where Profit is the output, and Earning and Expenses are inputs, and the minus operation is the function.

2. Business rule extraction criteria We have interacted with several software maintenance organizations about their needs of BRE, and now summarize their requirements. 1) Faithful representation: Any business rules extracted frorn the code must reflect the true state of the software. Thiis is the most important criterion, and if this criterion conflicts with other criteria then this criterion should supersede. The reason for the critical nature of this criterion is that programmers trust the code more than they trust the associated documents. 2) Multiple Representation and Hierarchical abstraction: Different people require different representatioins of business rules. For example, a programmer would like to have business rules represented as code segments, while managers may prefer decision tables, decision trees, and structured charts. Business rules should be represented in a hierarchical manner. Business rules are often rather complex because they must meet various constraints, such as legal, marketing and technology constraints. It will be extremely difficult to trace business rules without some form of abstractions or decomposition. 3) Domain-specific Business Policies: Most software maintainers prefer business rules to be expressed in domain vocabulary that represent domain concepts in the application. The tool should provide a means to identify important concepts including data and algorithms related to business rules from other supporting program entities. 4) Human-assisted Automation: As legacy systems are huge, it is extremely difficult, if not impossible, to devise an automatic tool for BRE. The software maintainers prefer to have an interactive tool that allow them to extract business rules, simplify their representations, and provide linkage to the code, rather than providing a black-box tool that generate business rules automatically. 5) Maintenance Tool: Business rules extracted should be useful in helping other software maintenance activities. The rules extracted should be maintained together with the software using the same tool. The tool should provide the

3.1. Overall approach to business rule extraction A business rule is usually centered around certain data, either input or output data. For example, the above simple business rule can be attached either to the output variable Profit, two input variables, or both in the text documents. Thus, we take a data-centered approach [5] to perform BRE. Recall that a business rule is a form O=F(I), and we can first identify either I, 0, or both to start BRE. For example, if we wish to extract the business rule that deals with 1-800 calls in a phone billing program, we may first identify the variables that represent the 1-800 phone charges, and then extract the code segments that either directly or indirectly manipulate these variables to obtain the business rules related to 1800 charges. This is the data-centered approach to BRE. Thus, the first step is to identify important variables from the code that can be used to express business rules. In a typical large business application, we can find thousands of even hundreds of thousands of code variables, but only a subset of them are suitable for expressing business rules, i.e., those that have domain concepts in the application. We call these variables domain variables, and we need a mechanism to identify those domain variables from other code variables. Section 3.2. describes our approach to automatic identification of domain variables. Once we identify domain variables, we extract the relevant code segments by generalized program slicing (GPS) [4]. But, to carry out GPS, we need to specify the slicing criteria which include both the variable names and the starting point in the program. Section 3.3 discusses

163

the issues on code extraction using GPS and approaches to handle them. Once we obtained the I relevant code segments, we need to concern with the presentation of the J segment to different users. Ehw&tmn UMny Iimcmlnlbl Roputu slicine Section 3.4 presents several presentation schemes to Buxlncrx Rule address the rule representation I issues as discussed in section t=igure 2 BRE process 2. Figure 2 illustrates the Illustration BRE process.

ACTIVITY-FEE

3.3. Code extraction using program slicing After identifying the I and 0 in O=F(I), we can either trace the code manually, or use program slicing [8] to retrieve relevant code segment. The idea is to retrieve the code segment that has direct and indirect impact to the concerned variables and nothing else. Program slicing and its extensions [2, 5 , 41 are techniques for automatically decomposing programs by analyzing their data flow and control flow. Starting at a given point of the program, program slicing automatically retrieves all the relevant code using data and/or control dependence. Note that program slicing techniques alone will not be able to obtain business rules because slicing is just like a search engine that will not run until an input, the slicing criterion, is fed. Thus, the code extraction should be divided into two steps: 1) identify slicing criterion, and 2) perform program slicing.

C d C

RrprrUl..lr"on

3.2. Domain variable identification Domain variables can be identified by variable classification techniques [ 11. An automatic tool for variable classification was reported in [1, 51. In this section, we discuss only the heuristic rules for choosing domain variables of interest. Heuristic rule 1: We select the overall system's input and output variables as the selected domain variables. Inputs and outputs are major characteristics of a software system. The system can be viewed as a black box that maps its inputs to outputs. They are good starting places to explore the system. By tracing the code from input to output, and vice versa, a maintainer can understand the transformations. Heuristic rule 2: We select the inputs and outputs of each procedure as the selected domain variables. Procedures are functional components of a system. Most of the business applications we encountered are naturally hierarchical in organization, even though not necessarily well-structured, and we should take the advantage of the program hierarchy in BRE. Often, the computation between the program inputs and outputs are rather involved, and tracing and representing it can be a tedious and challenging. Intermediate variables, such as procedm inputs and outputs can simply it and make it more readable. Example 1: The following business rule is used to compute a student's tuition bill. It is expressed in terms of program inputs and outputs only. IND-BILL=CREDITS*127.50+10

or

251+(25

or

3.3.1. Identification of slicing criterion Traditionally, a slicing criterion of a program P is a pair where i is a program statement in P and V is a set of variables referred at statement i [2]. But, this is not sufficient for business rule extraction because it has no constraints on the search space, and hence program slicing often produces too large slices to be manipulated [4]. Furthermore, a business rule may span only a particular portion of a program, for example, from some procedure inputs to an output in the same procedure. But without constraints, program slicing will search the whole program and produce slices that contain irrelevant or unnecessary code [4]. Hence, a slicing criterion must be generalized to accommodate constraints so as to restrict the search space of slicing. Thus, identification of slicing criterion consists of three parts: 1) variable set V; 2) program statement i; and 3 ) constraint C. Note that Vis decided upon by the rules in section 3.2. The other two parts are discussed next. Statement Identification - Program statement i may be chosen using the following heuristics. Heuristic rule 3: The input and output statements of the program are good candidates for starting BRE. The input and output statements of a program usually indicate the beginning and the end of certain computation. By tracing from the input (or output) statements forward (or backward), we are able to retrieve all the relevant code segment. Example 2: Following is a Cobol write statement:

SO)-SCHOLARSHIP

The above business rule simply says that the individual student bill is computed by summing the tuition, union, and activity fees minus any scholarship available. Using heuristic rule 2, it can be expressed in a more readable way as follows. IND-BILL=TUITION

+ UNION-FEE

TUITION = $127.50 UNION-PEE

+

ACTIVITY-FEE

-

SCXOLARSHIP;

* CREDITS;

= $25 if member,

= $25 if credits c 7, $50 otherwise.+

WRITE MOW-PRINTLINE

$0 otherwise;

LINES.

164

FROM MAIN-LINE1 AFTER ADVANCING

1

We can istart backward slicing at the above statement. The slice we get is a business rule that governs the computation of MAIN-LINE I. + Heuristicrule 4: A location which is a dispatch center of the program is a candidate starting point for forward slicing. A dispatch center delegates input data of different types to the corresponding processing units. Given an input, the computation originating from the dispatch center canries the business rules that regulate the processing of inputs of different types. Figure 3 illustrates this situation. Figure 3. Dispatch Center Example 3: Consider the following COBOL procedure excerpted from a sample industry program . It E, 1dispatches input records to Eo 1- P R ~ ~ E D ~ R~i PROCEDURE,EO2-PROCEDUREOr E04-PROCEDURE in terms of the value of CARD-NO field in the record. Clearly, if we start forward slicing from line O ~ S O O O , 0 5 5 2 0 0 , 055400, or 0 5 5 8 0 0 , we are able to separate the program into four parts, each part handles one kind of input records.

Constraints on Proeram Slicing: Two types of constraints on slicing are useful for business rule extraction. They are: 1) Depth Limit: Depth limit is a constraint which limits the search space by dependence distance. Depth limit can effectively reduce the search space on a dependence graph, and hence reduce the size of a slice. Example 4: We conducted an experiments on a sample industrial COBOL application from a telecommunication company using our generalized program slicing tool [7]. This application consists of 11 program files and 15K lines of code. We selected some random points in some program files, and then collected data for the size of slices without depth limit and the size of slices with 1 as depth limit. The average size of complete forward slices is 191.8 statements while the size with 1 depth is 3.6 statements. The average size of complete backward slices is 103.3 statements while the size with 1 depth is 4.7 statements. 2) Boundary: A slicing boundary is a set of points in a program that specify a program region to be sliced. These program points are implemented as a set of marks on the nodes in a control flow graph (CFG), such as a call tree. Program understanding is incremental. A programmer usually focuses on a particular portion of code at one time. He may already understand some functions within this portion, and these functions can be treated as intrinsic components that need not to be analyzed any more, and can thus be put outside the boundary of the slice. Example 5 : Figure 4 shows a call tree of a program. The slicing boundary specifies three nodes on the call tree. Node A is a start node specifying current slicing focus, node B and C are end nodes indicating the known parts within the focus. Given these boundaries, anything outside the subtree A is not included in the slice, nor is anything that is under the subtrees B Figure 4.A Slicing Boundary andC. +

+

054600 PROCESS-A-RECORD SECTION. 054700 PROCESS-A-RECORD-010 . 054800 EVALUATE CARD-NO 054900 WHEN ' EO 1 055000 PERFORM EO1-PROCEDURE 055100 WHEN 'Ell' 055200 PERFORM Ell-PROCEDURE 055300 WHEN ' E02 ' 055400 PERFORM EOZ-PROCEDURE 055500 WHEN ' EO 3 ' 055600 CONTINUE 055700 WHEN 'E04' 055800 PERFORM E04-PROCEDURE 055900 WHEN 'E05' 056000 CONTINUE 056100 END-EVALUATE. 056200 PERFORM READ-INPUT. 056300 PROCESS-A-RECORD-EXT, 056400 EXIT. 056500 SKIP3 8

Heuristic rule 5: An end point of a procedure is a candidate starting point for backward slicing. Some procedures are dedicated to produce the outputs. There are multiple Occurrences of output events for the same output variable. The sequence of these Occurrence of outputs is a business rule that shows how to organize the outputs to form a file, a table, or a screen display. Starting from the end of the procedure, a backward trace along all possible paths can identify all the occurrences of output events with respect to a given output variable, and thereby, form a slice. The slice is a business rule. Next, we discuss slicing contraints.

3.3.2 Choosing slicing algorithms Selecting slicing algorithm depends on the chosen slicing criterion and the structure of source code. Heuristic rule 6: Choose forward slicing if the variable in the slicing criterion is an input variable. Heuristic rule 7: Choose backward slicing if the variable in the slicing criterion is an output variable. An input is a starting point of dependence, and hence it does not make sense choosing backward slicing.

165

Similarly, an output is an end point of dependence, forward slicing from an output variable is useless. Heuristic rule 8: Choose forward slicing if the starting program location is a dispatch center. Since dispatch center delegates inputs to different processing units based on the type of inputs, forward slicing can efficiently separate out computation involved in the processing of a particular type of input. Multide-Dass analysis and recursive slicing: However, it is not sufficient to have only backward and forward slicing algorithms because business rules are rather complex. An interactive and iterative slicing, other than a single-pass slicing, is needeed to de-couple the rather convoluted computations. Recursive slicing [4] allows previously obtained slices to be further analyzed and decomposed. Traditionall program slicing can slice programs only, but slices are usually not programs. Recursive slicing facilitates the interactive and iterative program understanding. For example, recursive slicing can also be used to decouple the complex relations between multiple inputs and multiple outputs. Usually, one input contributes to more than one outputs, and an output depends on more than one inputs. Recursive slicing can be used in such a way that one first Derforms --+ uC*w forward slicing starting from an input to find all the code affected by the input, and then, choose a, an output to perform (4%**bl ;Name 4rrinm h+adrlmmgmmeFn

Business Rule Extraction from Legacy Code - Computer Software and ...

Business Rule Extraction from Legacy Code - Computer Software and ...

Suggest Documents

business knowledge extraction from legacy ... - Semantic Scholar

business knowledge extraction from legacy ... - Semantic Scholar

Rule-based Model Extraction from Source Code - CiteSeerX

Rule Insertion and Rule Extraction from Evolving Fuzzy Neural ... - AUT

Scalable Knowledge Extraction from Legacy ... - Semantic Scholar

Data Mining using Rule Extraction from - CiteSeerX

2 Rule Extraction from Neural Networks - CiteSeerX

On Rule Extraction from Regulations - CiteSeerX

Rule extraction from support vector machines - CiteSeerX

On Rule Extraction from Regulations - CiteSeerX

Connecting Legacy Code, Business Rules and ... - RuleML-2008

Connecting Legacy Code, Business Rules and ... - RuleML-2008

Business Rule Mining from Spreadsheets

Business Rule Based Software System Configuration Management

Clean Legacy Code

Reverse Engineering of Legacy Code Exposedl - Software Reverse ...

DWroidDump: Executable Code Extraction from Android Applications ...

LtRules: an Automated Software Library Usage Rule Extraction ... - Irisa

Legacy Software Modernization

Integrating A Rule Based Code Compliance Software Platform Into A ...

Migrating Legacy Software to the Cloud with ARTIST - IEEE Computer ...

[PDF] Code: The Hidden Language of Computer Hardware and Software

Automated Information Extraction from Empirical Software Engineering ...

Logical Structure Extraction from Software Requirements Documents