reference to inter-modular data Row analysis for Pascal software systems, an ... good reports and information starting from code analysis, they do not produce ... also point in this direction and have affirmed that âthe new ..... every module in a program. Such an ..... What mainly differentiates DATA-tool from OMEGA and.
,
II
I E E E T R A N S A C T I O N S O N S O F T W A R E E N G I N E E R I N G , V O L . 18, N O . 12, D E C E M B E R 1 9 9 2
1053
A L o g ic-B a s e d A p p r o a c h to R e verse E n g in e e rin g T o o ls P r o d u c tio n G e r a r d 0 Canfora, Aniello Cimitile, M e m b e r , IE E E , a n d U g o d e Carlini, M e m b e r , IE E E
Abstiact-This p a p e r analyzes difficulties arising in the u s e of d o c u m e n t s p r o d u c e d by R e v e r s e E n g i n e e r i n g tools. With reference to inter-modular data R o w analysis for P a s c a l software systems, a n interactive a n d evolutionary tool is proposed. T h e tool is b a s e d on: i) the production of inter-modular data R o w information by static analysis of code; ii) its representation in a P r o l o g p r o g r a m dictionary; iii) a P r o l o g abstractor that allows the specific queries to b e answered. Index T e r m s - C a s e tools, data flow analysis, logic p r o g r a m ming, maintenance, reverse engineering.
I. I N T R O D U C T I O N
0
V E R the last few years a l a r g e set of tools for software
m a i n t e n a n c e a n d r e - e n g i n e e r i n g h a s b e e n p r o p o s e d [ 11, [2]. W a c h t e l [3] points out that their utility lies in their capability to create a n d u p d a t e t h r e e basic types of documents: d o c u m e n t s of the overall system architecture; d o c u m e n t s o n the system data, specifically d a t a flow between modules; detailed analysis/design documents. Unfortunately several difficulties p r e v e n t these tools from b e i n g widely. Let u s outline t h r e e f u n d a m e n t a l difficulties. T h e first is related to the level of detail. T h e d o c u m e n t s a r e either too g e n e r a l o r too detailed. In the first c a s e the d o c u ments a r e useful only in the initial a p p r o a c h to u n d e r s t a n d i n g the system but they a r e i g n o r e d d u r i n g implementation. In the s e c o n d c a s e the s u p p o r t to the maintainer is p o o r b e c a u s e of the l a r g e n u m b e r of details. T o o v e r c o m e these d r a w b a c k s software tools with traceability b e t w e e n s u m m a r y a n d detailed d o c u m e n t s h a v e b e e n recently p r o p o s e d . T h e s e tools a r e characterized b y g r a p h i c a l representations a n d a g r a p h i c a l q u e r y system. Unfortunately, it is well k n o w n that with the p r e s e n t state of the art m a n y p r o b l e m s arise in the automatic layout a n d navigation t h r o u g h c o m p l e x g r a p h s [4]. This is especially true for the g r a p h s r e p r e s e n t i n g detailed information a n d relations that the maintainer n e e d s in the implementation p h a s e . This class of g r a p h s is poly-partite a n d poly-line (see, for e x a m p l e , the d e p e n d e n c e g r a p h s [5], [6], the W e b g r a p h s [7], the inter-procedural, intra-procedural a n d inter-modular data-flow g r a p h s , etc.). T h e existing layout algorithms a r e not efficient e v e n for the g r a p h i c a l representation of small l
l
l
Manuscript received M a y 3, 1 9 9 2 ; revised July 30, 1 9 9 2 . This work w a s supported by Progetto Finalizzato Sistemi Informatici e Calcolo P a r a l l e l 0 of the C N R (Italian National Research Council) u n d e r Grants 8 9 . 0 0 052.69, 9 0 . 0 0 705.69, 9 1 . 0 0 930.69. R e c o m m e n d e d by V. Rajlich. T h e authors a r e with the Dipartimento di Informatica e Sistemistica of the University of N a p l e s Federico II, Naples, Italy. IE E E L o g N u m b e r 9 2 0 4 0 9 2 .
p r o g r a m s . B r o o k s [B ] points out that “a s s o o n a s w e attempt to d i a g r a m software structure, w e find it constitutes not o n e but several g r a p h s s u p e r i m p o s e d o n e u p o n a n o t h e r . . . in spite of p r o g r e s s in restricting a n d simplifying the structure of software they r e m a i n essentially unvisualizable.” Recently o t h e r a u t h o r s h a v e m a d e the s a m e observation [9]. T h e a b o v e considerations s h o w that it is b o t h useful a n d important to e x p l o r e o t h e r a p p r o a c h e s that allow efficient r e s p o n s e s to specific maintainer requests. T h e s e c o n d difficulty arises from the fact that the r e v e r s e e n g i n e e r i n g ( R E ) d o c u m e n t s lack the d a t a the maintainer n e e d s . T h e high-level d o c u m e n t s a r e d e f i n e d b y w e l l - k n o w n a n a l ysis a n d d e s i g n m e t h o d o l o g i e s in the software d e v e l o p m e n t process. T h e low-level d o c u m e n t s exploit certain t e c h n i q u e s d e v e l o p e d for i m p l e m e n t i n g compilers a n d d e b u g g e r s . T h e y m a y b e precise a n d useful for software d e v e l o p m e n t , but they a r e less precise a n d less useful for m a i n t e n a n c e . This is also the o p i n i o n recently e x p r e s s e d b y Forte [lo] w h o says a b o u t the existing R E tools: “th e s e tools a r e closely related to the latest w o r k b e n c h e s a n d d e b u g g i n g tools, a n d in s o m e cases they a r e virtually indistinguishable.” T h e a b o v e is certainly true for R E d o c u m e n t s w h i c h r e p r e s e n t a p r o g r a m data-flow. It is also true b e c a u s e the analysis a n d identification of d a t a relations w h i c h a r e useful in m a i n t e n a n c e ( s e e for instance the relations related to the inter-procedural a n d inter-modular d a t a flow) [ll] a r e still a r e a s of research. W o r k i n g in a Software E n g i n e e r i n g L a b o r a t o r y for software m a i n t e n a n c e [12], the a u t h o r s directly e x p e r i e n c e d the lack of relevant d a t a a n d relations in R E d o c u m e n t s p r o d u c e d b y commercial tools a n d r e s e a r c h prototypes. A l t h o u g h R E tools (such a s CIA, S C O P E , LINT, L O G IS C O P E , V IA /INSIGHT, etc.) p r o d u c e g o o d reports a n d information starting from c o d e analysis, they d o not p r o d u c e information important for the maintainer. This p r o b l e m is well k n o w n in the literature a n d a very interesting discussion a b o u t the limits of the a b o v e tools is r e p o r t e d in P31.
T h e third difficulty is inflexibility of R E tools. T h e s e tools p r o d u c e a p r e d e f i n e d set of reports a n d in s o m e cases furnish a n s w e r s to a p r e d e f i n e d a n d fixed set of queries. For e x a m p l e , this is true for the p r e v i o u s tools a n d also tools discussed in [lo]. T h e s e tools a r e not evolutionary tools, i.e., they c a n n o t b e easily tailored a n d e n r i c h e d in the o p e r a t i n g environment. In fact, the capacity of a tool to evolve in a real-life m a i n t e n a n c e e n v i r o n m e n t is essential, a s u n d e r l i n e d b y M a y m o t o et al. [14] w h o h a v e e v e n b e e n a b l e to register p r o b l e m s with tools that a r e b a s e d o n a n Al a p p r o a c h , for e x a m p l e P U D S Y [15] a n d P A T [16]. T h e r e q u i r e m e n t of flexibility is also outlined b y
0 1 6 2 - 8 8 2 8 / 9 2 $ 0 3 . 0 0 1 9 9 2 IE E E
I I’I I
1054
IEEE TRANSACTIONS
Forte [lo] who says that the production of a new generation of tools must be pursued by constructing “the tools and methods so that users can modify them, but once altered the tools enforce the methods rigourously.” Basili and Musa [17] also point in this direction and have affirmed that “the new technology must evolve and adapt as we gain experience with its use and continually evaluate its successes and failures.” The three difficulties discussed above show that the maintainer needs to interact, especially in the implementation phase, with an environment that presents a view of software systems characterized by: a) a large set of interrelated facts that must be stored in some sort of data base; b) a query language that allows the extraction of a subset and correlation of different facts; c) an evolutionary set of general and summary rules that define answers to questions, starting from the facts actually stored. The logical programming paradigm offered by languages such as Prolog perfectly supports the development of this type of environment on the conceptual level. Unfortunately, while there has been wide use of logical programming in other areas of software engineering [18], one does not find the same amount of research in the maintenance sector. As examples from the literature, note logic-based environments for software development [21], [22], configuration management 1191, and testing [20]. As researchers in the “Sistemi lnformatici e Calcolo Parallelo” project, the biggest on going project in Italy supported by the Italian National Council of Research (CNR) on Information Technology, the authors are involved in research to apply logical programming to software engineering. In this context they have built up experience of design, implementation and use of an RE tool for maintenance. The goal of the tool is to reconstruct the inter-procedural data flow of a Pascal program, and to reply interactively to queries by the maintainer. It was decided to work upon inter-procedural data flow both because of the lack of traditional tools, and because of the potential high utility of new tools in this field. The decision to use Pascal is due to the availability of the large amount of existing software written in Pascal, and also because of the syntactic-semantic characteristics of Pascal. In particular, the visibility rules between data and program units offer wider experimentation opportunities than other traditional languages. The next section of this paper defines our philosophy. Sections III-V present the fundamental characteristics of DATA-tool, a prototype based on the use of Prolog. These sections also present some examples of solutions to critical problems. Finally, the conclusion addresses and discusses possible criticisms about performance, usability, and front-end user problems deriving from the use of Prolog in RE tools. II.
REVERSE
ENGINEERING,DATA
As said above, many of the RE nance produce documents that were engineering, are not evolutionary, and to produce data and relations that the
FLOW
AND
PROLOG
tools used in maintedesigned for forward are only partially able maintainer needs. The
ON SOFTWARE
ENGINEERING,
VOL.
18,N0.
12, DECEMBER
1992
tools producing structure charts are an example of this. In the past the authors have designed, implemented and experimented with tools for producing structure charts from code [23] and many existing tools on the market produce structure charts by RE (TeamworWFortran Rev, Teamwork/C Rev, Battle Map, AISLE, Super CASE, RE/Source, Tek CASE Designer, etc.). It is well known that one of the fundamental merits of structure charts as defined by Yourdon/Constantine, Weinberg et al. [24], [25] is the specification of inter-modular data flow in high level design. In the development process, the designer does not deal with typical data flow problems such as data visibility between modules, aliases caused by binding between actual and formal parameters, declaration and re-declaration of the same variable in different modules, etc. Therefore, data that is relevant for maintenance activities is not represented in a structure chart produced according to Structured Analysis while it should be shown in a structure chart produced by a RE tool for maintenance. W ith reference to the Pascal-like program’ in Fig. l(a), the related structure chart is shown in Fig. l(b) with the intermodular data flow according to the definitions and notations of the Yourdon/Constantine methodology. Clearly, for a maintainer the structure chart is not sufficiently precise and its contents are ambiguous. The variable z defined in module E is not the same variable x declared in module A and used in module D. In the implementation of a change updating z in D or in E the maintainer must know which module is declaring X. Another question concerns the relationships between module B and 2. It is evident that B is a transferring module for x and that the maintainer must know that the updating of x in D does not produce side effects in B. Fig. l(c) shows a modified Yourdon/Constantine structure chart which is helpful in solving the previous problems. It is clear in the above example that specific information about the inter-modular data flow is required to support the maintainer, and therefore this information must be represented in structure charts used for maintenance purposes. For instance, referring to the same example in Fig. l(a), more detailed and precise information useful to the maintainer is represented by the presence or absence of a pathological connection between modules C and D (does module D use the variable x defined by module C and declared in A?). Code analysis reveals that there is no such connection, while the structure chart in Fig. l(c) cannot answer the question. The previous considerations show the maintainer’s need for tools that support not only the production of general or detailed maintenance-oriented documents of the whole software system, but also provide answers to specific queries during the development of the maintenance process. t In all the examples the extent of the procedures will be denoted by a pair of horizontal lines joined by a vertical line; the words DEC, FORM, USE, and DEF introduce the lists of variables declared, formal, used and defined, respectively. This notation provides less distraction when attempting to produce the intermodular data-flow. It is useful to emphasize that for Pascal module we mean a function, a procedure, a sequence and/or a nesting of functions and/or procedures. Thus there are no conceptual differences between inter-modular and inter-procedural data-flow, and unless otherwise stated we will use either of the two terms for the same concept.
CANFORA
et al.; A LOGIC-BASED
APPROACH
In order to answer the types of queries listed above, intraprocedural, inter-procedural, and inter-modular data-flow is B needed. This cannot be completely represented on a structure chart. Indeed, any other documents that can collect and represent all such information would be so complex and difficult to navigate that a maintainer would not get sufficiently fast replies to his enquires. For example, diagrammatic representations use complex graphs with nodes which can represent several entities (procedures, modules, intraprocedural blocks, global data, local data, etc.) and edges that represent different relations (activation between procedures or modules, control links between blocks, definition, declaration and use relations between program units and data, data redeclaration, binding between formal and actual parameters, etc.). Given the complexity of required layout algorithms to visualize these graphs, they cannot be proposed as working tools for a maintainer in the implementation phase, or to answer his specific queries in real time. The definition and setting up of RE tools capable of interacting with the maintainer to answer queries and solve problems during the maintenance process first entails representing and collecting the information extracted from code in a language independent fashion, and then defining mechanisms and rules for both expressing the queries and getting answers from the collected information. W e propose a logic-based approach to the definition and development of such a tool. The proposal is based on Prolog and entails the use of i) a Prolog program dictionary to collect the information extracted from code; ii) Prolog queries to express the maintainer’s needs; and iii) Prolog production rules for getting answers to user queries starting from the facts collected in the program dictionary. According to this approach, therefore, the classification of RE tools as extractors and abstracters that has been proposed several times in the literature [28], [35] sees the former as being essentially Fig. 1. A sample Pascal-like program and related strucure charts. devoted to filling the Prolog program dictionary by means of code analysis, while the latter consist of sets of Prolog rules A tool capable of answering specific queries makes it that attempt to synthesize higher level information in order to possible to overcome the trade-off between completeness and answer the user’s queries. readability of the documents produced by RE, thus allowing In order to show the feasibility and usefulness of our the maintainer to select the set of information useful for the approach, in this paper we discuss the design and implespecific operation being carried out. This kind of tool should, mentation of a prototype environment for the interactive therefore, not only present the user with the information comprehension of the modular structure of a Pascal software recovered from code but, above all, link it all together and system. This prototype, named Data-tool, is the result of process it according to the specific requests of the user. W ith experience acquired in the implementation of tools supportreference to the inter-modular data flow, it is useful to have ing the production of high and low level design documents an interactive tool which both makes structure charts available from Pascal code [23], [26]-[28] and solves some of the at different levels of abstraction to select the desired level of problems concerning their use in maintenance. The envidetail, and also supports real time answers to user’s queries ronment consists of two subsystems. The first is an “EXabout TRACTOR,” which separately analyzes every program mod1) the names of modules declaring a global variable refer- ule and detects direct relations existing between modules and/or data. These direct relations are translated and collected enced in another module; in a Prolog program dictionary. The second subsystem is 2) the list of variables and related declaring modules bound an “ABSTRACTOR,” written in Prolog, that produces sumto a formal parameter referenced in a module; mary relations existing between modules and/or data, i.e., 3) the existence of pathological connections to a variable relations due to module activations, in order to satisfy the x referenced in a module; 4) the list of modules that reference the same variable for user’s queries on program modular architecture and data every variable referenced in a module, etc. flow. L
DEC x
--
II
II
1056
IEEE TRANSACTIONS
ON SOFTWARE
ENGINEERING,
The overall environment architecture is described in the next section, while the extractor is described in Section IV; Section V describes the abstractor. As it is not our aim to present a new RE tool but rather illustrate the fundamental aspects of the logic-based approach to produce RE tools, attention will be focused not on the technological features but on the methodological aspects. As far as the extractor is concerned, we will present and discuss the Prolog facts representing the direct relations, i.e., the structure of the Prolog program dictionary to be extracted from code. As regards the abstractor, on the other hand, we will show how some typical questions can be expressed using summary relations and then translated into Prolog production rules.
VOL.
18, NO.
12, D E C E M B E R
1992
PASCAL CODE
E X T R A C T 0 R
t PARSER 1 PROCESSOR J TRANSLATOR
DIRECT RELATIONS ABSTRACTOR RESULTS
III. DATA-TOOL:
AN OVERVIEW
DATA-tool is a prototype environment for the interactive comprehension of the modular structure of Pascal systems, i.e., module interconnections and related inter-modular data flow. It has been designed by the Department of Informatica e Sistemistica (DIS) of Naples University for the Software Engineering Laboratory (LIS) of CRIAI (Consorzio Campano per la Ricerca in Informatica ed Automazione Industriale), a research consortium for computer science and industrial automation. LIS is a laboratory for software maintenance and RE. From a theoretical point of view, DATA-tool is based on an evolution of the inter-procedural and intra-procedural data flow analysis developed in the field of compilers and their optimization [29], [30]. This theoretical approach to modular program architecture and data flow analysis in the RE field is reported in [ 111. This paper only gives an outline of the aim of inter-modular data flow analysis in order to discover and summarize the side effects due to module activations. In this way it is possible to understand the information flow in the system and, therefore, the semantic effects of module activations. Inter-modular data flow analysis aims to derive summary relations [31], [32] existing between data belonging to different program modules, starting from a set of basic relations existing between data belonging to the same module and called direct relations. Summary relations are capable of emphasizing the side effects due to module activations while direct relations may be easily obtained from the code by separately analyzing every module in a program. Such an approach to inter-modular data flow analysis has allowed us to design and implement DATA-tool as a set of two subsystems; the first is a static analyzer capable of producing direct relations and the second abstracts summary relations starting from direct relations. The overall architecture of the prototype is shown in Fig. 2, which illustrates the two major subsystems of DATA-tool, i.e., the extractor subsystem and the abstractor subsystem: a) the extractor subsystem produces the direct relations needed to rebuild the program modular structure and inter-modular data flow and then translates and collects them in a Prolog program dictionary; b) the abstractor subsystem receives user queries, performs the abstraction process on the collected direct relations and presents the results of the program modular structure and inter-modular data flow analysis.
-
Fig.
2.
The DATA-Tool
structure.
In the following pages we will analyze both the extractor and the abstractor in greater detail. In particular, we will describe the program dictionary collecting the direct relations produced by code and we will also deal with the abstraction process using production rules. IV.
THE
EXTRACTOR
SUBSYSTEM
This subsystem consists of three main parts: a PARSER, a PROCESSOR, and a TRANSLATOR. The PARSER accepts a Pascal source file as input and produces a parse tree; the PROCESSOR processes the parse tree and produces a data structure representing the direct relations on which the abstraction process is founded. Finally, the TRANSLATOR translates the data structure produced by the PROCESSOR into a set of Prolog rules. Fig. 3 shows a simplified version of the data structure produced by the PROCESSOR. The MODULE TABLE contains the names of the modules in a Pascal software system, and for each module contains the following lists: -FORMAL PARAMETER LIST of its formal parameters, identified by name and position;* -DECLARED MODULE LIST of the modules it declares; -DECLARED VARIABLE LIST of the variables it declares; -USED VARIABLE LIST of the variables it uses; -DEFINED VARIABLE LIST of the variables whose values it defines; -CALLED MODULE LIST of the modules it directly calls. Called module list for each called module contains: -ACTUAL PARAMETER LIST of the actual parameters, identified by name and position. The MODULE TABLE summarizes several direct relations. As PP and VV denotes the sets of modules and variables in a system, and N denoting the set of integers, the MODULE TABLE contains the following relations: rl) par-dec:PPxVVxN defined as: (m,~,i) E par-dec iff module m declares 2 as the formal parameter in the ith position. ‘To simplify matter, in this paper we do not consider, the type of the data and the kind of parameter exchange. This information is naturally produced and used by DATA-tool.
CANFORA et al.: A LOGIC-BASED
APPROACH
1057
TABLE I AN EXAMPLE OF PROGRAM DICTIONARY
DECLARED
MODULE
-9
DECLARED
VARIABLE
LIST -->
USED
VARIABLE
mod(b). var-dec (b, ul par-dec (b. (x, 1) ) par-dectb, (z,Z) I. use tb, x) “se tb, 2) use(b.1). use(b,u).
mod (main) var-dec (main,xl var-dec 1main.y) mod-dec (main, a) mod-dec (main,dl call (main,a) bind(main,a, (x.1)). bind(main,a, (y,Zll call (main,d) u5e (main, x) use (main. y) .
LIST
mod(c). var-dec (c. pl par-dec Cc. 1% 1) ) call Cc,b) bindcc, b, (q, 11 1 bind(c. b. Ip. 1) I bind(c,b, 1x.2)). bind(c,b, (t.21). use(c,q)
mod(a). var-dec (a, 1, var-dec Ca,m, par-dec (a, Ct. 1) ) par-d% (a, (z,2) ) mod decta,b). mod-dec (a.=,
LIST -->
califa,
MODULETABLE
mod Id) var-dec Cd, r) var-dec Id, s) call Cd,a) bind(d,a, (r,l) 1. bind(d,a, (~~111. bind(d.a, (y,2)1. use Id, X,
bl
bind(a,b, bind(a,b, cauta,c, bind(a,c, bind(a,c,
(t, 1)). (m,2)). Cm,1)). Cl, 1)).
I
-
Fig. 3. The Module
Table structure.
r2) mod-dec:PPxPP defined as:(ml,m2) E mod-dec iff module ml declares module m2. r3) var-dec:PPxVV defined as: (m,z) E var-dec iff module m declares variable 5. r4) use: PPxVV defined as:(m,z) E use iff module m uses variable 2. r5) define: PPxVV defined as: (m,z) E define iff module m defines the value of variable 2. r6) call: PPxPP defined as: (ml,m2) E call iff module m,l directly actives module m2. r7) bind: PPxPPxVVxN defined as: (ml,m2,z,i) E bind iff (ml,m2) E call and ml defines x as an actual parameter in the ith position when calling m2. The TRANSLATOR extrapolates these relations from the MODULE TABLE and represents every instance of each of them with a Prolog rule. For every module m E PP, the TRANSLATOR specifically produces the following. i) one rule mod(m). which defines m as a module; ii) as many rules par-dec(m,(parameter-name,parametergosition)). as there are formal parameters which m declares; iii) as many rules var-dec(m,variable-name). use(m,variable-name). define (m,variable-name). as there are variables which m respectively declares, uses and defines; iv) as many rules mod~dec(m~,module~name). call(m,module-name)
I1 I
EC X,Y A FORMt,z DEC 1,m B FORMx,z DEC ”
-
i
USE X,2,1,”
c FORMq DEC q “SE q Elcq+p, Xft) ‘I
A
a 1r.m) Ctl+m,
D DEC r,s “SE x A(l3S.Y) i USE
x,y
A(X,Y) D -
Fig. 4. A sample Pascal-like
program.
as there are modules which m respectively declares and activates. Moreover, for every m E 1PP such that (m,ml) E call the TRANSLATOR produces a set of rules bind(m,
ml, (variable
,ame, position))
each of which affirms that module m uses the variable vurito define the actual parameter associated to the positionth formal parameter of module ml. Table I shows the set of rules related to the sample program in Fig. 4.
able-name
__
1058
IEEE TRANSACTIONS
V. THE ABSTRACTOR
ON SOFI-WARE
ENGINEERING,
VOL.
18, NO. 12, DECEMBER
1992
to the possibility for more than one variable to have the same actual name. In all these cases a data item may be fully identiThis subsystem is written in Prolog. It consists of a set of production rules that attempt to prove relations between fied by its actual name and the name of the module declaring it. Now that the characteristics of the inter-modular data flow objects (modules and/or data) on the basis of user queries. to be reconstructed have been discussed, we can deal with the Consequently, the abstractor is not a finished tool but a kernel on which the user may build a personal set of relations by up-in and up-out production rules. Hereafter, we shall only refer to up-in production, because adding new production rules to the system or relating the the construction of up-out is analogous and indeed is made existing ones in a different way. At present, it is defined and implemented to answer all the questions in points l), 2), simpler by the lack of effects due to parameter exchange by value (their definition in a module does not produce aliasing 3), and 4) of Section II and various other topical problems. In this section we only show and discuss the design and effects in the activating modules or in their ancestors). One of the first problems is how to establish, on the basis of implementation of those production rules which allow us to answer more complicated questions on inter-modular data the declarative nesting of procedures and functions, whether or flow. As reported in [ll], the problem of inter-modular data not a data item x declared in a module ml, both as a variable and a formal parameter, is visible in a module m2. flow production for a pair of calling/called modules is solved Let us introduce the following two relations: by the production of i) the set up-in of actual variables used r8) mod-dec-scope: PPxPP defined as the transitive closure in the called module or in its subordinates and declared by of mod dec: the calling module or its ancestors; ii) the set up-out of actual (ml,m2)Emod_dec_scope iff module ml declares directly variables defined in the called module or in its subordinates module m 2 OR m 2 is declared in a module rn,i such that and declared by the calling module or its ancestors. (ml,mi)~mod-dec-scope (ml declares indirectly m,2). The considerations made in Section II show how the funr9) var-orgar-dec: PPxVV defined as: damental problem of inter-modular data flow analysis for (m, .X)E var-orgar-dec iff maintenance purposes regards the quality of information pro(m,. z)Evar-dec OR an iE exists such that (m, 2, i)Epar-dec. duced by RE, i.e., the degree of accuracy and completeness of both the direct relations produced by code and the summary Relations r8 and r9 allow the solution of the data visibility relations abstracted. problem. Infact, according to Pascal visibility rules, we can The correct design of a change and the detection of its side say that: i) a data item u: is visible in the module m which effects require the reconstructed inter-modular data flow to be: declares it both as a variable and a formal parameter; ii) a data item :I: declared in a module m2, either as a variable or i) TOTAL, i.e., all the module links due to data definitions as a formal parameter, is visible in module ml if ml does not and uses must be recognized and shown; redeclare 2; moreover, there are no modules declared directly ii) ACTUALIZED, i.e., every data item in the inter-modular data flow must not be represented by a formal name but or indirectly in m2, which directly or indirectly declare ml, and which redeclare 5. by the names of the actual parameters bound to it; Points i) and ii) may be formalized as follows. iii) QUALIFIED, i.e., every data item in the flow must be fully identified, for example by qualifying the name of r10) the data item :I’ declared in m,2 is visible in ml the variable with the name of the module declaring it. iff (m,2=ml) OR ((m,l,z)E var-orgar-dec AND (mi, X)E var-orgar-dec) for all m; such that: Requirement i) means that links produced by global vari(m2,7rb,) E mod-dec-scope AND (mi, ml)E mod-decables or parameter exchanges must be detected. Moreover, scope. “indirect” connections due to pairs of modules between which there are no call relationships must also be recognized and Table II shows the Prolog program derived from r8, shown. As regards requirement ii), it must be pointed out r9, and r10, and some examples of queries related to the that parameter actualization is a well-known and intensively sample program in Fig. 4. Let us outline the queries Ql: studied topic. The literature in the field of compilers and code visible(M,(z,main)). Q2: visible(c,V). optimization has tackled parameter actualization problems sevQl returns the modules in which the variable z declared eral times and a large number of solutions have been produced. in MAIN is visible (in this set there is no module B since it Nevertheless, the solutions proposed in the compiler field are redeclares :I: as a formal parameter). Q2 returns the data visible less precise and less useful in the RE sector. The problem of in module C, each item of which is identified by its name and variable actualization in code optimization concerns the need by the name of the declaring module. Having identified the data visible in a module m, it must to establish, in the presence of a procedure or function call, whether or not the value of a variable changes as a result of now be actualized. Since we must reconstruct the actual data flow of m [27], having a variable 2 visible in m and declared this call. The consequence of this is that only the forward in a module ml as a formal parameter, we must replace x with actualization of a variable is performed (see, for example, the well-known Banning equation [33] and its application to the actual parameters bound to it. In this actualization we must Pascal compiler optimization [34]), whereas we also need to take into account the path of the module calls from MAIN to backward actualize the variables referenced in a module in the instance of the m we are considering. In fact, the same order to fully support the comprehension of the impact of a variable 5, visible in m, may be actualized in different ways change. Finally, requirement iii) solves the ambiguity related according to the call paths from MAIN to m. SUBSYSTEM
CANFORA
ef al.: A LOGIC-BASED
APPROACH
1059
TABLE II SUMMARY RELATIONS AND EXAMPLES OF QUERIES RELATED TO THE VISIBILITY PROBLEM
TABLE III SUMMARYRELATIONS AND EXAMPLESOF QUERIES RELATED TO THE ACTUALIZATION
SUMMARY: mod_dec_scope(Ml,MZ):mod_dec Wl,MZ, . mod-dec-scope(Ml,MZ):mod_dec(Ml,Mi), mod-dec-scope(Mi,MZ) var-orgar-dec (M,“cG) :var dec W,“oP, var-or>ar-dec CM,“&‘) :par dec PI, WoP,Pos)) visibl;(M, (V0P.M)) :mod 04) , var-orgar-dec (“,“oPl visible(Ml, (VoP,MZ)) :var-orgar-dec U42,“0P~, mod_dec_scope(MZ,Ml), not , var_o=gar_dec(Ml,VoPl; mod_dec_scope(M2,Mi), mod-dec-scope (Mi,Ml) , var-orgar-dec lMi,VoP)
)
EXAMPLES: ?- “areorgar-dec ” = u ->: ” ”
= =
x z
Cb,“,
->; ->;
no
?- mod dec scope(a,M) ), = ,, 1,; ”
=
c
->;
no
?- visiblecd,
“0
?- visible@& M = main ->;
?-
acti”emal”,c,
Yes ?- activehnain,c. no ?- acti”emain,b,PL P = Imin,a,b, P - Imai”,?.,C,bl P - Lmain,d,a,bl P - [mai”.d,a,c,bl no
hml”, ,mai”.
a.51,. &Cl,.
->; ->, ->; -x;
Il.?.)). (x,mainll
by the related call path P; a variable 2 declared in a module m 2 belongs to the set of variables actualizing the ~1 referenced in ml if there is a module m 4 and a data item 22 so that: 1) m 4 directly calls m 3 AND; 2) given pl and p2 the call paths from MAIN to m 4 and from m 3 to ml, respectively, the path P is the concatenation of pl and p2 AND; 3) m 4 uses 22 to define the actual parameter associated with ~1 AND; 4) x belongs to the set of variables actualizing x2 over the path pl. W e should consider module B of the sample program in Fig. Table III shows the Prolog program derived from the 4 as an example. There are four call paths from MAIN to B: considerations in points i) and ii) and some examples of PHl: MAIN,A,B; queries related to the sample program in Fig 4. The reader can PH2: MAIN,A,C,B; again find in these examples the actualization of data items 2 PH3: MAIN,D,A,B; and t, both visible in B, over the previously defined call paths PH4: MAIN,D,A,C,B. PHl, PH2, PH3, and PH4. The data item z is visible in B because it is declared in it W e shall discuss the following queries as significant examas a formal parameter, and is actualized by the following sets ples: of variables, each identified by its name and by the name of Q3: actualize (b, 5, VJmain, a, b]). its declaring module: Q4: actualize (c, 4, V, P). {(x,MAIN)}, over the path PHl; Q5: actualize (b, 2, (V, a),[main,a,c,b]). {(Z,A),(m,A),@, C)}, over the paths PH2 and PH4; Q3 returns the variables that actualize the formal parameter over the path PH3. x of B when considering the instance of B identified by the call Likewise, the data t, visible in B because it is declared as path (MAIN,A,B). Q4 returns the variables which actualize the a formal parameter in the module A declaring B, is actualized formal parameter 4 of C for all the instances of C. Q5 returns by the sets the variables declared in A which belong to the set of variables that actualize the formal parameter x of B, if we consider the {(x,MAIN)}, over the paths PHl and PH2; instance of B identified by the call path (MAIN,A,C,B). over the paths PH3 and PH4. Finally, note that both queries According to Pascal visibility rules, we can now affirm that: Q6: actualize(d, T, (T, d),[main,d]). i) a data item z declared as a variable in a module m 2 and Q7: actualize(c, z,(z,main),[main,a, cl). visible in a module ml is itself the actualization of the z referenced in ml for all the instances of this module; ii) let return the result YES; the first one does this because D declares a data item ~1 be declared as a formal parameter in m 3 and T as a variable, the latter because x (declared as a variable in visible in ml, and let an instance of module m,l be identified MAIN) is visible in C. M
=
a
->;
M = d ->; M = c ->; “0 ?- visible(c,V) ” = p,c ->; v = q,c ->: V = x,main ->; V = y,main ->; " = 1,a ->; v = m,a ->; " = t,a ->; " = z,a ->:
l
l
l
l
l
l
l
{CT,
n
(s,
017
{CT,
a
(5
QlY
l
l
ll-r-
II .
1060
IEEE TRANSACTIONS
ON SOFTWARE
ENGINEERING,
TABLE IV SUMMARY RELATIONS AND EXAMPLES OF QUERIES RELATED TO THE UP-IN PRODU~ION PROBLEM
1 MAIN
VOL.
18, NO. 12, DECEMBER
1992
1
(LA) (vV (~0 (r,D) (SD) (&Man)
Fig. 5. Up-in
W e can now solve the problem of up-in production by observing that let an instance of module ml be identified by the related call path P let a variable 2 be declared in a module m2 which is an ancestor of ml the variable x belongs to the up-in set of m,l if and only if: 5) ml uses a data item ~1 and x belongs to the set of variables which actualize ~1 over P OR 6) ml calls a module m 3 and z belongs to the up-in set of m3. Table IV shows the Prolog program for up-in production and some example queries for the program in Fig. 4. W e outline the queries Q8: up-in((b,[main,a, c, b]),V). Q9: up-in((c,[main,d, n, cJ),(V,d)). For the instance of B identified by the call path (MAIN,A,C,B), QS returns the variables belonging to the set up-in, each identified by its name and by the name of its declaring module. For the instance of C identified by the call path (MAIN,D,A,C), Q9 returns the variables belonging to the set up-in and declared ih D. Finally, the query QlO: in-flow(Mod,Up-set). returns the sets up-in for all the modules in the system (see Fig. 5). It is worthwhile to stress that the data-flow analysis and the summary relations presented in this section are mainly focused on the relationships existing among the modules in a software system. Such analysis does not deal with the relationships existing among the components of a given module, since every module is viewed as an atomic item of the system. This means, for example, that if in a module M there are two dynamic paths, Pl and P2, and if a variable xl is only defined on Pl while a variable x2 is only defined on P2, both variables will be considered defined by M. Similarly, the effects caused
sets for the sample program.
l
l
l
Fig. 6. An example of inaccuracy.
by chains of assignment statements belonging to the same dynamic path are not detected. Neither are the killed definitions [23] as the detection of such phenomena requires a thorough analysis of the intra-modular relationships. The consequence is that the inter-modular data flow produced may be incomplete or even inaccurate in some cases. An example of inaccuracy is shown by the sample program in Fig. 6. The query Qll: actualize (a, z, (x, main), [main, b, a I). returns the result YES because the variable x declared in main actualize the formal parameter z of a. Similarly, the query Q12: actualize (a, Z, V,[main, b, a]). returns the result V=(X, main). However, if the code in procedure B has no branches and if the statement defining q is q:=constant, then the value of x bound to q ‘does not reach the call to A. In this case the answers returned by the system for the queries Qll and Q12 are inaccurate. This is because the actualization process we defined is based on inter-modular relationships (especially the calls and the data visibility and parameter exchanges rules) and does not take into account the intra-modular relationships. Detecting the above inaccuracy requires an analysis of the intra-modular relationships and in particular the data definition reachability, i.e., the range in which the definition of a data item is live.
I II I
CANFORA
et al.: A LOGIC-BASED
APPROACH
1061
Such a set of direct relations has beeq c@fined by the authors and the extractor subsystem of DATq_tool is at present being enhanced to produce them. The definition of these direct relations is based on the language independent algebraic representation of the control flow of a module proposed in [37]. An enhanced version of the algebric representation has been defined. This version is able to capture and represent the intramodular data flow information as well as the control flow. The rules for translating the intra-modular relationships captured by the algebric representation have then been defined and a Prolog program dictionary to collect them has been designed.
VI.
WORKS
AND
CONCLUDING
REMARKS
Maintenance of a software system requires the availability of information whose type and level of abstraction are closely related to the specific operation to be carried out. Consequently, the documents produced by RE from static code analysis are typically too general or toq detailed for the maintainer’s needs and rarely provide an immediate and precise answer to his queries. Tliis problem may, in the authors’ opinion, be overcome’through the use of interactive and evolutionary tools. This paper has presented a logic-based approach for the design and implementation of such tools. The approach has been employed in developing DATA-tool, which is a prototype tool for data flow analysis in the Pascal environment. DATA-tool is currently being used at LIS, a software engineering laboratory for program maintenance, and is used in the part of the laboratory involved in the development of RE processes. Even though DATA-tool works on Pascal code and answers queries on the system data flow, its architecture is general and could be used to build other tools to answer queries in other fields. The choice to produce a tool capable of handling the data flow is due to the fact that on the one hand its reconstruction poses interesting theoretical problems and, on the other, the knowledge of the data flow is essential in all maintenance operations which are not simply corrective. The decision to implement DATA-tool for Pascal is justified by the fact that itr is widely used in the scientific community. However, a tool similar to DATA-tool has been produced in our Department for the FORTRAN-77 language. This tool is currently being tested. DATA-tool may, on the basis of the problems it deals with, be compared with similar tools already presenteg in the literature. The first of these is OMEGA [38J, a system for defining, retrieving, and updating certain views (configurations, versions, call graphs, and slices) of Pascal-like programs. OMEGA uses 58 relations to store program models in a relational data base (INGRES). While OMEGA deals with a large variety of problems, it does not solve the interprocedural ones we solve using DATA-tool. The fundamental lesson OMEGA has taught us is the need for a separation between the Extractor tool and the Abstractor tool to improve system performance; this separation (naturally) also implies the separation between data base and source, a solution already adopted by CIA [39].
=4 *I
(b) Fig. 7. An example of precision
RELATED
loss.
Choosing to consider a module as an atomic item of the system may also entail a loss of precision in the actualization process. This, in turn, acts as a reason why the data flow produced is coarse in some cases. As an example, Fig. 7 shows two sample programs which are semantically equivalent but coded with two different styles. Despite being equivalent, the query 1) Q13: actualize (n, 5, V, [main, b, a]). returns V=(U, main) for the program in Fig. 7(a) and V=(U), b) for the one in Fig. 7(b). This is because the direct relations we have defined capture and record the definitions and uses of a data item but do not capture the links existing among the definitions and uses of data. In particular the direct relations do not capture the link existing between u and x in the program in Fig. 7(b) because the statement defining w in the body of module B is not analyzed to see what variables are used. In order to overcome such inaccuracies it is necessary to change the grain of the direct relations. A set of more fine-grained direct relations able to capture and represent the intra-modular relationships must be defined. For example, direct relations must be defined in order to represent the different types of primes [36] in the control flow of a module, the sequence and/or nesting relationships existing among these primes, the set of data items defined and/or used by a prime, the set of variables used to define a data item, etc.
11
-
1062
IEEE TRANSACTIONS
In order to improve performance, one of the fundamental lessons taught by CIA was the need for adopting a Conceptual Model, and idnetifyinh the minimum set of relations which can represent a program at a selected level of abstraction. The level adopted in CIA (only global objects) does not allow the solution of the data-flow problems that DATA-tool solves. In order to solve this problem, the conceptual model cannot just be complete (Total) but must also be Actualized and Qualified. On the abstraction level, DATA-tool is located in a position between the Conceptual Model of CIA and the detailed level of OMEGA. This does not mean that the performance is analogous to OMEGA. The Prolog program dictionary collecting the direct relations has also been implemented in a relational data base using only 10 relations (in comparison to the 58 relations of OMEGA which is the major cause of its low performance). What mainly differentiates DATA-tool from OMEGA and CIA is not the performance, but the level of its flexibility and completeness. A relational data base can be queried using relational algebra. Unfortunately the limited power of relational operators restricts the range of analysis and abstractions that may be performed [41]. This is the reason why we decided to use Prolog for querying the parse tree. Apart from OMEGA and CIA, which constitute milestones in the field of RE tools, several other tools have been proposed to help maintainers analyze the interdependencies among the program components. Other authors have expressed the same criticism as we have, i.e., it is very hard for maintainers to navigate their way through the data and it is very easy for them to lose track of what they are doing with these tools. There is a tool analogous to ours designed by Dietrich and Callis [40]. They propose the use of deductive data base founded on the integration of relational data bases and logical programming via Prolog. Their work is oriented toward analyzing the important problem of inter-modular relations, where modules are “package-like” software units available in languages such as ADA and MODULA2, and in object oriented languages. Their work does not deal with the problems of inter-procedural data-flow. These problems are prevalent in the most diffused languages that one meets in the maintenance field. Software systems written using these languages intend a module as being, in general, a set of related program units and consequentely, the problems of inter-modular data-flow are analogous to those of inter-procedural data-flow. Another piece of work that assumes an approach similar to ours is that of by Consens et al. [41]. They propose representing the structural design information in a Prolog data base. These authors limit themselves to represent dependency relationships among modules, and stress the importance of GRAPHLOG, a graphical query language for querying the Prolog data base. In particular, the authors show how GRAPHLOG queries can be used to identify and remove cyclic control dependencies. This work is highly relevant for us in that it concerns the utility and efficiency of graphical frontends based on a Prolog data base. W e believe that in this way it will be possible to solve the problems outlined previously about the complexity of graphical front-ends for representing and navigating through inter-procedural and intra-procedural data-flow.
ON SOFTWARE
ENGINEERING,
VOL.
18, NO. 12, DECEMBER
1992
DATA-tool is not a commercial tool, but a research prototype presently being experimented and evolved in an applied research environment. The main aim of this experiment is to introduce the use of Prolog for prototyping solutions for some fundamental RE problems. In particular we are interested in experimenting with the use of a rule-based approach to evolutionary tools. Therefore, the usability of DATA-tool must be regarded as giving a software engineer the possibility to quickly and efficiently implement an enrichment of the range of analysis and abstractions that DATA-tool performs. The results obtained are very encouraging: DATA-tool is easily and incrementally extensible by adding new summary relations. The experience gained using DATA-tool in a RE activity to support a Reuse-Reengineering process developed on about 200 000 lines of Pascal code is particularly interesting. The fundamental task in this experiment was the selection of procedures, functions, types, and data to re-engineer and cluster into Turbopascal Unit modules [42]. The need for queries to support the search for software components which implement functional and data abstraction produced a large set of new Prolog summary relations that were easily added to the existing kernel of DATA-tool and then reused. W e tested to see whether Prolog is a labor saving tool when prototyping the typical nonnumerical algorithms which arise in RE abstractions. Another result that should not be overlooked is that the experimentation of DATA-tool has encouraged all the people involved to describe the maintenance problems in a logical manner in order to capture, understand, select, and then formalize them more easily. W e do not propose the Prolog interface for practicing maintainers. W e would like to provide them with a preprogrammed set of relations. The new summary relations that stem from the needs of practicing maintainers must be evaluated and, if they are useful, must be implemented by a Maintenance Engineer in such a way that the practicing maintainer may select and istantiate them. This approach does not exclude the possibility to enhance DATA-tool with a more effective front-end which could also offer the maintainer the possibility to directly write certain classes of queries in a simpler notation. The experimentation that was carried out has only dealt with the problem of creating a support for the maintainer for data-flow knowledge and has only been developed for small and medium software systems. The authors are aware of the performance problems arising in rule-based systems when the rule set becomes large. Introducing the use of rulebased tools on a larger scale requires these problems to be overcome. In order to achieve this the authors are following with great interest the scientific and technological development for the integration between relational data base and Prolog query systems [21], [43], [44].
ACKNOWLEDGMENT
The authors would like to thank H. Sneed for his invaluable help in this paper, as well as B. Fadini for his guidance throughout their research. They also thank the referees for their suggestions in improving this paper.
CANFORA
et al.: A LOGIC-BASED
APPROACH
REFERENCES
PI H. B. Holbrook and S. M. Thebaut, “A survey of software maintenance
tools that enhance program understanding,” SERC-TR-9-F, Software Engineering Research Center, Univ. Florida/Purdue Univ., 1987. “Special issue on case tools for reverse engineering,” Case -> ouflook. vol. 2. PD. l-15, 1988. B. Wachtel, “Cask’considered tricky,” Soft. Maint. News, vol. 7, 1989. G. Canfora and F. Vargiu, “Reverse-engineering and visual environments: The VAPS proje&” in Proc. Workshop on-Reverse Eng., CUEN, pp. 141-175, 1991. (51 J. Ferrante, K. Ottenstein, and J.Warren, “The program dependence graph and its use in optimization,” ACM Trans. Progr. Lang. and Sysr., vol. 9, pp. 319-349, 1987. [61 S. Horwitz and T. Reps, “The use of program dependence graphs in software engineering,” in Proc. 14rh Int. ConJ Software Eng., pp. 392-411, 1992. M. Napoli, and G. Tortora, “W e b stuctures: A [71 A. Maggiolo-Schettini, tool for representing and manipulating programs,” IEEE Trans. Soffware Eng., vol. SE-14, pp. 1597-1609, 1988. PI F. P. Brooks, “No silver bullet: Essens and accident of software engineering,” IEEE Compufer, vol. 20, pp, 10-19, 1987. [91 B. Devambu, R. J. Brachman, P. G. Selfridge, and B.W. Ballard, “LASSIE: A knowledge-based software information system,” Commun. ACM, vol. 34, pp. 35-49, 1991. [lOI G. Forte, “Tools fair: Out of the lab, onto the shelf,” IEEE Software, vol. 9, pp. 7&79, 1992. 1111 G. Canfora and A. Cimitile, “Reverse engineering and intermodular data flow: A theoretical approach,” J. Soft. Maintenace: Res. and Pratt., vol. 4, pp. 37-59, 1992. W G. Canfora and A. Cimitile, “ LIS : A software engineering laboratory for software maintenance,” in Proc. Software Eng. Symp. 1991, 1991. customizable, language-and front-end [l31 P. T. Devanbu, “GENOA-A independent code analyzer, ” in Proc. 14th Int. Conj on Sofbvare Eng., pp. 307-319, 1992. 1141 H. Huang, K. Sugihara, and I. Miyamoto, “A rule based tool for reverse engineering from source code to graphical model,” in Proc. 4th Int. Conf on Software Eng. and Knowledge Eng., pp. 178-185, 1992. 1151F. J. LuKey, “Understanding and debbugging programs,” Int. J. ManMachine Studies, vol. 12, pp. 189-202, 1980. [I61 M. T. Harandi and J. Q. Ning, “Knowledge based program analysis,” lEEE Software, vol. 7, pp. 74-80, 1990. [l71 V. R. Basili and J. D. Musa, “The future of software: A management perspective,” IEEE Computer, vol. 24, pp. 74-80, 1991. WY P. Ciancarini and G. Levi, “ What is logic programming good for software engineering?” Keynote Speech, 4th Int. Conf. on Software Eng. and Knowledge Eng., 1992. P91 J. R. Callahan and J. M. Purtilo, “A packaging system for heterogeneous execution environments,” IEEE Trans. Softwure Eng., vol. 17, pp. 626-635, 1991. PO1 V. DeLeo, M. Napoli, G. Nota, and G. Tortora, “Testing programs by queries,” in Proc. 3rd Int. Con) on Software Eng. and Knowledge Eng., pp. 241-248, 1991. Pll B. Peuschel and W. Schafer, “Concept and implementation of a rule based process engine,” in Proc. 14th Int. Conj on Software Eng., pp. 262-277, 1992. PI V. Ambriola, P. Ciancarini, and C. Montangero, “Software process enactment in OIKOS,” in ACM SIGSOFT Symp. on Practical Software Development Environment, vol. 15, pp. 1831142, 1990. [231 P. Benedusi, A. Cimitile, and U. De Carlini, “Reverse engineering process, design recovery and structure charts,” J. Sysf. Softkzre, voc 16, 1992. Englewood [241 E. Yourdon and L. L. Constantine, Structured Design. Cliffs, NJ: Prentice-Hall, 1979. Englewood Cliffs, NJ: PrenticePI V. Weinberg, Structured Analysis. Hall, 1980. P. Benedusi, A. Cimitile, and U. De Carlini, “A reverse engineering WI methodology to reconstruct hierarchical data Row diagrams for software maintenance,” in Proc. Con! on Sofiware Maintenance 1989, pp. 180-189, 1989. v71 A. Cimitile, G. Di Lucca, and P. Maresca, “Maintenance and intermodular dependencies in Pascal environment,” in Proc. Con/Y on Software Maintenance, pp. 72-83, 1990. PI P. Antonini, P. Benedusi, G. Cantone, and A. Cimitile, “Maintenance and reverse engineering: Low level design documents production and improvment,” in Proc. Conf: on SofhYare Maintenance, pp. 91-100, 1987. New York: [291 M. S. Hecht, Flow Analysis of Computer Programs. Elsevier North Holland, 1977.
1063
[30] A. V. Aho and J. D. Ullmann, The Theory of Parsing, Translation and Compiling, Vol. 2. Englewood Cliffs, NJ: Prentice-Hall, 1973. (311 J. M. Barth, “A practical interprocedural data-flow analysis algorithm,” Commun. ACM, vol. 21, pp. 724-736, 1978. 132) K. D. Cooper and K. Kennedy, “Efficient computation of flow insensitive interprocedural summary information,” ACM SIGPLAN Symp. on Compiler Construction, vol. 19, pp. 247-258, 1984. (331 J. P. Banning, “An efficient way to find the side effects of procedure calls and the aliases of variables,” in Proc. 6th POPL Co@, ACM, pp. 724-736, 1979. [34] J. Richardson and M. Ganapathv, “Interprocedural optimization: Experimental results,” Software kraci. Exper.; vol. 9, pp. i49-169, 1989: [351 E. J. Chikofsky and J. H. Cross, II, “Reverse engineering and design recovery: A taxonomy,” IEEE Soffware, vol. 7, pp. 13-17, 1990. I361 P. A. Hausler, M. G. Pleszkoch, R. C. Linger. and A. R. Hevner, “Using function abstraction to understand program behavior,” IEEE Software, vol. 7, pp. 55-63, 1990. [371 A. Cimitile and U. De Carlini, “Reverse engineering: Algorithms for program graph production,” Software-Pratt. and Exper., vol. 21, pp. 519-537, 1991. M. A. Linton, “Implementing relational views of programs,” in Proc. [381 ofACM SIGSOFTISIGPLAN Software Enn. Symp. on Practical Sofiware development Environment, pp.‘132-140,-19b4: I391 Y.F. Chen. M.Y. Nishimoto, and C.V. Ramamoorthv, “The C information abstractor system,” IEEE Trans. Software Eng., vol. 16, pp. 325-334, 1990. (401 S. W. Dietrich and F. W. Callis, “A conceptual design for a code analysis knowledge base,“J. Soft. Mainrenace: Res. and Pratt., vol. 4, pp. 19-36, 1992. and [411 M. Consens, A. Mendelzon, and A. G. Ryman, “Visualizing querying software structures, ” in Proc. Ilth Int. Conf on Somare Eng., pp. 138-157, 1992. [421 A. Cimitile, “Toward reuse reengineering of old software,” in Proc. 4th Int. Conf on Software Eng.and Knowledge Eng., pp. 14&149, 1992. [431 S. Ceri, G. Gottlob, and G. Wiederhold, “Efficient database access from Prolog,” IEEE Trans. Software Eng., vol. SE-15, pp. 153-164, 1989. “Designing 1441 F. Cacace, S. Ceri, L. Tanca, and S. Crespi-Reghizzi, and prototyping data-intensive applications in the Logres and Algres programming environment,” IEEE Trans. Software Eng., vol. 18, pp. 534-546, 1992.
Gerard0 Canfora received the Laurea degree in electronic engineering from the University of Naples, Italy, in 1989. He is an associate researcher in computer science at the Department of Informatica e Sistemistica at the University of Naples. He is currently a Visiting Researcher at the Centre of Software Maintenance of the University of Durham, England. His research interests include software maintenance, reverse engineering, and reuse re-engineering.
Ugo de Carlini received the Laurea degree in electronic engineering from the University of Naples, Italy, in 1971. Since 1971 he has been with the Department of Informatica e Sistemistica of the University of Naples. Since 1973 he has been a Professor of Computer Science at the University of Naples, where he currently holds the position of Full Professor. He is currently involved in reseach projects both in the EEC-sponsored program ESPRIT and in the Italian CNR-sponsored program on Sistemi Informatici e Calcolo Parallelo. He is the author of more than 50 papers most of them published in international journals, books and conference proceedings. His main research interests include distributed software engineering, reverse engineering, software maintenance, and testing.
1064
IEEE TRANSACIIONS
Aniello Cimitile received the Laurea degree in electronic engineering from the University of Naples, Italy, in 1973. He is an Associate Professor of Computer Science at the University of Naples. Since 1973 he has been a researcher in the field of software engineering, and his list of publications contains more than 50 papers published in journals and conference proceedings. His research interests include software maintenance and testing, software quality, reverse engineering, and reuse reengineering. At present he is involved . in three research projects on reverse engmeermg, respectively supported by some southern Italy industries, the Italian National Council of Research, and the European Communities.
ON SOFTWARE
ENGINEERING,
VOL.
18, NO. 12, DECEMBER
1992