Structural Rde ocumentation: A Case Study KENNY WONG, SCOTT I?. TILLEY, HALN A. MULLER, and MARGARET-ANNE D . STOREY,University ofVictoria
Most so&are documentation typically describes the program at the algorithm data-structure
and
level. For large
legaq systems, understanding
the
system’s architecture is more important.
The authors propose a
method of reverse engineering through redoamentation
that
promises to extend the use&,? life of large q/sterns.
46
L
Pyogyamnaers have becomepart historiavz, part detective,and part clairvoyant. -
eeacv software svsterns require a different approach to software documentation than has traditionally been used. In understanding large, evolving software systems, structural redocumentation through reverse engineering plays a key role. Reconstructing and effectively redocumenting the design of existing software systems is even more difficult than initial design. Recognizing abstractions in real-world systems is as crucial as designing adequate abstractions for new ones. This is especially true for legacy software written 10 to 2S years ago, which is often in poor condition because of prolonged, sometimes dramatic -
07407459,94,$04
Y
i
00 D 1994 IEEE
i
Thomas A. Corbi
even traumatic - maintenance. More than SO percent of softwareevolution work is devoted to program understanding, a task in which documentation traditionally serves an important role. Yet documentation needs differ significantly for software systems of vastly different scales, say 1,000 lines versus one million lines. Most software documentation is in-the-small, because it typically describes the program at the algorithm and data-structure level. For large legacy systems, understanding the structural aspects of the system’s architecture is more important than understanding any single algorithmic component.
JANUARY
1995
STRUCTURALANALYSIS Software engineers and technical managers responsible for maintaining such systems find program understanding especially problematic. The documentation that exists for these systems usually describes isolated parts but not the overall architecture. Moreover, the documentation is often scattered throughout the system and on different media. Thus maintenance personnel must explore the low-level source code and piece together disparate information to form high-level structural models. Manually creating just one such architectural document is always arduous. Creating the necessary documents to describe the architecture from multiple viewpoints is often impossible. Yet this is exactly the sort of in-thelarge documentation needed to expose the structure of large software systems. Software structure is the collection of artifacts used by software engineers when forming mental models of software systems. These artifacts include software components such as procedures, modules, interfaces and subsystems; dependenciesamong components such as client-supplier, inheritance, and control-flow; and attributes such as component type, interface size, and interconnection strength. A system’s structure is the organization and interaction of these artifacts.’ Reverse engineering reconstructs structural models by identifying the system’s current components, discovering their dependencies, and generating abstractions to manage complexity - providing insights that can then improve subsequent development, ease maintenance and reengineering, and aid project management. Structural redocumentation is reverse-engineering the architectural aspects of software. As a result, the overall gestalt of the subject system can be derived, and some of its architectural design information can be recaptured. In addition, structural redocumentation does not involve physically restructuring the code, although this might be a desir-
IEEE
SOFTWARE
able outcome of a subsequent reengineering phase. At the University of Victoria, we have developed the Rigi environment, which focuses on the architectural aspects of program understanding. It sunnorts a method for identifying, building, and documenting layered subsystem hierarchies. Critical to Rigi’s usability is its ability to store and retrieve views - snapshots of reverse engineering states. These views are used to transfer information about the abstractions to software engineers. The box on page 48 contains a detailed description of Rigi’s features. Early experience has shown we can produce views that are compatible with the mental models used by the subject software’s maintainers. These maintainers benefited from the documentation produced by the Rigi system because the views + made concrete the logical software structure previously held only in their minds; + highlighted critical areas of the software structure that needed more attention, such as central components that have many incident dependencies; + provided an objective basis for discussion and software maintenance because the views are based on the actual source code, unlike system documentation that often becomes out-ofdate as the source code evolves; and + verified that the system’s software structure was at least understandable to an experienced analyst from the outside. The reverse-engineering approach developed under the Rigi project has been successfully applied to several real-world software systems. These include a physician’s patient-record information system written in Cobol, a particle-accelerator control program written in C, and numerous Unix utilities. However, before the case study II
reported here, the largest program analyzed using the Rigi methodology was about 120,000 lines of code. While such a program is reasonably large in an academic setting, it is not exceptional for commercial legacy software. Not until we undertook the challenge of redocumenting SQL/D S did we begin to validate our approach.
IN-THE-LARGE DOCUMENTATION DESCRIBES LEGACY;;;~f~ENTAT’oN ARCHITECTURES consists of FROMMULTIPLE Rigi + Rigireverse, a parsing that supports VIEWPOINTS. system common imperative pro. gramming languages such as C and Cobol and a parser for LaTex, to analyze documentation; + Rigisewer, a repository to store the information extracted from the source code; + and Rigiedit, an interactive, window-oriented graph editor to manipulate program representations. In Rigi, the first phase of structural redocumentation is automatic and involves parsing the source code of the legacy system and storing the extracted artifacts in the repository. This produces a flat resource-flow graph of the software. Software maintainers can use this graph to represent the structural dependencies of interest, such as function calls and data accesses. To manage the complexity, the second phase involves human patternrecognition skills and features language-independent subsystem-composition techniques to generate multiple, layered hierarchies for higher level abstractions.2 For example, the analyst can cluster functions into subsystems according to business rules, personnel assignments, or by accepted principles of software modularity, providing the multiple, alternative perspectives needed for maintaining the software. Subsystem composition is a recursive process of grouping building blocks such as data types, procedures, and other
47
/I
components into composite subsystems, to help manage the complexity of understanding the software structure. The composition criterion depends on the application. For program understanding, the process is guided by partitioning the resource-flow graph according to established modularity principles such as low cvupling and styong cohesion. Exact interfaces and modularity quality are used to evaluate the generated software hierarchies. Evolved programs often contain lengthy import lists that
contain many more declarations than really necessary. A subsystem’s exact interface lists only the resources - such as functions and data types - actually provided, required, or used internally by the modules that comprise that subsystem.
LEGACYSYSTEM The Structured Query Language/ Data System is a large, relational data-
base-management system based on a research prototype developed by the IBM Almaden Research Center in the mid-1970s. It has undergone numerous revisions since the first release in 1982. Originally written in PL/I to run on IBM’s VM, SQL/DS is now over two million lines of PL/AS code and runs on several IBM operating systems, including VM and VSE. The hardware base for all SQL/DS versions is the IBM System/370 architecture, but the current release supports
THE RIGI REVERSE=ENGINEERING TOOL We designed Rigi to reverse-engineer large legacy software. Our research has focused on Dynamic views. Rigi presents structural documentation using a collection of vims. A view is a bmd1e Of visual and textual frames ihat contains among Other things, resource-flow graphs, subsystem-hierarthy diagrams, projections, exact interfaces, and annotations. A view is similar to a database view and is a dynamic snapshot that reflects the current reverse-engineering state. Because views are ultimately based on the underlying source code, they remain up-to-date. Flexibility. Because program understanding involves many facets and applications, it is wise to make the approach as flexible as possible for use in many domains. Most reverse-engineering tools provide a fixed set of extraction, selection, filtering, organization, docu-
mentation, and representation techniques. We provide a scripting language that lets analysts customize, combine, and automate these activities in novel ways. For example, analysts have used this language to express or access additional software metrics, clustering strategies, and graph-layout algorithms. Human input. There is a trade-off in programunderstanding environments between what can be automated and what should (or must) be left to humans. The best solution lies in a combination of the two. The Rigi approach relies heavily on the experience of the analyst using it, who makes all the important decisions. For example, as the software engineer forms subsystems on the basis of various high-level criteria, the Rigi system can offer selection and search algorithms on the basis of aspects such as graph connectivity and component and dependen-
cy type and provide statistics, such as exact interfaces, between subsystems and graph-quality metrics. Nevertheless, the process is synergistic because the analyst also learns and discovers interesting relationships by exploring software systems using the environment. We advocate a hands-on approach to reverse engineering to help transfer the constructed abstractions into the minds of the software engineers. Multiple views. Because the user is in charge, the subsystem-composition process can be based on diverse criteria, such as business rules, tax laws, requirements, and other semantic information. These alternative, independent decompositions may exist simultaneously under the structural representation supported by Rigi. Views can accurately capture coexisting architectural decompositions, providing many perspectives for later inspection. In effect, multiple, virtual
48
___~
representations of the software’s architecture can be created, manipulated, and saved. Scalability. To deal effectively with legacy software, we must train programunderstanding tools and methods on large, multimillion-line source codes. Techniques that work on toy projects often do not scale up. Our current scalability objective is to analyze systems of up to five million lines of code. Customizable User Interface. Providing a single interface to a general toolkit poses the persistent problem, “What is a good user interface for this application?” Our solution was to let the tool user, rather than the tool builder, provide the answer. Analysts can customize Rigi’s user interface to suit their own personal taste and needs, while still working within the common look-and-feel imposed by the window manager.
JANUARY
II-
-..
1995
several IBM-only operating-system platforms, including VMKA SP and VSE/SP. PL/AS is a proprietary IBM systemsprogramming language that is PL/Ilike and allows embedded System/370 assembler. Simultaneous support of SQL/DS for multiple releases on multiple operating systems requires multipath code maintenance, increasing the difficulty for its maintainers. SQL/DS consists of about 1,300 compilation units, roughly split into three large systems and several smaller ones. Because of its size and complex evolution, no individual alone can comprehend the entire program. Developers are forced to specialize in a particular component, even though the various components interact. Existing program documentation is also a problem; there is too much to maintain and keep current with the source code and too much to read and digest. SQL/DS is a typical legacy software system: successful, mature, and supporting a large customer base as it is adapted to new environments and grows in functionality. Our main goals in applying Rigi to this system were + to ease software maintenance by providing up-to-date, high-level perspectives of the operational system structure, l to transfer this information effectively into the minds of the maintainers, + to test the applicability of Rigi to industrial legacy systems, and + to promote the use of programunderstanding technology within IBM. Extending Rigi. Because SQL/DS is written in a proprietary language, most commercial off-the-shelf analysis tools are unsuitable. This presented us with an enticing and rare opportunity to exercise our approach on a classic industrial legacy system and provided an excellent test of whether the Rigi tool and method would scale up. Before we could perform the analysis, however, we had to enhance some aspects of our tool to respond to the
IEEE
SOFTWARE
Deciding what to include and what unique challenges posed by SQL/DS: to ignore is an art. When we decided its proprietary implementation lanto ignore intraprocedural details, we guage, its size and complexity, and its reduced the repository size for a multiapplication domain. million-line SQL/DS program signifiOur initial analysis of the SQL/DS cantly, which made a big difference code exposed some shortcomings in when loading the data into the Rigiedit Rigi. The sheer amount of code comgraph editor and manipulating the pelled us to change all three compograph interactively. For nents of the environexamule, the generated ment, especially the database’for SQL/DS is Rigiedit graph editor, HUMAN COGNlTlVE under two megabytes. which was augmented with a scripting lanProprietary language. guage. We also added After deciding what ina PL/AS language subformation to extract system to Rigireverse from the source code, and increased the memwe added a new parsing ory limit of Rigiserver. subsystem to Rigireverse The Rigireverse parsto handle PL/AS. Parsing system, once it was ing PL/AS is problemaugmented with a PL/ AS interface, successfully processed the entive. However, it was tire source code in an incremental manner. not necessary for us to parse the entire lanThe Rigiserver reposiguage completely. Because our intertory was able to handle the large graph est was in the high-level architecture, structures produced by the parser. we focused on extracting only external procedure definitions and calls, even Managing scale. The Rigireverse parthough PL/AS supports nested procesing system comprises several subsysdures. For our initial experiments, we tems, one for each supported lanextracted neither intramodule proceguage. Users can specify which artifacts to extract and in what level of dure calls nor procedure calls to library routines or built-in functions. detail. For example, an option selects if the parser should extract calls to The easiest way to add support for PL/AS within Rigi was to extract the system routines. The ability to pinpoint specific software subsets helps relevant information from the SQLDS us limit the flood of information and source code with a collection of Unix’s irrelevant details that would have emergcsh, awk, and sed scripts, translate this information to its skeletal representation ed had we tried to extract everything completely. in C, and feed the result into the existing C parser. In this way, Rigireverse was Storing entire abstract syntax trees for such a large system would require isolated from most of the changes. several hundred megabytes of storage. In addition, by extracting just a subWhile this level of detail may be necesset of the available information in the sary for tasks such as control-flow PL/AS source, we immediately reduced analyses or code optimization, it is not one of the problems of scale. For exnecessary for understanding and ample, a 400,000-line subsystem of SQL/DS reduces to only 2,000 lines of redocumenting the software architecture. For program understanding, it is C. Figure 1 shows a sample PL/AS important to build abstractions that :ode fragment and its C equivalent, for emphasize important themes and supRigi’s purposes. press irrelevant details. The automatic parsing of all 1,303
ABILITIES ARESTILL MORE POWERFUL THANHARDWIRED ALGORITHMS, SO WEDIDNOTUSEA FULLY AUTOMATED REVERSE-ENGINEERING $;g,be:~;e;:s,‘nd:,: ENVIRONMENT.
I’
49
EXAMPLE:
PROC(FOO,BAR) OPTIONS ('EXAMPLE @EJECT ASM; OMACINCL SYSLIB(MEMBER1);
PLAS'),REENTRANT; -
DCL VERSION
FIXED(15)
CONSTANT(7);
plos2c /* /* /*
File examp1e.c created by plas2c from file example.plas on Man Jan 24 13:08:14 PST 1994.
CALL ARIXEDB(FETCH,RSIBASEP,RSIRSSRC); IF RDATRACI >= '1' THEN DO; CALL ARIXETR(SCORDSCD); GEN CODE( DC H'6715'); END:
void EXAMPLE0 ( ARIXEDBO; ARIXETRO;
ALLDONE: RETURN: END EXAMPLE;
1
*/ */ */
return;
codefragment and its C representation. Using Unix scripts to create skeletal representations in C of the subsystemsgreatly simplified Rigi’s analysis of the sojhare.
Fig-we 1. A PL/AS original SQL/DS
modules of SQL/DS into Rigi took about 2.75 hours and required roughly 10 Mbytes of virtual memory on an IBM RISC System/6000 M375. The resultant database uses 1.6 Mbytes of disk space. Editor interface. The graph editor, Rigiedit, is the heart of our reverse-engineering environment. As such, it is extremely important that Rigiedit be easy to use and interactive. One challenge in editing graphical representations for programs such as SQL/DS is managing visual complexity. We changed the editor so that the screen is not redrawn every time a single graph operation is carried out. Graphs with more than 1,000 nodes and arcs required efficient refreshing to avoid degrading interactive response time. Thus we tuned the user interface to avoid unnecessary redrawing by + redesigning it to let the user batch a sequence of operations without redrawing and + allowing the user to specify when to update the window. Scripting layer. A more dramatic change was needed in one of the philosophies underlying our approach. We have always felt that a semiautomatic reverse-engineering environment is better than a fully automatic one, because human cognitive abilities
are still much more powerful and flexible than hardwired algorithms. In essence, the user should always be in control. However, many of the operations performed during the initial decomposition of the SQL/DS code were repetitive and could be automated. The user would still be in charge of accepting, rejecting, or modifying automatically generated subsystem decompositions, but decomposing the graph interactively with the editor could be made easier. These observations led us to introduce a scripting layer into the editor. Previously, the graph editor consisted of two tightly-coupled subsystems: the user interface and the editor core itself. All editing and selection operations were intermingled with operations for manipulating the user interface, such as window size and menu selection. We separated the user interface from the graph editor and added a transparent intermediate layer to make the environment programmable. Instead of writing yet another command language, we used the Tool command language.3 Tel provides an extendable core language and was specifically written to be embedded into interactive windowing applications. It is application-independent and provides two complementary interfaces: a textual interface to users who issue Tel commands and a proce-
dural interface to the host application. In the new Rigiedit, the Tel interpreter sits between the graphical user interface and the editor core. This integration process is more fully described elsewhere.4J Domain knowledge. Program understanding takes place within the context of a specific application domain. Aspects of the domain that affect reverse engineering include artifact representation, application semantics, and environmental concerns. We initially analyzed SQL/DS without using any domain-dependent knowledge. However, it quickly became clear that to use the extracted information effectively we had to take advantage of existing informal application-specific domain knowledge. Our reverse-engineering approach adapts to new application domains through scripting. This lets users write customized routines for common activities such as artifact extraction, graph presentation, and object search and selection. This makes the system domain-retargetable. For the SQL/DS source code, we created a library of specific scripts to aid in our analysis. One such script is shown in Figure 2. It is used to construct an initial decomposition of the subsystems of SQL/DS on the basis of existing documentation and the current physical modularization.
50
JANUARY
'I
1995
DEVELOPERFEEDBACK The target audiences for our case study were the SQL/DS product-development and management teams. The study was guided, in part, by the need to produce results directly applicable to them. An increased emphasis on product quality mandated a different approach to software maintenance than had previously been used. For the individual subsystems, we proceeded to analyze their intermodule dependencies, summarizing them in a set of views depicting different architectural perspectives. We then presented these views to the development teams in a series of carefully designed one-hour demonstrations. These demonstrations let us exhibit the structural views of the subsystems pertinent to a particular development group, let the audience interact with the software structures using the views as starting points, and let individual developers create new views on the fly to reflect and record specific domain knowledge. The case study involved three major cycles of experiments and corresponding meetings with the developers for feedback Call graph. For our first experiment, we generated a view of the entire call graph without considering any SQL/ DS-specific domain knowledge. The result was not as encouraging as we would have liked. The developers did not recognize the abstractions we generated, making it difficult for them to give us constructive feedback. This reaffirmed our belief that successful reverse engineering must do more than manipulate system representations independent of their domain; the results must add value for its customers. Informal information and applicationspecific knowledge provided by existing documentation and expert developers are rich veins of data that should be mined whenever possible. Naming conventions. Our second experiment used product-naming conventions and existing physical modularizations to construct another set of
IEEE
SOFTWARE
views. The prefix Ari is the unique code for the SQL/DS product, hence all module names begin with these three letters. The fourth letter in each module’s name represents the physical subsystem to which the artifact belongs, the fifth letter represents fur-
global recursionlimit set recursionlimit
ther subsystem refinement, and so on. Rigi captured this information using a series of scripts and views, resulting in decompositions the developers readily recognized. The developers valued these views because they established a common
2
# Create a subsystem decomposition of SQL/DS # based on product code naming convention proc decompose-sql I {prefix "ARI") I ( build-subsystems $prefix 1 I I/ Mutually recursive # the decomposition proc
to
build-subsystems I substring global recursionlimit if I $counter > Srecursionlimit sms rcl-select-none return
set char set Zchar scan "A" scan "2" while
proc
procedures
“I( “"%c" "%c"
counter
1 1
I i
char Zchar
i $char 1 ) ( set parentwindownum [rcl-win-get-id] rcl-collapse rcl-node-rename $name set windownum [rcl-open-select] build-subsystems $name [expr Scounter
+ 11
sms rcl-select-none rcl-win-close rcl-win-select
Swindownum $parentwindownum
‘&w-e 2: A domain-dependent suipt for SQL/DS that constructs an initial subsystem decomposition on the basis of existing dommentation and the subsystem’s cwrent physical modularization.
51
ground for further discussions and analysis. Moreover, the developers could use them to verify their system documentation as well as to suggest where our decomposition did not fully conform to their mental models. Figure 3 shows one view created using Rigi and displayed in Rigiedit. The view contains three windows, each with icons representing different components of the software system. Arcs connecting icons represent resourceflow relationships between components, such as a procedure call. The arcs are typed and give rise to a multilayered semantic network representing the many different dependencies within the system. The top window shows a portion of all SQL/DS low-level modules. For clarity, the arcs have been filtered from the top window. Each module is represented by a mod icon. The lower left window shows a
high-level view of the major subsystems, each depicted by a sys icon. The arcs in this window represent composite dependencies between the major subsystems. The hierarchical subsystem structure within the highlighted Arix component in this window is expanded, filtered, and presented in the bottom-right overview window. Subsystem in detail. Our third experiment decomposed the Arix relational data subsystem. This subsystem is roughly one million lines and consists of nine physical subsubsystems. With the help of an SQL/DS developer, we decomposed At-ix alternatively into four logical subsubsystems according to the distinct development teams responsible for maintaining them. In this viewpoint, our criterion for subsystem composition was personnel assignment to the software, as follows:
l a runtime-access generator; + optimizer pass one and two; + path-selection optimizer; and + executive, interpreter, and authorization. We then further decomposed the Arixo path-selection-optimizer subsubsystem. The developer in charge of this subsubsystem had created her own diagrams of its structure, based on product-development logbooks and the mental model she had formed from her maintenance experience. A view was easily created using Rigi to portray this mental model. More important, an alternative view was created, based on the actual structure as reflected by the current source code. Figure 4 shows these two views of Arixo. The window on the left contains the maintainer’s view and the window on the right the newly constructed view. The maintainer’s view reflects a left-to-right ordering of the major product components according to functional design layers. The new view reflects a top-down control flow based on actual information extracted from the source code. This second view presented a somewhat different perspective that conflicted with existing architectural documentation in some respects. However, because it was constructed using automatically extracted information, it was a more accurate representation of the subsubsystem’s “operational” architecture. The maintainer was able to form a more accurate mental model using a combination of information gained from maintenance experience and information extracted from the code. The two perspectives can be unified in a single Rigi view.
LESSONSLEARNED
Figure 3: A view of SQL/DS and the Arix subsystem. The top window shows a portion of all SQL/DS low-level modules, the lower left window shows a high-leve view of the major subsystems, and the highlighted Arix component appears expanded andfiltered
in the lower right window.
52
While our prepared views did not uncover each developer’s exact mental model of the system, the audience readily recognized the structures presented. There were two main reasons
JANUARY
II
1995
ps
yn
*
Fmw2tlm* ,Is%.3R) AIWJ oplam -+-ulon, ~S”CS
ge
gn
+ws
O#O”,
FTa,Mb”DI “)cJz%l- ALIXO ~muon* Zutflasl
AWXXA bRRD(OTS /\AD(OTS AFmOGC&=mODF for this. First, these developers knew uIP(oc* AllMOSS their subsystems intimately. Second and more important, the views repreUIMOG~ MMOMS MMosG ~~ols bmaGC /\A*ocsAdOSFAAWOZJ AAMOCY PmxOML &=mOSR I\Rcun.4w sented the correct level of abstraction. . *AMOCT mo(osI~~~~~ Most satisfying for us was when the ARD(OFRwmoou developers used their individual AFmoTB t.RIxoMA bFiD(OMS ARD(ODM M!xOSs ARMowAdOOF mD(oss knowledge to design additional views IUIP(ORH Amas ~~0~ to reflect their personal mental models uI!xOLC ARlXOSG MD(OFR MD(ODM AAE(OCT more closely. Each developer could h,waML Mm-WA AAMOK AJlMOD1 emphasize components of particular AAMOSR interest and filter out irrelevant inforI\Rc(0TB mation. A=mosI !4!xclios3 bRRD(OIS .w3RD(ORM AnD(ocY Could we have achieved the same ARIXODZ result without tool support? Possibly, AACKQJ AR!&4 but only with much greater effort and AA!xOMB bIxumw much less functionality. While it is *Mow nRMOvn ARDmSF true that system-description diagrams M&I ~CeJD~ AFIRMOD? have been used for years, recreating and updating them for legacy systems of this magnitude would be ponderous, if not impossible. Moreover, our semiFigure 4: Two views of the path-selection optimizer subsystem. The window on the automatic approach lets several such left contains a view constructed from the maintainer’s mental model. The window documents be created and maintained on the right contains an alternative view based on the actual smcture as reflected simultaneously. These system-level by the current source code. documents are always up-to-date because they are based on the underlying source code. be fruitful to investigate this aspect of Script-based decompositions capware developers who are familiar with ture domain-specific knowledge and the system can design additional views the system. can be generated quickly. For example, in Rigi according to multiple, alternaWe are currently designing and deit took two days to semiautomatically veloping a more ambitious reversetive perspectives. This produces charts create a decomposition using Rigi, but engineering environment based on our and graphs that can ease software only minutes to automatically produce maintenance, thereby saving the comeight years’ experience with Rigi. This one via a prepared script. Either new environment involves three unipany time and money, as well as extendmethod would be much faster and use versities and IBM as an industrial parting the useful life of its legacy system. the analyst’s time and effort more Further analysis of SQL/DS is ner, with collaboration the main theme. effectively than would a manual underway. One of our research associ- McGill University is extending the process of reading program listings and structural pattern-matching capabilities ates is using Reasoning Systems’ Softconsulting volumes of possibly out-ofware Refinery to parse PL/AS and of Rigi to support syntactic, semantic, date system documentation. export more fine-grained relationships functional, and behavioral search patamong procedures and variables. terns. The University of Toronto is Another is processing the source-code building a more flexible repository for ur analysis of the SQLDS source listings produced by the compiler to storing software artifacts, patterncode has proven valuable to its extract additional relationships. Most of matching rules, and software-engineerdevelopers and to our own research. It ing knowledge. The University of Vicour work until now has focused on has shown that our methodology scales exploring control dependencies. toria is making the Rigi system more up to the million-lines-of-code range However, the architecture of SQL/DS extensible by enhancing the scripting and that the views produced during is based on global data structures maniplanguage, improving the user interface, structural redocumentation aid in ulated by many different software modand providing a method for modeling understanding such legacy software. ules. While the developer looks at code the domain of discourse. Preliminary Furthermore, we have shown that we that is more than 90 percent control results indicate that an extensible but can provide businesses with valuable logic, the compiler sees more than 90 integrated toolkit is required to support information about the architecture of percent as data-structure declarations, the multifaceted analysis necessary to their legacy systems. Experienced softplaced in shared %include files. It will understand legacy software systems. +
0
IEEE
SOFTWARE
53
1
ACKNOWLEDGMENTS We thank Michael Whimey and Brian Corrie for their work on the Rigi system. Without their efforts, much of our analysis would not have been possible. We thank the anonymous referees for their constructive comments. We also thank the SQL/DS group members at IBM for their participation in the case study and the IBM Centre for Advanced Studies for inviting us to take part in this research. In particular, we thank Jacob Slonim, director of CAS, for his guidance and support. This work was-supported in @rt by the British Columbia Advanced Svstems Institute. the IBM Software Solutions Toronto Laborator; Centre for Advanced Studies, the IRIS Federal Centres of Excellence, the Natural Sciences and Engineering Research Council of Canada. the Science Council of British Columbia, and the University of Victoria.
3. J. K. Ousterhout, An Introduction to Tcland Tk, Addison-Wesley, Reading, Mass., 1994. 4. S. R. Tilley et al., “Domain-Retargetable Reverse Engineering,” Proc. 1993 Int’l Conf Software Maintenance, IEEE CS Press, Los Alamitos, Calif., Sept. 1993, pp. 142-151. 5. S. R. T&y, et al., “Programmable Reverse Engineering: Int’lJ. .%&are Eng. and Knowledge Eng., Dec. 1994.
Kenny Wang is a PhD candidate in computer science at the University of Victoria. He has worked at the IBM Centre for Advanced Studies with the program understanding group. His research interests include program-understanding, runtime analysis, user interfaces, object-oriented programming, and software design. Wong received a BSc and MSc in computer science from the University of Victoria. He is a member of ACM, Usenix, and the Planetary Society.
REFERENCES 1. H. L. Ossher, “A Mechanism for Specifying the Structure of Large, Layered Systems,” in Research Directions in Object-Oriented Programming, B.D. Shriver and P. Wegner, eds., MIT Press, Cambridge, Mass., 1987, pp. 219.252.
DISTANCE EDUCATION I EanlNc --,-.I...-
A MASTER
.-
-rn
OF SCIENCE
IN
SOFTWARE ENGINEERING OFFERED BY THE SCHOOL OF ENGINEERINGAND APPLIED SCIENCE AT SOUTHERN METHODIST UNIVERSITY
I I
l
The program is offered via VIDEOTAPE as well as on the SMU campus.
l
The program is centered about the problems facing the working professional in the field. Emphasis is on both the fundamental principles of software and system design, and the practical problems of commercialization. Much of the subject material of the core is based on the curriculum recommended by the Software Engineering Institute. The elective courses enable the student to focus on particular interests.
l
ADMISSION REQUIREMENTS: BWBA in one of the sciences or engineering disciplines with a 3.014.0 GPA and one year experience in software development or maintenance. For complete
S
nationwide
M
U
information
2. H. A. Miiller et al., “A Reverse Engineering Approach to Subsystem Structure Identificadon,“~. Research and Practice, Dec. 1993, pp.lBl-204.
I
Scott R. Tilley IS currently
on leave from the IBM Software Solutions Toronto Laboratory and is a PhD candidate in computer science at the University of Victoria. He has authored a book on home computing. His research interests include end-user programming, hypertext, program understanding, reverse engineering, and user interfaces. Tilley received a B.Comp.Sci in digital systems from Concordia University, Montreal, and an MSc in computer science from the University of Victoria. He is a member of the IEEE and ACM
Hausi A. Miiller is an associate professor of computer science at the University of Victoria. He has worked as a software engineer for Brown, Boveri & Cie in Switzerland and with the program-understanding group in the Toronto Laboratory at the IBM Centre for Advanced Studies. His research interests include software engineering, software analysis, reveme engineering, reengineering, programming-in-the-large, software metrics, and computational geometry. M&r received a PhD in computer science from Rice University. He is a member of the editorial board of IEEE Transactions on Sojhare Enginewing and was program cochair of the International Conference on Software Maintenance in 1994.
Margaret-Anne D. Storey is a PhD student in computer science at Simon Fraser University and the University ofvictoria. Her research interests include software engineering and user interfaces, focusing on software visualization. Storey received a BSc in computer science from the University of Victoria. She is a member of the IEEE.
contact
Southern Methodist University FA;;;;;;J;;;;;;;; e-mail:
[email protected]
Address questions about this article to Hausi Miiller at the Department of Computer Science, University ofvictoria, Box 3055, Victoria, BC, Canada VSW 3P6; hausiQcsr.uvic.ca.
Reader Service Number 6 JANUARY
1995