Available online at www.sciencedirect.com
The Journal of Systems and Software 81 (2008) 764–771 www.elsevier.com/locate/jss
Extracting entity-relationship diagram from a table-based legacy database Dowming Yeh a
a,*
, Yuwen Li b, William Chu
c,*
National Kaohsiung Normal University, Department of Software Engineering, 116 Hou-Ping 1st Road, Kaohsiung, Taiwan b National Sun Yat-Sen University, Department of Information Management, 70 Lien-hai Road, Kaohsiung, Taiwan c Computer Science and Information Engineering, Tunghai University, Taiwan Received 30 January 2007; received in revised form 24 June 2007; accepted 1 July 2007 Available online 26 July 2007
Abstract Current database reverse engineering researches presume that the information regarding semantics of attributes, primary keys, and foreign keys in database tables is complete. However, this may not be the case. In a recent DBRE effort to derive a data model from a table-based database system, we find the data content of many attributes are not related to their names at all. In this paper, we present a process that extracts an extended entity-relationship diagram from a table-based database with little descriptions for the fields in its tables and no description for keys. The primary inputs of our approach are system display forms, table schema and data instance. We utilize screen displays to construct form instances. Secondly, code analysis and data analysis involving comparisons of fields and decomposition of fields are applied to extract attribute semantics from forms and table schemas, followed by the determination of primary keys, foreign keys and constraints of the database system. In the final step of conceptualization, with the processes of table mergence and relationship identification, an extended ER diagram is successfully extracted in a case study. 2007 Published by Elsevier Inc. Keywords: Database; Reverse engineering; Entity-relationship diagram; Data model; Case study
1. Introduction Software reengineering is the modification of the functionalities or structures of a software system in order to improve the quality of the software (Snelting, 2000). Software reengineering includes two parts: reverse engineering and forward engineering, i.e., traditional software engineering. Reverse engineering analyzes the implementation of a legacy system, and then abstracts such information into high-level design representations in order to obtain the design specification of the original system. With the specification, the meaning of the source code can be comprehended, and the future maintenance or replacement work may be performed much easier. Most reverse engi*
Corresponding authors. E-mail addresses:
[email protected] (D.M. Yeh),
[email protected] (W. Chu). 0164-1212/$ - see front matter 2007 Published by Elsevier Inc. doi:10.1016/j.jss.2007.07.005
neering works focus on uncovering programming logic and functions of application software. Data reverse engineering, on the other hand, concentrates on data model recovery of legacy systems (Aiken, 1996, 1998). In many cases, the major goal of data reverse engineering efforts is to reconstruct the conceptual data model of a database system in the form of an entity-relationship diagram (ERD) or extended ERD (Davis, 2001). Therefore, they are also called database reverse engineering (DBRE) (Blaha, 1999). The thoroughness of semantics acquisition is one of the criteria that must be considered when designing a DBRE process. The semantics of a recovered conceptual model must incorporate more domain semantics than some structural meaning, namely how data entities relate to each other. Therefore, sources of DBRE should not be limited to the database schema (Chiang et al., 1997). Another way to obtain semantics is through analysis of data instances in a database and the query and view manipula-