In addition, a maintenance tool is implemented to demonstrate the capability of the HED model. Keywords: WWW, software maintenance, hyperlink, database. 1.
A Web Database Application Model for Software Maintenance CHIA-LIN HSU, HSIEN-CHOU LIAO, JIUN-LIANG CHEN, AND FENG-JIAN WANG Department of Computer Science and Information Engineering National Chiao-Tung University Hsinchu, Taiwan 30050, R. O. C. E-mail: {glshu, hcliao, jlchen, fjwang}@csie.nctu.edu.tw applications and conventional applications is that the program modules of the former are constructed based on files (Web pages). A Web database application program may consist of hundreds of files and its application logic is distributed in or among these files. When a programmer maintains (changes) some part of an application, the program files affected by the change should be modified simultaneously to keep functional consistency. One major difficulty of maintenance is how to identify these affected parts [4], especially in Web database applications. In addition to program files, both applications differ in program execution. Web browser can interpret Web application programs directly, while tools in conventional ones use compilation usually. The interpretation of a Web database application causes some semantic errors whose real resource cannot be identified directly and precisely. Some approaches have been proposed to overcome the above difficulties. These approaches mainly contain a model to represent the application logic, and then perform maintenance tasks based upon the model. The model can be classified according to their abstraction level for application logic. For example, the Microsoft’s FrontPage [5] is based on a lower level model that records the hyperlink structure of application programs. Programmers can manipulate programs by following the roadmap specified in the model. WebComposition [6] is based on a higher level model, an object-oriented component model. WebComposition provides a component server for development, operation, and maintenance Web applications. Maintenance is made effective by regenerating programs from the component model of a Web application. It is not necessary for programmers to manipulate programs directly. Generally, a higher level model is more complicated and provides more functions than a lower level one. Current models [7][8][9] are mainly based on the hyperlink structure. They are not effective on addressing the database aspect. In this paper, an HED model is proposed for maintaining Web database applications that consist of a set of pre-existing program files. HED model decomposes a Web database application into three diagrams: a hyperlink diagram (HLD), which is generally
Abstract Web database application programs may consist of hundreds of files. The interoperations among these files could be complex, and most of them are lack of design documents. When a programmer maintains (changes) some programs of an application, the program files affected by the changes need be updated simultaneously to keep the functional consistency. One major difficulty of maintenance is how to identify these affected parts, especially in Web database applications. In this paper, we present a model, called HED model, for the maintenance of Web database applications. HED model decomposes a Web database application into three diagrams, hyperlink diagrams, entity-relationship diagrams, and data-flow diagrams, which are used to represent different aspects of a Web database application. Based on the HED model, the programs files affected by a program change can be identified precisely via the structure and database analyses. In addition, a maintenance tool is implemented to demonstrate the capability of the HED model. Keywords: WWW, software maintenance, hyperlink, database.
1. Introduction WWW (World Wide Web) makes cross-platform communication between people more and more easily. It mainly utilizes Web pages and Web browser. HTML (HyperText Markup Language) [1] is commonly used for describing Web pages and a set of related pages forming a hyperlink structure can be browsed easily. In business applications, a database system is frequently used [2]. Current HTML does not provide constructs for manipulating databases, so that various companies propose different approaches for database functionalities. For example, Microsoft Co. adapts the IDC (Internet Database Connector) and HTX (HTML extension) [3]. These extensions increase the capabilities of Web pages, and complicate the design and structure of Web pages. The primary difference between Web database
1 Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on May 05,2010 at 12:35:13 UTC from IEEE Xplore. Restrictions apply.
used for representing the hyperlink structure of application programs, an entity-relationship diagram (ERD) and a data-flow diagram (DFD) are included in HED model to realize the database aspect in Web database applications. ERD’s are included for recording the database schema and DFD’s are for the database manipulation behavior. The HLD’s notations are defined based on the general hyperlink concept of nodes and links. The DFD’s notations used are adapted from [9]. HED model has some modifications from both HLD’s and DFD’s notations for Web database applications. Based on the application logic extracted in HED model, the program files that are affected by changes can be identified precisely, and some semantic checks can also be performed. In addition, HED model can be used to help programmers understand application logic to improve the maintenance of Web database applications. A maintenance tool is demonstrated in the paper for the capability of the HED model. In Section 2, the changes for maintaining Web database applications are categorized. In Section 3, HED model is defined. In Section 4, the analyses for identifying the affected program files are presented. Section 5 presents a prototype to demonstrate the capability of the HED model. Section 6 draws a conclusion and suggests the future work.
2. Database changes: Among the maintenance changes, a database change usually causes the greatest impact to an application. For example, when a relational database is modified, there are at least three types of changes: y Delete a table: When a table is deleted, the programs affected directly are those accessing this table. The programs that access the tables connected with the deleted table are also affected. For example, a table T1 contains a field F1, which is referred to as a foreign key by field F2 in table T2. When T1 is deleted, the programs that access F2 are affected indirectly. y Delete a field or modify the data type of a field: When a field is deleted from a table or its data type is modified, the programs affected directly are those accessing this field. The programs that access the tables pointing to the deleted field are also affected. For example, a table T1 contains a field F1, which is referred to as a foreign key by field F2 in table T2. When F1 is deleted, the programs that access F2 are affected indirectly. y Add or delete a relationship between tables: Such changes concern the join operation between tables. The modification of a relationship between tables may introduce inconsistency. The programs concerned here are those accessing both fields of the relationship.
2. Changes in Web Database Applications For a modification operation of a record in a database, there usually contains three steps: enter the index of the target record to be modified, retrieve and modify the data of the record from database, and put back the modified record. To provide the service for above modifications, it needs in general three dialogues for input, modification, and acknowledgement, and two database accesses for retrieval and update. For example, when Microsoft’s IDC and HTX techniques [3] are used for implementation, there are about five files needed for a modification. The example indicates that the files in a Web database application are different from those in traditional software. Especially, the number of Web database files is distributed and bigger than expected. For a Web database application, the file distribution and program size (including unused parts) make the identification of affected parts for a change more difficult than traditional software. The modifications in a Web database application can be classified into two categories: 1. Structure changes: Structure changes happen when a Web page or hyperlink is added or deleted. These operations may affect the consistency of original hyperlink structure. The inconsistency is of two types: unreachable Web pages and dangling hyperlinks. A Web page is unreachable if there is no hyperlink connecting this page. A hyperlink is dangling if its destination Web page does not exist.
3. The HED Model 3.1. HED Model A Web database application is composed of a set of files and one or more back-end databases. The whole structure of such an application can be shown in Figure 1, where a rectangle node represents a program file and a link between two nodes represents a hyperlink. The text accompanied with a hyperlink denotes a passing parameter or a data item between two program files. HED model decomposes a Web database application into three diagrams: Data-Flow Diagram (DFD), EntityRelationship Diagram (ERD), and HyperLink Diagram (HLD). These three diagrams are used to represent different aspects of Web database applications. The notations used in HED model are defined as follows.
2 Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on May 05,2010 at 12:35:13 UTC from IEEE Xplore. Restrictions apply.
users, projects, and userproj, and relationships, R1, R2, and R3. The attributes of an entity are listed under the entity name. An attribute with an asterisk (“*”) represents a key. For example, the entity users contains a key userid and three other attributes, username, password, and email.
Web Database Application Program Files D1 D2 Database HTTP
D3
Client
users ¡¯ userid ¡E username ¡E password ¡E email
D4
Figure 1: The structure of a Web database application. R1
3.1.1. Data-Flow Diagram.
HLD is used to represent the hyperlink structure of programs in an application. In HLD, a node represents a program file and a link between nodes represents a hyperlink. A sequence of hyperlinks represents a path for browse. Generally, the hyperlink structure of a Web database application is dynamic since the contents of nodes and hyperlinks may change at runtime. For better understanding of dynamic parts depicted in HLD, the nodes are classified into two types, monotonic and polymorphic, according to their outgoing hyperlinks while browsing. When user browses a monotonic node, the outgoing hyperlinks are always the same. However, he/she browses a polymorphic node, the outgoing hyperlinks may change dynamically. Figure 4 shows an HLD. Each rectangle represents a node, where a polymorphic node is added one line on both sides. In the example, polymorphic node auth.htx has four outgoing hyperlinks representing four different functions, save, retrieve, search, and browse. Users are classified into two levels, members and guests. Members can use all the functions, while guests can use functions search and browse only. When a user enters username and password via login.htm, the input data are sent to node auth.idc. Node auth.idc checks the username and password to see whether the user is a member, to generate the user’s level to node auth.htx. Node auth.htx then displays corresponding outgoing hyperlinks according to the user’s level. In addition to the hyperlinks generally defined in the HTML or Java Script, some constructs in other languages or techniques may introduce additional relationships among nodes. These hyperlinks can also be included in HLD. For example, a .idc node may contain SQL statements for retrieving data from database. It contains a statement with Template construct to assign a .htx node for merging the retrieved data into a Web page.
userid
register.idc userid, username, password, email
R2
3.1.3. HyperLink Diagram
User
register.htm
userproj ¡¯ projid ¡¯ userid
Figure 3: A sample entity-relationship diagram
DFD is used to represent the data manipulation behavior of an application [10]. It comprises four elements: process, data store, external entity, and the flow between these elements. In HED model, a process in DFD represents a node in HLD. A data flow consists of one or more data items. Each data item represents either an attribute in ERD or a passing parameter between files. A data store is an entity in ERD. An external entity, i.e., a producer or consumer of data flows, usually represents a user in the Web. Figure 2 is a sample DFD. It represents a registration action. Firstly, Process register.htm receives four data items, userid, username, password, and email, from the external entity user, and passes these data items to the process register.idc. The process register.idc checks whether the data item userid exists in the data store users. If userid does not exist, the process register.idc adds a record with these four data items into the data store users.
userid, username, password, email
R3
projects ¡¯ projid ¡E projname ¡E dptname ¡E userid
users userid, username, password, email
Figure 2: A sample data-flow diagram 3.1.2. Entity-Relationship Diagram ERD is used to represent the relational database schema [11]. It comprises two elements: entity and relationship. Figure 3 shows a sample ERD, where a rectangle node represents an entity and an arc represents a relationship between two entities. There are three entities,
3 Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on May 05,2010 at 12:35:13 UTC from IEEE Xplore. Restrictions apply.
Therefore, there exists a corresponding hyperlink from the .idc to the .htx node. In Figure 4, the Template statement in auth.idc assigns auth.htx as the template node. Thus, there is a hyperlink from auth.idc to auth.htx
4. Incremental Analysis for Changes In Section 2, the changes of Web database applications are classified into two categories, structure change and database change. In order to identify the affected programs, the application logic specified in the HED model is analyzed. Here two analysis categories, structure analysis and database analysis, are presented for these two changes, respectively.
: monotonic file : polymorphic file
login.htm auth.idc auth.htx search.idc
browse.idc
. . .
retrieve.idc
. . .
. . .
save.idc
4.1. Structure Analysis
. . .
Figure 4: A sample hyperlink diagram
Structure changes happen when a hyperlink or Web page is added or deleted. They affect the hyperlink structure of an application. Thus, a structure analysis is performed based on the hyperlink structure specified in the HLD of the HED model. Structure analyses are divided into three types: 1. Dangling hyperlink analysis: When a hyperlink is added or a Web page is deleted, it may make a hyperlink connect to a nonexistent Web page. Such a hyperlink is called a dangling hyperlink. In HLD, a dangling hyperlink can be identified by checking whether the destination node of a hyperlink exists. If the destination node does not exist, the hyperlink is identified as a dangling one. 2. Unreachable node analysis: When a hyperlink is deleted or a Web page is added, it may cause a node connected by no hyperlink. Such a node is called an unreachable node. In HLD, an unreachable node can be identified by checking whether the added Web page or the destination of the deleted hyperlink is connected by other hyperlinks. 3. Restricted browse path analysis: Since a polymorphic node restricts the availability of its outgoing hyperlinks, a restricted browse path is defined as a path containing at least one polymorphic node. Oppositely, if the nodes of a path are all monotonic, the path is called unrestricted browse path. When a Web page or hyperlink are added, the added hyperlinks provide new browse paths, and thus may violate the restrictions on original browse paths. Such a situation causes a kind of information leakage [13]. The violation of restricted browse path can be identified as follows. Firstly, two monotonic nodes are chose from the HLD; one is regarded as starting node and the other as ending node. Then, all possible paths from the starting node to the ending node are explored. If the explored paths includes at least one restricted browse path and one unrestricted browse path, the restricted browse path is violated. The pair of starting and ending nodes is reported. The above procedure is repeated
3.1.4. Interrelationships among DFD, ERD, and HLD According to the above definitions of DFD, ERD, and HLD, the abstract concept of HED model can be shown in Figure 5. For a Web database application shown at the top of Figure 5, its application logic is split into three diagrams in the lower HED model. The arrow lines are used to represent the interrelationships among the application and three diagrams. That is, a program file of an application is a node in HLD and a process in DFD. The database schema are extracted and stored in ERD. An entity in ERD is a data store in DFD. An attribute in ERD is a data item of a data flow in DFD. From another viewpoint, these three diagrams are interrelated with each others, and it might be better to construct an integrated structure of Web database applications by combining them together. This graph is called HED graph here. An HED graph can represent various aspects of application logic, including the originals in DFD, ERD, and HLD correspondingly, and the interrelationships among these diagrams. Program Files
Web Database Application Database
Node
HLD (HyperLink Diagram)
Process
DFD (Data-Flow Diagram)
Data Item
Data Store
Entity
ERD (Entity-Relationship Diagram)
Attribute
The HED Model
Figure 5: The abstract concept of the HED model
4 Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on May 05,2010 at 12:35:13 UTC from IEEE Xplore. Restrictions apply.
until all the pairs of nodes for starting and ending nodes are checked. Finally, those pairs of nodes that violate the restricted browse path are returned. Figure 6 illustrates two examples for the addition and deletion of a Web page. In Figure 6(a), the original HLD is shown at the left-hand side. It contains two nodes, N1 and N2. When the Web page (node) N3 which contains two hyperlinks, L1 and L2, is added, the HLD becomes the one at the right-hand side of Figure 6(a). The above analyses can identify that N3 is an unreachable node and L2 is a dangling hyperlink. In Figure 6(b), the original HLD is shown at the left-hand side. After the Web page N4 is deleted, the HLD becomes the one at the right-hand side. Three dangling hyperlinks, L1, L2, and L3, and two unreachable nodes, N5 and N6, are identified. In Figure 6(c), the original HLD is shown at the left-hand side. It contains a polymorphic node N2. After the node N5 is added, the HLD is shown at the right-hand side. There are two paths from N1 to N4. One is unrestricted browse path where the nodes of the path are all monotonic ones, N1, N5, and N4. The other is restricted browse path that contains a polymorphic node N2. In the original HLD, the node N2 restricts the browse of N4. However, the addition of N5 causes the node N4 can be accessed without any restriction. The restriction browse path is violated. N1
L1
N2
add N3
5.1. System Architecture
N2 N3
L2
N1
L3
N4 L5
L4
In this section, we present an implementation of a maintenance tool based on the HED model, called SMTW (Software Maintenance Tool for Web database applications).
L2
N2
N5
5. An Implementation Based on the HED Model
N3
(a)
L1
3.
N1
N2
N1
2.
L3
N4 L5
L4
delete N4
N6
The system architecture of SMTW is shown in Figure 7. In the figure, the rounded rectangle represents a Web database application. Each rectangle in the rounded rectangle represents a program file and the column represents the database. SMTW is shown as the black rectangle at the bottom of the figure. It contains four parts: 1. The HED model: It is used to record the application logic. 2. Reverse Engineer: It is responsible to analyze program files for extracting application logic and store the results in the HED model. The analyses are based on the construction rules discussed in Section 4. 3. HED Visualizer: It is responsible to visualize the DFD, ERD, and HLD, stored in the HED model. 4. Maintenance Assistant: It is responsible to assist programmers by performing structure analyses and database analyses. SMTW introduces two advantages for programmers. The first advantage is that SMTW helps programmers understand the application logic easily. The HED Visualizer can visualize the HED model on the screen, it
N3
L2 L1
N5
N6
(b) N1
N1
N2
N3
DFD, if its input or output data flows contain the attribute to be analyzed, this process is identified as an affected program. The above procedure is repeated until all the processes in DFD are examined. Entity usage analysis: When a table is deleted, those programs accessing the table should be identified. The entity usage analysis is based on the attribute usage analysis. The attribute usage analysis is done on all the attributes of the table. Then, the identified programs of entity usage analysis are the union of the programs identified by attribute usage analysis. Relationship usage analysis: When a relationship is deleted, those programs accessing the relationship should be identified. The relationship usage analysis is based on attribute usage analysis. The attribute usage analysis is done on both attributes of a relationship. Then, the identified programs of relationship usage analysis are the intersection of the programs identified by attribute usage analyses.
N2
add N5
N4
N3
N5
N4
(c)
Figure 6: The file addition and deletion examples.
4.2 Database Analysis Database changes happens when tables, fields, or relationships are added or deleted. In order to identify the affected programs, the ERD and DFD are analyzed, called database analyses. Database analyses can be further divided into the following three types: 1. Attribute usage analysis: When a field is deleted or its data type is modified, those programs accessing the field should be identified. The attribute usage analysis is done as follows. For each process in
5 Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on May 05,2010 at 12:35:13 UTC from IEEE Xplore. Restrictions apply.
is easier for programmers to read the content of an application from the model on the screen than from a list of program files directly. The second advantage is that the tool can increase the maintenance effectiveness. The Maintenance Assistant can identify a list of program files via the structure and database analyses. It is rapidly for programmers to locate a program file for maintenance from the identified list than from the set of program files of an application directly. Program Files
Database
Web Database Application
Reverse Engineer HLD (Hyper-Link Diagram)
Programmers
HED Visualizer ERD (Entity-Relationship Diagram)
Figure 8: A sample user interface of SMTW Maintenance Assistant
A list of names of program files is shown at the upper right part of Figure 8. Each time, a distinct name list is shown for its own purpose. For example, the sample list in Figure 8 represents all the program files of a project. Another example is to represent the set of program files identified by structure and database analyses. Besides, DFD, ERD, and HLD are shown at the lower right part. User can choose one of them for viewing. In Figure 8, a sample DFD is displayed. In the first prototype of SMTW, the ActiveX component technology based on the DCOM interface is used for implementation. Besides the set of pre-built ActiveX components in the Visual Basic environment, such as Tree View, List View, etc., six additional components are designed and implemented. Three components for constructing DFD, ERD, and HLD, and three for visualizing DFD, ERD, and HLD, respectively.
DFD (Data-Flow Diagram)
The HED Model
SMTW
Figure 7: The SMTW system architecture
5.2 First Prototype of Services The SMTW’s first prototype is implemented on Windows using Visual Basic 5.0 and MS-Access 7.0. Figure 8 shows the interface of SMTW. There are three parts in the window. The left part is a project tree that contains a set of projects. Each project represents a Web database application. For example, there are three projects, “Component Management System”, “Training Course On-line Registration”, and “Guest Book System,” in Figure 8. There are three sub-trees for each project. The first sub-tree is “Files” which contains the set of program files of a Web database application. The second sub-tree is “Database Schema” which contains the database schema extracted from the program files. Within the “Database Schema”, there are two sub-trees, “Tables” and “Relationships”, which are used to record the entities and relationships in ERD. The third sub-tree is “Virtual Directories” which records the mappings between virtual directory and physical directories. For example, the virtual directory “/dcms” maps to “h:¥inetpub¥wwwroot¥dcms.” Virtual directories are used while extracting the hyperlink structure among program files. In a program file, the destination node of a hyperlink is specified as an URL. An URL can be transformed into a physical program file name via the mappings in virtual directories. Thus, the hyperlink between two physical program files can be established.
6. Conclusion and Future Work In the paper, an HED model and a prototype based on the model have been presented for supporting the changes to Web database applications. HED model decomposes a Web database application into three diagrams: HLD, ERD, and DFD, which are used to represent different aspects of application logic. HED model is used not only to support maintenance, but also to help programmers understand the application logic. The SMTW’s first prototype based on the HED model has been completed and a sample application that contains more than three hundreds files is used to evaluate the SMTW’s capability. SMTW can identify the programs that are affected by a change rather precisely via database analyses. Besides, some semantic errors, such as dangling hyperlink or unreachable node, can be detected via the structure analyses. Future works on SMTW contain two directions.
6 Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on May 05,2010 at 12:35:13 UTC from IEEE Xplore. Restrictions apply.
One is to provide mechanisms for supporting cooperative maintenance. The other is to integrate SMTW with other systems or tools. For cooperative maintenance, there exist at least two needs: supporting multiple programmers to modify different programs simultaneously and supporting multiple programmers to accomplish the maintenance cooperatively. Some mechanisms used in ComputerSupported Cooperative Work (CSCW) software can be adapted; examples include the mechanism for information sharing or cooperative writing. One system architecture of the future maintenance tool is shown in Figure 9. The future maintenance tool is designed to be a client, a server, or both. The four parts, the HED Model, Reverse Engineer, HED Visualizer, and Maintenance Assistant, are the same as those of current architecture in Figure 7. Besides, a Application Manager and Cooperation Coordinator are added for cooperative maintenance. WWeebb W eb DB App. DB DB App. App.
WWeebb W eb DB App. DB DB App. App.
Programmer
Application Manager Reverse Engineer
HED Visualizer
9. 10. 11.
12.
13. 14.
S. Shlaer and S. J. Mellor, Object-Oriented Systems Analysis: Modeling the World in Data, Prentice Hall Inc., 1988. J. Martin and C. McClure, Diagramming Techniques for Analysts and Programmers, Prentice-Hall Inc., 1985. P. Chen, The Entity-relationship Approach To Logical Database Design, Wellesley, Mass. QED Information Sciences, 1991. B. Musteata and R. Lesser, Standard SQL relational database language guide and reference menu, Computer Technology Research Corp., 1988. D.G. Virgil and S. Gupta, “Towards a Theory of Penetration-Resistant Systems and its Applications,” Journal of Computer Security 1, pp133-158, 1992. D. Kiely, “Are Components the Future of Software?,” IEEE Computer, Feb. 1998.
Programmer
Application Manager
Maintenance Assistant
Reverse Engineer
HED Visualizer
Maintenance Assistant
« « « HED Model
Cooperation Coordinator
HED Model
Cooperation Coordinator
Internet
Figure 9: Possible system architecture for cooperative maintenance
References 1. 2. 3. 4.
5. 6.
7.
8.
I.S. Graham, The HTML sourcebook: A complete guide to HTML 3.0, John Wiley & Sons, 1996. G.W. Hansen and J.V. Hansen, Database management and design, Prentice-Hall Inc. 1996. “Microsoft FrontPage 97 and SQL Server,” Microsoft® Corporation white paper, 1997. S.S. Yau and S.S. Liu, “Some Approaches to Logical Ripple-effect Analysis,” Software Engineering Research Center, SERC-TR-24F, University of Florida, Oct. 1988. Microsoft’s FrontPage Home Page, URL: http://www.microsoft.com/frontpage/ H. W. Gellersen, R. Wicke, and M. Gaedke, “WebComposition: An Object-Oriented Support System for the Web Engineering Lifecycle,” Sixth International World Wide Web Conference, Santa Clara, California, USA, 1997. A. Crespo and E. A. Bier, “WebWriter: A Browser-based Editor for Constructing Web Applications,” Fifth International World Wide Web Conference, Paris, France, 1996.URL:http://www5conf.inria.fr/fich_html/papers/P35 /Overview.html The WebObjects Home Page, NeXT Software Inc., URL: http://www.next.com/WebObjects/
7 Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on May 05,2010 at 12:35:13 UTC from IEEE Xplore. Restrictions apply.