REGoLive: Web Site Comprehension with Viewpoints

5 downloads 448 Views 530KB Size Report
sion of Web sites with three distinct viewpoints: developer- view, server-view, and .... tors for HTML, JavaScript, VBScript, SQL database access, and Windows ...
REGoLive: Web Site Comprehension with Viewpoints Grace Gui, Holger M. Kienle, and Hausi A. M¨uller Computer Science Department University of Victoria, Canada {gracegui,kienle,hausi}@cs.uvic.ca Abstract This paper describes a demonstration of the REGoLive reverse engineering tool. REGoLive supports comprehension of Web sites with three distinct viewpoints: developerview, server-view, and client-view. Each viewpoint provides unique information about a Web site, which is not contained in other viewpoints. REGoLive is built on top of the GoLive Web authoring tool, which allows us to expose the developer-view of sites that have been built with GoLive. REGoLive shows a graph visualization of each viewpoint and allows the reverse engineer to navigate mappings between them. We believe that all three viewpoints are necessary to understand a Web site effectively.

view, and the run-time view [13].1 During comprehension, software engineers frequently switch between viewpoints, mentally constructing relationships between them. For example, when exploring the run-time view and observing a certain behavior, the software engineer typically wants to know which parts of the source code are responsible for this behavior. There are tools to construct such mappings for traditional systems, ranging from symbolic debuggers and trace information to concept analysis [2]. Shimba is an example of a reverse engineering tool that uses static and dynamic analyses to construct mappings between the development-view and the run-time view [12]. Similarly to the viewpoints of traditional software systems introduced above, one can identify different viewpoints for Web sites. We believe that the following are most important for Web site comprehension [7]:

1. Introduction

client view: The view of the Web site that a client (typically using a Web browser) sees.

The reverse engineering and program comprehension community has developed many tools to understand complex software systems better. Nowadays, many Web sites are in fact highly complex software systems [9]. This has caused the emergence of Web site reverse engineering, which proposes to apply reverse engineering approaches to Web sites. There are a number of research tools that help Web site comprehension [14] [11] [4]. However, all of these tools focus on a single viewpoint: either client-view or server-view. In the next section, we explain the viewpoints that we identified for Web site comprehension and explain why a Web site comprehension tool should support all of them.

server view: The view of the Web site that a Web server (accessing the local file system) sees.

2. Viewpoints Software engineers typically work with several different viewpoints when comprehending a software system. For traditional software systems, written in a high-level programming language, among the more important viewpoints are the development (or source code) view, the build-time

developer view: The view of the Web site that a developer (using a Web development tool such as GoLive) sees. All three views introduced above are of potential interest to the Web site maintainer. For example, the developer view shows the high-level Web design such as information about templates; the server view is the one the Web server uses and thus important for server maintenance; finally, the client view is the one that the user sees and thus is important to assess navigability and structure of the site as well as to detect dead links. Depending on the tools and technologies that have been used to construct a Web site, mappings between views can be more or less complicated. Generally, the more interactive a Web site and the more sophisticated the Web authoring tool, the less trivial are the mappings. Static Web pages 1 Researchers have identified many other views. For example, Kruchten’s “4+1” view model defines logical, process, physical, and development view.

Proceedings of the 13th International Workshop on Program Comprehension (IWPC’05) 1092-8138/05 $ 20.00 IEEE

Developer View

Server View

Client View

generator

generator

(off−line)

(on−the−fly)

UML scripting (JS) wizards

JSP + servlets XSLT

HTML + JavaScript XML + XSchema

Figure 1. Generative aspect of Web site views

extractor is invoked. Extracted facts are represented in the Tuple Attribute (TA) format. All extractor output is consolidated into a single file and visualized with PBS [3] for the purpose of Web site architecture recovery. To our knowledge, there is no reverse engineering research tool that supports the developer view. This is not surprising, because reverse engineering research has focused on stand-alone tools. However, in order to support the developer view, the most effective approach is to graft comprehension functionality on top of the Web authoring tool itself. REGoLive follows this approach by extending GoLive [1].

3. REGoLive typically have straightforward one-to-one mappings of their pages. Every HTML file on the server corresponds to a Web page on the client side. In contrast, a dynamic JSP page on the server view can map to many differently rendered Web pages on the client view. Many of the differences in the views are caused by generative techniques that connect the views [6]. Mechanisms such as templates at the developer view drive the generation of target code (often off-line) for the server view. Dynamic, server-side technology (such as JSP and servlets) in turn generates on-the-fly target code for the client view. This generative aspect between views is depicted in Figure 1. Reverse engineering tools can operate on any of the above views. The reverse engineering process starts with fact extraction from one (or several) views. Extracted facts are typically stored in a repository, which is then queried by analyses. Analysis results are then visualized to assist the engineer in comprehending the Web site. ReWeb is an example of a reverse engineering tool that targets the client view. It consists of an extractor, analyzer, and viewer for static Web sites [11]. The extractor, written in Java, downloads all pages of a certain Web site starting from a given URL. Links in pages that point outside the Web site are ignored. The extracted Web site is represented as a typed, directed graph. The ReWeb tool has several analyses that operate upon the graph structure. Most of these analyses are inspired by traditional compiler (flow) analyses and mapped to the Web site domain (e.g., dominators of a page, shortest path from the start page to a page, and strongly connected components). Results of analyses are visualized with dotty, a customizable graph editor. Most reverse engineering research has targeted the client view, but more server-view extractors are emerging. Hassan and Holt have developed coarse-grained server-view extractors for HTML, JavaScript, VBScript, SQL database access, and Windows binaries [4]. During the extraction process, each file in the directory tree that contains the Web site is traversed and depending on the file type the corresponding

The architecture of the REGoLive tool is depicted in Figure 2. In order to support the three viewpoints, different fact extractors for each view had to be written.

Figure 2. Architecture of REGoLive

The developer-view extractor retrieves information that is provided by GoLive about the currently loaded Web site (cf. left tree-view in Figure 5). The artifacts in the developer view include files (such as Web pages, CSS files, and JSPs) as well as tool-specific objects (such as templates and smart objects). The server-view and client-view extractors work on their respective viewpoints, extracting similar artifacts. All extractors are implemented as extensions to GoLive and written in JavaScript. The extractors write the extracted facts into a repository, which is currently implemented as a flat file in GXL format [5]. Then data analysis constructs the mappings between different views. These mappings show how artifacts in one view are related to artifacts in other views. Some artifacts remain the same in different views (e.g., a static HTML

Proceedings of the 13th International Workshop on Program Comprehension (IWPC’05) 1092-8138/05 $ 20.00 IEEE

Figure 3. Two different client views, both generated by the same server-view JSP file

page), whereas other are transformed from one view to the other (e.g., a JSP in the server view may map to several of its generated HTML page in the client view). For example, Figure 3 shows two different client views of a Web site. The difference is caused by a JSP file, which generates different HTML pages based on user input and information stored in the server-side database.

Figure 5. REGoLive adds a drop-down menu to GoLive to access the viewpoints of the currently active Web site

The visualization engine presents the result of the data analysis to the reverse engineer. Figure 4 shows a screenshot of the developer view of a Web site. Artifacts are visualized as nodes in a graph editor (e.g., blue nodes represent HTML pages while yellow nodes represent GoLive templates). Relationships between artifacts are shown as arcs between nodes (e.g., a yellow arc from a blue node

to a yellow node indicates that a HTML page makes use of a template). The graph is rendered with Scalable Vector Graphics (SVG) [10] in a Web browser. We currently use Adobe’s SVG viewer to render the SVG in Internet Explorer. The SVG graph editor allows interactive exploration of the graph, including moving of nodes, filtering of arcs and nodes, searching, and applying graph layouts. The graph editor has been implemented in JavaScript and is a separate component that can be customized for different domains [8]. A reverse engineer can call up the different views from an extra pull-down menu (“RE Tool”) in GoLive. Figure 5 shows a screenshot of the added menu. It is also possible to navigate from nodes shown in the SVG graph editor to the corresponding artifacts in GoLive (and vice versa). To summarize, stand-alone Web site comprehension tools can only address the client and server view, whereas REGoLive supports GoLive’s developer view. To our knowledge, no other tool or analysis explicitly identifies the three viewpoints or offers mappings between them. We believe that making these mappings explicit has the potential to benefit Web site comprehension greatly.

Acknowledgments This work has been supported by the IBM Toronto Center for Advanced Studies (CAS), the Natural Sciences and Engineering Research Council of Canada (NSERC), and the Consortium for Software Engineering (CSER).

References [1] J. Carlson and G. Fleishman. Real World Adobe GoLive 6. Peachpit Press, 2003.

Proceedings of the 13th International Workshop on Program Comprehension (IWPC’05) 1092-8138/05 $ 20.00 IEEE

Figure 4. REGoLive’s developer view

[2] T. Eisenbarth, R. Koschke, and D. Simon. Feature-driven program understanding using concept analysis of execution traces. 9th International Workshop on Program Comprehension (IWPC 2001), pages 300–309, May 2001. [3] P. J. Finnigan, R. C. Holt, I. Kalas, S. Kerr, K. Kontogiannis, H. A. M¨uller, J. Mylopolous, S. G. Perlegut, M. Stanley, and K. Wong. The software bookshelf. IBM Systems Journal, 36(4):564–593, 1997. [4] A. E. Hassan and R. C. Holt. Towards a better understanding of web applications. 3rd International Workshop on Web Site Evolution (WSE 2001), pages 112–116, Nov. 2001. [5] R. C. Holt, A. Winter, and A. Sch¨urr. GXL: Towards a standard exchange format. Seventh Working Conference on Reverse Engineering (WCRE ’00), pages 162–171, Nov. 2000. [6] H. M. Kienle, H. A. M¨uller, and A. Weber. In the web of generated “clones” (position paper). 2nd International Workshop on Detection of Software Clones (IWDSC’03), Nov. 2003. [7] H. M. Kienle, A. Weber, J. Martin, and H. A. M¨uller. Development and maintenance of a web site for a bachelor program. 5th International Workshop on Web Site Evolution (WSE 2003), Sept. 2003. [8] H. M. Kienle, A. Weber, and H. A. M¨uller. Leveraging SVG in the Rigi reverse engineering tool. SVG Open / Carto.net Developers Conference, July 2002. [9] J. Offutt. Quality attributes of web software applications. IEEE Software, 19(2):25–32, Mar./Apr. 2002. [10] A. Quint. Scalable vector graphics. IEEE MultiMedia, 10(3):99–101, July–Sept. 2003.

[11] F. Ricca and P. Tonella. Web site analysis: Structure and evolution. International Conference on Software Maintenance (ICSM ’00), pages 76–86, Oct. 2000. [12] T. Syst¨a, K. Koskimies, and H. M¨uller. Shimba—an environment for reverse engineering Java software systems. Software—Practice and Experience, 31(4):371–394, 2001. [13] Q. Tu and M. W. Godfrey. The build-time software architecture view. International Conference on Software Maintenance (ICSM 2001), pages 398–407, Nov. 2001. [14] P. Warren, C. Boldyreff, and M. Munro. The evolution of websites. 7th International Workshop on Program Comprehension (IWPC ’99), pages 178–185, 1999.

Proceedings of the 13th International Workshop on Program Comprehension (IWPC’05) 1092-8138/05 $ 20.00 IEEE