maintaining software constructed using this new paradigm. This report ..... Both tests were run on a MacBook⢠pro with an Intel Core⢠2 Duo 2.8GHz processor ...
TR_303.doc
1
Static Support for Understanding SOA Descriptions: Exploring the Requirements Laura White, Thomas Reichherzer, Norman Wilde, John Coffey {lwhite|treichherzer|nwilde|jcoffey}@uwf.edu Douglas Leal, Joshua Dault, Juan Gil Restrepo, David Kaczynski, {ddl12|jbd16|jg38|dak19}@students.uwf.edu Executive Summary Service Oriented Architecture (SOA) has emerged as a way of providing flexibility to large scale software systems. However there may be problems in understanding and maintaining software constructed using this new paradigm. This report summarizes some of the issues that have been discussed and analyzes the requirements for static analysis tools to aid SOA maintainers. It also describes ongoing work on SOAMiner, a text search tool tuned to the analysis of SOA description files such as WSDL's, XSD's, and BPEL's. SOAMiner is currently under development following a spiral model to clarify requirements through repeated evaluations of prototypes. 2
This report may be cited as S ERC-TR-303, Security and Software Engineering 2 Research Center (S ERC), http://www.serc.net, July 1, 2010.
Table of Contents 1 2 3 4 5
Introduction and Motivation .................................................................................................... 2 Program Comprehension Tools and SOA ............................................................................... 3 Static SOA Program Comprehension in Context .................................................................... 4 The SOAMiner tool ................................................................................................................. 6 Initial Studies with SOAMiner ................................................................................................ 7 5.1 Case Study Sources ......................................................................................................... 7 5.1.1 The Travel Reservations Service ............................................................................. 8 5.1.2 A WSDL from MicroPAVER™ ............................................................................. 8 5.1.3 SOA Descriptions Harvested from the Web............................................................ 8 5.2 Scalability Study.............................................................................................................. 8 5.3 Basic Maintenance Scenario Study ................................................................................. 9 5.4 Locating Data Type Usages........................................................................................... 10 6 Conclusions ........................................................................................................................... 10 7 Acknowledgements ............................................................................................................... 11 8 References ............................................................................................................................. 11 Appendix A - Results of the Basic Maintenance Scenario Study ................................................. 14
TR_303.doc
1
2
INTRODUCTION AND MOTIVATION
In recent years many organizations have turned to Service Oriented Architectures (SOA) as a way to structure large software systems. While there are many different views of SOA, most of them describe applications structured as a collection of services, running on different nodes, and loosely coupled by exchange of messages via a layer of SOA infrastructure, sometimes called an Enterprise Service Bus (Figure 1)
Figure 1 – Structure of a Service Oriented Architecture Application Since the emergence of the SOA architectural style in the early 2000's, some concern has been expressed about how this new generation of computer applications will be maintained. Maintenance has always been the most expensive phase of the software life cycle, primarily due to the need to sustain understanding of complex code as it grows, often losing structure in the process, and is handed off from one group of software engineers to another. Software changes, be they bug fixes or enhancements, become much more risky and time-consuming as knowledge is lost. There seems to be little reason why these same issues will not emerge over time with SOA. In fact, some have described SOA systems as being in continuous evolution (or permanent beta) as soon as they are deployed [KONT:2008]. We will explore possible requirements for static analysis tools to aid SOA software engineers and specifically will describe ongoing work to create SOAMiner, a software engineer's search tool that users might think of as a Google* for SOA. An initial motivation for the development of SOAMiner came from working with students on an introductory SOA tutorial distributed as part of the Netbeans development environment. The Travel Reservations Service [KOVAL:2008] is intended to be a simple example of the use of BPEL to orchestrate services, and consists of a BPEL module and three partner services. The partners are simply stubs, designed to simulate reserving airline seats, hotel rooms and rental cars. Yet the whole example once deployed consists of 129 files distributed across 49 directories, not counting files actually deployed to the server (Table 1). While the tutorial went smoothly as a
*
Google is a trade mark of Google, Inc.
TR_303.doc
3
demonstration of Netbeans' BPEL capabilities, as novices we found ourselves completely bewildered by the multiplicity of components it used. Table 1 - Size of the Travel Reservation Service Example Initially
After Deploy Partner Services
After Deploy Composite Application
Directories
26
32
42
Files
76
105
129
We hypothesized that any future maintainer would encounter equal bewilderment if faced with the need to modify such a system. Obviously some aid to navigation through the mass of material could be useful, and a SOAMiner search engine tuned to SOA requirements seemed to be a relevant and understandable analogy. Further study of the Travel Reservation Service and other examples indicated that much of the complexity is in the files that serve to tie the application together and to deploy it to an application server. For lack of a better term we call these SOA description files; they include: Web Service Definition Language (WSDL) files, which specify the interfaces of web services and the addresses on which they are deployed XML Schema Definition (XSD) files, which may define data types used in messages Business Process Execution Language (BPEL) files, used to specify the orchestration of services A variety of XML files of different types, apparently containing mappings used to provide information to the programming environment or to the application server where the services will be deployed. These SOA description files provide a complex web of information that may provide essential background for a software maintenance task. For example a data type may be described in an XSD file, and then referred to in a message description within a WSDL file, which is then mapped to a service operation within that same WSDL, which specifies the URL where the service may be accessed, which is in turn mapped to specific EJB's by an XML deployment descriptor. A software maintainer may need to comprehend this web of relationships to fully understand the consequences of any change to the data type.
2
PROGRAM COMPREHENSION TOOLS AND SOA
There is a fairly extensive body of literature on the comprehension of pre-SOA styles of software including reports of research that has been backed up by experiments or careful case studies. It is clear from this literature that experienced software engineers use a pragmatic, as-needed strategy in studying unfamiliar code. They rarely attempt to understand a large program in its entirety, but rather seek out those parts that are essential for the specific task they have at hand [KOEN:1991]. This finding provides the motivation for the development of tools to help software engineers locate and browse code using different criteria. The actual mental processes used during comprehension are complex. For example von Mayrhauser and Vans observed experienced software engineers as they worked and noted that they switch back and forth between different perspectives: a program model (overall control flow of the code), a situation model (functional and data flow abstraction), and a top-down model
TR_303.doc
4
(knowledge of the application domain) [VONM:1994]. The conclusions of this line of research emphasize the rapid mental switching involved as engineers recognize "beacons" such as variable names or code patterns, and extract information from multiple sources. This view puts a premium on agile tools that can give answers quickly and play well within the engineer's development environment. For the specific case of SOA applications, there is little published work on program comprehension that is based on experimental research but there have been a number of discussions in the literature of the potential problems to be expected. In a panel discussion at the 2004 International Conference on Software Maintenance several of the panelists focused on organizational and software process changes that may become necessary [KAJK:2004]. KajkoMattson and Tepczynski later elaborated further on these suggested organizational changes and on the concept of "Service Centers" to specialize in the maintenance of web services [KAJK:2005]. Gold et al. describe comprehension issues in scenarios in which applications are composed dynamically, possible differently on every invocation, using broker services that may not disclose their inner workings [GOLD:2004b]. Wilde et al. discuss proposals for understanding specific features in SOA applications, based on their experiences with earlier kinds of distributed software [WILDE:2008]. Gold and Bennett provide some interesting experience based on the development of a prototype health information service [GOLD:2004a]. This system involved integrating information from a wide range of health service providers, and not surprisingly they found that the integration of multiple changing data models and ontologies presents significant challenges. Either interfaces must be tightly coordinated among participating organizations or code must be constructed to cope with minor interface changes. Tracing of execution patterns via "audit services" could help both program comprehension and debugging, especially if services are composed on-the-fly. More recently, two papers at the 2008 Frontiers of Software Maintenance workshop addressed SOA. Lewis and Smith [LEWIS2008] discuss some of the issues related to the evolution of SOA systems, notably the problems of dealing with distributed systems with multiple owners and the comprehension difficulties of having expertise to deal with multiple languages and operating environments. Kontogiannis also discussed the multi-language issues and the need for processes to support the continuous incremental evolution characteristic of deployed SOA applications. [KONT:2008].
3
STATIC SOA PROGRAM COMPREHENSION IN CONTEXT
The history of static support for program comprehension shows an interesting evolution from simpler to more complex tools (see Table 2) Table 2 - The Evolution of Static Program Comprehension Level 1 2 3 4
Tool Category cross-referencing, indexing text search, regular expression graph model of impacts or dependencies design recovery
Characteristics single source file, no user interface (hardcopy), byproduct of the compilation process multiple file search, initially command line interface, later GUIs database of whole software system, various query interfaces, tracing of chains of relationships specialized tools for recovery of specific abstractions assumed to be useful to maintainers
TR_303.doc
5
From the 1960's, compilers have often provided a cross reference listing of source code as a byproduct of the compilation process. This simple list of identifiers giving the line numbers where each was used was a significant aid for software maintainers, especially at a time when the use of global data was much more prevalent than it has now become. For example the cross reference helped a maintainer to understand data flows since he could see all the places where a particular variable was set and accessed. This kind of tracing became, of course, more and more tedious as programs grew and separate compilation units became common. In the 1970's programmers gained continuous access to source code through time sharing terminals. Source code was now more often split across an increasing number of files so multifile search tools were needed. A classic example was the grep tool for regular expression search which has continued to be available in most Unix environments [OPEN:2004]. Regular expression matching provided more freedom in expressing queries, but at the cost of possible false matches to irrelevant code and comments. Tracing effects through code continued to be difficult, requiring the maintainer to locate each 'hit' in the code, evaluate its relevance, and then possibly generate new queries based on the evaluation. Regular expression search tools are now commonly built into programming environments such as Eclipse and Netbeans and continue to be a maintenance programmer's favorite for many tasks. A third level of development emerged in the 1980's and 1990's with tools founded on more sophisticated models of the source code. These models were typically based on graphs of some sort, where the nodes represented different entities in the source code (e.g. variables, data types, classes, functions or methods) and the arcs represented different kinds of relationships between them. Tools for relationship extraction [CHEN:1990], program slicing [GALL:1991], dependency analysis [LINO:1994] and impact analysis [QUEI:1994] fall into this category. Many of the tools at this level aimed to reduce the tedium of tracing effects through the code by allowing queries about chains of relationships, for example, "show all parts of the code that affect the value of variable X". The fourth level of sophistication is seen in tools that provide or support design recovery. At this level the toolmaker attempts to abstract up from the code to a higher level representation and provide a concise response to questions maintainers are presumed to ask about the code. There are many such tools in the literature with specialized goals and approaches, e.g. [MURPH:1997], [DILUC:2000]. Gueheneuc, Mens and Wuyts provide a classification scheme [GUEH:2006]. Perhaps because of this specialization, relatively few of these tools have passed into widespread use. In thinking about tool requirements, we see three interesting progressions as we move up this scale of tools. At the lower levels the tool does much less work and more is left to the software maintainer. That has benefits as well as costs since maintainers generally know quite a lot about the code they are faced with. They are likely to be familiar with the problem domain, with the run-time environments, and possibly with coding practices used in development. Even maintainers lacking these advantages may have access to colleagues who can help them over the rough spots. Thus in formulating queries and viewing results they have a substantial advantage over a software tool, no matter how sophisticated it may be. A second progression derives from the first. Since at the lower levels the maintainer does most of the work, the requirements for the tool are relatively straightforward. If the maintainer needs to understand a variable it is up to him to seek it in the cross reference or formulate a regular expression query. At this level the tool designer is not directly concerned with the thought processes or the work flow of the maintainer.
TR_303.doc
6
However as the tool tries to do more, it becomes important that the 'more' should actually map to real maintainer tasks and thought processes. The tool designer needs to understand typical maintainer problems and craft a user interface that will present solutions to these problems in an easy-to-use and intuitive way. The user of grep does not need to understand how it processes regular expressions. The user of more sophisticated tools should not need to understand either the structure of the graph it uses or the subtleties of the output it displays. Which brings us to our third progression: ease of use, or perhaps more important, ease of first use. For over 20 years our program comprehension research group has been working with industrial software engineers in the Security and Software Engineering Research Center - S2ERC (http://www.serc.net) [WILDE:2007]. Our experience is that maintainers of real industrial software are very busy, and have little leisure to explore new tools. Tools that take more than a few hours to install and run, or that require extensive practice to use effectively are unlikely to become part of their toolkit. The cross referencing and text search tools are favored because they require little or no setup and can be used quickly across a range of maintenance tasks with little training. More sophisticated tools tend to need more time-consuming setup, as well as more expertise to use.
4
THE SOAMINER TOOL
Our current SOAMiner prototype is firmly located at Level 2 of Table 1. We feel that it is too early to develop more elaborate tools since there is still little experience with the practical maintenance of SOA systems. We simply do not know what tasks and questions maintainers will encounter. We still need to explore the diversity of different SOA designs and interact with industry software engineers before we can define the requirements for higher level tools. So our initial goal is to support maintainers of SOA applications with an easy-to-use text search tool for SOA description files while keeping within the well known mental model of a web search engine. The tool should provide initial benefits quickly; our goal is less than one hour from tool download to first results. Our initial SOAMiner prototype is based on Apache Solr [APAC:2010]. Solr is a widely-used open source text search system that runs within a servlet container and may be accessed from a web browser. To use Solr, the administrator creates a schema describing the different fields in each document he wishes to make searchable along with indexing and querying options. He then parses the input documents to create the specified fields and post the result into Solr's index. Users may then make queries using a web browser interface. The design of the Solr schema is important since it determines the kinds of queries that can be made and the results that will be returned [SMIL:2009, chapter 2]. An important decision concerns the granularity of the index. The grep tool searches text that is divided into lines and echoes the lines that match the user's query. Most web search engines index files and return a link to each complete file that matches the query. We noted that most of the SOA description files (WSDL, XSD, etc.) have an XML format and that most of the information is given in attributes within tags (See Figure 2). For such files line endings are arbitrary, and complete files may be quite complicated so neither granularity is really appropriate. We thus hypothesized that the best unit to index would be the XML tag and we decided to focus on queries that match or partially match the values of tag attributes.
TR_303.doc
7
Figure 2 - Part of a WSDL File Showing XML Structure There are three tags, , and , with matching closing tags and
SOASearch, our current user interface used for searching SOAMiner, is a JavaScript application based on AJAX-Solr, an open source library [AJAX:2010]. SOASearch runs in a web browser and provides a window divided into two panes (see Figure 3). Users make queries in the left pane and the usual strategy is to make a general query and then narrow it down as needed. This pane contains "tag clouds" showing the most common file types, file names, XML tag names. etc. in the most recent query response. The user can click on these tags or enter search text to restrict the query. The right pane displays the current results, paged 10 at a time. At the moment it simply displays the data stored in Solr's index for each matching XML tag.
Figure 3 - SOAMiner Search Interface
5 5.1
INITIAL STUDIES WITH SOAMINER Case Study Sources
SOAMiner is still in early stages of a spiral development process so the studies done with it so far are not evaluations of a finished tool, but rather intended to provide feedback on our design decisions and to surface unanticipated requirements.
TR_303.doc
8
We have used three data sets in these initial evaluations, the Travel Reservation Service, a WSDL from the MicroPAVER™ civil engineering application, and a collection of SOA description files harvested from the Web. 5.1.1 The Travel Reservations Service The Travel Reservations Service is the Netbeans example described earlier as a motivation of this project [KOVAL:2008]. Disregarding duplicates and many miscellaneous XML files, we were left with three distinct WSDL's (one each for airline, hotel and rental car reservations), one large XSD with travel industry standard data types, and a BPEL file for the program to orchestrate the services. 5.1.2 A WSDL from MicroPAVER™ MicroPAVER is a large software application widely used by civil engineers in managing the maintenance of pavement installations such as roads and airport runways [AWPA:2009]. It is implemented as a large collection of services programmed using Windows Communication Foundation (WCF). Most of these services are tightly coupled and WSDL's are not normally used internally but they can be generated by the WCF software to allow external access. We used one very large generated WSDL of over 1 MB that may be representative of SOA descriptions generated automatically from large legacy components*. 5.1.3 SOA Descriptions Harvested from the Web A third data set of WSDL's and XSD's was collected from the Web to provide both test data for SOAMiner and a rough snapshot of the current state of practice in service design. A crawler was written that generated automatic Web queries to select and download WSDL and linked XSD files. The Web queries included various keywords and type specifications to select URLs that match WSDL or XSD files. Keywords were selected from the vocabulary of WSDL and XSD files as well as glossaries from different subjects, to select files from a wide range of applications. Among matching results, up to 100 Web documents were downloaded using the URLs returned by the search engines. The documents were subsequently filtered to match WSDL or XSD files. In addition, WSDL files were analyzed to collect any XSD files describing data structures within the WSDL files. The result was a data set with 1513 WSDL and XSD files.
5.2
Scalability Study
As has been mentioned, we believe that ease of first use is an important design goal for SOAMiner. Since it may be used on large systems with many files we ran an initial scalability study to make sure that the choice of Solr and our design of the Solr schema were not compromising SOAMiner's ability to rapidly index and query data sets of various sizes and complexity. We made two timed tests, one with the single large WSDL from MicroPAVER to stress memory use in our parser, and a second with the entire 1513 file data set harvested from the Web to stress Solr's index. Both tests were run on a MacBook™ pro with an Intel Core™ 2 Duo 2.8GHz processor, 6MB of L2 cache and 4GB of RAM. We measured the clock time required to parse the input files and the time to post the resulting data into the Solr index. The results are shown in Table 3, along with the data set size, measured as the total number of input XML tags that were indexed. *
We would very much like to thank Dr. Arthur Baskin of Intelligent Information Technologies, a S2ERC affiliate company, for providing us with this data as well as with many insights into the way services are used within MicroPAVER and associated programs.
TR_303.doc
9
Table 3 - Times to Parse and Load into SOAMiner Data Set MicroPaver WSDL Web Harvested WSDL's and XSD's
Total Files 1
Total Size (tags) 12,818
Parse Time (sec) 166.63
Post Time (sec) 20.13
1513
529,127
1278.89
955.93
We made several test queries against each data set and found that the response time was only a few seconds in all cases. These results suggest that SOAMiner will scale well with both the size of data sets and the size of individual files. Parse, load and query times are within acceptable limits.
5.3
Basic Maintenance Scenario Study
A usability study was conducted with the initial prototype of SOAMiner to evaluate current capabilities and to identify additional requirements. A think aloud protocol was used with two participants engaging in two predefined software maintenance scenarios using the Travel Reservations Service described earlier. Participants were encouraged to verbalize their thoughts as they performed activities related to software maintenance, while observers recorded times, comments, and participant behavior. To help isolate usability factors related to text search tools in general as opposed to usability issues with SOAMiner in particular, each participant performed one of the scenarios using grep and the other using SOAMiner. Both participants were students with some reading experience with WSDL's and XSD's, but without practical experience working with such documents. Accordingly the study was preceded with approximately three hours of basic orientation related to XML Schema, WSDLs, BPEL, SOAMiner, and grep. The first maintenance scenario was fairly simple and involved locating where the URL for a particular service was defined. The second scenario involved a hypothetical bug involving failed cancellations of vehicle reservations; this was more difficult in that it required tracing through several of the SOA description files to understand how message return data was defined. Both participants, one using grep and one using SOAMiner, were able to get the correct answers for the first scenario. The participant using grep was able to answer the questions within 15 minutes while the participant using SOAMiner took 25 minutes, the difference being entirely attributable to the time required to go through SOAMiner's indexing procedure. The second scenario was more challenging and neither participant was able to correctly answer all of the questions. The main difficulty seemed to be that both participants only had a novice level of familiarity with BPEL, WSDL, and XSD and neither tool was sufficient to substitute for this lack of background knowledge. One specific problem was that the WSDL contained strings such as "CancelVehicleOut" and so the participants searched on variants of "CancelVehicle". However in the XSD the data type they were looking for was called "CancellationStatus" and so was not found. A SOA expert would be able to trace the point in the WSDL where the terminology changed but novices could not make the connection. The most important result of this study is the list of suggested improvements to SOAMiner as given in Appendix A.
TR_303.doc
5.4
10
Locating Data Type Usages
One final study explored a task that we think may be typical for users of SOAMiner understanding usages of data types. The WSDL 1.1 specification provides wide latitude to service developers as to how data types are declared. There are three possible strategies which may be combined within any WSDL. The data contained within a particular message may be declared: (1) by reference in an optional section to an external XML Schema document (2) by using the XML Schema namespace and coding XML Schema-formatted data types in the section of the WSDL itself or (3) if only the 44 simple types in the XML Schema recommendation are used, by coding them directly in tags. Unfortunately this flexibility means that maintainers may often be faced with WSDL's written in an unfamiliar style. For the Travel Reservation Service we imagined a scenario in which a maintainer needed to understand the data used in the input message to reserve a vehicle. A WSDL expert would know that data types are often declared in tags within a tag. In SOAMiner it was easy to restrict to WSDL files and to tags and then search for "vehicle". This immediately finds the four matching messages (Figure 3). However it was more difficult to navigate to the tag contained within the ReserveVehicleIn message. The only solution was to search for the "tag child Id", an arbitrary unique string generated during parsing. While that method works, it would be desirable to have a more-intuitive way of navigating up and down the hierarchy of tags. Once the tag for ReserveVehicleIn was found, SOAMiner showed immediately that its input data type is “ota:TravelItinerary”. The obvious next step was to do an unrestricted search for “TravelItinerary”, but that produced thousands of hits because all tags in the file named OTA_TravelItinerary.XSD match the query! Thus the file containing the type definition was quickly identified, but the current Solr search interface does not provide any easy way to search for that specific string within that file. The Solr schema should probably be adjusted to avoid matches to file names or to give low weight to such matches.
6
CONCLUSIONS
This report has discussed some of the problems that software maintainers may face when trying to understand the large SOA applications which are now coming into service. It also described the ongoing development of SOAMiner, a proposed search tool that users might think of as a Google for SOA. Since there is little documented experience in the maintenance of SOA applications, we do not know clearly what maintainers will need, so SOAMiner is being developed following a spiral process with repeated evaluation of prototypes. The evaluations reported in this report will be used to guide the development of the next version of SOAMiner, which we hope may then be ready for trials at S2ERC industrial affiliates. The top priorities identified for the next cycle are: 1) Provide a better user interface and a more agile setup and load procedure for indexing SOA description files into SOAMiner. 2) Redesign the panel showing SOASearch output (the right panel in Figure 3) so that it conveys more information about the tags that matched the query and the context in which those tags exist. One possibility would be to integrate with a text display that would show the query results highlighted on the original XML file. 3) Miscellaneous cleanups to our parser and to the Solr schema to avoid searching on file names and paths and to provide better information for use in the redesigned output panel.
TR_303.doc
7
11
ACKNOWLEDGEMENTS
Work described in this paper was partially supported by the University of West Florida Foundation under the Nystul Eminent Scholar Endowment. We would also like to thank Dr. Arthur Baskin of IIT, a S2ERC affiliate, for his guidance.
8
REFERENCES
[AJAX:2010]
Evolvingweb's AJAX-Solr, http://github.com/evolvingweb/ajax-solr, link accessed June 2010.
[ALMOD: 2010]
Almonaies, Asil A., Cordy, James A. and Dean, Thomas R., Legacy System Evolution towards Service-Oriented Architecture, International Workshop on SOA Migration and Evolution SOAME 2010, Madrid, March 2010, pp. 5362, ISBN 978-3-00-030627-3.
[APAC:2010]
Apache Software Foundation, Apache Solr. http://lucene.apache.org/solr/ link accessed June 2010.
[APWA:2009]
APWA - American Public Works Association, MicroPAVER 6.1.5 Pavement Maintenance Management System, http://www.apwa.net/About/SIG/Micropaver/ link accessed June 2010.
[CHEN:1990]
Chen, Yih-Farn; Nishimoto, Michael; Ramamoorthy, C. V., "The C Information Abstraction System", IEEE Transactions on Software Engineering, Vol. 16, No 1, pp. 325 - 334.
[DILUC:2000]
Di Lucca, Guiseppe Antonio; Fasolino, Anna Rita; De Carlini, Ugo, "Recovering Class Diagrams from Data-Intensive Legacy Systems" Proceedings International Conference on Software Maintenance, ICSM2000, San Jose, October 2000, pp. 52 - 63.
[GALL.1991]
Gallagher, Keith B. and Lyle, James R., "Using Program Slicing in Software Maintenance" IEEE Transactions on Software Engineering, Vol. 17, No. 8, August 1991, pp. 751 - 761.
[GOLD:2004a]
Nicolas Gold, Keith Bennett, "Program Comprehension for Web Services," pp.151, 12th IEEE International Workshop on Program Comprehension (IWPC'04), 2004
[GOLD:2004b]
Nicolas Gold, Claire Knight, Andrew Mohan, Malcolm Munro, "Understanding Service-Oriented Software," IEEE Software, vol. 21, no. 2, pp. 71-77, Mar./Apr. 2004, doi:10.1109/MS.2004.1270766.
[GUEH:2006]
Gueheneuc, Y.-G.; Mens, K.; Wuyts, R., "A comparative framework for design recovery tools," Proceedings of the 10th European Conference on Software Maintenance and Reengineering, 2006. CSMR 2006, pp.123 -134, March 2006, doi: 10.1109/CSMR.2006.1.
[KAJK:2004]
Mira Kajko-Mattsson, "Evolution and Maintenance of Web Service Applications," pp.492-493, 20th IEEE International Conference on Software Maintenance (ICSM'04), 2004.
TR_303.doc
12
[KAJK:2005]
Mira Kajko-Mattsson, Michal Tepczynski, "A Framework for the Evolution and Maintenance of Web Services," pp.665-668, 21st IEEE International Conference on Software Maintenance (ICSM'05), 2005.
[KOEN:1991]
Koenemann, J. and Robertson, S. P. 1991. Expert problem solving strategies for program comprehension. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: Reaching Through Technology (New Orleans, Louisiana, United States, April 27 - May 02, 1991). S. P. Robertson, G. M. Olson, and J. S. Olson, Eds. CHI '91. ACM, New York, NY, 125-130. DOI= http://doi.acm.org/10.1145/108844.108863
[KONT:2008]
Kostas Kontogiannis, Challenges and Opportunities Related to the Design, Deployment and Operation of Web Services, Frontiers of Software Maintenance (FoSM) 2008, Beijing, Sept. - Oct. 2008, pp. 11-20.
[KOVAL:2008]
Anastasia Koval, Understanding the Travel Reservation Service, http://netbeans.org/kb/61/soa/understand-trs.html, link accessed June, 2010.
[LEWIS:2008]
Lewis, G. A. and Smith, D. B., Service-Oriented Architecture and its implications for software maintenance and evolution, Frontiers of Software Maintenance (FoSM) 2008, Beijing, Sept. - Oct. 2008, pp 1-10.
[LINO:1994]
Linos, Panagiotis; Aubet, Philippe; Dumas, Laurent; Helleboid, Yann; Lejeune, Patricia; Tulula, Philippe, "Visualizing Program Dependencies: An Experimental Study" Software - Practice and Experience, Vol. 24, No. 4, April 1994, pp. 387 - 403.
[MURPH:1997]
Murphy, Gail and Notkin, David, "Reengineering with Reflexion Models: A Case Study", IEEE Computer, Vol. 30, No. 8, August 1997, pp. 29 - 36.
[OPEN:2004]
The Open Group, grep - The Open Group Base Specifications Issue 6, http://www.opengroup.org/onlinepubs/009695399/utilities/grep.html, link accessed June, 2010.
[PANCH:2007]
Oleksandr Panchenko: Concept Location and Program Comprehension in Service-Oriented Software. 23rd IEEE International Conference on Software Maintenance (ICSM 2007), October 2-5, 2007, Paris, France, ICSM 2007: 513-514
[QUEI:1994]
Queille, J.-P.; Voidrot, J.-F.; Wilde, N.; Munro, M., "The Impact Analysis Task in Software Maintenance: a Model and a Case Study", Proc. IEEE International Conference on Software Maintenance - 1994, Victoria, Canada, September 1994, pp. 234 - 242.
[SMIL:2009]
David Smiley and Eric Pugh, Solr 1.4 Enterprise Search Server, Packt Publishing Ltd., Birmingham UK, 2009, ISBN 978-1-847195-88-3.
[VONM:1994]
von Mayrhauser, A. and Vans, A. M., Dynamic Code Cognition Behaviors for Large Scale Code, Proceedings Third Workshop on Program
TR_303.doc
13
Comprehension, November 14-15, 1994, Washington, DC, IEEE Computer Society, pp. 74-81. [WILDE:2007]
Norman Wilde, Dennis Edwards, Sharon Simmons, "Software Reconnaissance: Experiences with a Simple Requirements Traceability Technique", International Symposium on Grand Challenges in Traceability, TEFSE/GCT’07, March 22-23, 2007, Lexington, KY, USA, pp. 103 - 107.
[WILDE:2008]
Wilde, N., Simmons, S., Pressel, M., and Vandeville, J. 2008. Understanding features in SOA: some experiences from distributed systems. In Proceedings of the 2nd international Workshop on Systems Development in SOA Environments (Leipzig, Germany, May 11 - 11, 2008). SDSOA '08. ACM, New York, NY, 59-62. DOI= http://doi.acm.org/10.1145/1370916.1370931
TR_303.doc
14
APPENDIX A - RESULTS OF THE BASIC MAINTENANCE SCENARIO STUDY Observations from Scenario 1 Both participants, one using grep and one using SOAMiner, were able to correctly answer questions related to the maintenance of the hotel reservation system. The participant using grep was able to answer the questions within 15 minutes. The participant using SOAMiner was able to answer the questions within 25 minutes. The observer noted that the difference in time was attributed to the setup time for SOAMiner. Both users retraced steps in their attempt to derive the sought after information. The participant using SOAMiner remarked that copying all of the SOA description files into a special directory, clearing the index, and typing the SOAParser command n times – once for each of n files was tedious. The participant using SOAMiner initially forgot to load the index – but fairly quickly realized the error when the system did not behave as expected. This participant performed partial loads of the files needed to answer the first question and then performed a second load later when additional files were needed for the second question. At one point the participant using SOAMiner did not realize that he had the answer in the right panel displayed in SOAMiner. The lack of line numbers in the display provided by SOAMiner necessitated the supplemental use of a text editor to answer some of the maintenance scenario questions. SOAMiner doesn’t integrate access to an editor that displays line numbers. Comments from the participant using SOAMiner were that an easier way to load each file into the index is needed, that line numbers are needed, and that he liked the left panel browser with cloud tags. The participant using SOAMiner got to the appropriate location within the file structure fairly quickly once the files were loaded. The participant using grep entered lots of irrelevant queries before attaining answers. Comments from the participant using grep were that grep was fast and provided good support once he knew the commands, however, he did not like having to use the command line interface, and had difficulty keeping track of his location within the file structure. Observations from Scenario 2 The second scenario was more challenging for each of the participants than the first scenario. Neither participant was able to correctly answer all of the questions regarding the maintenance scenario. Both participants only had a novice level of familiarity with BPELs, WSDLs, and XSDs which caused difficulty for them in finding answers – the tools did not substitute for lack of background knowledge. The participant using grep also made extensive use of the vi text editor, however he did not understand the WSDL, BPEL, XSD relationships well enough to navigate within and between files to find answers. He eventually decided the answer had to be in the XSD, and examined that file in the vi editor, however name changes within those files stumped him.
TR_303.doc
15
The participant using grep entered many mistyped and irrelevant commands--never used a text editor. The SOAMiner participant spent 11 of 36 minutes creating the index. He also performed a partial load of needed files. Comments from the participant using SOAMiner were that the interface is easy to use, and that options in the left panel browser (e.g., filter capability) were very useful. However deriving the desired meaning from the results was most difficult. The tool was easy to use like other web searches but did not provide enough information in the results. Conclusions - Suggested Improvements: 1) The extent of time that both participants expended setting-up and loading files into SOAMiner made evident that the need to improving the manner in which users set-up and load files into SOAMiner is a priority. The new set-up and load procedures should accommodate multiple load use cases. 2) The SOAMiner user-interface is not as intuitive as desired. The capability to undo/redo activities would improve support for the natural thought process of users as demonstrated by both participants’ attempts to retrace their steps as they progressed through the scenarios. The integration of a text editor that can be launched from the SOAMiner interface, possibly when clicking on links associated with files, and the display of files with line numbers will better support users. 3) Enhancing SOAMiner with the means to represent domain knowledge about WSDL, BPEL, XSD and relationships among these would be beneficial for users, especially for the novice user. 4) Incorporating a built-in HELP system for SOAMiner, possibly one where a user can hover over tags and get information about the meaning of the tag would be one way to address the participant remarks about deriving meaning from the SOAMiner output. 5) A problem with text based-tools is with different labels for the same concept. In this case the WSDL contained strings such as CancelVehicleOut whereas in the XSD the term was CancellationStatus. 6) Further testing with users with more familiarity with the XML vocabularies involved is needed to evaluate the usability of SOAMiner for more experienced maintainers. Future Work Two issues arose from the evaluation of these results that warrant further work. First, we need to determine the effect of loading the same files into SOAMiner multiple times, and second we need to look at configuration issues such as case sensitivity, partial string match, and explicit namespace qualifiers to see how these are best handled in SOA maintenance scenarios.