Proceedings
Seventh Workshop on Empirical Studies of Software Maintenance 9 November 2001 Florence, Italy Sponsored by:
IEEE Computer Society's Technical Council on Software Engineering (TCSE)
In collaboration with:
Università degli Studi del Sannio – Benevento - Italy
Table of contents Seventh Workshop on Empirical Studies of Software Maintenance 9 November 2001 – Florence - Italy
Table of contents ……………………………………………………………………………...
III
Conference Committee ……………………………………………………………………….
V
Maintainability, Quality ……………………………………………………………………… An Experimental Infrastructure for Evaluating Failure Analysis Techniques for Released Software ………………………………………………………………………………………... S. Elbaum
1
Ambiguity identification and resolution in software development: a linguistic approach to improving the quality of systems ………………………………………………………………. L. Mich Automatable Integrations for the Reuse of Concurrent Specifications ………………………… A. Santone, G. Vaglini Auditing Software Maintenance: Studying the Effort of Euro Integration in a Banking Environment ……………………………………………………………………………………. C. Simón, A. B. Ferro, M. B. Alvarez
3
7 12
17
Software Code Mining: Change Impact Discovery in Distributed Multi-language Source Code N. Melab, M. Bouneffa, L. Deruelle, H. Basson
23
Distributed Software ………………………………………………………………………….. Web Site Maintainability ……………………………………………………………………….. N. F. Schneidewind
27 29
Seven Challenges for Research on the Maintenance and Evolution of Widely Distributed Software ………………………………………………………………………………………… J. M. Bieman
31
Geographically Distributed Software Engineering and Maintenance, a Challenge for Code Analysis and Empirical Studies ………………………………………………………………… P. Tonella
35
Evolutionary Web-Site Development for a Sport Retail Shop in Munich ……………………... S. H. Sneed
III
39
Software Process ……………………………………………………………………………… Toward Distributed GQM ……………………………………………………………………… A. Bianchi, D. Caivano, F. Lanubile, F. Rago, G. Visaggio Empirical Perspectives on Maintaining De-localized Software Systems using Web based Tools …………………………………………………………………………………………… V. Balaji, S. Balaji Management of a Distributed Testing Process using Workflow technologies: a Case Study …. B. Copstein, F. M. de Oliveira
45 47
52 62
Distributed and Colocated Projects: a Comparison ……………………………………………. A. Bianchi, D. Caivano, F. Lanubile, F. Rago, G. Visaggio
65
EPiCS: Evolution Phenomenology in Component-intensive Software ………………………... M. M. Lehman, J. F. Ramil
70
Code Analysis, Metrics ……………………………………………………………………….. Measuring and Predicting the Linux Kernel Evolution ………………………………………... F. Caprio, G. Casazza, M. Di Penta, U. Villano
75 77
Analyzing Programs via Decomposition Slicing: Initial Data and Observations ……………… K. Gallagher, L. O'Brien
84
Improving Corrective Maintenance Effort Prediction: An Empirical Study …………………... A. De Lucia, A. Persico, E. Pompella, S. Stefanucci
97
Empirical Validation of Project-Independency of Quantitative Relationship between Program Fault Density and Complexity Metrics ………………………………………………………… R. Takahashi, Y. Nakamura, Y. Muraoka, S. Ikehara Clone Analysis in the Web Era: an Approach to Identify Cloned Web Pages ………………… G. Di Lucca, M. Di Penta, A. R. Fasolino, P. Granato
101 107
Maintenance Project Assessment Using Fuzzy Function Point Analysis ……………………… 114 O. de Souza Lima Jr., P. P. Muniz Farias, A. Dias Belchior Author index …………………………………………………………………………………...
IV
123
Conference Committee
Chair: Giuliano Antoniol (University of Sannio - Italy) -
[email protected]
Organizing Committee: Lionel Briand (University of Carleton - Canada) -
[email protected] Paolo Nesi (University of Florence - Italy) -
[email protected] George Stark (IBM Global Services - USA) -
[email protected] Norman F. Schneidewind (Naval Postgraduate School - USA) -
[email protected] Martin Shepperd (Bournemouth University - UK) -
[email protected]
Program Committee: Anneliese Amschler Andrews (Colorado State University, USA) Ira Baxter (Semantic Designs, USA) Jim Bieman (Colorado State University, USA) Shawn Bohner (Metagroup, USA) Gerardo Casazza (University of Neaples, Italy) Ned Chapin (InfoSci Inc., USA) Andrea De Lucia (University of Sannio, Italy) Sebastian Elbaum (University of Nebraska, USA) Khaled El Emam (NRC, Ottawa, Canada) Alessandro Fantechi (University of Florence, Department of Systems and Informatics – Italy) Filippo Lanubile (University of Bari, Italy) Manny M. Lehman (Imperial College, UK) Mark Harman (Brunel University, UK) Warren Harrison (Portland State University, USA) Taghi M. Khoshgoftaar (Florida Atlantic University, USA) Kostas Kontogiannis (University of Waterloo, Canada) Ross Jeffery (UNSW, AU) Ettore M. Merlo (École Polytechnique of Montréal, Canada) Sandro Morasca (Politecnico di Milano, Italy) Colin Potts (Georgia Institute of Technology,USA) Francesco Rago (EDS Italia Software, Italy) Juan F. Ramil (Imperial College, UK) Marc Roper (University of Southampton, UK) Harry Sneed DE Paolo Tonella (ITC-Irst, Italy) Colin Tully (European Software Process Improvement Foundation, UK) Giuseppe Visaggio (University of Bari, Italy) Claes Wohlin (Dept. of Software Engineering and Computer Science Blekinge Institute of Technology, SE)
Proceedings editor: Massimiliano Di Penta (University of Sannio, Italy) –
[email protected]
V
Maintainability, Quality
An Experimental Infrastructure for Evaluating Failure Analysis Techniques for Released Software Sebastian Elbaum Department of Computer Science and Engineering University of Nebraska - Lincoln Lincoln, Nebraska
[email protected] ABSTRACT
edge, no formal assessment has been performed on those failure analysis techniques after the software has been released. Furthermore, no specialized technique has been developed to assist in the analysis of software failures after release 1 .
Learning from software failures is an essential step in the development of more reliable software systems. In that regard, failure analysis techniques attempt to assist in the investigation of software failures by providing information about the circumstances leading to the failure, or about the state of the target system after the failure occurred. In spite of the numerous failure analysis techniques available, most of them have been designed to work and assessed within the confines of the organization. This paper reports on our effort to design and develop an experimental infrastructure to evaluate and compare failure analysis techniques on released software.
1.
We conjecture that gaining any knowledge about failures generated outside the controlled perimeter of the software organization is extremely valuable. Just for illustration purposes, consider the reported Jet Propulsion Laboratories cost of $10K per incident when a fault was caught in testing [3]. Now, assuming Boehm’s studies [2] hold for the JPL case, failure costs can increase from 40 up to 1000 times if not found until after the software has been released. That means that the cost per field failure on this example could range from $400K to $10,000K. If the circumstances that lead to the failure are not understood, the proper corrections and fixes are unlikely to be made, and the failure will be likely to occur again in other instances of the system (e.g., the next version of the Mars Lander). The fact that costly software failures often repeat themselves [6], should be an indicator of the possible benefits of learning from software failures.
INTRODUCTION
Learning from software failures is an essential step towards the development of more reliable software systems and processes. However, as more intricate software systems are developed, determining the nature and causes of a software failure becomes a greater challenge. Today, software systems have so many different layers and components that it is even hard to determine what parts are failing. As an example, consider the situation presented by web based systems, where failures can occur due to a fault in a user space application, in the web server, in the database, in the connection, in the particular browser, or in the version of the particular browser.
These possible benefits become more obvious when working on systems with potentially costly software failures, (from avionics and medical systems, to credit card transaction managers and web servers running shopping carts) where the cost of repeating the same failure is not acceptable. But, it is also relevant for organizations that rely heavily on beta-testing and alpha-testing, where it is necessary to understand and take advantage of the large number of failures occurring outside the organization’s controlled environment.
Failure analysis techniques (and tools) attempt to assist in the identification of the faulty sections of code and the characterization of the system’s state that led to the failure. Several techniques and tools have been developed to assist and accelerate the analysis of failures, and the identification of the faulty sections of code. Perhaps the simplest techniques are memory transfers, which provide a snapshot of the memory and registers used by the application (e.g., core dumps in Unix systems). The idea of having sequential snapshots led to the creation of traces [1], which provide a continuous information flow of a certain type of event. Slicing techniques [10] constitute a more specialized group, that attempt to identify the sections of code associated with a specific variable or event. Replay architectures [8]constitute another group of techniques that provide an environment to exactly reproduce the activities that led to the failure. There are also specialized techniques, such as the ones for distributed systems monitoring and debugging that incorporate synchronization mechanisms [9].
Our claim is that failure analysis techniques on released software are needed to decrease the probability of repeatedly suffering the same type of failure in the future, and to increase the efficiency of the testing and debugging process. This claim has several implications. First, we may need to re-evaluate the existing techniques taking into consideration a broader assessment spectrum since the tradeoffs involved in a pre-release versus a post-release may be different. For example, techniques used in pre-release can be very intrusive in order to provide additional information, while techniques working on post-release may need to employ lighter monitoring mechanisms in order to be tagged with a product that needs high performance. Second, the more careful evaluation may lead to the creation of specialized techniques that take advantage of additional
Although the previous enumeration of failure analysis techniques is not complete, it provides an idea of the large number and the variety of techniques available. However, most of those techniques have been designed to work (and evaluated) within the relatively controlled environment provided by the organization. To our knowl-
1
There are instances when existing tools for failure analysis are adapted to work with a released product. Netscape Communicator’s ”software quality agent” is an example[5].
3
it. Getting a handle on failure scenarios has not been (and will not become) an easy task. Nevertheless, we have been able to take advantage of existing information such as the one provided in the bug lists of open-source applications, and of known techniques of fault seeding to provide the complete artifacts [7].
sources of information that are not available in the company environment. For example, a technique described in [4] only triggered a trace when the user usage profile departed from common execution patterns. Other techniques may capture information from additional systems that interact with the application. For example, [5] presents a mechanism that captures the names of all dynamic libraries and device drivers executing in the system at the time of the failure. Third, with the existence of so may techniques, we may need to be able to recommend the use of a certain type of failure analysis technique on a target scenario based, for example, on the failures that have been historically observed, the application domain, and the average cost of a failure.
After the existing techniques are investigated, we plan to extend the information requirements for each application to account for the specific needs of failure analysis techniques that focus on released software. For example, we would be interested in knowing the constraints imposed on the application after it is released, such as expected performance or security concerns that limit the type of technique that could be employed. We would also be interested in recording data associated with the failure analysis effort performed by the organization personnel to check the consistency of our experiments, and the degree of success of their approaches in discovering the causes of those failures.
All these investigation opportunities on failure analysis techniques require some form of empirical work that needs to be defined, designed, and implemented. From that perspective, this paper reports on the design and construction of an experimental infrastructure to objectively study and assess failure analysis techniques along several dimensions. We envision an infrastructure that will not only allow the evaluation of existing techniques, but it would also constitute a platform to facilitate the investigation of new approaches to software failure analysis.
2.
2.2 Metrics The set of metrics we have developed captures four aspects of failure analysis techniques:
EXPERIMENTAL INFRASTRUCTURE
Effectiveness:
In the next three sections, we discuss the artifacts, metrics and techniques that constitute the foundations of the experimental infrastructure.
(1) Failure Reproduction: % failures that were reproduced by the technique. (2) Root cause identification: % failures for which the root cause is identified. Note that although the root cause of a failure might be identified at different levels (e.g.,faulty implementation of a block of statements, specific group of ambiguous requirements), initially we reduced our expectations to a section of code.
2.1 Artifacts In order to study failure analysis techniques, we began by gathering the artifacts required by current techniques. We have determined that a set of minimal requirements includes the target application and a failure scenario.
Efficiency: (1) Failure Reproduction: time required to reproduce failure. This measure has two components. The first component is the time it takes for a person to find the root cause. The second component is the time it requires for the tool to operate, interact with the user, and analyze the data (tool’s overhead). These two metrics will let us quantify the overall efficiency of an approach, but also compare some of the tradeoffs between the level of assistance versus the delays imposed by the additional services provided by the technique. (2) Root cause identification: time required to find the root cause of the failure.
The target application can be represented by a group of binary, object, or source code files. Obviously, the type of file affects the type of instrumentation performed and the class of data that is collected. To this point, we have focused mainly in open-source applications, which freely provide several versions of the source code, build scripts, some documentation, and bug lists from where failure scenarios can be derived. Currently, we have completed the preparation of 4 open-source applications (bash, gzip, flex, grep), and we are in the process of finishing 5 more 2 . The preparation procedure includes the setup of the application files in a standard format [7], instrumentation at the statement, branch, and function level, modification of configuration and makefiles to obtain core dumps and traces, and a description of the environment (processor, operating system, additional applications and threads running). We are also maximizing the investment in each application by employing multiple versions when they are available.
Accuracy: This metric represents the scope provided by the technique in an ordinal scale initially composed of fblock, function, class, fileg. We expect that techniques providing higher accuracy will have higher computational complexity, while techniques providing lower accuracy will be less expensive and probably less helpful in locating the root cause.
Each failure scenario includes two elements. First, a combination of system state-input sequence that leads to an observable system failure. This allows the experimenter to reproduce the failure at will when the input sequence is provided to the system in the specified state. Second, evidence that points to the root cause of the failure. With one of our artifacts, a complete failure report constitutes the evidence. However, we have been faced by more difficult cases where we have had to determine what sections of code were changed in response to the failure and develop our own tests for
Computational complexity: Three metrics will capture the technique’s execution requirements to determine the transparency with which the technique operates. Assuming an application P, and its instrumented version Pi generated using a certain technique, we compute the ratios between Pi over P for: size of the executable, maximum bytes of memory required during execution, bytes of hard storage required during execution.
2 This work is being performed in collaboration with Oregon State University.
4
Figure 1: Experimental Procedure.
2.3 Techniques
in Figure 1. First, N viable applications and some of their associated failure scenarios are selected (note that different applications can – and very likely will – have different numbers of failure scenarios). A viable application is one that provides all the components for the technique to be implementable (e.g., a technique that works by instrumentating source code is not viable if source code is not available). Second, the selected applications receive a combination of the M treatments (failure analysis techniques). Receiving a treatment means that each application is prepared according to the specifications of each technique. Third, each failure scenario is induced on each treated application. The data generated by each technique for each one of the failure scenarios is recorded. Fourth, the data recorded is randomly distributed among subjects following a certain experimental design. Fifth, the subjects employ the technique’s data and its post-failure mechanisms (when available) to reproduce the failure and find its root cause. Sixth, the metrics collected during the execution of the second and fifth step are analyzed and interpreted.
Each one of the failure analysis techniques constitutes a treatment that will be applied to each viable failure scenario. Since the number of techniques is quite large, we have started to build a taxonomy for the techniques we want to evaluate (existing ones like [peng93] are well suited selecting a validation technique, but they are too broad for our goals on failure analysis technique assessment). Our taxonomy classifies techniques based on the model employed to represent the system states and events, the type of programming language they work on, and the type of input they require. We plan to analyze techniques within groups to assess each failure analysis against its own type (e.g., trace of methods invocations versus trace of context switches), but we will also analyze techniques across certain groups that present interesting tradeoffs (e.g., snapshots from binaries versus snapshots from source code) and tools presenting a combination of different approaches. Given the variety of techniques and their associated instrumentation strategies, we do not provide application instrumentation facilities within the infrastructure. However, we are developing a logging protocol to standardize the data generated by the instrumented application when a failure is induced. The protocol will ensure that all the techniques follow the same reporting format, independent of the type of data provided. This will facilitate the use of the data in the later part of the experiments. -
3.
Each experiment will have at least two groups of confounding factors that need to be controlled. The first factor is made of the application domain and failure scenarios. By having a large number and diversity of applications and failure scenarios we diminish the opportunities that our results are biased toward a certain type of application or failure scenario. Also, if we classify the applications and failure scenarios, the experiments could focus on certain groups, making the experiment less general but more powerful.
EXPERIMENTAL MODEL
To illustrate the application and usage of the infrastructure, we present a generic experiment that compares a group of failure analysis techniques. Our null hypothesis states that there is no significant statistical difference among a group of M failure analysis techniques according to the metrics presented in Section 2.2.
The subjects employing the techniques constitute the second confounding factor. The experiments will need to clearly define the subjects expertise, familiarity with the application, and level of practice with the technique. Failure to control this second factor might compromise the experiment’s results and interpretation.
In order to evaluate that hypothesis, we follow the steps described
We expect to find several additional roadblocks when performing
5
the experiments just described. As mentioned, finding application and failure scenarios that are representative of what is observed in an industrial environment constitutes a major challenge. But even when we find representative scenarios, they sometimes turn out be too complex or cumbersome to handle in a controlled experiment. For example, we have an industrial application where finding the root cause took weeks for its own developers. Performing a similar tasks in a controlled experiment by subjects that are not familiar with the application is just too expensive. Careful examination of acceptable compromises is necessary.
4.
FINAL REMARKS
Learning from software failures constitutes an essential step towards the development of more reliable software. Although many software failures are observed after release, the lessons that can be learned from those failures are commonly lost due in part to the lack of specific post-release failure analysis techniques. This research effort is aimed at constructing the necessary infrastructure to assess current failure analysis techniques on released software, and to facilitate the development of new techniques. We are interested in obtaining feedback from the workshop participants regarding the infrastructure requirements, design and possible challenges.
Acknowledgements This work was supported in part by NSF Award CCR-0080898. I would like to thank S. Goddard and G. Rothermel for their helpful comments.
5.
REFERENCES
[1] T. Ball and J. Larus. Optimally Profiling and Tracing Programs. In ACM SigPlan-SigSoft, pages 59–70, Aug. 1992. [2] B. Boehm. Software Engineering Economics. Prentice Hall, get, 1981. [3] M. Bush. Improving Software Quality: The Use of Formal Inspections at the Jet Propulsion Laboratory. In Proc. 12th International Conf. Soft. Eng., pages 196–199, June 1990. [4] S. G. Elbaum. Conceptual framework for a software black box. Ph.D. dissertation, University of Idaho, July 1999. [5] Netscape. Netscape quality feedback system. home.netscape.com/communicator/navigator/v4.5/qfs1.html, Aug. 2000. [6] P. G. Neumann. Computer Related Risks. Addison-Wesley Publishing Company, New York, NY, 1995. [7] G. Rothermel and S. Elbaum. Handbook of subject preparation. http://mapstext.unl.edu-Infrastructure, Jan. 2001. [8] M. Shapiro. A System for Incremental Replay Debugging. Technical Report CS9712, Crown University, July 1997. [9] J. P. Tsai and S. H. Yang. Monitoring and Debugging of Distributed Real-Time Systems. IEEE Computer Society Press, Los Alamitos, CA, 1995. [10] M. Weiser. Programmers use slices when debugging. Communications of the ACM, 25(7):446–452, July 1982.
6
Ambiguity identification and resolution in software development: a linguistic approach to improving the quality of systems Luisa Mich Department of Computer and Management Sciences University of Trento, Via Inama 5, 38100 Trento – Italy
[email protected]
Index terms Quality of software artifacts, Ambiguity Measures, Linguistic tools, Natural Language Processing Systems
[3], [6] or [12]). We argue here that ambiguity affects a number of artifacts related to the development of software systems. Moreover, it influences other relevant characteristics related to the quality of software systems. Referring to the six categories of the ISO 9216 quality model [4], it mainly concerns usability (for the subcategories understandability and learnability) and maintainability (for the subcategories analyzability).
I. INTRODUCTION Ambiguity, understood as the possibility to interpret words or phrases in different ways, constitutes a serious problem in software development. When considering ambiguity, a first point to stress is that there are various types of ambiguity [5], [15]. Words usually have different meanings. And sometimes, the same word can be a noun, or a verb, or an adjective. As a result, sentences can have different interpretations. To take into account the various levels of ambiguity, we have introduced a family of ambiguity measures. Then, to evaluate them we have proposed the use of techniques and tools developed in the realm of natural language processing (NLP) [8]. A preliminary investigation on their applicability and effectiveness using linguistic instruments focused on requirements analysis [7].
Our present objective is to extend the application of the ambiguity measures to different kinds of artifacts produced in software development, namely: −
−
There are already numerous research projects, proposals and ideas regarding the use of linguistic tools in software engineering, and their number continues to grow alongside progress in the area of natural language processing. They range from methods that facilitate the use of natural language in writing specifications (see for example [13]) to projects for the development of conceptual models. 1 As regards requirements analysis, non-ambiguity is one of the quality characteristics explicitly called for in all the quality models for requirements specification (see for example, [2],
−
−
requirement documents: ambiguity in descriptions of requirements may cause serious problems; we can assume that the quality of the requirements documents in natural language is related to the quality of the conceptual models; models: in entity-relationship diagrams or in the models foreseen by the Unified Modeling Language (UML) 2 - like class models, use case models etc. - the different items (e.g., entities, relationships, classes, attributes, actors, etc.) must be named according to the problem domain; code: independently of the adopted programming language, within the code we have names for variables, parameters, methods, etc., and comments; both the names and the sentences should work to make comp rehension easier and therefore support software maintenance; interfaces, web sites: their usability depends also on the names of commands and links, whose meaning should be non-
1
For a critical introduction see [14]. For an introductory text on the topic see [1]. A bibliography of projects related to the definition and modelling of requirements is given at www.nl-oops.cs.unitn.it.
2
Adopted as standard by the Object Management Group (OMG) in 1997, http://www.omg.org.
7
−
−
−
ambiguous; a common test is to verify if the terms used have meaning(s) outside the present context [11]; language-customized versions of software: the development of applications for different countries or for distributed systems often requires translation into different languages. This leads to the problem of ensuring the same level of quality among the different versions; documentation, user manuals : they should be as clear as possible; regarding the influence of ambiguity on product maintenance and servicing, we can cite the experience of an auto manufacturer who finds out that the estimated time for repair was often not respected, because the documents given to the local dealers contained a lot of ambiguities, on different levels. 3 legal documents: a thorough study of ambiguity will also include a review of the legal documents belonging to a software project: contracts, for example, which should be free from ambiguity.4
to the possibility of connecting the components of the phrase in different ways (‘I saw the man in the park’ is ambiguous if we have to decide who was in the park). Pragmatic ambiguities represent the third level of ambiguity, and are more difficult to detect and resolve because they concern relations more than content. In our approach, we focus on the first two levels of ambiguity.6 − −
semantic ambiguity: concerning the meaning of a word or phrase; syntactic ambiguity: concerning the various roles performed by words in sentences and possible grammatical constructions.
Another aspect to consider is the role played by the context, which may influence the understanding of a phrase positively or negatively.7 For purposes of our studies, we assume that, regardless of the context, by reducing the ambiguity of words and phrases we can improve the quality of the software artifacts. The measures of ambiguity introduced in [8] were given a general definition without reference to a particular linguistic instrument or natural language system.
Finally, to complete the picture of the influence of ambiguity in software development, the quality of all the above cited issues can influence the communication both with the users/customers and the teams members. Regarding this subject, we have found evidence of comprehension problems in the use of a web site design and evaluation model [9]: the ambiguity inherent in some of the terms and attributes used in the model led users with different views or backgrounds to have different interpretations.
At the level of individual words we can talk of lexical ambiguity. Given a word wi , it can correspond to ni different concepts or meanings, that we denote with mij, and r ij denotes the possible syntactic roles (noun, adjective, verb, etc.). Finally, the meanings of a word are used with different frequencies νij. So for each word we can associate a set of terns containing meaning, role and frequency:
II. AMBIGUITY MEASURES
wi ≡ { | j = 1,… ni}
Words and sentences in natural language may correspond to a vast number of meanings. As regards individual words, there are those that can represent different senses or which can be used as both a verb and a noun (part of speech ambiguity). For example, in English a ‘bank’ is a financial institution or the edge of a river, and there is also the verb ‘to bank’). At the phrase or sentence5 level there may be semantic ambiguity due to the presence of ambiguous words, and syntactic or structural ambiguity due
We introduced the following measures of lexical ambiguity: −
−
Semantic ambiguity: function of the number of possible meanings. α(wi ) = f(ni) Weighted semantic ambiguity: function of the number of possible meanings weighted according to their frequency. α* (wi) = f(ni,νi )
3
Oral communication See for example the projects of the SRI Centre in Cambridge (http://www.cam.sri.com/) or in Italy, the work of the ILC, Istituto di Linguistica Computazionale of the CNR (http://www.ilc.pi.cnr.it/). 5 In linguistics, a phrase is a complete sub-sentence unit. 4
6
For a more in -depth study of the definition and classification of ambiguity see [15]. 7 For example, the phrase “Chocolate, I think.” is incomprehensible apart from the contextual question: “What flavour of ice-cream does she like?”.
8
−
−
Syntactic ambiguity: function of the number of possible syntactic roles. β(wi ) = f(n(ri )) Weighted syntactic ambiguity: function of the number of possible roles weighted according to their frequency. β* (wi) = f(ri,νi)
−
The applicability and efficacy of the ambiguity measures in requirements analysis was in fact investigated with instruments of increasing complexity, arriving at the use of a system that enables a deep analysis of texts [10] and that are able to offer sophisticated support regarding the adjustments made t o software artifacts improving their quality.
In a similar way, for a phrase or sentence sk , its meanings depend on the ambiguity of the words contained in it, given by the function γ(α k l; thus, for a sentence sk we can get more than one parsing tree tk l; because parsing trees usually require a different (cognitive) effort, we can assign a penalty p k l to each of them. In this way we obtain a terns of this kind:
Concerning the use of lexical ambiguity measures, they have been applied to assess the menu commands of Netscape [7].8 The semantic and syntactic ambiguity of the names of the commands were evaluated using two different systems, Wordnet 9 and LOLITA10 . The analysis pointed out that the commands have very different levels of ambiguity. On the basis of this experiment, some indications have emerged which can be useful when setting up or redesigning an interface or help menu, not to mention the eventual benefits when training new end-users. In particular, the lexical ambiguity measures can be used to identify:
sk ≡ { | l = 1,… nk } For the ambiguity of a phrase or sentence we defined the following measures: −
Semantic ambiguity: takes into account the ambiguity of the words in the phrase. γ(sk ) = f(g(αk l))
−
Weighted semantic ambiguity: takes into account the weighted semantic ambiguity of the words.
−
γ* (sk ) = f(g(α* k l)) −
Syntactic ambiguity: function of the number of possible parsing trees.
−
Weighted syntactic ambiguity: function of the number of possible parsing trees and the penalty associated with them. δ* (sk ) = f(nk ,p k )
δ(sk ) = f(nk )
−
III. APPLICATIONS
− −
Command names with high values of semantic ambiguity, that should be further analyzed, possibly using the linguistic tool to substitute the names with less ambiguous alternatives. Command names with high value of syntactic ambiguity suggest that the menu commands could also be analyzed to standardize their syntactic role (e.g., to have all nouns or all verbs).
On a theoretical level, to evaluate lexical ambiguity, the (slight) difference in the values obtained with different tools, Wordnet and LOLITA, reflects the need to take into account the size of the dictionary or of the knowledge base used (the larger the dictionary, the higher the average number of meanings included).
If we want to use linguistic tools to identify and to evaluate ambiguity in software artifacts we have to choose suitable instruments, and here we have a list of the main parameters we have to take into consideration: −
language obtained imposing a restricted grammar; the type of analysis carried out and the performance (there are linguistic tools of different complexity, and based on different approaches).
As regards sentence ambiguity, its assessment requires the use of parsing instruments that are
the type of language that the linguistic tool can analyse (English, Italian, Spanish, etc.); the vocabulary, or knowledge necessary for the application domain (for example, financial domain, tourism domain, etc.); the complexity of the language, ranging from full natural language to controlled
8
Copyright © Netscape Communications Corporation. WordNet - a Lexical Database for English, Princeton University: www.cogsci.princeton.edu/~wn/w3wn.html. 10 A description of LOLITA and the use of commands to assess ambiguity measures, with examples of the use of annotation is found in [8]. 9
9
able to produce parsing trees corresponding to the different interpretations and are able to determine the semantic ambiguity of terms within the context of the sentence. And using LOLITA, these requirements are satisfied. In fact, analysing a sentence, this NLP system produces a list of penalty-ordered trees, in which also the ambiguity of terms is given. To obtain this information, we can use the commands pasbr and tp. The effect of the first command on an input sentence is to produce all the syntactic trees with two representations, the first of which also shows the semantic ambiguity of the terms of the sentence, while the second – based on the use of bracketing enables understanding of the interpretation associated with the tree. As for the penalties, they can be interpreted as measures of the effort made by the NLP system, and therefore of the effort required to interpret sentences. In LOLITA there are four groups of penalties and their values can also be used to support evaluation of the quality of a sentence. In fact, if for the parsing of a statement, we obtain trees only in the last group of penalties, we should adjust the text. In all other cases, if the trees are in the same group there is an essential ambiguity, so we have to gather more information to resolve it.
On the basis of our preliminary experiments regarding ambiguity measures at the sentence level, we can say that: −
as regards semantic ambiguity, a NLP system with a hierarchically organised knowledge base can provide the analyst with useful information on how to reduce semantic ambiguity with more precise terms by descending the hierarchy of concepts and also preparing a glossary for the application; − as regards syntactic ambiguity, it is necessary to use a more complex NLP system, like LOLITA, in order to be able to obtain useful information about its source and the nature. From a practical standpoint, the applications realized so far have highlighted the need to look into the possibility of integrating several linguistic instruments within one environment so as to give the analyst a choice of instruments during each work session. Further investigation is also required to design an interface that presents this information in a manner that efficaciously supports ambiguity identification and reporting. Future developments of our project are related to experimenting with the application of ambiguity measures to different software artifacts. In particular, we would like to design an experiment to study the influence of ambiguity on the quality of models. To this end, a preliminary experiment was carried out with students from the third-year Database course (Faculty of Economics), all of whom were familiar with UML modeling techniques. The first objective was to verify the usefulness of the ambiguity indicators present in the requirement descriptions. In one test, students were required to design the class model for given terns of classes—the ambiguous names and the number of possible meanings was included in parentheses.
The main results of a case study that illustrate the application of ambiguity measures to a requirements text are given in [7]. It refers to the ABC Video case11 . In figure 1 we have one of the trees produced by the parser module of LOLITA for the first sentence (out of ten trees). pasbr information: Customers select at least 1 video for rental. (…) sen missing_det comnoun CUSTOMER [Plur,Sexed,Per3] transvp verb SELECT [Pres,NoPer3S] missing_det snouncl intense_adject adverb AT_LEAST adj_quantity 1 relprepcl comnoun VIDEO [Sing,Neutral,Per3]*2 prepp prepNormRel FOR missing_det comnoun RENTAL [Sing,Neutral,Per3]*2
The second test included a text in which cases of lexical ambiguity were noted in two different ways: next to the ambiguous terms or in a list maintained separately from the text itself. The results of these tests provided useful direction towards further research into the influence of ambiguity in requirements documents on the
(Customers (select ((at least 1) (video (for rental)))))
Figure 1 - Extract from the output from the pasbr command
11
The version used is part of the experimental material of the ESEG research group http://www.cs.umd.edu/projects/SoftEng/ESEG.
10
quality of conceptual models. But there are a lot of (demanding) open problems. We can cite here two of the most important: the identification of acceptable ambiguity levels for documents in natural language, the identification of measures of quality for the obtained conceptual models. References [1] [2]
[3] [4]
[5] [6] [7]
[8]
[9]
[10]
[11] [12] [13] [14] [15]
Burg JFM. Linguistic Instrument in Requirements Engineering, IOS, Amsterdam, 1997. Fabbrini F, Fusani M, Gervasi V, Gnesi S, Ruggieri S. Achieving Quality in Natural Language Requirements. In: Proc Int SW Quality Week, S. Francisco CA, May 1998. IEEE Std 830-1993. Recommended Practice for SW Requirements Specifications. Dec 2, 1993. ISO 9126-1991. Software product evaluation. Quality characteristics and guidelines for their use; ISO/IEC 9126-1:2001, Software Engineering - product quality - Part 1: Quality model. Levinson. Pragmatics. Cambridge University Press, 1983. Meyer B. On formalism in specification. IEEE Software 2(1): 6-26, January 1985. Mich L, On the use of Ambiguity Measures in Requirements Analysis, NLDB’01, Madrid, Jun 2829, in AM Moreno, RP van de Riet (eds), Application of Natural Language to Information Systems, Lecture Notes in Informatics, Bonn, pp. 143-152. Mich L, Garigliano R. Ambiguity measures in Requirements Engineering. Int Conf on SW Theory&Practice-ICS2000, 16th IFIP WCC, Beijing, China, 21-25 Aug. In: Feng Y, Notkin D, Gaudel M. (eds), House of Electronics Industry, 2000: 39-48. Mich L, Franch M. 2QCV2Q: A Model for Web Sites Analysis and Evaluation. Int. Conf. Information Resource Management Association (IRMA), Anchorage, Alaska, May 21-24. In Khosrowpour M. (ed), Challenges of Information Technology Management in the 21st Century, IDEA, Hershey, PA, 2000, pp. 586-589. Morgan R, Garigliano R, Callaghan P, Poria S, Smith M, Urbanowicz A, Collingham R, Costantino M, Cooper C & the LOLITA Group. Description of the LOLITA System as used in MUC-6. Proc 6th ARPAMUC, Morgan Kaufmann, 1996. Nielsen J. Designing Web Usability. New Riders Publishing, Indianapolis, 2000. Robertson S, Robertson J. Mastering the Requirements Process. Addison Wesley, 1999. Rolland C, Achour C Ben. Guiding the construction of textual use case specification. Data and Knowledge Engineering 1998; 25: 125-160. Ryan K. The Role of Natural Language in Requirements Engineering. IEEE 1992, 240-243. Walton D. Fallacies Arising from Ambiguity. Kluwer, 1996.
11
"!#$% &"' ()! * + '-,./+01+2 3 46587#4:9A@B>C465D7E4:9F>C4HG IJLK#;LJ=7E;L>MN>CKE;=JL4:J OQPSR6TUWVXPSYFyzZ\{][]|CVX}^`{]_#~]PHooab[6zcC£ M'J?> JL7C58JL¦8>C;L¥lJb©Hª «X¬C:®oªC¨NJ?¦D>H©6 5\>C;L¯ E[R#° UXZ\±³^][]¼\²\VXZ\p\´\pWjBµp¶l¾ ·#D9Ñ7#u7E4:9465ÌãH>#¦Ì>F¡¢7EDA>C;¦w9]58J=7E4©E5Dã:J?¦¿KE7l7lGÃÑ\ãH>C\>#Ñ5D98JL£]58J=7E47C¡³5DãH9 7BGBÒH;=9¡¢Ò:4HÑ5DJL7#4H>#;=J=5DJL9C4õ>#ÒB5D7#« >C5DJ?Ñ¿âQ>o¯#×döJ=4H>#;=;L¯#©]¦DJ=4uÑ9¿89«ÒH¦8>Cç:JL;LJe5X¯D9#4:K#Òu>CK#9óé ü#íÄñ¡¢D7Eý> Ñ\ãH>C\>#Ñ5D98JL£]58J=7E4å7C¡25DãH99#¦5D7õç9ÉÒH¦w9dG ð¢K#JL¥#9C4H¦7#¡s589u7E8>#;2;=7EK#J?Ñ¡¢7EDÒ:;?>C9#©58ã:9 ;=7EK#J?ÑÉâm9 ÒH¦D9 JL¦58ã:91¦D9;L9#;LÑÒ:;=Òu¦é=®#©\ªoí¢ñ×:ÖQã:9`958ã:7BG>#;=âQ>o¯B¦¿HD7BGBÒHÑ9Ó:5w58J=4HK¦w7E;=ÒB58J=7E4©l¡¢7#Ì¡¢7EDÒ:;L>#9 â2ã:JLÑ\ã1ÑC4 ç9¦8>]5DJ?¦wÓH9C4uG çu9KEJ=¥E94Ñ;?>#¦8¦ 5Dã:9 ¡¢9#¦DJLç:;=9 ¡¢7#8Ò:;?>C9FãH>o¥E9F> H>CD5DJ?ÑÒ:;?>C2¦w5D8ÒHÑ58Ò:89sâ2J=5Dã 89C9E×lÖQãH9 7#ÒB5\Ñ7#97#¡Q5Dã:995Dã:7BGËJ?¦>4H9âÆÔÕÖ'Õ`@/¦Du9dÑJ=ÓuÑ>C5DJL7#4þ¦8>]5DJ?¦w¡¢¯6JL4:K58ã:9A¡¢7EDÒ:;L>#9 >C4HGË9úB:;=7EJe58J=4HK÷>]55Dã:9Éç9C5DJL7#4× W4ÿé=®ü#íÌâÌ9ÃãH>o¥E9A¡¢7#8A>C;L;=¯ GB9ÓH4H9#4HGÃHD7]¥E9CJL4:9dG 58ãH>]5N¦8>]5DJ?¦wÓH9o¥#9Ì5D7FJL4H¦Du9dÑ55DãH9;=JLç:\>C8¯F7C¡D9Cç:;L9 7BGBÒH;=9d¦Q5D7ÓH4uG>A¦w¯B¦w5D9 â2ã:JLÑ\ã1>#;L¦D7A¦D>C5DJ?¦XÓu9]5;L9#¦w5'GB7l9#¦w5 5Dã:92Ñ7E4HÑ9¦D¯l¦w5D9E¦X58J=4:K >s:D7Eu9]5N4:7F7BGBÒ:;L92JL45Dã:92;LJ=çH8>#D¯¦8>]58JL¦wÓH9d¦ Ã>#4HG58>CøE92> ¦w¯B¦w5D9 5DãH>C5'GB7l9¦D¯B¦X589
ó¡>CÒ:;=5':87#4:9Eæ:7#89F:89#4J=4uGB9ÓH4HJe589;L¯9D¡¢7#8$¦D7#9F9C89 7#4:;L¯/5Xâm7÷J=4 5Dã:9ÓHK#ÒHD9E#© "%$É>C4H&G "('lñ©ç:Ò:5â2ã:9C4þ9C89 ;=7l7Eø6JL4:Ká¢7#d©uÒH¦X594]òX7]¯É58ã:9¡¢7E;=;L7]â2JL4:K :D7Eu9#Ñ5#ÑÑ7#Ò:465FJL4õ58ã:9Ñ7E4H¦X58DÒuÑ5DJL7#4/7C|¡ 7E4:;L¯õ7C¡m5Dã:9A9#4çu9JLK#4H7#89]5DJL7#49¥E94658¦#;=;L¯#©5Dã:9m:D7Eç:;L9è7C¡B7#çB5\>CJL4:JL4:K'>24:9C;LJ=5X¯çl¯J=46589K#\>]58J=4HK>C4F9úBJ?¦X58J=4HK 7BGBÒH;=9ÃÑC4ËãH>o¥E9>#4l¯÷¦D7#;LÒB5DJL7#4H¦ 7EÒ: 958ã:7BGå9úB:;L7#J=58¦ >#;=;mu7#5D9#;=J=5DJL9EG:GBJL4:K5Dã:9H>CD58¦24H7C5'JL4HÑ;LÒHGB9dG 7#'G:Jeï989465D;L¯¦D9]58J=7E4âÌ7#ÒH;LGçu9s>`ãH>C\G 5\>#¦Døâ2J=5DãB« 7#ÒB5`>¡¢7#8A>C;GB9F5897#\>C;;=7EK#J?Ñ2587 GB9#D9Ñ7Eu7E4:94658¦#A¦v¡¢87# 5D9#; ;=7EK#J?Ñm¦Du9dÑJ=ÓuÑ>C5DJL7#4H¦>C89Ì:D9d¦w9C;C58J=9ÌÑ7#:;L9úBJ=5X¯#×o@lJ=JL;L>#(Ñ7#:;L9úBJe5X¯`Ñ>#4Fçu9¿¡¢7#Ò:4uG`¡¢7#v5Dã:9NHD7Eç:;=9C5DJL7#4H¦`é=®#®E©®C9E©EãH>E¦m> u7E;=¯l4:7EJ?>C;u5DJL9Ñ7#:;L9úBJ=5X¯>#4HG³©6â2Je58ãÃ89#4HG³©N7ED9Cï9dÑ589 94l¥lJLD7E4:9465é ]í©â2ã:J?Ñ\ã/JL¦ >ÉâÌ9;L;e«øl4:7]â24 94l¥lJ=87#4:9@# D4F BE?# D4F and >2' D4F BEH' D4F . The recurthat sive application of morphisms denotes impact propagaI X the tion process. For instance, let > + 012 3,4 0:;1@ 2'GDJFLBc ' D4N F Z 0 _'G
D4 F d /0eI,4 07 X < > BSf 0_
< > B (/0eI,4 07 X < > B and < > B are two functions returning the edges com-
ming from C or entring to c). Such a morphism causes a violation of inheritance graph constraints. Assume that the class is a java class, so the inheritance graph will become a not connected one. In the following section we show how we propagate the impact to fire the recursive execution of the other morphisms.
4 Change Propagation Process Change-and-Fixhallows a change on a segi 3S, >2tojEB ,perform lected graph node the set of components of a program N j , and computes the modified program, referred to as j . It marks the nodes and dependency relationships that are inconsistent in the program. The process iterates until the set of marked nodes is empty. N In the algorithm, given below, the expression jk> B and j > B represent respectively the old and new (obtained after change) neighborhoods of . This includes the set of incoming and outgoing dependencies (or edges) of a.
24
component belonging to another granular level
IDLFile
relationship between components
module:list(Module)
agregate relationship between components inheritance relationship between components Legend
Module name: string modules:list(Module) types:list( IDLType)
IDLType name: string
IDLSequence
IDLInterface
members: list(IDLMembrer) operations:list(IDLOperation)
Meta Type
name: string type: IDLType
TypeCode id : string memberNames : list(String)
AnyType type :TypeCode
memberTypes : list(TypeCode)
IDLException
IDLOperation
IDLMember
name: string attributes: list(IDLMember)
parameters: list(Parameter)
type: IDLType
Figure 1. UML-based representation of CORBA components in DSCSM
l ; mEnpoMq rWstlUu m livwstlyxzls{m+u.u}|"l(~2s{m+u n l m lvwstlxzls{m+u.u}|"l ~ s{m+u ; while s{"m7
}stlUuv 3u .
Given a consistent program Select ; Change( ); ; do Select a mark( ); Change( );
Three problems arise with the previous algorithm: (1) The nodes of the graph are visited several times ; (2) An infinite process may occur when a change is propagated in a cycle (loops) ; and, (3) When a node is affected by the change, either all its neighbours are marked or no one of them is marked. This is a consequence of considering all the relationship types as a unique one, namely dependency relationship. However, if we take into acount the semantics of the different relationships we can refine the impact propagation to the direct neighbours by marking those really affected by the change. In the following section, we propose a change propagation algorithm based on an expert system, that allows to solve the quoted problems.
and their marked neighbours in the knowledge base (KB) (line (4)). Previously visited components may not be inserted again Therefore, they can not be explored 6 0inYethe 12 < KB. designates the set of propagation rules again. that can be fired. (1) Given a consistent program represented by (2) Select ; (3) ; (4) insert fact (ModificationType, ); (5) ; (6) do (7) do (8) Select a rule in ; (9) Trigger ; (10) (11) while (FireRules ); (12) Select a (mark( ); (13) ; (14) insert fact (ModificationType, ); (15) ; (16) while .
m np C ( m_q 7o6s2K6@ 2Am6rtq &; o0m+u m ~ vws2 ~ 4 ~ u (t3o; ¢¡X£¤oK¥ W ¦ _ m A r-2KqI§SoMr v n ( m_q}7o6s26@ tLm6r-2Kq &KXo6-m+u m ~ ~ ~ w v 2 s 4 u s{"m7
}s2¨uv 3u
vs24Gu ;
5 Changeability Assessment
4.1 Solution based on an Expert System The new solution is presented below. The Expert System is such that the facts are components and their marking for change, and the rules are the change propagation rules. The approach consists of inserting components to be changed
The change impact analysis requires two steps: (1) the determination of the structure of the code i.e. its components and their relationships ; (2) the change propagation process. Therefore, changeability assessment consists of avaluating the cost required by each step. In this paper,
25
we consider two case studies: a part of the tool itself and a real-world information system cartography application.
References [1] R.S. Arnold and S.A. Bohner. Impact Analysis - Toward a Framework for Comparison. Proc. of IEEEICSM’93, pages 292–301, 1993.
First, our tool is developped in Java/ObjectStore and Jess, a Java version of the CLIPS expert system generator. The part considered for assessment contains 414 files and ©+ª+«+¬7¬ code lines. Its analysis using our tool (first step) has allowed to extract the graph representing the code. It contains +« ¬7®6© nodes (components) and ¯7°+°7± edges (relationships). The total CPU time hours and Qanalysis ³µ´ with is«+4° Q ¶ RAM 32 minutes on a PC Celeron _ ² 7 ± ± Q¶ Swap. To evaluate the second step i.e. the and ± change propagation process, we considered 3± change operations on different components. The average number of nodes affected by the change is © ±7±+± , and the average CPU time required to determine them is 15 seconds. Such operations will require many days for the user to be performed by hand.
[2] S.A. Bohner and R.S. Arnold. Software Change Impact Analysis. IEEE Computer Society Press, ISBN 0-8186-7384-2, 1996. [3] S.S. Chandrasekaran. Change-and-Fix Software Evolution Using Ripples 2. M.S. thesis, Dept. of Computer Science, Wayne State University, Detroit, 1997. [4] M.A. Chaumun, H. Kabaili, R.K. Keller, and F. Lustman. A Change Impact Model for Changeability Assessment in Object Oriented Software Systems. Proc. of the Third IEEE Euromicro Working Conference on Software Maintenance and Reengineering, Amsterdam, The Netherlands, pages 130–138, Mar. 1999.
Second, the information system cartography is a realworld application. It is the project of a computer science enterprise (AXIALOG-Lille, france). The objective is to reconstitute the cartography of an information system from a set of ¬7®7± Cobol programs. Two steps are thus required: (1) analysing the programs ; and, 2) finding the cartography (mainly partitioning the programs into functional domains) from the result of the analysis. The enterprise uses our tool to analyse the programs. In the future, we will extend the tool to deal with the cartography problem. According to the software expert of the enterprise, without our tool the programs’ analysis would mobilize one person during 6 months. With our tool, the analysis of all the programs has been done in only one day.
[5] Y.F. Chen, M.Y. Nishimoto, and C.V. Ramamoorthy. The C Information Abstraction System. IEEE Transactions on Software Engineering, Vol. 16(3):325–334, Mar. 1990. [6] L. Deruelle, M. Bouneffa, N. Melab, H. Basson, J.C. Nicolas, and G. Goncalves. A Change Impact Analysis Approach for Corba-based Federated Databases. Springer Verlag LNCS, DEXA’00, London, UK, pages 949–958, Sep. 4–8 2000. [7] Object Design. Bookshelf for ObjectStore PSE Pro Release 3.0 for Java. http://www.objectdesign.com, 1998.
6 Conclusion and Future Work
[8] D. Kung, J. Gao, P. Hsia, and F. Wen. Change Impact Identification in Object Oriented Software Maintenance. Proc. of the International Conference on Software Maintenance (ICSM’94), Victoria, B.C., Canada, pages 202–214, sept 1994.
In this paper, we have presented a change impact discovery process. It is based on a distributed source code structural model together with a hybrid change propagation process, System to the commonly that
Cintegrates d ·anExpert - algorithm. used Our new version allows to solve the problems of visiting several components more than one time, and propagating the change in a circle. We have developed an integrated prototype named IFSEM, that allows to instanciate the structural and propagation models. In this paper, we have started the work on changeability assessment using our tool. Two case studies have been considered: a part of the tool itself and a realworld information system cartography. The results show that our tool allows an important saving of time. In the future, we plan to enhance the changeability assessment process by considering more real-world case studies.
[9] C. L´ecluse, P. Richard, and F. Velez. O , an ObjectOriented Data Model. Proc. of the 13¸2¹ Annual ACM Conf. on the Management of Data, pages 424–433, June 1988. [10] N. Melab, H. Basson, M. Bouneffa, and L. Deruelle. Performance of Object-oriented Code: Profiling and Instrumentation. Proc. of the IEEE Intl. Conf. on Software Maintenance (ICSM’99), Oxford, UK., Aug. 30 - Sep. 3 1999. [11] V. Rajlich. A Model for Change Propagation Based on Graph Rewriting. Proc. of IEEE-ICSM’97, Bari, Italy, pages 84–91, Oct. 1–3 1997.
26
Distributed Software
Title: Web Site Maintainability Dr. Norman F. Schneidewind Professor of Information Sciences, Director of the Software Metrics Research Center and Lab Naval Postgraduate School 2822 Racoon Trail Pebble Beach, CA 93953 Voice: (831) 656-2719 Fax: (831) 372-0445 Email:
[email protected]
Web site construction is one of the most important activities in today’s Internet economy. While a great deal has been written about implementing Web sites, very little has been said about the factors of reliability, availability, maintainability, usability, accessibility, performance, and security, and the tradeoffs that must be made among these factors. For example, accessibility and security are in direct conflict, as are performance and security; we could build a Web site with maximum security and zero accessibility! In discussing these factors and tradeoffs, we must consider both the server side and the client side and the interaction between the two: any feature or function implemented at the Web server will affect the usability and functionality as perceived by the client. If, for example, we choose to provide many graphics at our Web site, a price will be paid in reliability and performance, as seen by the user, and in maintainability, as seen by the developer. Another interesting aspect is how we manage to provide continuous access to the Web site when changes must be made to the site. Additionally, there has been overemphasis about the performance of Web servers, at the expense of what the user, as a client, experiences. Web site applications have unique characteristics that set them apart from traditional applications. Thus, new thinking and maintenance models must be developed for Web sites compared to standalone or even local network models. A major factor that stimulates this need is that a public Web site is exposed to the world community for access, unlike most applications where the user community and its requirements can be defined. Therefore, the needs and accessibility requirements of a much larger user set must be considered in developing the site. Related to this dynamic is that the resources of the user in terms of browser version, monitor size, screen resolution, etc. may be vastly different from and unknown to the developer who designs the site. Thus, although a Web designer may have a certain clientele in mind when implementing a site, the potential clientele and their expectations could be vastly different. One of the most interesting examples of this phenomenon is distance learning, wherein a Web site may be designed to provide a learning experience for a defined subject, objectives, and student body, but the actual audience could vary considerably from the intended one because various users surf the Internet looking for various objects including
29
“free” courses. The challenge of providing effective distance learning increases if live video broadcasts of the instructor’s lectures, using a scan of the Web site material on the PC screen, are combined with student access to the Web site from their PCs outside of lecture times. Screen resolution and font size that may be satisfactory for the latter type of access may be unsatisfactory for the former. I propose to flesh out Web site maintainability issues and tradeoffs and to stimulate discussion with WESS attendees, using my practical experience in distance learning Web site development as a framework.
30
Seven Challenges for Research on the Maintenance and Evolution of Widely Distributed Software Position Paper James M. Bieman
Computer Science Department Colorado State University Fort Collins, Colorado 80523
[email protected] August 8, 2001 Software is increasingly both widely distributed and tightly interconnected. For example, you can now read and send email via your oce and home computers, a PDA, or cellular phone. Users demand the same kind of functionality from all devices, and they want to access the same mailbox from everywhere. Of course, the software that makes this happen must be easy to install and use, have seamless interconnections, be continuously available, and secure. The standards for ease of use have increased to the point that users expect that they can use software without accessing any documentation or training, and they need take no action to install it. Many users do not care who supplied the software, as long as it works according to expectations. As users now expect distributed and interconnected software, the means for developing, distributing software and running software is continuously evolving. Software is commonly developed by geographically distributed developers; it is now commonly distributed and installed over the internet, and portions of functionality are often executing on several distributed platforms simultaneously. Changes to distributed software components are inevitable. Such changes may include new and improved security components, the addition of intelligent agents, support for new kinds of attachments (to email), and so on. An application service provider (ASP), servlets, and CGI are alternate means for providing widely distributed software. With such mechanisms, the software may execute on the server or ASP's machines, rather than on client hardware. Here, use of the software is widely distributed, rather than the software itself. Surely the maintenance and evolution problems are reduced with such software, since the software provider has tighter control of the software/machine interface and version control. However, we can only guess at the ultimate market penetration of the ASP or similar models for software delivery. Along with the research problems relevant to non-distributed software, the maintenance and evolution of widely distributed software provides several, surely at least seven, unique challenges for research. 1 31
1 Widely distributed software is dicult to adapt. Widely distributed software is just plain complex. It generally has distributed concurrent threads running on dierent platforms. Concurrent software tends to be error prone, and these errors can be troublesome | they are dicult to expose during testing and can be very dicult to correct, and the corrections often cause other problems. Enhancements are likely to introduce errors for the same reason.
2 You cannot precisely determine the status of a widely distributed system. Consider a widely distributed system with many thousands of machines, which all may be running dierent versions of a particular software component. It is not possible to determine which machines are running the software at any point in time, and which machines are running which versions. Why is it so dicult to have precise knowledge of system status? The system is dynamic; new machines may be added or dropped from the system and portions of the network can go down at the same time that you are trying to determine the state of the system.
3 Widely distributed software is incompletely and unevenly updated. You may distribute updates to the software, yet many machines may not have new versions installed. The updates must propagate through the network, and some machines may never receive or install updates. Managing the updating process is a major task.
4 The size and complexity of widely distributed software is a challenge for analytical studies. Large dynamic distributed systems are dicult or impossible to adequately model for analytical studies, because simpli cations eliminate the key problems. For example, the large scale of the systems will require an analysis of networks with thousands of nodes. Scaling this down to a tractable number, removes the key problem | dealing with a large-scale system. Many analytical tools, for example model checking tools, cannot complete an analysis of large systems, especially if they have thousands of networked nodes.
5 Controlled experiments will be dicult to design due to the size and complexity of widely distributed software. Like analytical studies, controlled experiments tend to be small scale. It is not likely that anyone will have the resources to conduct a maintenance or evolution study of a distributed system with thousands of nodes. As with analytical studies, experiments must be designed studying distributed systems with a relatively small number of nodes. Finding a reasonable experimental controls to compare to experimental treatments | say new maintenance techniques | on widely distributed software will be a problem. Due to the size and 2 32
complexity of the systems there may be many non-treatment variables that we are unable to keep constant. The research challenge will be to identify interesting hypotheses that can be eectively evaluated with a small enough sample size, and adequate controls so that experiments can actually be conducted. Unfortunately, some of the most interesting research questions involve the large scale of current widely distributed systems. We will need new and very creative experimental designs to deal with this large scale.
6 We need to observe the maintenance and evolution of widely distributed software rst. The maintenance and evolution of widely distributed systems is really a new and uncharted area. Before we can really develop meaningful hypotheses for controlled studies or develop relevant analytical models, we need to observe the maintenance and evolution activities on existing widely distributed systems. Our observations can lead to the identi cation of the most critical problems on real systems. Once we identify these critical problems, then we can design experiments to evaluate possible solutions to the problem.
7 Results may become obsolete before they are published. Perhaps the most dicult challenge results from the dynamic nature of the software maintenance and evolution arena. Ten years ago we could barely imagine the current size of widely distributed system, and especially the current distribution system for software and software updates. Empirical studies take time, sometimes years. We surely run the risk of solving problems that are no longer relevant. The challenge is identifying important persistent problems, problems that are likely to be relevant in future unknown software maintenance and evolution arenas.
Concluding Remarks. It is our obligation to study the maintenance and evolution of widely distributed systems. Studies can reveal that new, popular techniques do not deliver promised results, or are just plain misused [1]. To study the problems especially relevant to widely distributed systems, we need to rst identify what makes these systems unique. We have to explore how practitioners are now managing such systems, so that we can clearly identify the relevant problems to solve. Analytical studies must be based on models that can deal with the large-scale size of the maintenance problems. Controlled experiments must be quite large scale, so that the results have some chance of generalizing. We need to pick problems that will not become obsolete as technology changes. This workshop is a great opportunity to learn how the research community is dealing with the great challenge of maintaining and adapting the large scale widely distributed software systems that are becoming more and more common. The most fruitful workshop discussions will be those that lead to the identi cation of the key research problems | problems that are relevant to most widely distributed systems and likely to continue to be relevant.
3 33
Acknowledgements This work is partially supported by U.S. National Science Foundation grant CCR-0098202, and by a grant from the Colorado Advanced Software Institute (CASI). CASI is sponsored in part by the Colorado Commission on Higher Education (CCHE), an agency of the State of Colorado.
Author Information My work is focused on the evaluation and improvement of software design quality. I study the structure of software to nd ways to quantify important quality attributes, for example cohesion, coupling, and reuse [2, 4]. I develop approaches for re-structuring or reengineering and evolving software to improve the maintainability and reusability of software systems [1, 3, 5, 6]. I recently started a project supported by the US National Science Foundation entitled \Evaluating Object-Oriented Designs". The research is developing techniques to quantify design attributes of object-oriented software in terms of architectural structures and patterns, and demonstrating relationships between these design attributes and external quality attributes such as maintainability, reusability, testability, and reliability. The focus is on identifying design structures and patterns that will make software easier to adapt and test. Design measurement is based on the structure of interconnected objects including the links between objects and the properties of these links, and properties of individual object classes. Design patterns are identi ed through program and design analysis and the use of existing design pattern recognition technology. External design quality evaluations are based on process data from commercial organizations, and analytical evaluation of change diculty. Relationships between design attributes and external quality are identi ed by examining commercial software engineering data, and through analyses of the connection between internal and external attributes. Results from this work will demonstrate costs and bene ts of alternative object-oriented software designs. The work should lead to improved design methods to produce software that is easier to adapt, extend, and test as it evolves.
References [1] J. Bieman, D. Jain, and H. Yang. Design patterns, design structure, and program changes: an industrial case study. Proc. Int. Conf. on Software Maintenance (ICSM 2001). To Appear 2001. [2] J. Bieman and B-K Kang. Measuring design-level cohesion. IEEE Trans. Software Engineering, 24(2):111{124, February 1998. [3] R. France and J. Bieman. Multi-view software evolution: a UML-based framework for evolving objectoriented software. Proc. Int. Conf. on Software Maintenance (ICSM 2001). To Appear 2001. [4] W. McNatt and J. Bieman, Coupling of design patterns: common practices and their bene ts. Proc. COMPSAC 2001. To Appear 2001. [5] B-K Kang and J. Bieman. A Quantitative Framework for Software Restructuring. Journal of Software Maintenance, 11, 245{284, 1999. [6] H. S. Kim and J. Bieman, Migrating legacy systems to CORBA based distributed environments through an automatic wrapper generation technique. Proc. Joint meeting of the 4th World Multiconference on Systemics, Cybernetics and Informatics (SCI'2000) and the 6th International Conference on Information Systems Analysis and Synthesis (ISAS'2000) .
4 34
Geographically Distributed Software Engineering and Maintenance, a Challenge for Code Analysis and Empirical Studies Paolo Tonella ITC-irst, Centro per la Ricerca Scientifica e Tecnologica 38050 Povo (Trento), Italy
[email protected]
Abstract The Web will impact software development and maintenance in several respects. Not only applications will have a Web-based gate, but they will be developed in a Web-centric environment. Such a geographically distributed production environment poses new problems to the fields of configuration management and code analysis. In fact, computation of code merges and ripple effects will be crucial and automatic support is expected to be extremely beneficial. In such a context, several questions remain still unanswered. Some of them are related to the way code construction and modification will be carried out. Others refer to potential improvement areas and the role of support tools. The usability of these tools is another relevant issue. Empirical studies in software engineering and maintenance have the potential to provide some of the answers.
1 Introduction The Web is becoming the natural way for commercializing services and products in several areas. Web applications are developed as a complement to the traditional ways of advertising and selling goods as well as of providing services, which are typically software-intensive ones. While an increasing demand is emerging for Web applications that are developed quickly and evolve easily, the economic relevance of the services that are delivered on the Web requires that problems related to reliability and quality are attacked. More generally, the development of software systems was itself impacted by the Web diffusion, and it is likely that the next generation of most software systems will be Web-based, not only in their interface, but even in the development process that is followed for their realization. The development of applications by a team that is geographically distributed and can exploit the Web as a common infrastructure for information exchange involves sev-
eral practical and scientific issues, which have been investigated only to a limited extent. Research in the field of collaborative and geographically distributed software engineering has produced several contributions in the area of configuration management, Webbased tools for cooperative work and security [1, 2, 3, 5, 10]. Configuration management tools have incorporated facilities for access through the Web and resolution of conflicts on remotely modified sources. In this paper the state of the art of available configuration management tools will be first revised, to highlight the issues that are still unresolved. The position of the author is that code analysis tools can address some of these issues and that empirical studies will be fundamental to assess the cost/benefit trade off of the solutions.
2 Configuration management Several configuration management tools have been recently developed which improve over the traditionally available ones in that they support cooperative, Web centric software development [1, 2, 3]. The main functionalities that have been added are: HTML interface This is an obvious way of exploiting the Web: graphical and command-line oriented interfaces have been replaced or complemented with an interface provided through the HTML language (and its extensions) by a Web server that can interoperate with the configuration management tool. Security Authentication protocols have been designed specifically for Internet, so that users are allowed to access resources under configuration management from a Web browser only if they have the proper permissions. Remote access This is a direct consequence of the presence of a gate toward Internet. Users can access the
35
resources they need from any remote location worldwide. Decentralized development Users are not required to be constantly connected to the configuration management tool. Advanced functionalities may include the possibility to produce a full local replication of the repository. In such cases, specific procedures for the re-synchronization of the artifacts have been defined. Consistency checks Different policies for consistency check and conflict resolution have been designed to accommodate the different needs of local vs. remote users. The weak points of the proposed solutions are related to the extremely limited support which is provided to the two fundamental operations of “merging” and “propagation of ripple effects”. Both are totally charged to the programmers and no support is provided to lead the whole system to a fully consistent state. It is the author’s opinion that this problem will be exacerbated by the possibilities offered by a geographically distributed evolution of the software based on the Web. It is no longer possible to assume that programmers can interact with each other and examine carefully all implications of the modifications planned for the next release. Moreover, the optimistic assumption that every programmer will provide a reliable version of the module she/he controls, without any unpredictable side effects, cannot be made. In the next section the potential role of code analysis in solving some of these problems will be described.
3 Code analysis The problem of detecting interferences and merging different evolutions of a same program is one of the topic in code analysis that was deeply investigated in the last few years [4, 8]. Proposed solutions are based on an explicit representation of the dependences between program parts, for example by means of the Program/System Dependence Graph [6, 9], and on the computation of dependent and independent program parts, obtained by means of slicing [4, 8, 13]. Tools based on such algorithms may be extremely beneficial to a geographically distributed production environment in which unexpected interferences can always occur when different evolutions of the program are merged. The increasing decentralization will lead to a lower ability to control parallel interventions on the software artifacts. Nevertheless, all contributions will have to be periodically integrated into a consistent, working system. In such a context it is not possible to rely only on the capabilities of the programmers to solve all conflicts that may arise. Automatic
support is expected to become one of the enabling technology for this change in software production. Although the main problems related to program merging and impact analysis have been attacked for decades, available technologies are not fully satisfactory from an industrial perspective. A list of improvement areas follows. Some improvements are just technical and implementative, while others require additional research in the field. Interprocedural analysis/pointer analysis While the theoretical problems related to interprocedurality and pointers have been investigated and several algorithms have been proposed [9, 12], moving to the implementation for real programming languages remains a hard task, and only a few research prototypes are available [11]. Support to conflict resolution While available techniques support the operation of merge in absence of interferences, programmers are charged to resolve conflicts when interferences do exist, without any help from code analysis, apart from the detection of the interferences. Variable precision algorithms The complexity of available algorithms may represent a major barrier for their implementation in an industrial context. Less accurate but still useful techniques could be studied so that the user can select an appropriate trade-off between the degree of support and reliability of the information obtained vs. the cost of implementing the analyses. User interface Usability of the information retrieved by code analysis may be a relevant impediment to its exploitation. Tools supporting conflict detection and resolution during the merge of parallel versions of the software should smoothly integrate with configuration management. Moreover, they should use advanced presentation facilities (e.g., hyperlinks and colors) to simplify understanding and usage of the analysis outcomes. Selection of tools and techniques among those available may be a difficult task and wrong choices may lead to the failure of the project. On the other side, it will be no longer possible to face the production of software in a geographically distributed environment without any help from tools. Assessment of the cost/benefits and selection among the alternatives are activities that can be conducted rigorously by means of the methodologies employed by the empirical studies in software engineering and maintenance. Therefore this research area is considered to play a major role, as described in the next Section.
36
4 Empirical studies The challenges to the empirical studies in software evolution that come from a geographically distributed production environment are mainly related to the enabling technologies that are expected to be employed. While technologies in the area of configuration management have reached a quite high maturity, so that their usage in the new context will be almost straightforward, the same is definitely not true for the code analysis technologies, that are still relegated to academia. In presence of a strong need of new technologies and in absence of consolidated practices and de-facto standards, empirical studies may help determine pros and cons of the potential alternatives. Experimental hypotheses for such studies could be based on a set of questions that have not been answered until now and at the same time are crucial for the adoption of an automatic support to a Web-centric software production environment [7]. Some of them follow, divided according to the category of associated empirical study.
4.3 Usability studies Is information offered to the user by analysis tools correspondent to the expectations? Does navigation inside the provided information match the mental paths followed by the user? Are there advanced interactive facilities that could simplify the access to the information produced by the tools? What questions remain unsolved when the user searches an answer through the tools? Is the information available but not easily accessible? If not available, is it computable?
4.4 Comparative studies Among all available techniques and tools, which are more appropriate to support geographically distributed software production?
4.1 Descriptive studies
Within a given category of tools, which one satisfies the user needs better?
What are the main difficulties encountered during the merge operations of parallel versions of the software? What are the logical paths followed to solve conflicts between interfering versions? How is independence assessed by programmers when parallel versions do not interfere? How are ripple effects determined when parallel versions do interfere? What are the main errors made by humans when merging and solving conflicts for parallel versions of the software?
4.2 Improvement studies Is it possible to observe any improvement when a code analysis tool is adopted to support program merging and impact analysis? What are the effects on the quality of the code produced with the adoption of support tools? What costs are expected to occur when moving from a manual to a tool supported code merging and conflict resolution activity?
What is a reasonable feature list against which tools should be evaluated? Are there mandatory features?
5 Conclusion In this position paper a possible future scenario related to software development and evolution was described. It assumes a central role of the Web, as a common infrastructure for code artifact sharing, and consequently it considers fundamental the usage of appropriate configuration management tools. Moreover, advanced features that are desirable in a geographically distributed environment, but are not provided by currently available configuration management tools, such as support to impact analysis and code merging, have been identified. Research in code analysis could provide some of the needed solutions to the problems devised, but their incorporation inside industrial environment is not trivial. In such a context the role of empirical studies is a central one. Several questions arise related, for example, to the real problems that will be faced, to the support that can be provided by code analysis, to the features that support tools should have and to the interactive functionalities that allow a better exploitation of the information computed by the tools. The answer to these questions will push software production to make choices and to evolve so as to include new technologies into a geographically distributed environment.
37
References [1] L. Allen, G. Fernandez, K. Kane, D. Leblang, D. Minard, and J. Posner. ClearCase MultiSite: Supporting geographically-distributed software development. In J. Estublier, editor, Software Configuration Management: Selected Papers of the ICSE SCM-4 and SCM-5 Workshops, number 1005 in Lecture Notes in Computer Science, pages 194–214. Springer-Verlag, Oct. 1995. [2] G. Callahan and M. Hopkins. Web configuration management. Software Development, 5(6):s1–s4, June 1997. [3] R. DelRossi. Configuration management hits the worldwide web. Software Development, 4(12):64–70, Dec. 1996. [4] K. Gallagher. Conditions to assure semantically consistent software merges in linear time. In P. H. Feiler, editor, Proceedings of the 3rd International Workshop on Software Configuration Management, pages 80–83, Trondheim, Norway, June 1991. [5] C. Godart, G. Canals, F. Charoy, and P. Molli. About some relationships between configuration management, software process and cooperative work: The COO environment. In J. Estublier, editor, Software Configuration Management: Selected Papers of the ICSE SCM-4 and SCM-5 Workshops, number 1005 in Lecture Notes in Computer Science, pages 173–178. Springer-Verlag, Oct. 1995. [6] M. J. Harrold and B. Malloy. A unified interprocedural program representation for a maintenance environment. IEEE Transactions on Software Engineering, 19(6):584–593, June 1993.
[7] J. D. Herbsleb, A. Mockus, T. A. Finholt, and R. E. Grinter. An empirical study of global software development: Distance and speed. In Proc. of ICSE 2001, International Conference on Software Engineering, Toronto, Ontario, Canada, May 12-19, pages 81–90, 2001. [8] S. Horwitz, J. Prins, and T. Reps. Integrating non-interfering versions of programs. Technical Report 690, Computer Sciences Department, University of Wisconsin-Madison, Mar. 1987. [9] S. Horwitz, T. Reps, and D. Binkley. Interprocedural slicing using dependence graphs. Proc. of the ACM SIGPLAN’88 Conf. on Programming Language Design and Implementation, pages 35–46, June 1988. [10] M. Lacroix, D. Roelants, and J. E. Waroquier. Flexible support for cooperation in software development. In P. H. Feiler, editor, Proceedings of the 3rd International Workshop on Software Configuration Management, pages 102– 108, Trondheim, Norway, June 1991. [11] J. Lyle, D. Wallace, J. Graham, K. Gallagher, J. Poole, and D. Binkley. Unravel: A case tool to assist evaluation of high integrity software. User Manual NISTIR 5691, U.S. DEPARTMENT OF COMMERCE, Aug 1995. [12] B. Steensgaard. Points-to analysis in almost linear time. Proc. of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 32–41, January 1996. [13] M. Weiser. Program slicing. IEEE Transactions on Software Engineering, 10(4):352–357, July 1984.
38
Evolutionary Web-Site Development for a Sport Retail Shop in Munich Stephan H. Sneed Icon Medialab eMail:
[email protected]
an unknown world. No one can say at the beginning where and when the journey will end. [Mack2000]
Abstract: The following case study describes the experience with a web-site development project and what lessons were learned. It emphasizes the fact that the prototype development must be followed by a phase of evolutionary functional growth and a further phase of consolidation and rework. Keywords: Web-site Evolution, Extreme Programming, Reengineering
In light of the exploratory nature of web-site development, there can be no sharp division between development and maintenance. [Känel2000] An immediate prototype solution can be provided within a matter of weeks, but it may take years for the web-site to ripen and become a solid product. So this gives way to a never ending, open ended project which is in effect an iteration of smaller, time-box limited projects, each of which starts with the web-site in one state and ends with the web-site in a different, hopefully better state. From this point of view, web-based development projects are a series of state transformations performed on the web-site as an object. [Sharma2001] It is important that these partial projects be short and compact with a limited budget and a limited scope. What is more, after every such project a review of the web-site should be conducted to determine the current state and to compare it with the previous state as well as with the state the user is striving to achieve at the current moment. The requirements for the next iteration are taken from the difference between the actual and the desired state, whereby it is clear that the desired state is continually shifting as pointed out by Belady and Lehman in their classical work on evolutionary systems. [Lehman/Belady1985].
Background The development of Web-Sites is now a major goal of many small and medium size enterprises that up until now have had little exposure to information technology. They lack not only experience with the new technologies, something they share with other enterprises in the old economy, but also experience with IT-projects in general, which means that it is difficult for them to precisely define their requirements and to specify acceptance criteria. On the other hand, there is little literature on the subject of web-based development to which they could refer. One of the few practical guidelines is the book by Arno Scharl on Evolutionary Web Development [Scharl2000]. This book contains some useful advice on how to go about developing a web-site, but it by no means proposes a generic concept which can be applied to all projects. Each organization must find it’s own best way to achieve it‘s organizational objectives within the constraints imposed by the business situation. This implies trial and error. All of the literature on the subject agrees that web-site development is a learning process which has to be carried out in an iterative stepwise manner. The costs and benefits of such projects are difficult to estimate since so little is known about how to measure them. Some suggestions have been made such as Reifer’s proposal to introduce Web-Points [Reifer2000], however little empirical evidence is available to validate these proposals. The Rational Unified Process (RUP) only partially applies to Web-based projects [Booch+1999], so that leaves the users of the web pretty much in the dark as to how best to proceed. One study conducted at the University of Hamburg compares web-based development with expeditions into
The Project Conditions ICON Medialab is representative for many of the startup companies in the so called new economy. In Munich it employed a team of young, motivated but relatively unexperienced programmers, mostly from the front end, most of whom had never been involved in a classical software development project with applications and databases. The project management was equally unexperienced with persons who had never managed a classical software project and without any technical knowhow. There was one experienced technical manager and one advanced Java programmer, whose reservations were mainly ignored by the others. The customer was also new to the internet world, without any clue about soft-
39
developers themselves, which meant that many errors went unnoticed.
ware and it’s complexity. As a result, the web-site concept was never finished and was changed weekly during the development. Due to the lack of experience and knowledge on the part of both the customer and the supplier, it was difficult at the beginning of the project to predict the time and cost required to deliver a working web-site. Consequently, it was impossible to plan the project. This was clearly a case of learning by doing.
At this point, it was decided to repeat the development until the web-site reached a state which could be considered acceptable. Since then, the web-site has been evolving through a series of limited improvement projects. Both the customer and the supplier of the web-site have been going through a learning process, entailing the technical platform, the business requirements and the development process itself. In the meantime the sporting good’s shop has been using the latest web-site releases to sell some 51,000 different articles to more than 11,000 registered customers. Despite the difficulties of working with an unfinished system, the shop has become one of the few profitable online stores in Germany, providing adequate if not optimal satisfaction to their customers. All along new versions are being provided at the rate of one per month, each of which contains some improvements over the last version. Thus, the system is steadily evolving toward the state that the customer would have liked it to be in at the beginning.
The Original Development of the Web-Site The project to build a web-site described here in this paper can be considered as typical for many evolutionary web-site developments in the field of business to customer commerce. The mission was to create a modern e-commerce platform for an emerging sporting goods retail business in Munich, which is planning to sell it’s sport articles first in Germany and later on in Austria and Switzerland as well, i.e. where ever the German language is spoken. From the beginning it was decided by the parent corporation to develop the website on the basis of a given technical platform, this being the Enfinity platform from INTERSHOP in Jena, since this was the platform also used by the parent company. Icon Medialab was given the task of developing the sport shop’s new web-site and integrating it with the already existing customer administration system, as well as with the legacy stock inventory system, both of which were hosted on the mainframe of the parent corporation in Hamburg. The original web-site development project started in June of 2000. It was to have gone online in November of that year since the user company had already began to advertise the internet service on television. As often is the case, the project planning, including cost and time estimation , was handled by the sales managers without any consultation with the responsible technicians.
The Web-Site Architecture The sport retail shop web site - sportretailer.com – was developed with the middleware product Enfinity from Intershop, based on Java, JSP, EJB, SQL, XML and HTML technologies. The production system runs in an UNIX environment. The evolutionary development is being made on PC-Workstations under Windows NT whereas the final testing takes place on a Sun Unix server to guarantee that the test environment is as close as possible to the live production conditions. Enfinity is using an Oracle Database, which stores all the product and customer data. The system itself is divided into two servers, the transaction server and the catalog server, in order to provide differentiated scalability. Both servers have an account with the host database to store their persistent objects, using EJB’s. Enfinity provides a convenient API, providing buyer, product, session, basket and other typical Java objects required by an online store. It also includes a number of ready to use pipelets. These pipelets are small Java classes , reduced to one or two single functions, designed to reuse other components, ensure reliability and to avoid redundant code. These classes are wrapped behind XML files and can be linked together in a so called Pipeline, which, besides piplets also contains decisions, joins, jumps, calls, start nodes, end nodes and transaction rollbacks to commit positions. These pipelines are editable with the visual pipeline manager, and are representing the business logic. Almost every pipe empties into a template which shows the results evaluated, calculated or determined by
The web-site was indeed ready by November of 2000, but this first version could only be considered as a prototype. There was neither a requirement specification nor was there any design documentation. The code was hacked, mainly at the customer site where it was tested. The customer was defining requirements while the code was being written. The HTML formats were composed together with the user in a trial and error mode. The fact that there was no modular design led to many dependencies which led again to multiple errors. Nevertheless this first version encompassed almost the full functionality. It was possible not only to display the goods but to order them as well from the central stock distribution system. However, many minor features were still missing, the performance was low and the reliability of the system was unacceptable. All of the testing was done by the
40
der great pressure from the customer, since it was difficult for the customer to understand why the project should be repeated in order to reengineer the architecture of the product and to improve the quality. From his viewpoint, the web-site should have been done right the first time. After all, he had paid for this. Therefore, it was very difficult to get him to support the further evolution of the web-site. The customer continued to be involved in the process, but was made aware of the technical limits. A formal change request process was introduced together with problem reporting and tracking. It was also decided to review all code through the team as a whole before releasing it to the customer.
the pipe. Upon the first access to a pipeline, all Java class files, the XML dependencies and the ISML (Intershop Mark Up Language) templates are compiled once to JSP code for further use, rather than upon each access of the pipeline. This feature was added later to enhance the performance. The templates, themselves, are coded in the ISML format, which is actually HTML with Enfinity specific JSP Tags. These tags are used to call pipelines, if necessary with parameters, or to print out values from the pipeline dictionary. The product import, that has since the first installation in November, been run almost weekly to guarantee a consistent state, is done automatically by Enfinity, provided the import files are in the appropriate XML format, specified by Intershop. Altogether, three XML Files are required, one for the unique products, one for the master products and variations, and one for the categories. The product information delivered by the web-site comes in the form of one huge CSV file. An import box, implemented in Java, translates that file into a corresponding XML file. The import box also contains a number of filters, mostly Python scripts.
After testing the results of the first iteration project, the customer began to understand the importance of product quality, since the time needed to make necessary changes after the first release was unacceptable, but after the second release it began to decrease. The number of second level errors was also decreasing, so the customer began to appreciate the necessity of rework. Having more influence on the project, the development team began to get more routine in turning out new releases.
The integration of existing web-site customers was realized with a Java HTTP interface, connected to the mainframe host via the message queuing series, which was able to transfer the old customers registered in the legacy customer relation system of the parent corporation into the Enfinity database at the web-site and to merge them with the new customers being stored there. The dispatching system of the parent corporation running in Hamburg receives messages from the Munich web-site via the same Enfinity interface, returning the delivery status on the products ordered so that can be displayed to the customers upon demand. In addition to informing the user on the status of his delivery, the web-site offers other information to the customers such as the latest news on sporting events and the most recent sport statistics. It is now even possible to purchase tickets for sporting events via the web-site.
To improve the product a post documentation project was started to create an object model of the system and to represent it using UML diagrams. . Parallel to this reverse engineering project , the Java code has been reengineered, so that more than 50% of the code has now been replaced. The Java source files have been filled with comments for the automatic source code documentation with JAVADOC. Also an accurate bug tracing routine was installed which allowed an error to be traced to its source and removed, rather than making a workaround as had been done before. As a consequence in March of 2001 the error curve began to decrease for the first time. In April, the decision fell to renew the entire web-site, taking components from the old system which have been accurately tested against the quality rules and the coding and naming conventions. Of course, the old web-site system had to be maintained too, so that the project team was now working on two systems in parallel. Every update to the one system had to be carried over to the other. This required strict change management and configuration management. To achieve this, all of the project hardware infrastructure had to be set up new, with separate developing, testing, staging and deploying environments. As a result of the reengineering process, it was possible to reduce the amount of pipelets and templates within the new system to less than half of the number in the original version. Also, the load distribution was optimized, thereby increasing the performance
Subsequent Projects to Evolve the Web-Site Following the introduction of the first version of the web-site in November of 2000, work began immediately on improving both the product and the process. To improve the process, the sales oriented managers were replaced by technical managers with knowledge of the limits of internet technology. These managers were obliged to consult with the technicians responsible before making any commitments to the customer. They had realized that to promise anything to the customer just to keep him satisfied has negative consequences for the project, both for the developers and for the customer himself. At this time, the project management came un-
41
high of 33 in November of 2000 to a current low of 5 per month. There are now some 14,000 customers registered for the 51,000 products offered.
by a factor of 4 times. In addition, a controlled and documented test process was introduced, using rational tools and the bug-database of the version control system. The test cases were now being constantly improved to achieve a maximum functional coverage. In addition, a library of test cases has been built up as proposed by Jacobson. [Jac1999] These test cases cover almost all of the use cases for the web-site. Work is now proceeding on a configuration management system to control the versions and to ensure the deployment of the correct product version.
Conclusions It would appear that web-sites built from scratch cannot be constructed according to classical software engineering principles. [Bask+2001] This is due to the tight time constraints and the volatility of the requirements. Two aspects of handling such a project have been proven as very useful. The first one is to divide the project into different sub projects. This is done by the factor of functionality (i.e. customer registration, shopping cart checkout, raffle, product import, top story) not, as maybe expected in architecture layers such as front end, application, data base.
In May, the UML documentation was adapted to the new system, while the other documents were modified to reflect the reconstruction of the shop. Meanwhile, an interface from the version control system to the proposed automated build and deploy process was implemented in the VCS Medializer in cooperation with NXN, the developers of Medializer. At the end of May, the new system passed all of the tests and went live, having only two bugs occur while logging in with the damaged customer profiles.
These subprojects also have to be handled separately by the project management and the customer. They are reflected in the version control system, so that they could be labeled each time a change is made. If a new version goes online, the build process allows the generation of a new version consisting of all the sub projects with specific labels, chosen by the customer and the management. This gives the project the needed flexibility to allow the customer to have different parts changed by the concept, like thinking of a new way of product import because of a change of formats of the import data at customer side. So the product import can be changed quickly, having no effect on the new registration subproject, this being a long period of time in reengineering. If the new product import has to go online, the registration label before the reengineering started will be taken, while work on the new registration version continues, so that the import sub project is uploaded into the live system. Things like that will definitely happen, since the customer changes his business as quickly as the market situation defines and is expecting his online shop to keep up. The other aspect is to have in mind, that the first version is just a prototype. In general, customers with little IT experience do not give exact and detailed specifications at the beginning, but follow with many change requests after the first running version. Since it is not possible to build a couple of fully functional high quality prototypes because of the high costs, quality assurance and documentation starts when product evolution ends and the product stabilizes. [BennRajl2000] Of course, it depends on the project budget, to what degree quality assurance can be done from the beginning, but usually the budget is to low too allow either quality assurance or documentation.
In June, the test scenarios were enhanced to cover almost 95% of all possible use cases. Each of the members of the development team now has a fixed responsibility for certain components. The product is under configuration control and the project has a defined , improved workflow. Change requests and new features can be now be specified, designed, implemented, tested and deployed according to this new process model in less than half of the time it used to take and with significantly less errors.The last step done in optimizing the process has been to automate the build process. The goal is to eliminate the errors occurring somewhere between the developing machines and the live system because of different configurations and manual file up-loads. Creating and combining scripts and also having an interface to the version control system has already resulted in a more stable build process, which cleans up the system, uploads the files and compiles them. [Sotir2001]
The current Web-Site System The current web-site system at the time of this writing consists of 86 Java classes with some 6,600 lines of Java code and 388 templates with some 75,800 lines of set in to gradually replace the existing prototype product with a real efficient and HTML code. There are in addition to the online shop, some 10 batch procedures for background processing and 317 XML files – either pipelines or wrappers – for interfacing both to the batch processes as well as to foreign systems. To date 11 calendar months have elapsed and 102 person months have gone into the development. The error rate has gone from a
42
must reliable product which can be readily maintained and extended as needed. If this does not happen, the web-site is doomed to failure. Therefore web-site evolution can be considered unavoidable, especially under the circumstances described here.
functional component
concept and development phase
functional component
evolution phase
functional component
quality assurance and optimizing phase
functional component
maintenance phase
change demands in LOC
For sure, everything that could be done at the beginning should be done, like coding conventions, directory structures and using smart environments. However, initially hacking or extreme programming seem to be the only alternatives to getting a web-site up and running. Then, at some point, software engineering
project life time
Figure 1: project phases
References [Mack2000] Mack,J.: Software-Xpeditionen – ein gelungene Verbindung aus Expeditionssicht und Extreme Programming, in Proc. of GI Conf. on Software Management, Oesterreichische Computer Gesellschaft, Univ. of Marburg, Nov. 2000, p. 45-60
[Bask+2001] Baskerville, R./ Levine,L.: “How Internet Software Companies Negotiate Quality“ IEEE Computer, May 2001, p. 51-57 [BennRajl2000] Bennett,K.H./Rajlich,V.T.: “Software Maintenance and Evolution – A Roadmap” in The Future of Software Engineering, Finkelstein (ed.), IEEE Press, Limerick, 2000, p.73-87
[Sotir2001] Sotirovski, D. : “Heuristics for Iterative Software Development“, IEEE Software, May 2001, p. 66-73
[Jac1999] Jacobson, I., Booch, G., Rumbaugh, J. : The Unified Software Development, Addison-Wesley, Reading, 1999, p. 152-160
[Reifer2000]Reifer, D. : „Web Development – Estimating Quick to Market Software“, IEEE Software, Nov. 2000, p. 5764
[Känel2000] von Känel, J. : „Media Enabling e-Business Applications“ in Proc. of RETIS Conf., Oesterreichische Computer Gesellschaft, Zürich, March 2000, p. 19
[Scharl2000] Scharl, A.: Evolutionary Web Development, Springer Verlag, Berlin, 2000, p. 1-300 [Sharma200] Sharma, M. : “E-Business – Building it Right from the Ground Up“ in Cutter IT Journal, Vol. 14, No. 1, Jan., 2001, p. 30-35
[Kru2000] Kruchten, P. : The Rational Unified Process – An Introduction, Addison-Wesley, Reading, 2000, p. 21-88 [Lehman/Belady1985] Lehman,M./Belady,L.: Software Evolution, Academic Press, London, 1985, p. 33-200
43
44
Software Process
Towards Distributed GQM Alessandro Bianchi*, Danilo Caivano*, Filippo Lanubile*, Francesco Rago°, Giuseppe Visaggio* *Dipartimento di Informatica, Università di Bari, Italy {bianchi, caivano, lanubile, visaggio}@di.uniba.it, °Italy Solution Center, EDS Italia, Caserta, Italy,
[email protected]
Abstract Global software development has a big impact on current software engineering practices. Current quality frameworks and improvement strategies need to be adapted for distributed software engineering environments. GQM is an approach to measurement that can be used within the context of a more general strategy to software quality improvement. We present a variant of GQM, called D-GQM (DistributedGQM), whose goal is to address new requirements from distributed and cooperative organizations.
Introduction Business globalization and in particular high-technology business are growing due to a continuously expanding market full of new opportunities and resources. Global Software Development (GSD) [IE01], i.e., developing software from remotely located sites, is increasingly being adopted by software suppliers because of the following reasons: - new business markets - new business opportunities - to capitalize on the global resource pool, wherever located; - to take advantage of proximity to the market - to take advantage of expert’s knowledge wherever located - to take advantage of time zone differences for software development; As reported by Herbsleb and Moitra in [HM01], “This change is having a profound impact not only on marketing and distribution, but also on the way products are conceived, designed, constructed, tested and delivered to the customer”. However, a series of disadvantages like the need of ad hoc methodologies to manage larger and geographically distributed work groups [Coc00], tools for exchanging and sharing knowledge [NFK97, SY99], extra overhead to face and overcome communication problems with staff [ED01], must to be taken in to account. Herbsleb and Moitra characterize the dimensions of the GSD problem [HM01]: - strategic issues: the decisions made to divide and distribute work among different sites, would let the sites operate as independently as possible while providing for an adequate communication across them; - cultural issues: due to the fact that staff involved in co-operative work have different cultural backgrounds; - inadequate communication: the distribution of work in different sites increases the cost of formal communication between team members and limits informal exchange of information that in a usual working environment contributes to sharing experience and co-operating for the achievement of a common goal. - knowledge management: the knowledge management is more difficult in a distributed context. This means that information may not be exchanged informally without a standard procedure and adequate rapidity; also the consequences of poor management can negatively influence and limit reuse. - project and process management issues: in distributed environments the project management activity is a critical success factor. The distribution of sites negatively impacts on synchronization and scheduling of activities. Furthermore, cultural differences and attitudes must be considered too. - technical issues: the distribution of sites in disperse locations imply the use of different hardware and software platforms, with derived problems related to incompatible data formats exchanges or the use of different tool versions. All these dimensions have also a strong impact on product and process quality. Thus, the quality frameworks and improvement strategies [WCR97] need to be adapted to distributed contexts. The Goal-Question-Metric (GQM) [BR88, BW88], represents a structured and systematic approach for data collection, based upon the specific needs of the project and the organization. Furthermore, as reported in [BCR94], “It can be used in isolation or, better, within the context of a more general approach to software process improvement.”. This work suggests a new measurement approach, called Distributed-GQM (D-GQM), developed to overcome the limits of the classic GQM in distributed contexts.
47
The GQM The main idea behind GQM is that measurement should be goal-oriented and based on context characterization. According to [BCR94], the measurement model has three levels: - Conceptual Level (GOAL): a goal is defined for a specific purpose based on the needs of the organization, Organization for a variety of reasons, with respect to various quality models, from various points of view, relative to a particular environment. Operational Level (QUESTION): a set of questions is Goal1 Goal2 GoalM used to characterize the way the achievement of a specific goal is going to be performed. - Quantitative Level (METRIC): a set of collectable data Quest.1 Quest.2 Quest.3 Quest.1 Quest.2 Quest.1 Q uest.2 Quest.3 is associated with every question in order to quantitatively answer them. Metric Metric Metric Metric Metric Metric Metric Metric Metric Metric
Figure 1: GQM used in a colocated organization
In the interpretation phase measurements are used to answer the questions and to conclude whether or not the goal is attained. Thus, GQM uses a top-down approach to define metrics and a bottom-up approach for analysis and interpretation of measurement data.
GQM defines a dynamic quality models [LV94] on which basing an effective measurement program. In a colocated project, quality goals reflect the business strategy and GQM is used to identify and refine goals based on the characteristics of software processes, products and quality perspectives of interest. This is done considering the entire organization as shown in Figure1. Furthermore a colocated organization can be considered internally homogeneous referring to competencies, maturity and processes in use. If we refer to distributed environments, what has been mentioned to this point is not always true. The sites that make up a distributed organization can be heterogeneous and then a quality model, built using GQM, need to be refined in different ways (depending on site characteristics), or a different quality model should be adopted for each site. This poses new problems at different levels: - Conceptual Level: one of the most important aspects of GQM is the domain characterization through the inclusion of the context in the goal specification. In the case of distributed environments the sites may differ for ability, maturity and responsibilities. Therefore characterization of one site is not necessarily extendible to (and then valid for) other sites. Each site might focus only on a small portion of the quality goals accepted by the entire organization. In fact if we consider that software processes are spread amongst the sites, each site might have different goals based on the business process it supports. - Operational Level: the questions used to refine a goal might change because of site heterogeneity. Therefore a site may not be interested in a certain aspect of a goal expressed in a question. Based on the maturity of a site the same goal can be refined with different questions that focus on certain aspects rather than others. - Quantitative Level: the choice of metrics used to quantitatively evaluate a question is a critical aspect. Collected measures should be familiar, easy to use and tailored to the sites in which they are used. A quality question can be refined using different metrics for each different site, so that metrics are well known in that context and reuse of existing metrics is promoted. In fact, the introduction of new and unfamiliar metrics causes some disadvantages such as purchasing new measurement tools, training people, resistance to adoption.
The D-GQM A distributed organization is formed of N geographically distributed sites. Therefore we can assume that each site can consider all or only part of the goals, depending on the supported software processes. For example, a software factory can be composed by the following sites: s1, analysis site; s2, project site; s3, s4, coding sites, s5,s6 testing sites. The different characteristics of each site, suggest that they have different goals. Thus, each site will contribute to satisfy a specific subset of goals. As shown in Table 1, site s2 supports the quality goals g2 and g3 and the site s3 supports the quality goals g3 and g5. The resources available, the levels of expertise in various technologies, tool used, processes maturity, capability and so on, characterize each site.
48
Even if two sites share the same goals and support the same processes, they might have the need to refine the quality model in two different ways. A typical example is the case of a large organization that takes over a smaller one. The integration takes time because the smaller organization will need to adapt to new standards, procedures, and technologies. Suppose that both goals g4 and g5 (including questions and metrics) are defined as shown in tables 2 and 3. Sites s3 and s4 (table 2) pursue the same goal and execute the same processes (coding). Goal g4 is refined by both sites s3 and s4, using the questions q1 and q5. Every site is different in terms of competencies, resources and abilities. Therefore code metrics such as Halstead’s complexity metric [Hal77] may be too expensive for site s3, because it does not have a measurement tool and manual measurement requires too much effort. On the other hand site s4 does not have the same problem because it can automatically measure this metric. Site s3 has a tool to measure McCabe’s cyclomatic complexity [MB89]. In such cases, although the same questions are used, D-GQM allows specializing the metrics in order to tailor them to the characteristics of each site. In this way, s3 can use McCabe’s cyclomatic complexity to measure code complexity rather than Halstead’s metric, and still be consistent with the question. This is one of many possible ways of measuring complexity as mentioned in [Zus91].
g1 g2 g3 g4 g5
s1 ■
s2
s3
s4
■
■
s5
s6
■
■
■ ■
Table 1: cross-reference goals-sites g4: Analyze the coding activity for the purpose of evaluation with respect to effectiveness from the viewpoint of the software engineer. q1: what is the m11: Halstead complexity if the complexity code produced? m15 : v(G)for each module q5: what is the productivity of each programmer?
m16: LOC/hour
s3
s4
g5: Analyze the system test process for the purpose of evaluation with respect to effectiveness from the viewpoint of the software engineer. q7: What is defect m17: number density? of defects / LOC
■
s5
s6
■
■
■
■
q9:What is the effort for test execution?
■
m8: effort per activity m10: effort per person
■ ■
Table 3: Decomposition of goal 5
Table 2: Decomposition of goal 4
In table 3, it can be seen how site s5 uses only one question to refine the goal g5 while site s6 uses both q7 and q9. Furthermore s5 uses one metric m17 to refine question q7 of goal g5. For example, suppose that site s5 performs testing only on safety-critical systems. For these systems, software quality is the higher priority, whatever effort is spent to achieve defect-free software. Under these conditions the measurement of both m8 and m10 might be useless in terms of achieving g5 because it is not necessary to make a tradeoff between quality and cost (determined by testing effort). Thus, only question q7 will be considered. On the other hand, site s6 performs testing on business information systems. In order to achieve g5, we can think of measuring the density of defects found (q7) and the effort spent on testing (q9). This choice makes it possible to find the desired tradeoff between cost and quality, in order to maximize the economical benefits for the organization. Each goal can be refined with different questions. A quality factor expressed by a goal is identified by many questions, finally each question is analyzed in terms of what measurements (in terms of metrics) are needed in order to be answered. The D-GQM allows using different quality models depending on the characteristics of the site.
49
The interpretation process of a quality model can be adequately represented with decision trees. In fact if we consider the structure of GQM, we can see how it is possible to go through the GQM-tree starting from the goals down to the lower levels corresponding to the metrics. At this point strengths and weaknesses are located by comparing values of metrics and baselines fixed. These values are a reference for the improvement process following the interpretation phase. The D-GQM tailors the classic GQM with respect to the conceptual, operational and quantitative levels overcoming the limits mentioned. − At the conceptual level it is possible to distribute the goals amongst the sites that compose the organization as shown in figure 2. This allows assigning goals to those sites that can actually satisfy them rather than to others. Each goal can then be characterized to the specific context. Furthermore, the strengths (+) and weaknesses (-) p1....pn deriving from the interpretation phase are not necessarily distinct for each site, as shown in figures 3 and 4.
−
−
At the operational level the same goal can be refined with different questions as shown in table 3. In figure 3 the goal g5 is refined in two different ways. Therefore the decision tree associated to the goal is different. The tree in figure 3.a leads to different strengths and weaknesses than those in figure 3.b. In fact in this last case it is not possible to identify the two weaknesses p3 and p4. Nevertheless both trees refer to the same goal g5. This is due to the fact that the decision tree in figure 3.b does not use one of the questions used in the tree represented in figure 3.a (The excluded question is colored in gray). At the quantitative level the same question can be quantified through different metrics. Figure 4 shows an example. This kind of approach allows using all the resources of a site as shown in the example in table 2. In this case the decision tree associated to the goal (as in figure 4.a and 4.b) leads to the same strengths and weaknesses although different metrics are used.
The interpretation phase plays a fundamental role for the successful utilization of the D-GQM. It makes it possible to combine results from different questions and metrics into a coherent view according to the quality model adopted.
Conclusions In this work, the limits of the GQM approach, applied within geographically distributed organizations, have been discussed and a new approach, called D-GQM, has been proposed. D-GQM is an extension of the GQM to be used in distributed contexts. This approach allows interpreting collected data, based on the characteristics of each site. The D-GQM may provide the following feedback to the project manager: (1) achieved goals, (2) where the goals have been achieved from (which site of the organization), (3) how the goals have been achieved. Although dispersed sites might use different metrics, the project manager can evaluate the conformance of organization to the adopted quality model .
50
Encouraged by the potential benefits of the D-GQM in distributed environments, we will further investigate the method in order to highlight its strengths and weaknesses. This investigation will be conducted through a rigorous formalization of the method and experimentation in an industrial environment.
Acknowledgments. This work is the result of collaboration between SER_LAB (Software Engineering Research LABoratory) from the University of Bari and EDS Italia, within the Software System Quality project. Our thanks to Teresa Baldassarre for her insightful remarks on a first draft of this proposal.
References [BCR94] [BR88] [BW88] [Coc00] [ED01] [Hal77] [HM01] [IE01] [LV94] [MB89] [NFK97] [SY99] [WCR97] [Zus91]
V.R. Basili, G. Caldiera, H.D. Rombach, “Goal Question Metric Paradigm”, Encyclopedia of Software Engineering, John Wiley & Sons, Volume 1, 1994, pp. 528-532. V.R. Basili, H.D. Rombach, “The TAME Project: Towards Improvement Oriented Software Environments”, IEEE Transaction on Software Engineering, Vol. 14 no. 6, 1988, pp. 758-773 V.R. Basili, D.M. Weiss, “A Methodology for Collecting Valid Software Engineering Data”, IEEE Transaction on Software Engineering, Vol. 10 no. 6, 1984, pp. 728-738 A. Cockburn, "Selecting a Project's Methodology", IEEE Software, July-August 2000, pp.64-71. C. Ebert, P. De Neve, "Surviving Global Software Development", IEEE Software, Mar-Apr 2001, pp.62-69 M.H. Halstead, “Elements of Software Science”, Operating, and Programming Systems Series Volume ,. New York, Elsevier, 1977. J.D. Herbsleb, D. Moitra, "Global Software Development", IEEE Software, Mar-Apr 2001, pp. 16-20. IEEE Software, The Global View, Mar-Apr 2001 F.Lanubile, G.Visaggio, “Quality evaluation in software reengineering based on fuzzy classification”, in Frontier Decision Support Concepts, John Wiley & Sons Inc, 1994, pp.119-134. T.J. McCabe, C.W. Butler, "Design Complexity Measurement and Testing." Communications of the ACM 32, 12 (December 1989): 1415-1425. K. Nakamura, et al., "Distributed and Concurrent Development Environment via Sharing Design Information", Proc. of the 21st Intl. Computer Software and Applications Conference, 1997. J. Suzuki, Y. Yamamoto, "Leveraging Distributed Software Development", Computer, Sep 1999, pp.59-65 Y. Wang, I. Court, M. Ross, G. Staples, G. King and A. Dorling , “Quantitative Evaluation of the SPICE,CMM, IS0 9000 and BOOTSTRAP”, Proceeding of the 3rd International Software Engineering Standards Symposium, Walnut Creek, CA, 1997, pp. H. Zuse, Software Complexity measure and methods, Walter de Gruyter, 1991.
51
Empirical Perspectives on Maintaining De -localized Software Systems using Web based Tools Balaji. V Project Manager
Infosys – Nortel Offshore Development Center, Infosys Technologies Limited, Bangalore - 561 229, India.
Sangeetha Balaji AK Aerotek Software Center Pvt. Limited, Banglore, India Introduction The Maintenance of software is a vitally important area of software engineering that warrants significant research attention. In the past, software maintenance has been assisted by depending on the original programmer to modify his/her own code [1]. However with industry turn over rates as high as 70%, this approach can no longer be relied upon. Also advances in Internet technologies have liberated Information Technology industry from conventional spatial constraints in that, co-workers no longer need to be physically together, leading to a multi site development environment. These factors warrant significant attention on software maintenance. For the application software industry engaged in providing customized solutions, de-localized environment has led to decentralization of various departments spanning different continents and staffed by employees of different organizations [2]. Information flow in such a de-localized environment becomes a key factor in the success of software development and maintenance efforts [3]. It impacts productivity (as developers and maintainers spend time looking for information), quality (as developers and maintainers need accurate information in order to carry out their tasks effectively) and communication between end users and maintainers. Situation The influence and the power of the Internet have bridged the gap of information flow with regard to global communication [4]. Web based tools have helped in better traceability, increased productivity and better quality of the software maintained in a delocalized environment. Software Maintainers use web-based tools to rely on various sources of information about the system being modified, adapted or changed. These
52
sources can range from artifacts of the development process (requirements documents, design documents, documents within code, test plans and reports, etc.) to user documentation (user manuals and configuration information) to the system itself (both the running system and the source code). In the industrial environment, identification of web-based tools for Requirements Management (RM), debugging and analysis of results of execution from a shared simulator, becomes important and helps in improved software maintenance. The web tools that are considered in the paper are DOORS (Dynamic Object Oriented RequirementS) for Requirements Management and Application Message Decoder (AMD) for decoding and analysis results of execution. Let us examine each in turn. Requirements Management is about, people, about communication and about attempting to understand before attempting to be understood [5]. Requirements Management involves [6] •
Communicating with customers across multiple locations
•
Writing requirements in natural language, and
•
Storing requirements in a database where they can be annotated, traced, sorted and filtered
Dynamic Object Oriented RequirementS (DOORS), provided by a third party vendor, is used for capturing all the artifacts of the development process (requirements, design documents, documents within code, test plans, test cases and reports, etc.) into a single database. For a project, each of the artifacts is created as unique modules, which in turn contain objects, which are instances of those artifacts (viz. A requirement module will have objects, which are instances of user requirements for the project, which needs to be maintained). DOORS provide links between modules that ensures better tracking and maintaining existing software. DOORS provides automatic version control mechanism. A version history is maintained for each of the artifacts and its associated results. Communicating snap shots of status and reports to the customer can be achieved on the Web for global access by means of a high-speed data link. DOORS provide multiple views based on different users needs. DOORS encapsulate the Requirements Management Process as shown in Figure 1: Requirements Management Process
53
Figure 1: Requirements Management Process
Request for Development
Is N
Requirement
Stop
available ? Y Is
N
Requirement Documented
Document Requirements
Y Analyze Requirements and study impact on Business
Are N
Customer Needs
Go For Alternate Design
V alidated ? Y Select Design Solution
Any Requirements
Y
Freeze Design and N
Change?
Validate the product to the requirements
The Application Message Decoder (AMD) tool, developed in-house, helps in decoding messages received from the simulator. The messages received from the simulator, are in hex bytes. These are converted using a script and the output is displayed in natural language (Text) for faster debugging. The messages received are primarily simulator outputs, received on account of testing an existing or a new requirement The Simulator simulates the functionality of the System Under Maintenance / Test (SUM / SUT). The
54
tool can be customized for decoding messages arising out of different standards (American, European, etc.). It is also useful in building prototype message formats early in the software cycle. Figure 2: Overview of a Typical Network Setup in a De-localized Environment , describes an overview of a typical network setup in a De-localized environment. On account of time differences between geographically distributed systems, the simulator is shared and used effectively round the clock to achieve 24x7 operational effectiveness. Figure 2: Overview of a Typical Network Setup in a De -localized Environment
WS3
WS4 SUT / SUM
High Speed Data Link across Internet
High Speed Data Link across Internet
Simulator
WS1
WS2
Motivation The motivation of the empirical study is to investigate the merits of using web based tools discussed above for better maintenance of software in a de-localized environment. Interesting questions to be discussed are •
What characteristics of the tools that needs empirical attention?
•
Empirical evidence needed to assess the merits of the tools?
Characteristics
55
In this section of the paper we examine the characteristics of the tools mentioned above that requires empirical attention. •
Accessibility of information across multiple locations .How can it be achieved?
•
What types of user interface will be needed for accessing data stored in the database?
•
What types of access mechanisms are to be enforced for multiple user access? What process mechanisms are to be put in place to ensure efficient optimization of tool usage and better maintenance?
•
How to analyze information stored in the database? Are customizable views available for data analysis for different user needs?
•
How is traceability ensured?
•
What will be the type of output presented for different type of results? How will they be useful for analysis and future maintenance?
Experience Information access becomes a key component in a de-localized environment. Before the implementation of the DOORS tool, requirements were documented as separate documents based on functionality. Multiple versioning (Draft, Review, Final) and duplication of the requirements maintained resulted in increased size of the document and increased storage space. Increased size also resulted in delay in exchanging the documents. Information exchange takes places through mails between multiple locations and response times are impacted by the differences in time zones and availability of resources. Requirements traceability was performed manually and mapped onto corresponding documents. Manual propagation was needed to link the requirements to the various artifacts of the development process. These were labor intensive, requiring manual analyses and searching through numerous documents. Requirement traceability was poor due to human error and manual propagation resulting in improper compliance and led to increased risk, inspection time and resource overhead in maintaining the software. After the implementation of DOORS tool, the following were observed. The requirement documents along with other development artifacts are stored in a central database, accessible through an easy user-friendly graphical interface. Duplication
56
and multiple versioning of the requirements documents are avoided using version control mechanism. Accessibility to database is provided thorough a web interface and is accessible simultaneously to the customer and user. Locking mechanism is provided to ensure that each module is accessible to a single user for write operation and multiple users for read operation. Information stored in the database is analyzed by making use of different customized views based on different user needs (Design primes, Project Mangers, Product Verification and Testing teams, Customer contacts, Product Managers and Release management teams). Snapshots on various stages of artifacts are made available on the web using scripting language. Color-coding mechanism helps in mapping requirements to various artifacts sections. It also provides a better picture on the progress of the project. A 3-Dimensional matrix using objects and colors indicates whether a link exists between the requirements and their corresponding sections. Presence of a link is indicated by Green color and a cross-matrix mapping, while absence of a link is indicated by Red and non mapping in the cross matrix [7]. The outputs of the simulator results are decoded using scripts and converted to text form. These are made available using web interface for better readability and identifying message outputs better. Pull down menus help in decoding messages for different standards. Figure 3: Simulator Output in Hex Format shows the results of output from the simulator in hex format. Figure 3: Simulator Output in Hex Format
09 81 03 0E 12 0B 52 22 00 12 04 16 14 18 10 00 00 04 43 48 00 0C A9 62 81 A6 48 04 00 00 B0 12 6B 1E 28 1C 06 07 00 11 86 05 01 01 01 A0 11 60 0F 80 02 07 80 A1 09 06 07 04 00 00 01 00 32 01 6C 7E A1 7C 02 01 00 02 01 00 30 74 80 01 00 83 08 84 13 16 14 21 61 30 00 85 01 0A 88 01 00 8A 05 84 97 16 14 00 BB 07 80 05 80 10 00 90 A2 9C 01 02 9F 32 08 05 05 42 01 21 61 30 F0 BF 34 17 02 01 00 81 07 91 16 14 07 09 00 00 A3 09 80 07 05 F5 20 02 59 00 03 BF 35 03 83 01 11 9F 36 01 C2 9F 37 07 91 16 14 08 09 00 00 9F 38 07 A1 16 14 30 51 00 F6 9F 39 08 02 10 60 70 10 91 32 00
57
Figure 4: Decoded Message using AMD Tool, shows a sample decoded messages in natural language (text) form using AMD tool. As can be seen below, provision exists to select the variant type based on different standards.
Figure 4: Decoded Message using AMD Tool
IN Codec
Select IN Variant
CAP_AC_1
09 81 03 0E 12 0B 52 22 00 12 04 16 14 18 10 00 00 04 43 48 00 0C A9 62 81 A6 48 04 00 00 01 00 32 01 6C 7E A1 7C 02 01 00 02 01 00 30 74 80 01 00 83 08 84 13 16 14 21 61 30 00 85
Decode
Clear
Message: SS7_IE_NO_TAG( 103) |--SCCP_IE_VARIANT(3020) = 01 |--SCCP_IE_MSG_TYPE(3001) = SCCP_UDT(09) |--SCCP_IE_PROTCL_CL(3007) = 81 |--SCCP_IE_CLD_ADDR (3005) = 5222001204161418100000 |--SCCP_IE_CLG_ADDR(3006) = 4348000c +-TCAP_BEGIN(8001) |--TCAP_IE_OTID(8005) = 0000b012 |--TCAP_IE_DIALOGUE_PORTION(8007) | +-TCAP_IE_EXTERNAL_TAG(8028) | |--TCAP_IE_OBJECT_IDENTIFIER(8042) = 00118605010101 | +-TCAP_IE_ASN1_TYPE(8029) | +-TCAP_IE_DIALOGUE_REQUEST(8031) | |--TCAP_IE_PROTOCOL_VERSION(8034) = 0780 | +-TCAP_IE_APPLICATION_CONTEXT_NAME(8035) | *--TCAP_IE_OBJECT_IDENTIFIER(8042) = 04000001003201 +-TCAP_IE_COMPONENT_PORTION(8008) +-TCAP_IE_INVOKE_COMPONENT(8010) |--TCAP_IE_INVOKE_ID(8041) = 00 |--TCAP_IE_LOCAL_OPERATION_CODE(8018) = 00 +-CAP_IE_INITIAL_DP_ARG(14005) |--CAP_IE_SERVICE_KEY(14061) = 00 |--CAP_IE_CALLING_PARTY_NUMBER(14021) = 8413161421613000 |--CAP_IE_CALLING_PARTYS_CATEGORY(14022) = 0a
58
|--CAP_IE_IP_SSP_CAPABILITIES(14097) = 00 |--CAP_IE_LOCATION_NUMBER(14042) = 8497161400 |--CAP_IE_BEARER_CAPABILITY(14017) | *--CAP_IE_BEARER_CAP(14016) = 80100090a2 |--CAP_IE_EVENT_TYPE_BCSM(14028) = 02 |--CAP_IE_IMSI (14038) = 05054201216130f0 |--CAP_IE_LOCATION_INFORMATION(14041) | |--CAP_IE_AGE_OF_LOCATION_INFORMATION(14011) = 00 | |--CAP_IE_VLR_NUMBER(14065) = 91161407090000 | +-CAP_IE_CELL_ID_OR_LAI(14025) | *--CAP_IE_CELL_ID_FIXED_LENGTH(14024) = 05f52002590003 |--CAP_IE_EXT_BASIC_SERVICE_CODE(14030) | *--CAP_IE_EXT_TELESERVICE(14032) = 11 |--CAP_IE_CALL_REFERENCE_NUMBER(14018) = c2 |--CAP_IE_MSC_ADDRESS(14046) = 91161408090000 |--CAP_IE_CALLED_PARTY_BCD_NUMBER(14020) = a11614305100f6 *--CAP_IE_TIME_AND_TIMEZONE(14098) = 0210607010913200 TTF_OK
Evidences A study was conducted on a project on the usage of the above mentioned web tools. The study phase lasted over 4 release cycles of product maintenance. Data collected included the impact of quality as a percentage of bug fixed, productivity in solving bug fixes per person month and percentage of schedule slippages. Data collected also included the time impacted on communication and project management. The outcomes of the study are indicated in Figure 5: Comparison on Quality, Productivity and Schedule slippage before and after the implementation of Web based Tools
The data collected in Sep-99 indicates the quality (8% of Service Requests or Bug-Fix rejected), productivity (1.6 SRs per person month) and schedule slippages with respect to forecast delivery dates (24% of SRs slipped) before using the web-based tools. The tools were introduced at the end of 1999 and the data was again collected during Apr-00 during the next delivery cycle. A marginal increase in quality (7% of Service Requests or BugFix rejected), productivity (1.7 SRs per person month) and schedule slippages with respect to forecast delivery dates (20% of SRs slipped) was observed. Data collection was again done during the 3rd cycle in Sep-00. Finally data collected in Apr-01, at the end of 4th cycle, revealed drastic benefits in quality (2% of Service Requests or Bug-Fix rejected), productivity (2.91 SRs per person month) and schedule slippages with respect
59
to forecast delivery dates (1% of SRs slipped). The benefits seen over 4-release cycle are 75% improvement in quality of software maintained, 81.8% improvement in productivity and 95.8% reduction in schedule slippages. Figure 5: Comparison on Quality, Productivity and Schedule slippage before and after the implementation of Web based Tools
25
24
20 20
15
10 8
8 7 5
5 2
0
2.5
2.91
1.7
1.6
1
Quality (% SRs rejected)
Productivity (SRs /PM)
Schedule (% of SRs slipped)
Sep-99
8
1.6
24
Apr-00
7
1.7
20
Sep-00
5
2.5
8
Apr-01
2
2.91
1
Data collected on impact of Project Management showed that the project management effort was reduced by 33%. Time spent in communication between different change agents was drastically reduced. Dramatic reductions in paper work resulted in cost savings to the organization.
60
Conclusion Based on our experiences in the usage of Web-based tools, significant improvements in software quality, productivity and schedule adherence are observed. This has also resulted in better stability and maintainability of the software along with better information flow. Hence, potential exists for an organization to maintain its Software from a de-localized environment using web-based tools. A formidable challenge in the next empirical study would involve maintenance of web-based tools, which will be taken up later. ACKNOWLEDGEMENTS We thank the ALMIGHTY for having given the strength to finish the paper. We would like to thank the reviewers for their valuable comments and suggestions. We would also like to thank our organizations in encouraging us to take this study and present our case. REFERENCES [1] Jane Huffman Hayes, A. Jefferson Offutt, “Product and Process: Key areas Worthy of Software Maintainability Empirical Study” [2] S.Gopalakrishnan, V.P. Kochikar, S.Yegneshwar, “ The Offshore Model for Software Development: The Infosys Experience”, in ACM SIGCPR 1996, Denver Colorado, 1996 [3] Carolyn B.Seaman, “Unexpected Benefits of an Experience Repository for Maintenance Researchers”, WESS 2000 [4] Michelle Cartwright,“ Empirical Perspectives on Maintaining Web Systems: A Short [5] Stephen covey, Simon & Schuster, “The Seven Habits of highly effective people”, 1989 [6] Alan M.Davis, “ Predictions and Farewells”, July / Aug IEEE 1998 [7] Balaji.V, "Overcoming Multi Site Development Challenges in Effective Technology Management", in InDOORS Europe 2000, Reading, U.K
61
Management of a Distributed Testing Process using Workflow technologies: a Case Study1 Bernardo Copstein2 Flávio M. de Oliveira2 Abstract This paper presents an on-going experiment on the application of workflow technologies to the control of a geographically distributed testing process. We propose a system architecture focused on two design goals: (1) support consistency among the diverse types of documents used in the testing process, with small impact on the routines of the human resources involved; (2) allow for distributed assignment and control of the test activities, automating some of these activities with basis on information extracted directly from the documents. The system was implemented in an installation where the testing and development teams are located in two different organizations, and the first results are being gathered to measure improvements in quality and performance. Keywords software testing, workflow technologies, distributed testing.
I. INTRODUCTION The testing phase of software development is composed of two main sub-processes: test plan design and test plan execution [4, 5, 6]. Many of the difficulties detected in the testing phase (delays, mistakes, etc.) have two main reasons: • •
Problems in the inputs and in the process of test plan design; Organizational difficulties at the document flow level, during test plan execution.
An alternative to minimize uncertainty in test plan design/execution is the use of some formal language to describe the test plan. A formal notation has the advantage of having well-defined syntax and semantics, which makes automatic analysis and information gathering much easier. If the specification of the software under test is formalized, parts of the test plan can be generated automatically [7]. Unfortunately, formal methods are not usual in today software industry. Pushing the test engineers and testers to the use of some (apparently cryptic and useless) mathematical formalism may seem a comfortable solution for the designer of testing tools – but a painful task to the project manager. The impact of such a cultural change is not to be underestimated. Problems in document flow are more frequent when the teams involved in the process (development, test planning, 1 2
test execution, etc.) are geographically distributed. Errors and delays in communication, discrepancies between e-mails and phone conversations, and difficulties in reproducing exact environment conditions are some factors that increase the potential for problems, thus increasing costs. Nevertheless, geographic distribution of testing tasks is becoming more and more a common practice in software companies. There is a high demand for robust management of distributed testing processes. Workflow technologies allow online, distributed control and tracking of business processes in organizations [3, 8]. The main functionalities of a workflow management system include [3]: (1) modeling of workflow processes and activities, (2) management of workflow processes in an operational environment and ordering of the component activities of each process, and (3) control of the interactions among users and/or applications during execution of activities. This paper presents an on-going experiment on the application of workflow technologies to the control of a geographically distributed testing process. We propose a system architecture focused on two design goals: (1) support consistency among the diverse types of documents used in the testing process, with small impact on the routines of the human resources involved; (2) allow for distributed assignment and control of the test activities, automating some of these activities with basis on information extracted directly from the documents. The system was implemented in an installation where the testing and development teams are located in two different organizations, and the first results are being gathered to measure improvements in quality and performance. II. THE IMPLEMENTED ARCHITECTURE The TProcess architecture is a framework for workflowbased test plan design and execution (fig.1). Its purpose is to support the activity of test engineers and testers, providing computational support for both processes. The philosophy of the project is to provide effective computational support to the testing task, imposing minimal constraints to already existing procedures and routines in real installations. In this way, the framework may be useful to software development
This work is supported by the HP Brazil – PUCRS Joint Research Agreement 001/99. FACIN-PUCRS, Brazil. e-mail: copstein,
[email protected]
62
teams in different levels of maturity, and can also help gradual evolution.
Test Plan Editor
User
bug reports, created by different testers when tests in a test plan are executed. After the execution of tests, the WMS automatically collects the bug reports and creates a test report, which is reviewed and finalized by the test engineer using the test report editor. The Object Server and the editors were implemented in the Java platform. The WMS was implemented using the HP Process Manager workflow environment, and the DBMS is in Oracle. The architecture was installed in May 2001 and is currently under normal operation; we are now collecting data for the first measures of improvements in performance. Two main parameters are in focus: bugs classified later as “not a bug”, and global rate of delays in test execution tasks.
Test Report Editor Bug Report Editor Object Server
III. CONCLUSIONS AND FUTURE WORK Workflow Management System (WMS)
Although there is recent work on techniques for distributed software development, there are relatively few papers describing experiences on the impact of distribution in the testing process. Ebert et al. [2] present some results focusing validation activities in general. Their findings support the claim that collocated teams are usually more cost-effective in detecting defects, specifically when using code reviews. Our experience has indicated that one important factor which reduces effectiveness in remote testing is the higher noise (mostly human) in communication. On the other hand, when project lifecycles are very dynamic, efficient control of the resources allocated is more difficult. Our hypothesis is that workflow-based architectures can reduce this noise, while providing effective tools for dynamic resource management.
Database Management System (DBMS)
Fig. 1 – Testing process support architecture The workflow management system (WMS) keeps online control of all processes taking place. The database management system (DBMS) stores all documents and the models (definitions) of the processes. These models include information about the activities that compose the process, data objects (including documents) manipulated by these activities, the agents responsible for performing activities, and ordering constraints. The WMS allows monitoring, automated routing, and control of the processes. Instead of using a formal language to describe the test plans and reports (describing test execution results), users create these documents with a set of form-oriented editors, the Test Plan Editor, the Test Report Editor and the Bug Report Editor. These editors reproduce the document formats normally used by the test engineers and testers. Internally, the document is represented as a complex object, composed of other objects representing the document contents: environments information, test steps and expected results, bugs, etc. In this way, the editors are very ease to use, while enforcing consistency among the components. The editors are implemented as client applications, which communicate with an Object Server using TCP/IP. The object server stores the document objects in the DBMS. The WMS has access to the components of the document objects, using the information to control the process and routing the documents. For example, a test report is composed of many
63
The architecture developed is part of a global framework for test plan design and execution. The project goal is to have a set of tools to support the main phases of the testing process, integrated through a workflow management system. The Object Server includes an object-oriented knowledge model, describing all entity types used in the testing processes: functionality requirements, hardware/software components, environment conditions, input/output objects, tests, bugs, etc. This model, together with the server, are structured in order to support other tools that we are currently developing – a library of test case generation algorithms and an information extraction system to find test-related information in requirements specifications written in natural language [1]. REFERENCES [1] Cowie, J.; Lehnert, W. Information Extraction. Comm. of the ACM, vol. 39(1), jan 1996. p. 80-91. [2] Ebert, C.; Parro, C.H.; Suttels, R.; Kolarczyk, H.; Improving Validation Activities in a Global Software Development. Proc. of International Conference on Software Engineering 2001, IEEE Press, 2001. p. 545-554. [3] Hollingsworth, D. The Workflow Reference Model. Hampshire – UK: WfMC, Jan. 1995.
[4] Kaner, C.; Falk, J.; Nguyen, H.Q. Testing Computer Software. New York, J. Wiley & Sons, 1999. [5] Pfleeger, S. L. Software Engineering: Theory and Practice. Englewood Cliffs, New Jersey: Prentice-Hall. 1998. 576 p. [6] Ramachandran, M. Requirements-Driven Software Test: A Process Oriented Approach. Software Engineering Notes. Vol 21, nro. 4, 1996. [7] Stocks, P.A. Applying Formal Methods to Software Testing. Ph.D. Thesis, Univ. of Queensland, Australia, 1993. [8] Workflow Management Coalition. Interface 1: Process Definition Interchange - Process Model. Hampshire - UK: WfMC, Nov. 1998. (Official Release - 7.04)
64
Distributed and Colocated Projects: a Comparison Alessandro Bianchi*, Danilo Caivano*, Filippo Lanubile*, Francesco Rago°, Giuseppe Visaggio* *Dipartimento di Informatica, Università di Bari – Via Orabona, 4 – 70126 – Bari – Italy °Italy Solution Center, EDS Italia, Viale Edison, Lo Uttaro, 81100 - Caserta - Italy {bianchi, caivano, lanubile, visaggio}@di.uniba.it,
[email protected]
Abstract The aim of this work is to point out through experimentation some of the problems that arise with distributed software development, such as the need for new techniques and methods for managing projects and processes, so as to achieve better assignment of activities among the various working groups, together with efficient communication among the members of each team. The paper analyzes the post mortem data on two projects, one conducted at a single site and the other at several different sites concurrently. The findings show that effort estimates are less accurate if the project is a small one, while increasing the number of staff members increases the risk of defects and hence rework; this generates a greater gap between expected and real staff requirements. The study also confirms that in distributed processes there is a greater need for communication among the working members than in colocated processes.
1. Introduction The new forms of competition and cooperation that have arisen in software engineering as a result of the globalization process have an impact on the whole software process. Software development and maintenance have thus become processes distributed over various geographical sites and involve increasing numbers of staff with different cultural backgrounds. It has been pointed out in [CA01] that at present, 50 different nations are collaborating in different ways in software development. However, global software development has a number of disadvantages, such as the need to use ad hoc methods for managing larger, and geographically distant working groups [Coc00], as well as knowledge sharing tools [NFK97, SY99], while there are new overheads involved in the problems of staff communication interchanges [ED01]. Herbsleb and Moitra have identified in [HM01] a set of issues connected with global software development. These consist of: • strategic issues, concerning the decisions for subdividing the tasks among the different sites, so as to be able to work as independently as possible while maintaining efficient communication among sites; • cultural issues, that arise when the staff come from different cultural backgrounds; • inadequate communication, caused by the fact that geographical distribution of the staff over several sites increases the costs of official communications among team members and limits the possibility of carrying on the informal interchanges that traditionally helped to share experiences and foster cooperation to attain the targets; • knowledge management, that is more difficult in a distributed environment as information sharing may be slow and occur in a non uniform manner, thus limiting the opportunities for reuse; • project and process management issues, having to do with all the problems of synchronization of the work at the various different sites; • technical issues, that have an impact on the communication network linking the various sites. This work presents an analysis of data acquired in industrial projects, aiming to assess the impact of project and process management and the need for communication on the results of distributed processes. A post-mortem analysis is made of the data acquired in two projects carried out in EDS Italia: one involving several different geographical sites and the other a single site belonging to the same company. Data are analyzed to show the impact of distributed projects over scheduling of the activities, their subdivision and the degree of synchronization achieved, as described in [PV98]. Furthermore, communication among team members is analyzed analogously to [PSV94], as also we look for evidence of higher cost overheads in distributed processes due to practical limitations of this communication. The paper is organized as follows: section 2 presents the projects and the metrics used in the analysis; section 3 illustrates the main lessons learnt from the investigation; section 4 draws some conclusions.
2. Case Study Setting 2.1 Characterization of the Projects The first project, that will hereafter be indicated as the distributed project, was conducted over 3 different geographical sites of EDS-Italia, and involved resources that operated as a single team. It was a large project requiring a high number of human resources to carry out massive, non routine maintenance of a large software system to solve the Y2K problem. The software system considered was subdivided into functional areas (FA), each consisting of a work-packet (WP). There were 100 WP, and the maintenance effort had to deal with 65 of them. The size of each WP is expressed by the number of items, i.e. programs, library elements or JCL procedures, included. Each WP included a variable number of items ranging from a minimum of 6 to a maximum of 7506, making up a total of 25044 items, i.e. an average of 385.29 items per WP involved in the maintenance effort. The second project, indicated as the colocated project, was conducted at a single site and consisted of corrective maintenance of the software system of a large services company. The goal was to remove all parts of the software
65
system, which caused an unexpected behavior. The software to be maintained consisted of 4 subsystems, each including a variable number of subprojects1. Each of the latter involved 111 or 112 items, for a total of 6672 items. The maintenance operations involved 58 subprojects.
2.2 Data Collection To carry out the post-mortem analysis of the data acquired in these projects, the work packets and subprojects covering all the phases of the working cycle were taken into account. The following measures were collected: • estimated duration and actual duration of the projects required to complete the WPs or subprojects, expressed as working days; • size of the WPs or subprojects, expressed as number of items; • reliability metrics of the WPs, i.e. number of faults and failures, number of requests for change made during the development projects or changes made on the system after it became operative (hereafter simply indicated as number of changes) and total number of problems of any nature that arose during the observation period (simply indicated as number of issues); • effort involved to complete the WPs or subprojects, expressed as working days/ person; • staff size, i.e. number of people who took part in executing the WPs or subprojects; • number of reports produced to describe the work progress; • number of messages, i.e. number of information exchanges among the various working groups; • number of meetings officially held among the members working on the WPs or subprojects. These observed metrics gave rise to the following calculated metrics: • mean of estimated duration of the WPs belonging to a FA (hereafter abbreviated as MED), expressed as working n
days, calculated as MED =
∑ EDi i =1
n is the number of WP in that FA;
, where EDi is the value of the estimated duration of the i-th WP in the FA and n n
• FA size normalized over the number of included WP (FANS), expressed as items, calculated by FANS =
∑ Si i =1
n
,
where Si is the value of the actual size of the i-th WP in the FA and n is the number of WP in that FA; • discrepancy between estimated and actual duration of the i-th WP (DIS_Di), expressed as a percentage, calculated ADi − EDi , where ADi and EDi are the estimated and the actual durations of the i-th WP, by DIS _ Di = RDi respectively; • discrepancy between estimated and actual staff (DIS_Si) for the i-th WP or subproject, expressed as the number of people and subprojects, calculated by DIS _ S i = ES i − AS i , where ESi and ASi are the estimated and the actual staff of the i-th WP or subproject, respectively.
3. Data Analysis 3.1 Estimated vs Actual Duration The first analysis made on the distributed project data assessed the ability of management to estimate the duration. The differences between the estimated and the actual durations were compared. Figure 3.1 shows the gap in percentage between the two values for each WP. It can be seen that for the first WPs there was a tendency to overestimate the required time; this was followed by a chunk of WPs whose time estimates largely correspond to the actual duration. Finally, in the last part of the project, there was a tendency to underestimate the time required to conclude all the activities pertaining to a WP, with only a few exceptions, such as WP G.081. This discordance between estimated and actual duration of the WPs is due to two main factors: a) a low rate of distribution of the WPs among the different sites, b) poor attention paid to the estimates of small WPs in comparison with the others belonging to the same FA, so that these were estimated to take much less than the average duration of the WPs belonging to that particular FA. To investigate the effect of the distribution of the WPs among the sites, we consider them as components of each FA, rather than of the entire system. Table 3.1 demonstrates the correspondence between WPs and FAs and shows the percentage of WPs belonging to each FA performed at a single site. This subdivision shows that the WPs whose times tended to be overestimated belonged to FA P01, and the underestimated WPs to FA P03, and, to a lesser extent, to FA P04. Finally, the chunk of WPs with a good time estimate belonged to FA P02. As to the percentage of WPs performed at a single site (third column in table 3.1), there is a clear 1
Note that the subsystems and the subprojects in the colocated project correspond to the functional areas and work packets in the distributed project.
66
correlation between the true correspondence of the estimate and the distribution of the WPs over several sites. Data in the table show a greater gap between estimated and actual duration for FAs whose WPs were performed at a single site. 0,6
0,456 0,4 0,338
0,321
0,243 0,2
0,141 0,117
0,115 0,077
% of Discrepancy
0
-0,032
0
0,028
0 -0,067
-0,004 -0,038
0
0
0
0 0
0
0
0
0
0
0
0 0
0,088
0
0,01
-0,024
-0,013
0
0
0
0,038 0
0
0
-0,005
0,015
0
0
-0,032
-0,2 -0,254 -0,311
-0,324
-0,4
-0,579
-0,6
-0,629 -0,683
% of discrepancy between expected and actual duration
-0,8 -0,838
-1 G.086
G.084
G.082
G.080
G.061
G.035
G.010
G.063
G.045
G.033
G.018
G.058
G.056
G.054
G.052
G.050
G.044
G.022
G.020
G.015
G.067
G.049
G.034
G.030
G.013
G.011
Work Packet
Figure 3.1. Comparison of the percentage discrepancy between estimated and actual duration of the work packets in the distributed project To investigate the effect of the size of the WPs on the time estimates, table 3.2 shows the data on: MED, FANS, estimated duration of the WPs, DIS_D, and WP size for all those WPs with a greater than 20% gap between estimated and actual duration. FA
Work Packet
% of WPs performed at a single site
P01
G.011, G.012, G.013, G.014, G.030, G.031, G.034, G.038, G.049, G.060, G.067,G.068
66,1%
P02
60,7%
P03
G.015, G.017, G.020, G.021, G.022, G.043, G.044, G.046, G.050, G.051, G.052, G.053, G.054, G.055, G.056, G.057, G.058, G.059 G.018, G.028, G.033, G.040, G.045, G.062, G.063, G.087
73,8%
P04
G.010, G.016, G.035, G.036, G.061, G.071, G.080, G.081, G.082, G.083, G.084, G.085, G.086, G.506
76,4%
Table 3.1. FAs to which each WP belongs, and percentage of WPs performed at a single site. Analysis of these data shows a greater gap for WPs with a lower estimate than the mean for that FA, except in the case of WP G.028. Thus, we can conclude that less accurate estimates were made for smaller WPs, whose duration was estimated to be much lower than the mean for WPs belonging to the same FA. FA
P01
P03
P04
WP G.014 G.030 G.038 G.049 G.068 G.028 G.045 G.062 G.087 G.071 G.081
MED
FANS
232,7 days
586,7 items
106,4 days
365,5 items
170,3 days
666,3 items
Estimated WP duration 148 days 175 days 202 days 79 days 90 days 125 days 94 days 31 days 47 days 128 days 125 days
DIS_D -31% -68% -63% -25% -56% 32% -32% 46% 34% 24% 83%
WP size 81 items 579 items 60 items 18 items 12 items 233 items 56 items 158 items 165 items 31 items 39 items
Table 3.2. Data on WPs with a greater than 20% gap between estimated and actual duration.
3.2 Estimated vs Actual Staff The second analysis assessed the ability to estimate the amount of staff needed to perform the WPs in the distributed project and the subprojects of the colocated project. Figure 3.2 shows the differences between estimated and actual staff in the distributed project (straight line) and the colocated project (dashed line). Firstly, the figure shows that there was a greater difficulty in estimating the staff for the distributed than for the colocated project. There is also a more regular distribution of the gaps between estimated and actual values in the colocated project, whereas in the distributed project, there are evident peaks. The highest gaps in the distributed project can be attributed to the variable duration of the WPs in this project, already examined in the section on Duration. Secondly, the WPs in the distributed project required a higher staff size than the subprojects of the colocated project: the average staff size for the distributed project was just under 20 people per WP (compared with an estimate of just over 20), whereas the colocated project required an average staff size of 15.5 people (compared with an estimated 17.6). Finally, it can be observed in figure 3.2 that only overestimates were made for the colocated project whereas there are also some underestimates for the distributed project.
67
20
16
15 13
12 10
5
4
4
5
4
6
5
4
Discrepancy
0
2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 -1 -2 -2
5
4
2
-4
-5
3
0
4
0
2 1
4 3 1 0 0 0
2
1 1 1 1 1 0 0 0 0 0
-4
-5 -8
-10
-15
Staff Discrepancy (Colocated Project) -20
Staff Discrepancy (Distributed Project)
-21
-25 Work Packets
Figure 3.2. Comparison of the overall discrepancy between estimated and actual staff required in the two projects To identify the causes of these differences, in table 3.3 we show the size and reliability measures for the WPs in the distributed project with a difference of more than 6 people between estimated and actual requirements. WP G.031 G.036 G.040 G.045 G.062 G.071 Mean of all WPs in project
DIS_S 12 people 13 people -21 people 6 people -8 people 16 people 3 people
WP size 4096 items 764 items 407 items 56 items 158 items 31 items 918.67 items
Faults 12 5 0 0 9 0 1,3
Failures 5 4 0 0 0 0 0,4
Changes 5 6 2 1 1 1 1,4
Issues 7 15 6 1 2 2 2,3
Table 3.3. Data on WPs with highest gaps between estimated and actual staff required. There are large gaps for large WPs, like G.031, but also for small ones, like G.045 and G.071, so the size of the WP has no impact on the discordance. This is caused, instead, by the faults that occur during execution of the WPs. In fact, in the distributed project, each of the WPs shown has a significantly higher value for at least one of the fault metrics than the mean (last row in table). We can conclude that the higher the number of people estimated for a WP, the higher the number of faults. This requires more reworking and therefore causes a greater gap between estimated and actual values.
3.3 Communication The third aspect investigated was related to the communication among team members. Table 3.4 shows the measures of communication for each project, expressed both as an absolute value and (in brackets) as a normalized value with respect to the effort required to carry out the relative FA or subsystem. The number of reports produced for each subsystem of the colocated project is constant, whereas it varies for the distributed project. When the value is normalized over the effort, the distributed project is seen to require more reports. The same can be said of meetings. Instead, the number of messages shows the opposite trend, as more messages were exchanged for the colocated than for the distributed project. If we consider only the number of reports and the number of meetings, data confirm that adequate communication between staff members working on the distributed project requires greater effort and hence entails greater cost overheads than for the colocated project. Distributed Project
N° Reports
N° Msg
N° Meetings
Effort
AF1
229 (3.88)
185 (3.14)
73 (1.24)
59 person/days
AF2
206 (3.12)
106 (1.61)
137 (2.08)
66 person/days
AF3
105 (2.19)
65 (1.36)
39 (0.82)
47.8 person/days
AF4
122 (3.99)
118 (3.86)
99 (3.24)
30.6 person/days
30 (0.10)
700 (2.40)
15 (0.05)
291.4 person/days
Colocated Project SS1 SS2
30 (0.08)
924 (2.41)
14 (0.04)
383.5 person/days
SS3
30 (0.15)
533 (2.63)
15 (0.07)
203 person/days
SS4
30 (0.03)
2909 (2.44)
15 (0.01)
1191.1 person/days
Table 3.4. Metrics for communication, expressed both as an absolute value and normalized over the effort
68
To investigate the opposite tendency for messages, figures 3.3.a and 3.3.b show the distribution of messages exchanged by each team, normalized over the effort, for the WPs of the distributed and for the subprojects of the colocated project. 4
1,6 1,47
3,5
1,4
3, 38
No MSG / Effort
No MSG / Effort 1,2
3
1
2,5
0,8
2
2, 85 2, 63
2, 22
1, 91
0,71
1,5
0,6
1, 1, 1, 1, 4 1, 37 1, 31 32 27 26 1, 03
1
0,43
0,4 0,32
0,2
0,18 0,15 0,13
0,12
0,11
0,15
0,15 0,13
0,12 0,09
0,08
0,07
0,08
0,09
0,08
0, 0, 46 0, 42 38 0, 0, 33 33
0, 6 0, 0, 0, 0, 35 34 0, 31 28 26
0, 53 0, 5 0, 4
0, 0, 0, 66 0, 65 63 61
0, 0, 0, 0, 48 46 0, 0, 46 0, 0, 0, 43 41 4 0, 0, 0, 35 35 36 34 31 31
0, 0, 0, 25 24 0, 21 19
1, 07
0, 59 0, 0, 0, 45 49 0, 0, 0, 4 38 0, 33 0, 33 28 0, 0, 23 19 18
0,06
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
G.086
G.084
G.082
G.080
G.061
G.035
G.010
G.063
G.045
G.033
G.018
G.058
G.056
G.054
G.052
G.050
G.044
G.022
G.020
G.015
G.067
G.049
G.034
G.030
G.013
Work Packet
P-922-1.15 P-922-1.14 P-922-1.13 P-922-1.12 P-922-1.11 P-922-1.10 P-922-1.9 P-922-1.8 P-922-1.7 P-922-1.6 P-922-1.5 P-922-1.4 P-922-1.3 P-922-1.2 P-922-1.1 P-913-4.15 P-913-4.14 P-913-4.13 P-913-4.12 P-913-4.11 P-913-4.10 P-913-4.9 P-913-4.8 P-913-4.7 P-913-4.6 P-913-4.5 P-913-4.4 P-913-4.3 P-913-4.2 P-913-4.1 P-900-1.15 P-900-1.14 P-900-1.13 P-900-1.12 P-900-1.11 P-900-1.10 P-900-1.9 P-900-1.8 P-900-1.7 P-900-1.6 P-900-1.5 P-900-1.4 P-900-1.3 P-900-1.2 P-900-1.1 P-893.15 P-893.14 P-893.13 P-893.12 P-893.11 P-893.10 P-893.9 P-893.8 P-893.7 P-893.6 P-893.5 P-893.4 P-893.3 P-893.2 P-893.1
0
G.011
0
0,5
0,21
0,2
1, 3
0, 94
Subprojects
a) b) Figure 3.3. Distribution of the number of messages exchanged by the team, normalized to the effort, for the WPs in the distributed project (a) and for the subprojects of the colocated project (b) These graphs show that in the distributed project, messages are not always exchanged for each WP; in fact, in 30 of 52 cases, (over 50%), the metric value is zero. Moreover, the values for the single WPs and subprojects shows that there is a more intense exchange of messages in the distributed project. This means that in the distributed project, the exchange of messages is affected by the characteristics of the WP and, despite the generally greater intensity, does not contribute to increase the organizational costs, because it is offset by the zero costs for more than 50% of cases.
4. Conclusions This post mortem analysis of the data on two different types of projects confirms some of the global software development problems that have still to be solved. It demonstrates the poor management of distributed projects, especially as regards estimation of the duration of work packets and of the number of staff required to execute them. The gap between estimated and actual values is largely due to the fact that those making the estimates have not gained sufficient experience in distributed environments and therefore define the baselines according to their experience with colocated projects. In short, the fact of being able to perform activities in larger work groups causes the management to underestimate the problems inherent to distributed processes. Moreover, the estimates are more accurate for larger WPs. This can be explained by the fact that for larger WPs with longer estimated times, the problems of inefficiency due to the distributed work are smoothed over time. It can also be observed that increased staff results in an increased number of faults and hence reworks, generating a larger gap between estimated and actual staff needed. Analysis of the duration provided a further indirect confirmation of the conclusions of various authors (e.g. [HMFG01], [ED01], [NFK97]) on the delay introduced by distributed work with respect to work carried out in a single site. Analysis of data on communication within the team show that distributed work requires more reports and meetings than work carried out in a single site, because the informal discussions that enable information sharing among the various participants are lacking. This causes an increase in cost overheads for distributed projects. In conclusion, new techniques and methods for software engineering need to be studied to meet the new requirements created by global software development. Simply transposing traditional technologies does not enable the best exploitation of the potential of distributed work projects and indeed, causes these to appear less efficient than centralized work projects.
5. References [CA01]
E. Carmel, R. Agarwal, “Tactical Approaches for alleviating Distance in Global Software Development”, IEEE Software, Mar-Apr 2001, pp. 22-29. [Coc00] A. Cockburn, “Selecting a Project’s Methodology”, IEEE Software, July-August 2000, pp.64-71. [ED01] C. Ebert, P. De Neve, “Surviving Global Software Development”, IEEE Software, Mar-Apr 2001, pp.62-69 [HMFG01] J.D. Herbsleb, A. Mockus, T.A. Finholt, R.E. Grinter, “An Empirical Study of Global Software Development: Distance and Speed”, Proc. Intl. Conf. on Software Engineering, 2001, pp. 81-90. [HM01] J.D. Herbsleb, D. Moitra, “Global Software Development”, IEEE Software, Mar-Apr 2001, pp. 16-20. [NFK97] K. Nakamura, et al., “Distributed and Concurrent Development Environment via Sharing Design Information”, Proc. of the 21st Intl. Computer Software and Applications Conference, 1997. [PSV94] D.E. Perry, N.A. Staudenmayer, L.G. Votta, “People, Organization and Process Improvement”, IEEE Software, Jul-Aug 1994, pp.36-45. [PV98] D.E. Perry, L.G. Votta, “Parallel Changes in Large Scale Software Development: An Observational Case Study”, Proc. Intl. Conf. on Software Engineering, 1998, pp. 251-260. [SY99] J. Suzuki, Y. Yamamoto, “Leveraging Distributed Software Development”, Computer, Sep 1999, pp.59-65.
69
EPiCS: Evolution Phenomenology in Component-intensive Software M M Lehman
J F Ramil
Department of Computing Imperial College 180 Queen's Gate, London SW7 2BZ tel. +44-20-7594 8214 fax. +44-20-7594 8215 {mml,ramil}@doc.ic.ac.uk nevertheless important, view is concerned with the what and the why of evolution. It examines the nature of the evolution phenomenon, its causes, drivers and impact. The approach is restricted to a much smaller community [e.g. leh74,leh85,94, kem99,raj00] and is based on the view that more insight into, and better understanding of evolution as a phenomenon will itself lead to improved methods for its planning, management and implementation. It will, for example, help identify the areas in which research effort is most likely to yield significant benefit. The FEAST/1 (1996-1998) and /2 (1999-March 2001) projects [leh96,98a] exemplify an investigation into the what and the why of evolution. In general, findings have strengthened the relevance of the view encapsulated in the FEAST hypothesis [leh94]. This states that E-type software evolution processes are multi-agent multi-level multi-loop feedback systems and must, therefore, be treated as such during planning and management of the process if sustained improvement of that process is to be achieved [leh94,feast]. The FEAST projects had as its broad objectives [leh96,98a] the empirical study of evolutionary attributes of products and processes based, inter alia on the construction of black-box [e.g. leh98b,ram01] and system dynamics [for61] white-box [e.g. kah00] models. These models were to reflect attributes of longterm software system evolution, including growth trends and evolution rates. Results strengthened the support for and understanding of the evolution phenomenology consolidated over the years [e.g. leh74,85,cho80]. That understanding led to the SPE program classification [leh80b,85,feast], the laws of software evolution [leh74,78,80a,b,85,97,feast] and a principle of software uncertainty [leh89,90]. Practical outcome of the project is exemplified by a set of rules and guidelines for software evolution planning and management [leh00a]. Together with the work of others in the area [e.g. kem99, raj01], the body of knowledge and the understanding achieved appears to offer the basis for the development of a formal theory of software evolution [leh00b]. A proposal for such a development is currently being prepared [leh00c] and, it is hoped, will run in parallel with the one [leh01b] for which preliminary plans are outlined in the present paper. The latter will aim at the empirical study of evolution in the component-intensive software and related domains, domains arousing widespread and increasing interest.
ABSTRACT This paper briefly justifies and outlines initial plans for an empirical investigation into the evolution of component-intensive software systems. Questions to be addressed include whether the findings and observations of traditional software evolution studies garnered over the last 30 years or so, and most recently as part of the FEAST projects, apply to and are relevant in the context of component-intensive software. The goal of the suggested investigation is to contribute towards disciplined planning and management of long-term evolution of component-intensive software. In submitting this paper to the WESS 2001 workshop, the authors are presenting initial plans to obtain early exposure of the ideas and constructive criticisms from the empirical studies community. Keywords Component-Based Software Engineering, Component-Intensive Software, COTS, Empirical Studies, Evolution Planning and Management, Maintenance, Metrics of Software Evolution, Process Improvement, Reuse, System Dynamics.
1. INTRODUCTION Software evolution processes, including those of software change, encompass all activities required to maintain stakeholder satisfaction over the operational life of a software system. The objectives of such activities include fixing, adaptation, and enhancement of the software. This all-inclusive view of software evolution encompasses what has otherwise been termed maintenance. The need for continuous evolution is, to a lesser or greater extent, relevant to all software that addresses real-world applications. In a world increasingly dependent on computers and, therefore its software, the nature, impact and management of evolution has great social and economic importance [leh85,01a]. The term evolution tends to be interpreted and studied in two separate and distinct ways. The more common approach sees the most important issues as those concerning the methods and means whereby evolution may be achieved. The focus of this approach is on the how of software evolution and includes both methods and tools to achieve the desired results in a systematic and controlled manner. Both ab initio development from conception to operational realisation and adaptation and extension of a system to be more satisfactory in a changing operational environment are considered, with far more attention being applied in research to generation rather than to change. This despite the fact that in industry more investment is made in change than in ab initio development. Work exemplifying this view has been presented in a series of international meetings on Principles of Software Evolution [ispse] and in a recent session on Formal Foundations of Software Evolution [ffse]. A complementary view has, however, also been taken. This, less frequently encountered but 09/08/01 11:52
2. THE MOVE TOWARDS COMPONENTINTENSIVE SOFTWARE Software development by assembly of a set of mass produced components was already discussed in the sixties [nau69]. In advancing the concept the parallel with other engineering fields, where the concept of components is almost universally accepted and applied was frequently invoked. Nuts, bolts and a wide variety of electrical connectors and components exemplify internationally agreed standards. Such standard components can be obtained from numerous suppliers and used as building blocks 1
© M.M. Lehman and J.F. Ramil, 2001. All rights reserved.
70
681_97_wess2001
in different and unique products. It was, however, not until recently that the practical application has been widely explored in practice [bro98,dso99]. A recently coined term, component-based software engineering (CBSE) [bro98,iee00], reflects the current trend towards the use of components, developed either in-house or as commercial off-the-shelf software (COTS) [e.g. mck99,car00,mor00,myr99] units. CBSE is widely seen as a way of reducing some of the problems so often encountered, especially in ab initio software development. In particular, the use of COTS is seen, as a means to achieve increased productivity and reliability, and to decrease the delivery time for large-system implementation. To what extent these expectations are being fulfilled, in particular, in the context of long-term evolution, is an open question that, together with related issues, deserves empirical investigation. The majority of empirical studies, including FEAST, have focused on software constructed and evolved using traditional paradigms, which did not make significant use of externally supplied components. But the evolution phenomenon is likely to re-appear as the newly emerging paradigms are applied. They must, for example, be expected to emerge, even dominate, in the areas of component-intensive software and their processes [leh00d]. Such systems are acquiring ever greater social and economic importance. Thus the question whether the findings of studies of traditional software domains are also relevant in the context of component-intensive software is of considerable, perhaps vital interest, as is the question of their extent. This paper presents preliminary plans for an investigation of this and related issues in a project to be called EPiCS (Evolution Phenomenology in Component-intensive Software).
systems, discusses their potential managerial impact and provides some preliminary recommendations. By so doing, the paper provides a reasoned set of questions and hypotheses to be further developed and empirically investigated. The proposed investigation is aimed at empirical investigation of the latter. As an example, one of these aspects is briefly discussed below. 3.1 Hypothesis: Complexity of the Software and its Impact The second law of software evolution, “Increasing Complexity”, [leh74] states that “As an E-type system evolves its complexity increases unless work is done to maintain or reduce it”. It appears that, in principle, the introduction of components will apparently reduce the constraints implied by the second law. Why? The task of building a complex system is now virtually shared in a classical “divide-and-conquer” approach by component builders, who provide and evolve components, on the one hand, and integrators, on the other, with the market (and even specialised brokers1) as mediator. The constraints implied by the 2nd law are expected, however, to re-emerge both at the level of the individual component and of the host system. The brief discussion below closely follows that in [leh00d]. The reader is referred to that paper for further details. 3.1.1 Individual Component Level Individual components will be seen to follow the law since they must be adapted to satisfy a mix of changed requirements from a variety of sources with independent, possibly orthogonal, needs and assumptions. Unless work is done to control component complexity, the resultant increase will be reflected in a growing maintenance burden leading to a decline in maintenance productivity and response time, of the component supplier’s organisation. Inevitably, knock on effects will impact on the integrator’s organisation. Complexity growth due to the volume and likely orthogonal nature of successive changes means that the component supplier may find it increasingly difficult and expensive to respond to component user needs in timely fashion with a high quality product. In particular, once a component is marketed, the supplier may not have sufficient incentives to invest significant effort in complexity reduction and other clean-up or re-engineering, even if the cost is shared by its customers. In the face of market and competitive pressures and with a focus on short-term profits, what will become the general accepted practice, remains to be seen. For example, in a market dominated by component suppliers, the latter may prefer to invest in refinement and extension of their products to attract new customers, rather than in cleaning-up existing products. Indeed, circumstances may force suppliers to withdraw support from individual components and offer a replacement which, despite a claim of upward compatibility, will inevitably involve new assumptions that may conflict with the host system. A whole series of new problems will emerge as both the component based, that is, the host system, and the components on which it relies are required to evolve. 3.1.2 Host System Level It may be that the introduction of components rather than the use of traditional paradigms, will lead to faster and more effective preparation of the initial version of a host system. Thus, the introduction of components will appear to reduce the constraints implied by the second law. However, increased capability and growing expectations tend to go hand-by-hand. This, and other factors, will lead to an increase in the functionality and size of software systems. As an increasing number of components are used, each is effectively a primitive in a language defined by the
3. HYPOTHESES FOR EMPIRICAL INVESTIGATION Eight laws of software evolution, as exemplified below, have been discussed in the literature [leh74,78,80a,b,85,97,feast]. Six of these have been generally supported (with minor modifications) by the study and interpretation of data gathered in FEAST. They provide phenomenological descriptors of evolutionary behaviour observed over the years in a number of software systems. The term laws was deliberately selected to indicate that they address and reflect forces rooted more in cognitive, organisational and societal mechanisms than in the specific software technology (e.g. languages, other tools, process models) being applied. Thus, from the point of view of individual stakeholders the phenomena reflected in the laws are outside their control. The phenomena appear to them as reflecting inescapable behaviour, hence the use of the term laws. The laws provide qualitative descriptors that relate directly to the evolutionary behaviour of E-type software, that is, software used to solve a problem or address an application in a real world domain. Real world software, that is software of type E, is ultimately judged by stakeholder satisfaction with the results of its execution [leh80b,85]. Several papers [leh80b,85,feast] have discussed the fact that any E-type computer application must undergo continuing evolution if stakeholder satisfaction is to be maintained. The fact that the system implementing the application includes a significant number of mass-produced components does not affect this fundamental truth. Thus, one may expect the laws to be relevant in the context of component-based software engineering, though their statement, the phenomenology they reflect, may have to be refined in the light of experience and as supported and typified by empirical evidence. A recent paper [leh00d] examines the most immediate implications of the laws in the context of component-intensive 09/08/01 11:52
1
2
© M.M. Lehman and J.F. Ramil, 2001. All rights reserved.
71
Suggested in a recent presentation (EWSPT 8) by N. Madhavji.
681_97_wess2001
full set of component units, with the interfaces providing the syntax of their use. Thus construction of the systems is based on a form of very high level programming with component characteristics as its primitives. Hence, one must expect many of the behavioural characteristics of the classical programming domains to reappear in the new domain. The behaviour and constraints implied by the second law will eventually reappear, though at a higher level of abstraction, the constraints implied by the law are difficult to avoid. The preceding is a preliminary and incomplete analysis, presented here to suggest that there are issues related to the long term evolution of component-intensive systems that are worthy of investigation and to the resolution of which empirical work can make a significant contribution. Other hypotheses that have been advanced, for example, in [hyb97,leh00d] will be taken into account in conducting an investigation based on a refined version of the present preliminary plans. As in previous studies of software evolution the main goal of the proposed investigation will be to advance understanding and mastery of the phenomenon.
hypotheses of interest [leh00d]. Once collected, such data will provide a basis for both black box [e.g. cle93,han96] and white box modelling. As in FEAST, system dynamics process modelling [for61,sen90] will be applied to the latter wherever possible. The second, qualitative data [sea99], will include results of interviewing key personnel, natural language descriptions of the evolution processes from their inception, etc. Such data will provide a set of qualitative descriptions of the phenomena and facilitate the identification of evolutionary mechanisms and of the major drivers underlying the evolutionary characteristics of the individual systems studied. Ideas inspired in the search for grounded theory [bar67] may inform such identification. The goal is to achieve a set of empirically based qualitative models [wol85] of the process. One aspect worthy of further attention is identification of stages, described in [raj00], in the evolution of software. As the empirical data gathering progresses, quantitative (black box and white box) and qualitative process views will be developed, validated, checked for cross consistency and completeness and, when possible, merged. As in FEAST, models will be refined when possible following a top-down process inspired in [zur67]. This will necessarily proceed in an iterative fashion, prompt revision of the data sets and probably suggesting additional attributes. It is expected to complete more than one cycle during the investigation. Results will be compared to the phenomenological invariants suggested by the SPE classification, the laws of software evolution and, in particular, with the hypotheses put forward in [leh00d]. The planned outcome will, hopefully, be a phenomenology of long term component-intensive software evolution and an empirically based set of rules and guidelines for the long-term evolution of such applications. Tool concepts may also emerge. The concepts and methodological tools to be utilised in the suggested investigation are expected to be to a great extent analogous to those used and refined during the FEAST studies e.g. [leh96,98a,ram01], though differences between traditional and component-intensive domains must be taken into account. Modelling and characterisation of attributes beyond the immediate software process, such as those of co-evolving processes and domains [soce], e.g. business and organisational, will be given particular attention in the EPiCS study. Other expected differences are in the area of metric definitions, in particular with regards to the search for metrics that will usefully reflect evolutionary attributes of the process and the product. Much of the work in software metrics assumes that there is access to the software artefacts (e.g. the code) in full [e.g. fen97]. Since in many respects components are black boxes from the point of view of the component integrators, metrics that relate to them may be unobtainable. Some challenges may already be identified. For example, it is unlikely, in the next several years, that metric data sets covering extensive periods of time will be available. So the small size of the data sets, already a challenge in FEAST, is likely to be an even greater challenge in EPiCS. This may not be so serious a limitation since one may expect faster and more frequent release cycles in the component-intensive process. This would compensate for the relatively recent adoption of the componentintensive paradigm. Another challenge emerges from the size of the integration teams. Since the teams involved in componentintensive processes tend to be smaller in size than those in traditional development, the role of the individuals may be more significant in determining evolutionary behaviour. Thus, one may expect that quantitative regularities in evolutionary attributes such
4. THE INVESTIGATION The following activities are foreseen as part of the EPiCS investigation: (i) develop instruments and tools for the collection of qualitative and quantitative data in a number of industrial organisations to address the hypotheses put forward (ii)
obtain empirical, qualitative and quantitative, data sets representative of the evolution of component-intensive industrially developed systems
(iii)
develop case studies of a number of evolving componentbased applications, representing widely different domains. major events, such as system restructuring, process changes and team reorganisations, from inception, highlighting significant events and factors at play in the long term evolution
(iv)
in parallel, and as soon as appropriate data is available, apply conceptual and modelling methods analogous to those developed and successfully pursued in FEAST to the new data sets
(v)
calibrate the models built in the previous step and derive phenomenological predictions
(vi)
test the phenomenological hypothesis put forward in [hyb97,leh00d]
(vii) document the results including observed characteristics, trends and patterns (viii) based on the above results derive a set of rules, tool proposals and guidelines for the evolution of componentintensive software [leh00a].
5. GENERAL APPROACH The project will start with data collection from industrially evolved component-based applications in collaborator's organisations. Both, quantitative and qualitative2 data, should be available [mor00]. The first will be empirical data reflecting a set of metrics of interest. Their selection will be inspired by the metric sets used in the study of evolution systems but adapted as necessary to what is available and to the need to address the 2
The gathering of qualitative data, along with quantitative, in empirical studies of software evolution was one of the suggestions that emerged during a recent workshop [fesbp].
09/08/01 11:52
3
© M.M. Lehman and J.F. Ramil, 2001. All rights reserved.
72
681_97_wess2001
as growth trends and evolutionary rate (e.g. in counts of modules handled [leh85]) may exhibit a higher variance with, in general, less well defined trends than those where large teams are involved. This is one reason why collection, modelling and interpretation of qualitative data is crucial to the study.
[bro98] BROWN, A. and WALLNAU, K.: 'The Current State of Component-Based Software Engineering (CBSE),' IEEE Software, September 1998, pp. 37-47 [car00] CARNEY, D., HISSAM, S.A. and PLAKOSH, D.: 'Complex COTS-based Software Systems: Practical Steps for Their Maintenance', J. of Softw. Maintenance: Res. and Pract., vol. 12, 2000, pp. 357 - 376 [cho80] CHONG HOK YUEN, C. K. S.: ‘Phenomenology of Program Maintenance and Evolution’, PhD thesis, Dept. of Comp., Imperial College, 1981 [cle93] CLEVELAND, W.S.: 'Visualizing Data', Hobart Press, Summit, NJ, 1993, pp. 360 [dso99] D'SOUZA, D. and WILLS, A.C.: 'Objects, Components, and Frameworks with UML: The Catalysis Approach' (Addison-Wesley, Boston, Ma. 1999) [feast] FEAST: 'Feedback, Evolution And Software Technology' Web Site including selected pubs. http://www-dse.doc.ic.ac.uk/~mml/feast [fen97] FENTON, N.E. and PFLEEGER, S.L.: 'Software Metrics: A Rigorous and Practical Approach', Int. Thomson Comp. Press, London, 1997, pp. 638 [fesbp] FEAST 2000: 'Intl. Workshop on Feedback and Evolution in Business and Software Processes'. 10 - 12 July 2000, Imp. Col., London. Pre-prints available from
6. RELATED WORK Recent discussion of component-intensive software engineering [e.g. bro98,dso99,kon96,wu00] has focused on principles and techniques for the construction of re-usable components and component-based systems [e.g. hal97,jac97], on the processes followed in CBSE [mor00] and on the impact of its adoption on other organisational processes [mck99], all aspects of the how. The what and the why of the evolution of component-intensive software has received much less attention. Included in this omission are the evolutionary attributes of component-intensive products and processes. As suggested in the eighties and supported by the FEAST findings [feast], the evolution of software is inevitable, whatever the implementation paradigm. It is related to and driven by the evolution of the applications, that is, of the use to which the system is put [leh01a]. Application evolution may be achieved by changing the software to support the new functionality or performance requirements. Alternatively, these may be provided by throwing away the old system and providing a new one that has the desired characteristics [voa99]. In either event, the achievement of an empirically-based understanding the way in which component-intensive applications evolve is essential if effective, disciplined and timely evolution is to be achieved, and in particular, to facilitate planning, direction, management and control of component technology and application.
http://www.doc.ic.ac.uk/~mml/f2000
[ffse]
[for61]
7. REUSE
[hal97]
A trend complementary to the use of components relates to software reuse, that is the use in some system of a software component originally developed for some other system or systems [jac97]. Its application and, in particular, its increasing adoption, is considered by some [hal97] to be more effective in the long term, at least in some respects, than development of new code. However, extensive reliance on reuse introduces new problems. Some of these are similar to the problems that arise in componentintensive software and are implicitly covered in EPiCS. Those which are exclusive of reuse are beyond the scope of the envisaged investigation.
[han96]
[hyb97]
[iee00]
8. FINAL REMARKS
[ispse]
The present paper discusses plans for an empirical investigation into the what and why of the evolution of component–intensive software. The main goal of the suggested investigation is to contribute towards disciplined planning and management of longterm evolution of component-intensive software. A funding proposal to carry out the investigation is currently being drafted. By submitting this paper, the authors are presenting their initial plans to the WESS 2001 workshop, seeking the early exposure of these ideas to the empirical studies community and possible partners. They welcome comments, suggestions and pointers to related work, completed or in progress.
[jab97]
[kah00]
[kem99]
9. REFERENCES3 [bar67] BARNEY, G.G. and STRAUSS, A.L.: 'The Discovery of Grounded Theory: Strategies for Qualitative Research', Aldine de Gruyter, 1967 3
[kon96]
An “*” indicates that the paper has been reprinted in [leh85].
09/08/01 11:52
4
© M.M. Lehman and J.F. Ramil, 2001. All rights reserved.
73
FFSE 2001: ‘Intl. Special Session on Formal Foundations of Software Evolution’, 13 March 2001, Lisbon, http://prog.vub.ac.be/poolresearch/FFSE/FFSEWorkshop.html FORRESTER, J.W.: 'Industrial Dynamics' (MIT Press, Cambridge MA, 1961) HALLSTEINSEN, S. and PACI, M.: 'Software Evolution and Reuse'. Springer Verlag, Berlin, 1997, 293 pp. HAND, D.J. and CROWDER, M: 'Practical Longitudinal Data Analysis', Chapman & Hall, London, 1996, pp. 232 HYBERTSON, D.W., ANH, D.T. and THOMAS, W.M.: 'Maintenance of COTS-intensive Software Systems,' Softw. Maint.: Res. and Pract., 1997, 9, pp. 203-216 IEE Proceedings - Software: 'Special Issue on Component Based Software Engineering', v. 147, n. 6, Dec. (2000). ISPSE 2000. 'Proc. of the Intl. Symp. on Principles of Softw. Evolution', Nov. 1-2, 2000, Kanazawa, Japan JACOBSON, I., GRISS, M. and JONSSON, P.: 'Software Reuse - Architecture, Process and Organisation for Business Success' (Addison-Wesley, 1997, 560 pps.) KAHEN, G., LEHMAN, M.M., RAMIL, J.F. and WERNICK, P.D.: 'An Approach to System Dynamics Modelling in the Investigation of Policies for E-type Software Evolution.' ProSim 2000, 12-14 July 2000, Imperial College, London UK. A revised version to appear in J. of Syst. and Softw., vol. 15, 2001 KEMERER, C.F. and SLAUGHTER, S.: ‘An Empirical Approach to Studying Software Evolution’, IEEE Trans. on Softw. Eng., vol. 25, n. 4, July/August 1999, pp. 493 – 509 KONTIO, J.A.: 'Case Study in Applying a Systematic Method for COTS Selection'. Proc. ICSE 18, 25-29 March 1996, Berlin, pp. 201-209 681_97_wess2001
[leh74] *LEHMAN, M.M.: 'Programs, Cities, Students, Limits to Growth?' Inaugural Lecture, May 1974. Publ. in Imp. Col. of Sc. Tech. Inaug. Lect. Ser., 9, 1970 - 1974, pp. 211-229. Also in Programming Methodology, D Gries (ed.), Springer Verlag, 1978, 42-62 [leh78] *LEHMAN, M.M.: 'Laws of Program Evolution - Rules and Tools for Program Management'. Proc. Infotech State of the Art Conf., Why Software Projects Fail, April 1978, 11/1-11/25 [leh80a] *LEHMAN, M.M.: 'On Understanding Laws, Evolution and Conservation in the Large Program Life Cycle,' J. of Sys. and Software, 1980, 1, (3), pp. 213-221 [leh80b] *LEHMAN, M.M.: 'Programs, Life Cycles and Laws of Software Evolution'. Proc. IEEE Special Issue on Softw. Eng., 68, (9), Sept. 1980, pp. 1060-1076 [leh85] LEHMAN, M.M. and BELADY, L.A.: 'Program Evolution, - Processes of Software Change', (Acad. Press, London, 1985) [leh89] LEHMAN, M.M.: 'Uncertainty in Computer Application and its Control Through the Engineering of Software,' J. of Softw. Maintenance: Res. and Practice, 1, (1), September 1989, pp. 3-27 [leh90] LEHMAN, M.M.: 'Uncertainty in Computer Application'. Tech. Let., CACM, 33, (5), May 1990, pp. 584-586 [leh94] LEHMAN, M.M.: 'Feedback in the Software Process', Keynote Address, CSR Eleventh Annual Wrksh. on Softw. Ev. - Models and Metrics. Dublin, 7-9th Sep. 1994. Also in Info. and Softw. Tech., spec. iss. on Softw. Maint., v. 38, n. 11, 1996, Elsevier, 1996, pp. 681 - 686 [leh96] LEHMAN, M.M. and STENNING, V.: 'FEAST/1: Case for Support.' Department of Computing. Imperial College, March 1996, available from FEAST web page, see ref. [feast] [leh97] LEHMAN, M.M.: 'Laws of Software Evolution Revisited'. Proc. EWSPT'96, Nancy, 9-11 October 1996, LNCS 1149, Springer Verlag, 1997, pp. 108-124 [leh98a] LEHMAN, M.M.: 'FEAST/2: Case for Support.' Department of Computing, Imperial College, July 1998, from FEAST web page ref. [feast] [leh98b] LEHMAN, M.M., PERRY, D.E. and RAMIL, J.F.: 'On Evidence Supporting the FEAST Hypothesis and the Laws of Software Evolution'. Proc. Metrics'98, Bethesda, Maryland, 20-21 November 1998, pp. 84 - 88 [leh00a] LEHMAN, M.M.: 'Rules and Tools for Software Evolution Planning and Management', in [fesbp]; rev. version with J.F. Ramil in Annals of Softw. Eng., spec. iss. on Softw. Managmt., v. 11, 2001 [leh00b] LEHMAN, M.M. and RAMIL, J.F.: 'Towards a Theory of Software Evolution and its Practical Impact'. Invited Talk, Proceedings Intl. Symposium on Principles of Softw. Evolution, ISPSE 2000, 1-2 Nov, Kanazawa, Japan, pp. 2 - 11 [leh00c] LEHMAN, M.M.: 'TheSE- An Approach to a Theory of Software Evolution', project proposal, Dept. of Computing, Imperial College, Dec. 2000, pp. 9 [leh00d] LEHMAN, M.M. and RAMIL, J.F.: 'Software Evolution in an Era of Component Based Software Engineering', IEE Proceedings - Software, v. 147, n. 6, Dec. (2000), pp. 249 - 255, earlier version as Technical Report 98/8, Imperial College, London, Jun. 1998
09/08/01 11:52
[leh01a] LEHMAN, M.M. and RAMIL, J.F.: ‘Software Evolution’, inv. keynote lect., IWPSE 2001, Vienna, Sept. 10-11, a revised and extended version of an article to appear in Marciniak J. (ed.), Encyclopedia of Software Engineering, 2nd. Ed., Wiley, 2002 [leh01b] LEHMAN, M.M. and RAMIL, J.F.: ‘EPiCS: Evolution Phenomenology in Component-intensive Software’, proposal draft, Dept. of Comp., Imp. Col., Aug. 2001 [mck99] McKINNEY, D.: 'Impact of Commercial Off-The-Shelf (COTS) Software on the Interface Between Systems and Software Engineering.' Invited Talk, Panel on COTS Integration, Proc. ICSE'99, 16-22 May 1999, Los Angeles, CA, pp. 627-8 [mor00] MORISIO, M., et al: 'Investigating and Improving a COTS-Based Software Development Process', Proc. ICSE 22, 4-11 June 2000, Limerick, Ireland, pp. 32 - 41 [myr99] MYRTVEIT, I. and STENSRUD, C.: 'Benchmarking COTS Projects Using Data Envelope Analysis', Metrics 99, Boca Raton, FL., pp. 269 - 278 [nau69] NAUR, P. and RANDELL, B. (eds.): 'Software Engineering'. report on a conference sponsored by the NATO Science Committe, Garmisch, Germany, 7-11 October 1968, January 1969, 231 pps. [nus97] NUSEIBEH, B.: 'ARIANNE 5 Who Dunnit?', IEEE Software, May/June 1997, pp. 15-16 [prosim] PROSIM 2000: 'Int. Workshop on Software Process Simulation Modelling', 12 - 14 July 2000, Imp. Col., London [ram01] RAMIL, J.F., LEHMAN, M.M., and SANDLER, U.: 'An Approach to Modelling Long-Term Growth Trends in Large Software Systems', submitted to ICSM 2001, 6-10 November, Florence, Italy [raj00] RAJLICH, V.T. and BENNETT, K.H.: 'A Staged Model for the Software Life Cycle', Computer, July 2000, pp. 66 - 71 [sea99] SEAMAN, C: 'Qualitative Methods in Empirical Studies of Software Engineering', IEEE Trans. on Softw. Eng, vol. 25, n. 4, July/Aug. 1999, pp. 557 - 572 [sen90] SENGE, P: 'The Fifth Discipline - The Art & Practice of The Learning Organisation', Currency/Doubleday, NY, 1990, pp. 423 [soce] SOCE 2000:'Workshop on Software and Organisation Co-evolution'. 12-13 July 2000. Imp. Col., London [tur87] TURSKI, W.M. and MAIBAUM, T.: 'The Specification of Computer Programs', Addison Wesley, London, 1987, p. 278) [tur96] TURSKI, W.M.: 'Reference Model for Smooth Growth of Software Systems,' IEEE Trans. on Soft. Eng. 22, (8), August 1996, pp. 599-600 [voa99] VOAS, J.M.: 'Disposable Information Systems: The Future of Software Maintenance?,' Journal of Softw. Maint: Res. Pract., 1999, 11, pp. 143-150 [wol85] WOLSTENHOLME, E.F.: 'A Methodology for Qualitative System Dynamics', in The 1985 Int. Conf. of the System Dynamics Society, 1985 [wu00] WU, Y., PAN, D. and CHEN, M.H.: 'Techniques of Maintaining Evolving Component-based Software', ICSM 2000, 11-14 Oct., San Jose, CA, pp. 236 - 246 [zur68] ZURCHER, F.W. and RANDELL, B: 'Iterative MultiLevel Modelling - A Methodology for Computer System Design', Info. Proc. 67, Proc. IFIP Congr. 1968, Edinburgh, Aug. 1968, pp. D138 - 142
5
© M.M. Lehman and J.F. Ramil, 2001. All rights reserved.
74
681_97_wess2001
Code Analysis, Metrics
Measuring and Predicting the Linux Kernel Evolution Francesco Caprio , Gerardo Casazza , Massimiliano Di Penta , Umberto Villano
[email protected],
[email protected],
[email protected],
[email protected] ()
University of Sannio, Faculty of Engineering - Piazza Roma, I-82100 Benevento, Italy () University of Naples “Federico II”, DIS - Via Claudio 21, I-80125 Naples, Italy
Abstract
plexity are accurate predictors of the resources required to the development of a software product. For example, the relation between size, complexity and required resources has led to the development of a number of cost models such as COCOMO and Slim [1, 14]. These models require as input an estimate of size and complexity, and the resulting accuracy of the model is related to the accuracy of the estimated input. Several studies show size estimate errors as high as 100%, which imply poor resources estimation and unrealistic project scheduling [7]. A time series is a collection of observations made sequentially in time. One of the possible objectives in analyzing time series is prediction: given an observed time series, it is possible to predict its future values. The prediction of future values requires the identification of a model describing the time series dynamics. Given the number of releases of a software system, the related sequence of sizes can be thought of as a time series. Once the time series has been modeled, it is possible to predict the size of the future releases. To investigate such a conjecture, the source code of 68 subsequent stable releases of the Linux kernel, ranging from version 1.0 up to version 2.4.0, was downloaded from http://www.kernel.org. Its size and complexity in terms of LOC (lines of code), number of functions and average cyclomatic complexity was evaluated, in order to obtain the time series describing the size evolution. Then, the dynamics behind the Linux kernel evolution was modeled using time series. Finally, a cross-validation procedure was performed to assess the accuracy of the estimates thus produced.
Software systems are continuously subject to evolution to add new functionalities, to improve quality or performance, to support different hardware platforms and, in general, to meet market request and/or customer requirements. As a part of a larger study on software evolution, this paper proposes a method to estimate the size and the complexity of a software system, that can be used to improve the software development process. The method is based upon the analysis of historical data by means of time series. The proposed method has been applied to the estimation of the evolution of 68 subsequent stable versions of the Linux kernel in terms of KLOCs, number of functions and average cyclomatic complexity. Keywords: Source code metrics, time series, prediction, Linux kernel
1. Introduction It is widely recognized that software systems must evolve to meet user ever-changing needs [9, 10]. Several driving factors for evolution may be identified: these include new functionalities added, lack of software quality, lack of overall system performance, software portability (on new software and hardware configurations, i.e., new platforms) and market opportunities. As a part of a larger study, we are investigating the influence of software evolution on its size. In this paper a method to estimate size and complexity of the next release of a software system is proposed. The method is based upon the analysis via time series of historical data. The principal reason for estimating both size and complexity of a software product is to help the developing process. The quality of a software development plan strongly depends on the quality of the estimated size and complexity. Humphrey recognized a poor size estimate as one of the principal reasons of projects failure [8]. Both size and com-
2. Background Notions A time series is a collection of observations made sequentially in time; examples occur in a variety of fields, ranging from economics to engineering. Time series can be modeled using stochastic processes [12]. A stochastic process can be described as a statistical
77
where X (t) is the original series and Z (t) is a series of unknown random errors which are assumed to follow the normal probability distribution. Using the backward shift operator B , the previous equation may be written in the form
phenomenon that evolves in time according to probabilistic laws. Mathematically, it may be defined as a collection of random variables ordered in time and defined at a set of time points which may be continuous or discrete. One of the possible objectives in analyzing time series is prediction: given an observed time series, one may want to predict its future values. The prediction of future values requires the identification of a model describing the time series dynamics. There are many classes of time series models to choose from; the most general is the ARIMA class, which includes as special cases the AR, MA and ARMA classes. A discrete-time process is a purely random process if it consists of a sequence of random variables fZt g which are mutually independent and identically distributed. By definition, it follows that purely random processes have constant mean and variance. Under the assumption that fZt g is a discrete purely ran2 , a process dom process with mean zero and variance Z fXtg is said to be a moving average process of order q (MA(q )) if
Xt = Zt + 1 Zt 1 + : : : + q Zt
q
where
(1)
(2)
a moving average process can be written as
Xt = (B )Zt
(3)
(B ) = 1 + 1 B + : : : + q B q
(4)
where
p
+ Zt
p
(5)
+Zt + 1 Zt 1 +: : :+ q Zt
(8)
Wt = 1 Wt 1 + : : : + p Wt
p
+ Zt + : : : + q Zt
(9)
q (10)
More details on time series can be found in [2].
where fi g are constants. This is similar to a multiple regression model, where fXt g is not regressed on independent variables but on past variables of fXt g. Broadly speaking, a MA(q ) explains the present as the mixture of q random impulses, whereas an AR(p) process builds the present in terms of the past p events. A useful class of models for time series is obtained by combining MA and AR processes. A mixed autoregressive moving-average process containing p AR terms and q MA terms is said to be an ARMA process of order (p; q ). It is given by
Xt = 1 Xt 1 +: : :+p Xt
(B ) = 1 1 B : : : p B p (B ) = 1 + 1 B + : : : + q B q
Wt = rd Xt = (1 B )d Xt the general process ARIMA(p; d; q ) is of the form
If fZt g is a discrete purely random process with mean 2 , then a process fXt g is said to be an zero and variance Z autoregressive process of order p (AR(p)) if
Xt = 1 Xt 1 + : : : + p Xt
(7)
A time series is said to be strictly stationary if the joint distribution of X (t1 ) : : : X (tn ) is the same as the joint distribution of X (t1 + ) : : : X (tn + ) for all t1 ; : : : tn ; . In other words, shifting time origin by an amount has no effects on the joint distributions, which must therefore depend on the intervals between t1 ; t2 ; : : : ; tn . The importance of ARMA processes lies in the fact that a stationary time series may often be described by an ARMA model involving fewer parameters than a pure MA or AR process [4]. However, even if stationary time series can be efficiently fitted by an ARMA process [13], most time series are non-stationary. Box and Jenkins introduced a generalization of ARMA processes to deal with the modeling of non-stationary time series [2]. In particular, if in equation (7) Xt is replaced with rd Xt , it is possible to describe certain types of nonstationarity time series. Such a model is called ARIMA (Auto Regressive Integrated Moving Average) because the stationary model, fitted to the differenced data, has to be summed or integrated to provide a model for the nonstationary data. Writing
where f i g are constants. Once the backward shift operator B has been defined as
B j Xt = Xt 1
(B )Xt = (B )Zt
3. The Model A wide variety of prediction procedures are proposed by [3, 6, 18]; the method proposed here relies on the Box and Jenkins one [2]: an ARIMA(p; d; q ) process includes as special case an ARMA process (i.e., ARIMA(p,0,q ) = ARMA(p; q )). When modeling a time series, attention should be paid to assess whether the time series is stationary or not. If a time series is stationary, it can be modeled through an ARMA(p; q ) process, otherwise an ARIMA(p; d; q ) is required. A non-stationary time series can be described as a time series whose characteristic parameters change over
q
(6)
78
Release Series 0.01 0.1 1.0 1.1 1.2 1.3 2.0 2.1 2.2 2.3 2.4
Initial Release 9/17/91 12/3/91 3/13/94 4/6/94 3/7/95 6/12/95 6/9/96 9/30/96 1/26/99 5/11/99 1/4/01
Numb. of Releases 2 85 9 96 13 115 34 141 19 60 4
Time to Start of Next Release Series 2 months 27 months 1 month 11 months 6 months 12 months 24 months 29 months 9 months 12 months –
Duration of Series 2 months 27 months 12 months 11 months 14 months 12 months 32 months 29 months still current 12 months still current
Table 1. Linux Kernels Most Important Events
of a good model have to be small and randomly distributed [4].
time. Different measures of stationarity can be employed to decide whether a process (i.e., a time series) is stationary or not. In practice, assessing that a given time series is stationary is a very difficult task, unless a closed-form expression of the underlying time series is known. Non-stationarity detection can be reduced to the identification of two distinct data segments that have significantly different statistic distributions. Several tests can be used to decide whether two distributions are statistically different: Student’s t-test, F-test, chi-square test and Kolmogorov-Smirnov test [12]. The Box and Jenkins procedure [2] requires three main steps, briefly sketched below:
4. Case Study Linux is a Unix-like operating system that was initially written as a hobby by a Finnish student, Linus Torvalds [16]. The first Linux version, 0.01, was released in 1991. Since then, the system has been developed by the cooperative effort of many people, collaborating over the Internet under the control of Torvalds. In 1994, version 1.0 of the Linux Kernel was released. The current version is 2.4, released in January 2001, the latest stable release is 2.4.3 (March 31, 2001). Unlike other Unixes (e.g., FreeBSD), Linux it is not directly related to the Unix family tree, in that its kernel was written from scratch, not by porting existing Unix source code. The very first version of Linux was targeted at the Intel 386 architecture. At the time the Linux project was started, the common belief of the research community was that high operating system portability could be achieved only by adopting a microkernel approach. The fact that now Linux, which relies on a traditional monolithic kernel, runs on a wide range of hardware platforms, from PalmPilots to Sparc, MIPS and Alpha workstations, clearly points out that portability can also be obtained by the use of a clever code structure. Linux is based on the Open Source concept: it is developed under the GNU General Public License and its source code is freely available to everyone. “Open Source” refers to the users’ freedom to run, copy, distribute, study, change and improve the software. An open source software system includes the source code, and it explicitly promotes the source code as well as compiled forms distribution/redistribution. Very often open source code is distributed under artistic license and/or the Free Software Foundation GPL (GNU General Public License). Open
1. Model identification. The observed time series has to be analyzed to see which ARIMA(p; d; q ) process appears to be most appropriate: this requires the identification of the p; d; q parameters. 2. Estimation. The actual time series has to be modeled using the previously defined ARIMA(p; d; q ) process. This requires the estimation of the fi g and f j g coefficients defined by (8). 3. Diagnostic Checking. The residuals (i.e., the differences between the predicted and the actual values) have to be analyzed to see if the identified model is adequate. With the Model Identification Step the d value has to be set taking into account whether the time series is stationary (i.e., d = 0) or not (i.e., d > 0). On the other hand, the identification of (p; q ) parameters can be obtained following the Akaike Information Criterion (AIC): the model with smallest AIC has to be chosen [17]. In the Estimation step, after the estimation of the fi g and f j g coefficients has been carried out, it is possible to predict time series future values. Finally, in the Diagnostic Checking step, the model adequacy can be tested by plotting the residuals: the residuals
79
source distribution license shall not restrict any party from selling or giving away the software as a component of an aggregate software distribution containing programs from several different sources. Moreover, the license does not prevent modifications and derived works, and must allow them to be distributed under the same terms as the license of the original software. However, Linux most peculiar characteristic is that is not an organizational project, in that it has been developed through the years thanks to the efforts of volunteers from all over the world, who contributed code, documentation and technical support. Linux has been produced through a software development effort consisting of more than 3000 developers distributed over 90 countries on five continents [11]. As such, it is a bright example of successful distributed engineering and development project. A key point in Linux structure is modularity. Without modularity, it would be impossible to use the open-source development model, and to let lot of developers work in parallel. High modularity means that people can work cooperatively on the code without clashes. Possible code changes have an impact confined to the module into which they are contained, without affecting other modules. After the first successful portings of the initial 80386 implementation, the Linux kernel architecture was redesigned in order to have one common code base that could simultaneously support a separate specific tree for any number of different machine architectures. The use of loadable kernel modules, introduced with the 2.0 kernel version [5], further enhanced modularity, by providing an explicit structure for writing modules containing hardware-specific code (e.g., device drivers). Besides making the core kernel highly portable, the introduction of modules allowed a large group of people to work simultaneously on the kernel without central control. The kernel modules are a good way to let programmers work independently on parts of the system that should be independent. An important management decision was establishing, in 1994, a parallel release structure for the Linux kernel. Evennumbered releases were the development versions on which people could experiment with new features. Once an oddnumbered release series incorporated sufficient new features and became sufficiently stable through bug fixes and patches, it would be renamed and released as the next higher even-numbered release series and the process would begin again. Linux kernel version 1.0, released in March 1994, had about 175,000 lines of code. Linux version 2.0, released in June 1996, had about 780,000 lines of code. Version 2.4, released in January 2001, has more than 2.5 millions lines of code. Table 1, which is an updated version of the one published in [11], shows the most important events in the Linux kernel development time table, along with the num-
ber of releases produced for each development series. 3000 KLOC
Rel. 1.2
2500
Rel. 2.0
Rel. 2.2
Rel. 2.4
2000
1500
1000
500
0 0
10
20
30
40
50
60 70 Versions
Figure 1. KLOC Evolution
45000 Number of Functions Rel. 1.2
40000
Rel. 2.0
Rel. 2.2
Rel. 2.4
35000
30000
25000
20000
15000
10000
5000
0 0
10
20
30
40
50
60 70 Versions
Figure 2. Number of Functions Evolution
5. Predicting Linux Kernel evolution In order to measure the performance of the presented method the size (in terms of KLOCs, number of functions and average cyclomatic complexity) of 68 subsequent stable releases of the Linux kernel was evaluated. Figg. 1, 2 and 3 show the evolution of Linux kernels ranging from version 1.0 up to 2.4.0. The objective of such experiments
80
6 Avg. Cyc. Comp.
0.5
5
Rel. 2.2
APE
Rel. 1.2
4
Rel. 2.0
Rel. 2.2
Rel. 2.4
Rel. 2.4
0.4
3 0.3
2
0.2 1
0 0
10
20
30
40
50
60 70 Versions
0.1
Figure 3. Average Cyclomatic Complexity Evolution
0 0
5
10
15
20
25
30
35
40
45 50 Versions
Figure 4. KLOC: APE was to test the effectiveness and accuracy of the method on real code. The experimental activity followed a cross-validation procedure [15]. In each experiment a training time series was extracted from the observed time series. Then, the training time series was analyzed and modeled to predict its future values. Finally, the predicted values were matched against the actual values and the method performance was measured in terms of absolute prediction error and mean absolute prediction error. Given a time series, Xt , if x^T and xT are its predicted and actual values at the time T respectively, the absolute prediction error is defined as follows:
absolute prediction error =
abs(xT x^T ) xT
0.5
0.3
(11)
apek is the absolute prediction error related to the k th experiment and n is the number of performed exper-
0.2
iments, the mean absolute prediction error can be defined as follows:
1
X ape
0.1
n
n i=1
i
(12) 0 0
k
th experiment
ttsk = fx1 ; : : : ; x20+k g
(13)
Given the observed time series, the
(0 k 47) was run on the k th training time series
5
10
15
20
25
30
35
40
45 50 Versions
Figure 5. Function Evolution: APE
(ttsk ), defined as follows : 1
1 For
Rel. 2.4
0.4
If
mean absolute prediction error =
Rel. 2.2
APE
k = 0 ttsk contains 20 points.
81
0.12 Rel. 2.2
APE
ily developed and tested along the not-stable Linux kernel releases, and then included in the stable ones. As a result, the predicted values concerned with releases 2.2.0 and 2.4.0 were affected by a high error. No values could be predicted for the releases 1.2 and 2.0, because they belonged to the initial training time series (i.e. ttsk for k = 0). In fact, as shown in Figg. 1, 2, and 3, the values concerned with the releases 1.2 and 2.0 are within the first 20 points of each time series. The APEs related to the release 2.2.0 KLOCs and number of functions were respectively 43% and 47% (see Figg. 4, and 5). However, in correspondence of these errors, there was a 80% increase in terms of KLOCs and a 93% increase in terms of number of functions. On the other hand, the error affecting the average cyclomatic complexity was considerably smaller (about 10%, as shown in Fig. 6). The reasons behind such non-negligible errors can be explained taking into account that highest errors were obtained when the kernel underwent relevant changes. The most important changes introduced into release 2.2.0 of the Linux kernel, namely:
Rel. 2.4
0.1
0.08
0.06
0.04
0.02
0 0
5
10
15
20
25
30
35
40
45 50 Versions
Figure 6. Average Cyclomatic Complexity: APE
where x1 is the number of KLOC, the number of functions or the average cyclomatic complexity of the Linux kernel 1.0. During the k th experiment (i.e., Ek ) the one-step^20+k+1 ) was predicted; then, the preahead value (i.e., x dicted value was compared against the actual one (i.e., x20+k+1 ) in order to evaluate both the one-ahead absolute prediction error and the related mean absolute prediction errors. Figg. 4, 5 and 6 show the trend of the one-step-ahead absolute prediction errors for predicting KLOCs, the number of functions and the average cyclomatic complexity, respectively. The one-ahead mean absolute prediction errors obtained in the experiments carried out are shown in Table 2. Step(s) ahead KLOCs # of Functions Av. Cycl. Complexity
several improvements for networking (firewalling, routing, traffic bandwidth management) and TCP stack extensions added; new filesystems added, NFS daemon improved; sound configuration improved, and support for new soundcards added.
The APEs related to the release 2.2.0 KLOCs and number of functions were both of about 20% and, even if bigger than the mean absolute prediction errors (see table 2), they can be considered as acceptable. The most important changes in release 2.4.0 can be summarized as follows:
Mean absolute prediction errors 2.59% 2.54% 0.47%
Table 2. Mean absolute prediction errors
memory management improvement (support for addressing over 1 Gbyte on 32 bit processors, new strategy for paging, improvement of virtual memory management); raw I/O support, RAID system improvement, Logical Volume Manager introduced; multiprocessor support improvement; network layer totally rewritten; support for USB configuration improved.
6. Discussion
7. Conclusions
As shown in Fig.. 1, 2, and 3, major changes in the Linux kernel occurred with releases 1.2, 2.0, 2.2 and 2.4. However, it is worth noting that such changes were primar-
The proposed method has been applied to predict the evolution of the Linux kernel. The average prediction errors were generally low and, though with some relevant peaks,
82
they can be considered acceptable, and due to the substantial and non-evolutionary changes occurred in that point. Future work will be addressed to analyze a sequence of both stable and experimental releases of the Linux kernel, in order to take into account the progressive changes done. Moreover, kernel patches will be also considered, in order to measure the amount of changes with higher accuracy than the simple comparison of size differences between two subsequent releases. Finally, multi-step prediction will be performed, and the method will also be applied to other categories of software systems.
References [1] B. W. Boehm. Software Engineering Economics. PrenticeHall, Englewood Cliffs, NJ, 1981. [2] G. Box and M. Jenkins. Time Series Forecasting Analysis and Control. Holden Day, San Francisco (USA), 1970. [3] R. G. Brown. Smoothing, Forecasting and Prediction. Prentice Hall, 1963. [4] C. Chatfield. The Analysis of the Time Series. Chapman & HallRc, 1996. [5] J. de Goyeneche and E. de Sousa. Loadable kernel modules. IEEE Software, 16(1):65–71, January 1999. [6] A. Harvey. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press, 1989. [7] J. Hihn and H. Habib-agahi. Cost estimation of software intensive projects: a survey of current practice. In Proceedings of the International Conference on Software Engineering, pages 13–16. IEEE Computer Society Press, 1991. [8] W. Humprey. Managing the Softwarre Process. Addison Wisley, Reading MA, 1989. [9] M. M. Lehman and L. A. Belady. Software Evolution - Processes of Software Change. Academic Press, London, 1985. [10] M. M. Lehman, D. E. Perry, and J. F. Ramil. On evidence supporting the feast hypothesis and the laws of software evolution. In Proc. of the Fifth International Symposium on Software Metrics, Bethesda, Maryland, November 1998. [11] J. Moon and L. Sproull. Essence of distributed work: The case of the linux kernel. Technical report, First Monday, vol. 5, n. 11 (November 2000), http://firstmonday.org/. [12] A. Papoulis. Probability, Random Variables, and Stochastic Processes. McGraw-Hill, 1984. [13] D. Piccolo and C. Vitale. Metodi Statistici per l’Analisi Economica. Il Mulino, 1989. [14] L. Putnam and W. Myers. Measures for Excellence. Yourdon Press, 1992. [15] M. Stone. Cross-validatory choice and assesment of statistical predictions (with discussion). Journal of the Royal Statistical Society B, 36:111–147, 1974. [16] L. Torvalds. The linux edge. Communications of the ACM, 42(4):38–39, Dec 1999. [17] W. Venables and B. Ripley. Modern Applied Statistic with S-PLUS. Springer, 1999. [18] M. West and P. Harrison. Bayesian Forecasting and Dynamic Models. Springer-Verlag, 1989.
83
"!"# $%&(')*# + ,!- /.102(')345'6 7 849:-;3
0Þm
eB
H
Ìi
B
ß /%#?
D
&@
l
FfI
ÎB
m
ÔL8Ý5
vº
F
2@
ÎÝ
vl
ÌÞ
D
vk
ÿn
L
à
i
88
SÝ ÕSÐÓÍÈ è -ÅCÚÌÍÐ )Ú`ƱÍÐ Ú`Ñ£ØgÛ É áÇGÓ(Ó(ÑÉÍGÛ É ÙDZÏ`ÏÌDZÄÇmáÒÕS߯æÍÍádÛÄÉ Û É Ë2ÇGÄ(Ð(ÍÄÛ É Ã ¯Û É Ã CÛ É äoÇGÔ`áÍ 2Ï Ç±áÒÕSæÍÚÝSÕSÐÓÍÈÐ ÔBA
ÿ8Þ
ÿãU
v8Þ
/1Þ
v8ÞSÞ
ÔL8Ý
äÏ`ÑÒÅSÏÚ`Ó(ÍÅ=Ð=ÓÇGÎ`Ï`ÍÚ Ð(ÑÒÏ`ÜƯáÍÍÚ`ÏÌƯÅCÍÚ`ÐÛÍÖ{ä"Ñ£Î`ÓÍÎ>ÓÎ`ÍÈÍËÌÖ{ÓÑÕ ÚÌ/ÍÐÔ;áůÑÒÉÄÍÚ`ÐÍÄ^DZÄÇGͳÓ"Ï`ÓÎÌűÍÓ Ô;ÈűDZÓÏÓÕÏCàNÑÅÏ`Ù Ú ÓÉűÍØ Ð^Ù Ç=Ú`ÏÌÄÅCÇGÚ`ÐÍÓÐ}ÑÒÉÑЫÄ(ÍÄ(ÚÍÚ ÙÉÓ(ÉÑÍůÚÏÜÓÑÅÏô Ó(Î`Ù ö-Í¥Ï`Ï ÅSÚ`ÈÅ±Ø oÑÆ Ù Ä(Í ¯Û Ý
bl
eÝ
P
91
°Þ
oÑÆ Ù Ä(Í Ï`ÅSÚ`ÍØqÄ(Å±È ÓÎÌÍÄÍÚ Ù ÉÍÚƯÄÇGË`ÎÛ Â oÁ^À o¾p» ä"ÑÒÈÎÌËÌÍpÄ(ÅzÄÍß¯Ú ÍÈÙ ÉÍÓÏÑÒůÓÏ`Ö}ÐÇGDZÐ#ÄÍ=Ç±Ï Ð(ÑƱÏ`Ñ 2ÉDZÄ(ÍÏÚ ÓmÛ ÉÓ(Ï×ÑůÄÏîÍÈpÑÏXÅsßSÓÑÒÎÌÏ`Í Æ>Ï ÍÈpÈÍ Å±Ë`Ø ÄůƯÄÇGÈÈpÐ^ÍDZÄÏ`Û Ú ( Ð Í < Ð Ó ` Î Í N à { Ö Ò Ñ ` Ï ` Ú z Å { Ö Ñ ` Ï Æ á Ñ Ì Ô Ä ± Ç Ä Ñ Í Ð â / Ù Ð Û Ï3&ÓÄÎ`ÍÍÚ Ù ÔPÉÍÑÐÏ`Ó=ÆîÅz߯ÔÍÕkÄ Íã Ù ÑÒßGÛÜDZáä"ÍÏÎ`Ó ÑÐÐ(áÒÓ ÑÉÍÄÏ`Ðâ}ÐÍÉÖÇ±Ï Í Æ±ÍÏ`Ù ÍÄÇGÓ(ÍÇ=ÎÕSËPÅGÓÎ`ÍÐ(ÑÐ{ØqůÄ{DZÏÜÍ éCË;ÍÄÑÈpÍÏÓmÛ " ä ` Î > Í [ Ç ( Ó Ó Í È C Ë Ó Ð ( Ó × Å ± Ç ` Ï Ð Ö Í = Ä ( Ó ` Î Í Ð Í ã Í Ð ( Ó Ñ ¯ Å Ì Ï = Ð 2 Î m Ç ¯ ß > Í ¯ Æ Í Ì Ï Í Ä [ Ç Ó Í Ú Î S Õ P Ë G Å Ó ` Î Í ( Ð Í = Ð q Ø ¯ Å Ä Ù ÅGÓÎ`ÍÄ-ÍÈËÌÑÄ(ÑÒÉmDZá;ÐÓ Ù Ú`ÑÒÍÐÛ ¯Û î5ÅCÍÍÐ6DZÄ(Ó(ÍÎ`ÑÉ Ð«Æ¯ÄÄ(Ä(ÍDZÏË`Ó(Î áÒÕÜÇGÉÍÓ ßG٠DZDZá áá£ÇGÕ=Ó(ÑDZÏ`ÐÆ#Ð(ÑÐÓÓÎ`ÑÒÓ(ÐÎ`Û"Í ä"ÉÎ`ůÈpÍƯË`ÄÄ(DZÍÎ`ËÌÍÎ`Ï`Ð-ÐÓ(ÑůÍáÒÏá Ë`Ä(Ð^ÅSÖ{ÉÍÎ2Ð(ÇGÐ Ó Ú`NÍØQÉÐ(ůÅÌÈpâCÎ`ËPÅzÅ¯Ö ÐÑ£à ÓÉÑůűÈpÏË`ÐÄ(áÍÑÉÎ`Í ÍÙ qÏÌÐ Ð(ÑűÓÅÏÍ Ë`éÌÄDZÅCÈpÉÍÑÐÏ`Ù Ð ÍGÛ Ð-Ó(Î`ÍÄÍÐůÈÍűÓ(Î`ÍÄ^DZË`Ë`áÒÑÙÉÇGÓÑÒůÏűØ&Ó(Î`ÑÐ{ÑÏ>ÓÎÌÍ CÛ Ó5Î`ÅCÍ¥ÍÐÐ6ÕSÓ(ÐÎ`ÓÍ-ÍÈ ÐÓÄ Ù ÉÓ Ù Ä(Í^űØLÓ(Î`Í-ƯÄDZËÌÎpÕSÑÍáÚpÑÏ`ÐÑƯÎÓ(ÐÇGÔPÅ Ù Ó«Ó(Î`Í^DZÄ(É*Î`Ñ£ÓÍÉÓ Ù Ä(Í-Å±Ø ÌÛ ÎÕÜÇGááPÓ(Î`ůÐÍÍÈËCÓÕÐ(áÒÑÉÍÐ `Û Ð(5Î2Å#DZáÒÓ(áÎ`ÅzÖÍ¥Ë`Ä(Å±Å±Ä ËPÍÄÏ2ÓÇGÑÒÍÄ(Ð"Ä(ÅzÅ±Ö ØÓDZÎÌÏÌÍÚXÐ(ÍÚ`ƯÍÄ(ÍÇ±Ë Ë``Î`â)Ð"ůÕSÄÑÍÐ(áůÚ>ÈpÈÍ³Í ÉÓůÄÈáÒÓ Ð(Ù Å-űÄ(ØϳÎ2ÓÇmÓ(Î`߯ÅÑÒÍ"Ð"ůÇ-ÐÔ`ÕSÐÐÐÕSÍÓ(ÄÐÍ߯Ó(ÈÜÍÍ È Û Ó(Î`ÓÍ Î2ÇGÉ*Î2Ó:ÇGÎ2Ï`DZƯÐ:ͱƯâCůÑÒÏ`ØÍ6DZÏÓÎÌÕ¯âÌÄ(Å ÑÒÏ٠ƱÓ(Î Î`Í^-ƱÄÄ(DZÍË`áÒÍmÎÌDZÐÐÍË2РDZ¯ÐÖÐÍ"Ó(Î`ÇGÄ(Ä(Å Í٠DZƯÏ2ΠDZá£ÑÕCÏ#æÓÑÏ`ÎÌÆÍ Ù ÓÎÌÍÄ"Ö}ÇmÕSÐ{ÓÅpÑÈpË`Ä(Åz߯Í5Ó(Î`ͥƯÄÇGË`Î`ÑÉDZáË`ÄÍÐÍÏÓ*ÇGÓ(ÑůÏDZÄÍ ¯Û ÄaÇGÑÒÓ(ÐÎ`Ó:ÍDZÄ}áÒáÓ(Ï`Î2ÅCDZÚÌÏÜÍÐ(Ï2Î`DZÅzÈpÖ{ÑÍÏÌÐaÆÓÎ`ůÇGÏÌÓoáÒÕ³Î`ÇzÓ(ß±Î`ÍÍ¥Ô;ÉÍÅ ÍÏ last selected metrics p.r.c. T-test value log(M.E) -0.565 -13.21 T.S1-1(in) -0.006 -5.71 log(M.MOD'.n1) 0.547 5.46 T.scope-in(1) -0.002 -5.34 T.fan-in(1) 0.012 5.32 T.scope-in(2) 0.004 4.34 T.fan-out(2) 0.008 3.54 log(M.SUM) 0.342 3.53 log(M.MOD'.n2) -0.292 -3.35 log(T.COM) -0.103 -2.63 T.fan-out(1) -0.005 -1.47 T.fan-in(2) -0.004 -1.98
106
Clone Analysis in the Web Era: an Approach to Identify Cloned Web Pages Giuseppe Antonio Di Lucca°, Massimiliano Di Penta*, Anna Rita Fasolino°, Pasquale Granato°
[email protected],
[email protected],
[email protected],
[email protected] (°) Dipartimento di Informatica e Sistemistica, Università di Napoli Federico II Via Claudio, 21, 80125 Napoli, Italy (*) Università del Sannio Facoltà di Ingegneria Piazza Roma, I-82100 Benevento, Italy
Abstract The Internet and World Wide Web diffusion are producing a substantial increase in the demand of web sites and web applications. The very short time-to-market of a web application, and the lack of method for developing it, promote an incremental development fashion where new pages are usually obtained reusing (i.e. “cloning”) pieces of existing pages without adequate documentation about these code duplications and redundancies. The presence of clones increase system complexity and the effort to test, maintain and evolve web systems, thus the identification of clones may reduce the effort devoted to these activities as well as to facilitate the migration to different architectures. This paper proposes an approach for detecting clones in web sites and web applications, obtained tailoring the existing methods to detect clones in traditional software systems. The approach has been assessed performing analysis on several web sites and web applications.
1. Introduction The rapid diffusion of the Internet and of the World Wide Web infrastructure has recently produced a considerable increase of the demand of new web sites and web applications (WA). The lack of method in developing these applications, besides the very short time-to-market due to pressing demand, very often result in disordered and chaotic architectures, and in inadequate, incorrect, and incomplete development documentation. Indeed, the development of a WA is generally performed in an incremental fashion, where additional pages are usually obtained by reusing the code of existing pages or page components, but without explicitly documenting these code duplications and
107
redundancies. This in turn may increase code complexity and augment the effort required to test, maintain and evolve these applications. Moreover, if the WA are maintained and evolved with the same approach, further duplications and redundancies are likely to be added, and increased disorder may affect the code structure, and worsen its maintainability. This situation is similar to the one occurred in the past in the development and maintenance of large size systems where, especially as a consequence of poor design and of performed maintenance interventions, large portions of duplicated code was produced. These portions of duplicated code are generally called clones and clone analysis is the research field that investigates methods and techniques for automatically detecting duplicated portions of code in software artifacts. The approaches to clone analysis proposed in the literature are suitable for analyzing traditional software systems with a procedural or objectoriented implementation. In particular, methods based on the matching of Abstract Syntax Trees (AST), as well as on the comparison of arrays of specific software metrics, or on the matching of the character strings composing the code have been presented and experimented with. In the Internet era, web application are good candidates to clone proliferation, because of the lack of suitable reuse and delegation mechanisms in the languages generally used for implementing them. Moreover, this trend is reinforced by the hurried and unstructured approaches typically used for developing and maintaining web software. In this paper, we propose an approach for detecting clones in web sites or WAs. The approach has been obtained by tailoring the existing clone analysis methods in order to take into account the specific features of a WA. The approach addresses the detection of clones of static pages implemented in HTML language: two HTML pages will be considered clones if they have the same predefined
structural components (or properties), such as the components defining the final rendering of the page in a browser, or the components defining the processing of the application (like scripts, applets, modules, etc.). Moreover, two pages can be considered clones also if they are characterized by the same values of predefined metrics. In order to efficiently address the detection of cloned pages, the technique we propose takes into account only a limited set of components implementing relevant structural features of a page, but this limitation, however, does not affect the effectiveness of the approach. These elements are involved in the computation of a distance measure between web pages that can be used to determine the similarity degree of the pages. The validity of the proposed technique has been assessed by means of experiments involving several web sites and WAs. The experimental results showed that the approach adequately detects cloned pages. In order to carry out the experiments, a prototype tool has been developed that automatically obtains the distance between pages. The remaining part of the paper is structured as follows: Section 2 provides a short background in clone analysis, while Section 3 presents our approach to clone analysis. The experiment carried out to assess the approach is described in Section 4, and conclusive remarks are given in Section 5.
2. Background Clone analysis is the research area that investigates methods and techniques for automatically detecting duplicated portions of code, or portions of similar code, in software artifacts. These portions of code are usually called clones. The research interest in this area was born at the end of the ‘80s [Ber84] [Hor90] [Jan88] [Gri81] and focused on the definition of methods and techniques for identifying replicated code portions in procedural software systems. Clone detection could be performed to support different activities, such as recovering the reusable functional abstractions implemented by the clones to reengineer the system with more generic components, or correcting software bugs in each cloned fragment. A clone, usually produced by copying and eventually modifying a piece of code implementing a well defined concept, a data structure, or a processing item, can be generated for several reasons such as: • lack of a good modular design not allowing an effective reuse of a piece of code implementing a common service; • use of programming languages not providing suitable reuse mechanisms;
108
• pressing performance requirements not allowing the use of delegation and function call mechanisms; • undisciplined maintenance interventions producing replications of already existing code. The methods and techniques for clone analysis described in the literature focus either on the identification of clones that consist of exactly matching code portions (exact match) [Bak95,Bak93,Bak95b], either on the identification of clones that consist of code portions that coincide, provided that the names of the involved variables and constants are systematically substituted (p-match or parameterized match). The approach to clone detection proposed in [Bal00] and [Bal99] exploits the Dynamic Pattern Matching algorithm [Kon96][Kon95] that computes the Levenstein distance between fragments of code: each fragment is represented by a sequence of tokens and two fragments are considered clones if their Levenstein distance value is under a given threshold. The approach described in [Bax98] exploits the concept of near miss clone, that is a fragment of code that partially coincides with another one. Ducasse and Reiger propose an approach to clone detection that is independent of the coding language used for implementing the subject systems [Duc99]. Further approaches, such as the ones proposed in [Kon97][Lag97] [May96] [Pat99], exploit software metrics concerning the code control-flow or data-flow. In the Internet era, web sites and web application are good candidates to clone proliferation, because of the lack of suitable reuse and delegation mechanisms in the languages generally used for implementing them1. At the moment, a considerable growth of the size of web sites and WAs can be observed, and the necessity of effectively maintaining these applications is spreading fast [Ric00] [War99]. Therefore, the effectiveness of traditional clone analysis techniques in the context of WAs should be assessed, and suitable approaches for tailoring these techniques in the renewed context should be investigated. Clones can be looked for in web software with different aims, such as for gathering information suitable to support its maintenance, migration towards a dynamic architecture, and also to cluster similar/identical structures, facilitating the process of separating the content from the user interface (that may be a PC browser, a PDA, a WAP phone, etc.). One of the difficulties in analyzing clones in web software derives from the wide set of technologies available for implementing web sites and WAs, that makes it harder the choice of the replicated software components to be 1
In general, a web site may be thought of as a static site that may sometimes display dynamic information. In contrast, a WA provides the Web user with a means to modify the site status (e.g. by adding/ updating information to the site).
looked for. Web sites and WAs include both static pages (e.g., HTML pages saved in a file and always offering the same information and layout to a client system) and dynamic pages (e.g., pages whose content and layout is not permanently saved in a file, but is dynamically generated). Therefore, the concept of clone may involve either the static pages or the dynamic ones. Web pages include a control component (e.g., the set of items determining the page layout, business rule processing, and event management) and a data component (e.g., the information to be read/displayed from/to a user). Therefore, the clones to be detected may involve either the control or the data component of a page. Since the control and the data component of a dynamic page depend on the sequence of events that occurred at runtime, searching for clones in these pages should involve dynamic analysis techniques. Vice-versa, the structure of a static page is predefined in the file that implements it, and clone detection can be carried out by statically analyzing the file. In this paper, we focus on techniques for detecting clones among web static pages. In particular, a clone will be thought of as an HTML page that includes the same set of tags of another page, since a tag is the means used for determining the control component in a static page. In the paper, among the various approaches proposed in the literature for clone analysis, the technique based on the Levenstein distance will be analyzed. Moreover, a frequency based approach will be proposed, and the validity and effectiveness of both approaches will be discussed.
3. An approach to clone analysis for web systems 3.1 The Levenstein distance The comparison of strings is used with similar aims in several fields, such as molecular biology, speech recognition, and code theory. One of the most important models for string comparison is the edit distance model, based on the notion of edit operation proposed in the 1972 [Ula72]. An edit operation consists of a set of rules that transform a character from a source string in a new character in a target string. The alignment of two strings is a sequence of edit operations that transforms the former string into the latter one. A cost function can be used to associate each edit operation with a cost, and the cost of an alignment is the sum of the costs of the edit operations it includes. The concepts of optimum alignment and longest common subsequence are related with the definition of Levenstein distance too. The edit distance can be defined as the minimum cost required to align two strings; an alignment is
109
optimum if its cost coincides with the minimum cost, that is the edit distance. If we consider a unitary cost function (e.g., a cost function that associates each edit operation with unitary cost), the edit distance can be defined as the unit edit distance. The unit edit distance is also called Levenstein distance: the Levenstein distance D(x, y) of two strings x and y is the minimum number of insert, replacement or delete operations required to transform x into y. Moreover, a subsequence of a string (i.e. a substring) consists of each string obtainable by deleting zero or more characters from the string. A common subsequence of two strings is a sub-string that is contained in both strings, while the longest common subsequence of two strings is the common longest sub-string in both ones. As an example, given the strings informatics and systematics, the longest common subsequence is the string matics, while the Levenstein distance of the strings is 10. i n f o r s y
m a t i c s s t e m a t i c s
3.2 Detecting cloned pages by the Levenstein distance The computation of the Levenstein distance requires that an alphabet of distinct symbols is preliminary defined. In order to define this alphabet, the items implementing the relevant features of a static web page must be identified. Since our approach focuses on the degree of similarity of the control components of two static pages, disregarding the data components, a candidate alphabet will include the set of HTML tags implementing the control component of a page. In this way, a string composed of all the HTML tags in the page will be extracted from each web page and the Levenstein distance between couples of these strings will be used to compare couples of pages. Since the Levenstein distance represents the minimum number of insert or delete operations required to transform a first string into a second string, its value expresses the degree of similarity of two static pages. In particular, if the distance value is 0, the pages will be cloned pages, while if the distance is greater than 0, but less than a sufficiently small threshold, the pages are candidate to be near missing clones. In order to improve the effectiveness of the approach, the risk of detecting misleading similarities between pages, or the risk of not detecting meaningful similarities have to be minimized. The first type of risk, for instance, may depend on the approach used to manage the set of attributes that characterize each tag. In fact, in HTML the same sequences of attributes can refer to different tags, and their detection may produce false positives if they were not linked to the
*
Α
HTML files and Α2 alphabet
Tag extraction and composite tag substitution
alphabet of tags
Strings of Α2 symbols
Elimination of symbols not belonging to Α*
Strings of Α* symbols
Levenstein distance computation
Distance matrix
Figure 1: The process of cloned page detection correct tag. The second type of risk is connected both with the problem of the ‘composite tags’, that are sequences of tags providing a result equivalent to another single tag, and, finally, with the categories of tags that influence only the format of the data, like tags for text formatting, font selection and for inserting hyper-textual links. These problems can be solved by refining the preliminary alphabet including all the HTML tags, and substituting each composite tag in the alphabet with its equivalent tag: the resulting alphabet will be called Α2. The set of tags that establish the data formatting will be eliminated and a new refined alphabet Α* will be obtained. Α* will include the set of tag attributes too, provided that they are correctly associated with the tag they belong to. The detection of cloned static pages will be therefore carried out according to the process described in Figure 1. In the first phase, the HTML files are parsed, their tags are extracted and the composite tags are substituted with their equivalent ones. The resulting strings will be composed of symbols from the Α2 alphabet. These strings will be processed in order to eliminate the symbols that do not belong to the Α* alphabet. These final strings will be submitted to the computation of the Levenstein distance: the Distance matrix will finally include the distance between each couple of analyzed strings.
3.3 Detecting cloned pages with a frequency based method The method based on the Levenstein distance is in general very expensive from a computational point of view: in fact, in order to determine an edit distance, all the possible alignments between strings should be evaluated, until the optimal alignment is determined. The computational complexity of the algorithm for computing the Levenstein distance is in fact O (n2) where n is the
110
length of the longer string. A frequency based method to detect clones in web systems has been investigated too. The method requires that each HTML page is associated with an array whose components represent the frequencies (i.e., the occurrences) of each HTML tag in the page. The dimension of the array coincides with the number of considered HTML tags, and the i-th component of the array will provide the occurrence of the i-th tag in the associated page. Given the arrays associated with each page, a distance function in a vectorial space can be defined, such as the linear distance or the Euclidean distance. Exact cloned pages will be represented by vectors having a zero distance, since they are characterized by the same frequency of each tag, while similar pages will be represented by vectors with a small distance. Of course this method may produce false positives, since even completely different pages may exhibit the same frequencies but not the same sequence of tags, especially when the pages have a small size or use a limited number of tags. However, the lower precision of this method is counterbalanced by its computational cost, that is lower than the Levenstein distance one.
4. A case study A number of Web systems have been submitted to clone analysis using the proposed approaches, with the aim of assessing their feasibility and effectiveness. A prototype tool that parses the files, extracts the tags, produces the strings and automatically computes the distances between the pages has been developed to support the experiments. This section provides the results of a case study involving a WA implementing a ‘juridical laboratory’ with the aim of supporting the job of professional lawyers. The WA includes 201 files distributed in 19 directories and
Table 1: The HTML files analyzed in the case study File ID
File Name
1
\index.htm
2
\Specialisti\MainFrame.htm
KB
28
1.92
\Caso\Caso.htm
8.07
29
\Caso\MainFrame.htm
0.411
30
\Caso\Text.htm
0.492 7.29
3
\Specialisti\Specialisti.htm
1.75
0.401
31
\Caso\Title.htm
4
\Specialisti\Text.htm
2.30
32
\Caso\Testi\Autovelox.htm
13.6
5
\Specialisti\Title.htm
0.363
33
\Caso\Testi\Corruzione_Identificazione_atto.htm
26.4
6
\Novita\Brugaletta.htm
6.57
34
\Caso\Testi\Danno_biologico.htm
25.3
7
\Novita\CalendarioTarNA.htm
10.6
35
\Caso\Testi\Mobbing.htm
40.9
8
\Novita\CalendarioTarSA.htm
11.2
36
\Caso\Testi\Mobbing_nel_pubblico_impiego.htm
3.75
9
\Novita\MainFrame.htm
0.509
37
\Caso\Testi\Occupazione.htm
32.7
10
\Novita\Novita.htm
1.82
38
\Caso\Testi\Oltraggio.htm
14.8
11
\Novita\RivisteConsOrdAvvSa.htm
31.9
39
\Caso\Testi\Parentelemafiose.html
23.2
12
\Novita\Text.htm
3.30
40
\Caso\Testi\Problematica_beni_confiscati.htm
33.3
13
\Novita\Title.htm
0.409
41
\Caso\Testi\Professioni_intellettuali.htm
14
\Forum\Forum.htm
1.79
42
\Caso\Testi\Relazione_attivita_commissario.htm
15
\Forum\MainFrame.htm
0.506
43
\Caso\Testi\Responsabilita_amministrativa.htm
13.1
16
\Forum\Text.htm
0.237
44
\Caso\Testi\Responsabilita_medica.htm
45.8
17
\Forum\Title.htm
0.4
45
\Caso\Testi\Responsabilita_medico.htm
46.9
18
\Common\FrameLeftPulsanti.htm
4.78
\Caso\Testi\Riflessioni-
37.6
19
\Common\bottomFrame.htm
3.21
20
\ChiSiamo\ChiSiamo.htm
1.75
47
\Caso\Testi\Societa_miste.htm
35.2
21
\ChiSiamo\MainFrame.htm
0.494
48
\Caso\Testi\Truffa_in_attivita_lavorativa.htm
44.2
22
\ChiSiamo\Text.htm
3.24
49
\Caso\Testi\Uso_beni_condominiali.htm
30.7
0.407
50
\Caso\Testi\Misure_patrimoniali_nel_sistema.htm
20.6
1.87
51
\Archivio\Archivio.htm
1.87
0.501
52
\Archivio\MainFrame.htm
0.43 12.9
46
29 0.305
Omicidio_di_Peppino_Impastato.htm
23
\ChiSiamo\Title.htm
24
\Cerca\Cerca.htm
25
\Cerca\MainFrame.htm
26
\Cerca\Text.htm
27.3
53
\Archivio\Text.htm
27
\Cerca\Title.htm
0.4
54
\Archivio\Title.htm
0.406
Table 2: Couples of clones with null Levenstein distance (3,10)
(3,14)
(3,20)
(3,24)
(3,28)
(3,51)
(9,15)
(9,21)
(9,25)
(9,29)
(10,14)
(10,20)
(10,24)
(10,28)
(10,51)
(13,17)
(13,23)
(13,27)
(13,31)
(13,54)
(14,20)
(14,24)
(14,28)
(14,51)
(15,21)
(15,25)
(15,29)
(17,23)
(17,27)
(17,31)
(17,54)
(20,24)
(20,28)
(20,51)
(21,25)
(21,29)
(23,27)
(23,31)
(23,54)
(24,28)
(24,51)
(25,29)
(27,31)
(27,54)
(28,51)
(31,54)
Table 3: Clusters of clones Cluster A
3 - 10 - 14 – 20 - 24 - 28 - 51
Cluster B
9- 15- 21 – 25 - 29
Cluster C
13 - 17 - 23 - 27 - 31 - 54
111
Figure 2: A couple of cloned pages its overall size is 4,26 Mbytes. Its HTML static pages are implemented by 54 files with htm extension distributed in 10 directories, while 19 files with the asp extension and contained in 4 directories implement 19 server pages. The remaining files includes data or other objects, like images, logos, etc., to be displayed in the pages. The 54 HTML files have been submitted to the clone analysis according to the proposed approach. The name and the size of each analyzed file is listed in Table 1. The Levenstein distances between each couple of pages have been computed using the Α2 alphabet, and the Distance matrix has been obtained. The Matrix included 46 couples of perfect cloned pages involving 18 distinct files. The couples of cloned pages are listed in the following Table 2, where each page is identified by the file ID shown in Table 1. Moreover, the Distance Matrix included 25 couples of pages with a very low distance that made them potential near missing clones. The 46 perfect couple of cloned pages have been visualized with a browser in order to validate the results of the analysis, and each couple actually implemented perfect clones. As an example, Figure 2 shows the rendered HTML pages corresponding to the couple of clones (10, 28). In similar way, the 25 couples of pages representing near missing clones have been visualized with the browser, and their relatively small differences confirmed that they could not be considered perfect clones. The 18 files implementing the 46 couples of perfect clones were further analyzed and they could be grouped into three different clusters of identical or very similar pages. Table 3 reports the three clusters of pages. The pages from the same cluster were actually very similar, and their differences were essentially due to the parametric components providing the information displayed in the pages. Their similarity was essentially due to the framebased structure of the application.
112
In particular, the pages from the A cluster represented the roots of sub-trees of the web site all reachable from the home page of the application; all the pages of the B cluster were implemented by files with the same name ‘Mainframe.htm’, while the pages of the C cluster were all implemented by files with the same name ‘Title.htm’. Using the frequency based method, the same set of clones was obtained and no additional clone was detected. However, the second method produced more near missing clones than the Levenstein method. It is worthwhile noting that also in all the other experiments, involving other web systems, we carried out the frequency based method produced always the same set of clones detected by applying the Levenstein distance and no additional clones (i.e. false positives) were detected. Even if both the approaches detected the same set of clones, their computational costs were sensibly different. In particular, the computation of the Levenstein distance for all couples of pages required 2 hours and 50 minutes, while just 15 seconds were necessary for computing the frequency based distances (on a PC with a Pentium III 850 MHz processor). In order to reduce the computational complexity of the Levenstein method and the potential inaccuracy of the frequency based one, an opportunistic approach may be proposed. This approach will use the frequency based method for preliminarily identifying potential couples of clones, and apply the Levenstein method over these couples for detecting the actual clones and rejecting the false ones.
5. Conclusions In this paper an approach to clone analysis in the context of web systems has been proposed. Clone detection allows to highlight reuse of pattern of HTML tags (i.e., recurrent structures among pages,
implemented by specific sequences of HTML tags), provides an approach to facilitate web software maintenance, and the migration to a model where the content is separated from the presentation. Moreover, identifying clones facilitates the testing process of a WA, since it is possible to partition the pages in equivalence classes, and specify a suitable number of test-cases accordingly. Two methods for clone analysis have been defined and experimented with. We considered as clones the pages having the same control components, even if they differed for the data components. During the experiment, the proposed methods detected clones among static web pages, and a manual verification gave us confirmation about the methods’ effectiveness. The two proposed methods have produced results that are comparable but with different computational costs. Since the frequency based method produced, in all the experiments, always the same set of clones obtained by applying the Levenstein distance method, but with a very low computational cost, it could be an effective method for web static page clones detection. Future works will be devoted to further experimentation to better validate the proposed methods. Moreover, approaches based on the use of other suitable software web metrics to identify clones, as well as further approaches to identify clones among server pages, will be investigated.
References [Bak93] Baker S. B., A theory of parametrized pattern matching: algorithms and applications, in Proceedings of the 25th Annual ACM Symposium on Theory of Computing, 7180, May 1993. [Bak95] Baker B. S., On finding duplication and near duplication in large software systems, in Proc. of the 2nd Working Conference on Reverse Engineering, IEEE Computer Society Press, 1995. [Bak95b]Baker S. B., Parametrized pattern matching via BoyerMoore algorithms, in Proceedings of Sixth Annual ACMSIAM Symposium on Discrete Algorithms, 541-550, Jan 1995. [Bal00] Balazinska M., Merlo E., Dagenais M., Lagüe B., Kontogiannis K., Advanced clone-analysis to support object-oriented system refactoring, in Seventh Working Conference on Reverse Engineering, 98-107, Nov 2000. [Bal99] Balazinska M., Merlo E., Dagenais M., Lagüe B., Kontogiannis K., Measuring clone based reengineering opportunities, in International Symposium on software metrics. METRICS’99. IEEE Computer Society Press, Nov 1999. [Bax98] Baxter I. D., Yahin A., Moura L., Sant’Anna M., Bier L., Clone Detection Using Abstract Syntax Trees, in Proceedings of the International Conference on Software Maintenance, 368-377, IEEE Computer Society Press, 1998.
113
[Ber84] Berghel H.L., Sallach D.L., Measurements of program similarity in identical task environments, SIGPLAN Notices, 9(8):65-76, Aug 1984. [Duc99] Ducasse S., Rieger M., Demeyer S., A Language Indipendent Approach for Detecting Duplicated Code, in Proceedings of the International Conference on Software Maintenance, 109-118, IEEE Computer Society Press, 1999. [Gri81] Grier S., A tool that detects plagiarism in PASCAL programs, in SIGSCE Bulletin, 13(1), 1981. [Hor90] Horwitz Susan, Identifying the semantics and textual differences between two versions of a program, in Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, 234-245, Giugno 1990. [Jan88] Jankowitz H.T., Detecting plagiarism in student PASCAL programs, in Computer Journal, 31(1):1-8, 1988. [Kon96] Kontogiannis K., DeMori R., Merlo E., Galler M., Bernstein M., Pattern Matching for clone and concept detection, in Journal of Automated Software Engineering, 3:77-108, Mar 1996. [Kon95] Kontogiannis K., DeMori R., Bernstein M., Merlo E., Pattern Matching for Design Concept Localization, in Proc. of the 2nd Working Conference on Reverse Engineering, IEEE Computer Society Press, 1995. [Kon97] Kontogiannis K., Evaluation Experiments on the Detection of Programming Patterns Using Software Metrics, in Proc. of the 4th Working Conference on Reverse Engineering, 44-54, 1997. [Lag97] Lagüe B., Proulx D., Merlo E., Mayrand J., Hudepohl J., Assessing the benefits of incorporating function clone detection in a development process, in Proceedings of the International Conference on Software Maintenance 1997, 314-321, IEEE Computer Society Press, 1997. [May96] Mayrand J., Leblanc C., Merlo E., Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics, in Proceedings of the International Conference on Software Maintenance,. 244-253, IEEE Computer Society Press, 1996. [Pat99] Patenaude J.-F., Merlo E., Dagenais M., Lagüe B., Extending software quality assessment techniques to java systems, in Proceedings of the 7th International Workshop on Program Comprehension IWPC’99, IEEE Computer Society Press, 1999. [Ric00] Ricca F., Tonella P., Web Analysis: Structure and Evolution, in Proceedings of the International Workshop on Web Site Evolution, 76-86, 2000. [Ula72] Ulam S.M., Some Combinatorial Problems Studied Experimentally on Computing Machines, in Zaremba S.K., Applications of Number Theory to Numerical Analysis, 1-3, Academic Press, 1972. [War99] Warren P., Boldyreff C., Munro M., The evolution of websites, in Proceedings of the International Workshop on Program Comprehension, 178-185, 1999.
Maintenance Project Assessments Using Fuzzy Function Point Analysis Osias de Souza Lima Júnior
[email protected]
Pedro Porfírio Muniz Farias
[email protected]
Arnaldo Dias Belchior
[email protected]
Department of Computer Science, University of Fortaleza (UNIFOR) Fortaleza, CE, Brazil
ABSTRACT Function Point Analysis (FPA) is among the most commonly used techniques to estimate the size of development projects, enhancement projects or applications. During the point counting process that represents the dimension of a project or an application, each function is classified according to its relative functional complexity. This work proposes the use of concepts and properties from fuzzy set theory to extend FPA into FFPA (Fuzzy Function Point Analysis). Fuzzy theory seeks to build a formal quantitative structure capable of emulating the imprecision of human knowledge. With the function points generated by FFPA, derived values such as costs and terms of an enhancement project can be more precisely determined. Keywords: fuzzy sets, FPA, FFPA, metrics, evaluation model, enhancement project
1. Introduction Regardless of size, it is impossible to develop systems without the necessity for change. Throughout the life-cycle of a system, its original requirements will inevitably be modified to reflect user changes and customer needs. The process of changing a system after being released and utilized is called maintenance. Changes may involve correcting errors in codification, project elaboration, system specifications, or to incorporate new requirements. Therefore, maintenance is the process of change implementation that promotes the longevity of the software. Beginning in the late 1980s, emerging statistics revealed that many organizations allocate at least 50% of their financial resources to software maintenance [2, 4, 14]. This demonstrates the importance of maintenance studies that create new methods or adapt existing methodologies to implement the necessary activities within this process, such as estimating maintenance project size. Among other techniques, FPA (Function Point Analysis) allows for estimating the size of this type of project, being an important input to costs estimation [10]. Various extensions to this technique have been proposed in order to refine it. Maya [8], for instance, suggests altering the structure of FPA for a purer evaluation of small functional increments [1]. This work proposes the use of concepts and properties of fuzzy set theory to extend FPA to FFPA (Fuzzy Function Point Analysis), as relating to maintenance. Fuzzy theory seeks to construct a quantitative formal structure capable of emulating the imprecision of human knowledge. With the function points produced by FFPA, derived values such as developmental cycle and cost can be more precisely obtained.
2. Fuzzy Set Theory Approaches The fuzzy theory is inspired by the way the human brain acquires and processes information at low cost and high efficiency [11], that is, the manner in which the human mind deals with subjective concepts such as high, low, old, and new (linguistic terms), and its natural inclination
114
toward organizing, classifying and grouping into sets objects that share common characteristics or properties [9]. A fuzzy set is characterized by a membership function, which maps the elements of a domain, space or discourse universe X for a real number in [0,1]. Formally, à : X → [0,1]. Thus, a fuzzy set is presented as a set of ordered pairs in which the first element is x ∈ X, and the second, µÃ(x), is the degree of membership or the membership function of x in Ã, which maps x in the interval [0,1], or, à = {(x, µÃ(x)) | x ∈ X} [12]. The membership of an element within a certain set becomes a question of degree, substituting the actual dichotomic process imposed by set theory [9], when this treatment is not suitable. In extreme cases, the degree of membership is 0, in which case the element is not a member of the set, or the degree of membership is 1, if the element is a 100% member of the set [13]. Therefore, a fuzzy set emerges from the “enlargement” of a crisp set that begins to incorporate aspects of uncertainty. This process is called fuzzification. Defuzzification is the inverse process, that is, it is the conversion of a fuzzy set into a crisp value (or a vector of values) [3,13]. In the next sections, it will be presented the main FPA concepts and the way in which these concepts were extended by making use of fuzzy set theory.
3. Function Point Analysis (FPA) – Maintenance Project FPA can be applied to calculate the size of applications, development projects, or software maintenance. In the latter case, the following items comprise the functionality of a project of this nature: (i) the functionality of the application itself; (ii) the functionality of the conversion functions; and (iii) the adjustment factor. The functionality of the application includes all functions referenced in a maintenance project, whether added, altered, or excluded. New functions developed only to install a new version of the application are counted as conversion functions. To adjust the function points, two factors are considered: (i) the current adjustment factor; and (ii) the adjustment factor at the end of a maintenance project. The formula to determine the function points for a maintenance project is as follows [5]: EFP = [(ADD + CHGA + CFP) * VAFA] + (DEL * VAFB)
[Eq.1]
where, EFP = function points of an enhancement project; ADD = unadjusted function points of the functions added by the project; CHGA = new, unadjusted function points for functions modified by the project; CFP = function points of the conversion functions; VAFA = new value adjustment factor; DEL = function points for the functions deleted from the application; and VAFB = current value adjustment factor. After performing maintenance, the function points for the application should reflect the results of the maintenance project. To that end, Eq. 2 is used: AFP = [(UFPB + ADD + CHGA) - (CHGB + DEL)] * VAFA [Eq.2] where, AFP = new, adjusted function points for an application; UFPB = unadjusted function points of the application prior to maintenance; and CHGB = unadjusted function points of the functions modified by the project prior to maintenance. Regardless of counting type, classifying data and transactional functions follows what determines the matrix of relative functional complexity for each function. An external input (EI), for instance, may have its complexity classified as low, average or high, in accordance with the Table 1:
115
Table 1: Complexity Matrix of an EI FTR
DET 1 to 4
5 to 15
16 or more
0 or 1
LOW
LOW
AVERAGE
2
LOW
AVERAGE
HIGH
3 or more
AVERAGE
HIGH
HIGH
There are at least two clear situations in FPA that do not accurately translate the function points measurement process as can be observed in the data of Table 1: • Situation 1 (S1): an EI with 1 FTR and 2 DETs (function f1) is classified as low complexity (7 function points). By the same criteria, an EI with 1 FTR and 15 DETs (f2) is also classified as low complexity (3 function points). However, with an increment of only one more DET to the latter case, thereby increasing it to 16 DETs, the EI (f3) would be considered of average complexity (4 function points). Thus, FPA considers f1 and f2 as identical functionalities and f2 and f3 as substantially different functionalities. In the case where they are configured into the same project, the final measurement resulting will not correspond to a sufficiently accurate function points value. • Situation 2 (S2): an EI with 3 FTRs and 5 DETs has the same number of function points as an EI with 3 FTRs and 50 DETs; that is, they have the same functionality. In such a case, the number of DETs referenced, which determines the lower limits of the high complexity range, can lead to the same measurement precision difficulties observed in the above situation, especially in systems that reference a large number of DETs. Additionally, the abrupt and disjoined manner of classifying functions impedes the application from reflecting functionality in function points for what was added to it after maintenance. For instance, if a maintenance project consisting only of a modification to an already existing function, without altering the level of complexity after maintenance, the CHGB would be the same as the CHGA. Following Eq. 2, the AFP value remains unaffected in relation to that already installed. The FFPA proposal seeks to give a more accurate treatment to the process of counting function points by extending FPA to FFPA, and yet guarantees the validity of the final calculation of traditional function points.
4. Fuzzy Model to Function Point Analysis (FFPA) The central idea of extending FPA to FFPA through fuzzy set theory is to expand the semantics of traditional FPA by making use of the concepts and mathematical formalism of an already well established theory. The FFPA Model has already been utilized for calculating function points in estimates for software development projects as elaborated in [7]. The types of data functions (ILF and EIF) and transactional functions (EI, EO and EQ), within their respective functional complexity matrixes, can be mapped to discourse universe X, which corresponds to referenced DETs. These matrixes all use the same linguistic terms low, average, and high, to express their complexity. For each line of these matrixes, trapezoid-shaped fuzzy numbers were generated for each of their linguistic terms, because they best preserve FPA complexity matrix values in addition to circumventing the difficulties presented in S1 (item 3). A trapezoid-shaped fuzzy number can be represented by Ñ (a, m, n, b), whose membership functions are presented in Equation 3 below. The values a and b identify the lower and upper limits respectively of the larger base of the trapezoid, where µÃ(x) = 0. The values m and n are the lower and upper limits respectively of the smaller base of the trapezoid, where µÃ(x) = 1, as shown in Figure1.
116
0, if x < a ( x − a) /(m − a ), if x ∈ [a, m] µ Ñ ( x) = 1, if x ∈ [m, n] (b − x) /(b − n), if x ∈ [n, b] 0, if x > b Equation 3: Membership function of a trapezoid-shaped fuzzy number 1
0 a
m
n
b
Figure 1: Trapezoid-shaped fuzzy number FFPA consists of the four following stages [7]: •
First Stage
Through a fuzzification process, trapezoid-shaped fuzzy numbers are generated for each linguistic term belonging to the complexity matrix of the data and transactional functions. The value mi assumes the lower limit of the linguistic term i of the complexity matrix being considered. The value ni is calculated from the mathematical average of the values for mi and mi+1, whose result must be a rounded whole number. The values for ni-1 and mi+1 are attributed to ai and bi, respectively. •
Second Stage
A new interval of high complexity was added to data and transactional functions that called for at most an interval of average complexity and a new interval of very high complexity was added for the remaining functions, applying the modifier very to the linguistic term high. The last line of the complexity matrix of each function was the starting point for the creation of the new fuzzy number. In order to maintain the use of values used by the [5], it was decided that the number that indicates the lower limit of the third column of the matrix represents the value n of the fuzzy set of high complexity functions. In an EI, for example, the value of n i-1 would be 16 DETs, according to Table 1. Since the value of ni-1 of any given fuzzy number corresponds to the value of ai, it follows that the value of ai for the fuzzy number of a function of very high complexity would also be 16. Since the value of ai is calculated from the mathematical average of mi and mi-1, then for an ILF, the value of mi = 27, as follows: (m + 5) / 2 = 16 → m = 27 From this point forward, to simplify the remainder of this work, the value of mi for a fuzzy number of very high complexity will be referred to as k. The value corresponding to k must be calculated for each of the five function types belonging to FPA. Taking a value of k = 27 as calculated above, the membership functions of the trapezoid-shaped fuzzy numbers to the first line of the complexity matrix of an EI are presented in the Figure 2, in accordance to the model exposed.
117
Low
Average
High
1
0 1
9
16
22
27
DETs
Figure 2: Fuzzy numbers for EIs with 0 or 1 FTR •
Third Stage
In FPA, pi function points are attributed to each linguistic term ti of the n terms used (low, average, and high) in accordance with the complexity matrix under consideration. In FFPA, these points are directly associated with the fuzzy number of the linguistic term, where µÑ(x) = 1. From these data, the value for function point pm of the new linguistic term (very high) is obtained as follows: (i) xi = pi+1 - pi; (ii) r = xi+1 - xi; (iii) xn = x1 + (n –1) . r; (iv) pm = xn + pn. Applying the above definitions, function point values of 22, 14, 9, 10, and 9 were obtained as fuzzy numbers of very high complexity functions for ILF, EIF, EE, EO, and EQ function types, respectively. •
Fourth Stage
In FFPA, to obtain the number of function points pd from trapezoid-shaped fuzzy numbers, where µÑ(x) < 1, requires the following defuzzification process:
p d = µ Ñ ( x). pi + µ Ñ ( x). pi +1 5. Cases Study The fuzzy model proposed for function point analysis is being validated through a base of real data constituted by government systems that includes both maintenance and development projects. This database for the most part is made up of legacy systems developed mainly in Natural 2 language. Table 2 presents maintenance project estimates (M1, M2, ..., M6) in FPA and FFPA for some of these systems. The term estimates (in days) to program these systems were calculated in accordance with data supplied by Jones [6], considering both the level of the language used as well as the experience of the team utilizing it. The calculated values for k, according to the model, were: ILF/EIF = 82, EI = 27 and EO/EQ = 34. The margin of error corresponds to the difference between estimated (FPA and FFPA) and actual programming term. Table 2: FPA and FFPA estimates
System
Standard FPA points
Error Programming Actual term estimate Programming (%) (standard term FPA)
M1
10,20
6
9
M2
169,65
24
30
M3
96,05
60
65
M4
96,30
42
M5
12,20
M6
13,16
Programming term estimate (FFPA)
Error (%)
10,20
6
50
25,00 200,06
28
7,14
97,75
61
6,55
50
19,04 115,94
51
-1,96
13
17
30,77
12,76
14
21,43
7
10
42,86
13,29
7
42,86
118
50,00
FFPA points
8,33
With the results obtained above, it can be noted that there was a reduction between the predicted and real time taken to enhance a system when function points counting was done through FFPA. This corroborates the hypothesis that the fuzzy numbers generated better represents the functionality of an application when it possesses a large number of data or transactional functions with a large number of DETs. From a prototype built in Java, the values attributes to k for each function type were successively refined with the idea of reducing the margin of error in the estimates; that is, to find a time estimate in FFPA as close as possible to the actual programming time to the maintenance. The goal was to discover the combination of values for k whose Mean Absolute Error (MAE) for margin of error was as small as possible. In this case, the Mean Absolute Error corresponds to the average of the absolute values of the error percentiles. Table 3 presents results obtained from successive modifications of values of k. The function points for each project were calculated according to FFPA and the actual programming term was expressed in days. The values for k were recalculated for ILF/EIF, EI and EO/EQ. Upon analyzing the results obtained, we note that there ware no differences in the FPA and FFPA estimates for Project M1. This is because all the maintained functions in this project have a degree of membership equal to 1 (one). This is a further proof that the model preserves the values of standard FPA when the complexity functions of the project or system under maintenance are in the no extended region. Table 3: Estimates according to values for k K ILF
M1
M2
M3
M4
M5
M6
MAE (%)
EI
EO EQ
Points
82
27
34
10,2
6
169,6
24
96,0
60
96,3
42
12,2
13
13,1
7
21,6
74
25
31
10,2
6
203,5
29
97,7
61
119,0
52
12,7
14
13,3
7
21,3
67
25
31
10,2
6
209
29
97,7
61
124,1
55
12,7
14
13,3
7
22,2
67
23
31
10,2
6
209
29
97,7
61
124,1
55
12,7
14
13,3
7
21,9
67
23
28
10,2
6
209
29
97,9
62
126,6
56
12,7
14
13,3
7
21,9
61
23
28
10,2
6
210,6
30
97,9
62
126,6
56
12,7
14
13,3
7
21,6
61
21
28
10,2
6
210,6
30
97,9
62
126,6
56
12,7
14
13,3
7
21,6
61
21
25
10,2
6
210,6
30
98,3
62
127,4
56
12,7
14
13,3
7
21,6
55
21
25
10,2
6
218,3
31
98,3
62
127,4
56
12,7
14
13,3
7
22,1
55
19
25
10,2
6
221,8
31
98,3
62
127,4
56
12,7
14
13,3
7
22,1
55
19
23
10,2
6
221,8
31
98,7
62
127,4
56
12,7
14
13,3
7
22,1
53
19
23
10,2
6
223,3
32
98,7
62
127,4
56
12,7
14
13,3
7
22,7
53
18
23
10,2
6
223,3
32
98,7
62
127,4
56
12,7
14
15,2
8
19,7
53
18
22
10,2
6
223,3
32
98,7
62
127,4
56
12,7
14
15,2
8
19,7
EIF
Term Points Term
Points Term Points Term
Points Term Points Term
According to the data of the Table 3, the combination of values for k equal to (53, 18 and 32) and (53, 18 and 22) presented the smallest Mean Absolute Error (MAE). Therefore, these results indicate such values for k as indicated for use in maintenance project size estimates for this organization. However, it is worth pointing out that Project M6 strongly influenced these results since its small size gives it greater weight in percentile terms. The inclusion of new projects in the historical database may modify this scenario thereby identifying which would be the best
119
combination for use in estimates for the organization. Figure 3 graphically presents the Values for MAE obtained from the variation in the values for k. 23
MAE
22 21 20 19 18 82, 74, 67, 67, 67, 61, 61, 61, 55, 55, 55, 53, 53, 53, 27, 25, 25, 23, 23, 23, 21, 21, 21, 19, 19, 19, 18, 18, 34 31 31 31 28 28 28 25 25 25 23 23 23 22
Figure 3: Mean Absolute Error of Estimates for different values for k
6. Conclusion Using concepts and properties from fuzzy set theory, FPA was extended into FFPA (Fuzzy Function Point Analysis). Some important results obtained through the use of FFPA were: • The creation of the linguistic term very high complexity, pertaining to a parameterized interval through the value of k, which can be adjusted according to the characteristics of organization to better deal with larger systems; • Through the use of trapezoid-shaped fuzzy numbers for the linguistic terms low, average, and high, functions falling along the border areas of the intervals used receive values with a continuous graduation, without an abrupt change of those values; • This model offers a more precise programming term estimate than standard FPA, especially when evaluating systems that cross the threshold of high complexity by referencing a large number of DETs and FTRs within the same elementary process; • The model has become the most sensitive technique for modifying existing functionality and enabling the function points of an application to reflect the results of maintenance. This allows for better administration of the evolution of a system.
References [1] [2] [3] [4] [5] [6] [7]
Abran, A., Reliability of Function Points Productivity Model For Enhancement Projects (A Field Study), 1993. April, A., Abran, A, Industrial Research in Software Maintenance: Development of Productivity Models, Guide Summer ’95 Conference and Solutions Fair, Boston, 1995. Belchior, A. D., A Fuzzy Model to Software Quality Evaluation, Thesis of Doctored, UFRJ / COPPE, May, 1997; in Portuguese. Bourque P., Maya M., Abran A., A Sizing Measure for Adaptative Maintenance Work Products, IFPUG Spring Conference, Atlanta, April, 1996. Function Point Counting Practices Manual, Version 4.1, January, 1999. Jones, C., Programming Languages Table, Release 8.2, March, 1996. Lima, O. S. J., Farias, P. P.M., Belchior, A. D., Fuzzy Functions Points Analysis, Fesma - Dasma, Germany, May, 2001.
120
[8]
[9] [10]
[11] [12] [13] [14]
Maya, M., Abran A., Bourque P., Measuring the Size of Small Functional Enhancements to Software, 6th International Workshop on Software Metrics, University of Regensburg, Germany, September, 1996. Pedrycz, W. e Gomide, F., An Introduction to Fuzzy Sets – Analysis and Design, The MIT, Press, 1998. Ramil, J. F., Lehman, M. M., Cost Estimation and Availability Monitoring for Software Evolution Process, Workshop on Empirical Studies of Software Maintenance, San Jose, CA, USA, October, 2000. Wang P. e Tan S., Soft Computing and Fuzzy Logic, Soft Computing, vol. 1 (35-41), 1997. Zadeh, L. A., Fuzzy Sets, Information and Control, vol. 8 (338-353), 1965. Zimmermann, H. J., Fuzzy Set Theory and Its Applications, Kluwer Boston, 2nd revised edition, 1991. Zitouni, M., Abran A., A Model to Evaluate an Improve the Quality of Software Maintenance Process, 1996.
121
122
Author index Alvarez, M. B. …………………… 17 Balaji, S. …………………………. 52 Balaji, V. …………………………. 52 Basson, H. ………………………... 23 Bianchi, A. ……………………….. 47, 65 Bieman, J. M. .................................. 31 Bouneffa, M. ……………………... 23 Caivano, D. ………………………. 47, 65 Caprio, F. ………………………… 77 Casazza, G. ………………………. 77 Copstein, B. ……………………… 62 De Lucia, A. ……………………… 97 de Oliveira, F. M. ………………… 62 de Souza Lima, O. Jr 114 Deruelle, L. ………………………. 23 Dias Belchior, A. 114 Di Lucca, G. ……………………... 107 Di Penta, M. ……………………… 77, 107 Elbaum, S. ……………………….. 3 Fasolino, A. R. …………………… 107 Ferro, A. B. ………………………. 17 Gallagher, K. ……………………... 84 Granato, P. ……………………….. 107 Ikehara, S. ………………………... 101 Lanubile, F. ..................................... 47, 65 Lehman, M. M. ................................ 70 Melab, N. …………………………. 23 Mich, L. ........................................... 7 Muniz Farias, P. P. 114 Muraoka, Y. ...……………………. 101 Nakamura, Y. …………………….. 101 O'Brien, L. ……………………….. 84 Persico, A. ……………………….. 97 Pompella, E. ……………………… 97 Rago, F. …………………………... 47, 65 Ramil, J. F. ……………………….. 70 Santone, A. ………………………. 12 Schneidewind, N. F. ....................... 29 Simón, C. ………………………… 17 Sneed, S. H. ……………………… 39 Stefanucci, S. …………………….. 97 Takahashi, R. …………………….. 101 Tonella, P. ………………………... 35 Vaglini, G. ……………………….. 12 Villano, U. ………………………... 77 Visaggio, G. ……………………… 47, 65
123
124