Non-invasive Investigation of the Software Process in ... - CiteSeerX

3 downloads 0 Views 88KB Size Report
Nowadays, all non-trivial software projects involve several developers. ... systems collect a large set of data related to the development process [3]. The.
Non-invasive Investigation of the Software Process in Open Source Projects Martina Ceschi, Barbara Russo, Alberto Sillitti, Giancarlo Succi Abstract Open Source and Agile (eXtreme Programming, in particular) projects have several commonalities such as focus on the value for the user, continuous feedback, high level of communication, etc. Moreover, both approaches present difficulties in keeping track of the status of the development, verifying the quality of the production process, identifying best practices, etc. Such difficulties are related to the lack of a formal activity for the collection of data regarding the development process. However, the introduction of such an activity is not compatible with the basic principles of both approaches: focus on source code. For these reasons, the automated collection of data from source code repositories can help to provide a way to monitor the development process.

1

Introduction

Nowadays, all non-trivial software projects involve several developers. Therefore, the usage of a version control system is required to coordinate the development. Such systems have been designed to store source code and support developers in the management of different versions however, besides this data, version control systems collect a large set of data related to the development process [3]. The information collected includes code length, effort required, number of modifications, name of the developer that made the modification, etc. Many Open Source [7] and Agile [1, 2] projects are unable to access quantitatively their development process due to the lack of a formal collection of data [11]. However, it is possible to extract from version control systems useful process information. This approach is non-invasive since it requires only the interaction with a data repository and does not require any human effort from the developers. Developers perform only the usual activities such as write code, test it, and commit it into the version control system. The analysis is performed retrieving and analyzing data retrieved from the repository. This approach is compliant with the way many Agile and Open Source teams develop software. They focus on source code and do not have to spend time and effort in different activities such as collecting measures. 1

The paper is organized as follows: section 2 analyzes agile measurement approaches for Open Source projects; section 3 presents the early steps for the analysis of some Open Source projects; finally, section 4 draws the conclusions and presents future work.

2

Agile Measurement and Open Source

In every engineering area, measures are important to build artifacts. According to DeMarco ”You cannot control what you cannot measure” [5] is still valid in the software area. This is the main reason for implementing a measurement plan in every software project [8]. Approaches such as the Personal Software Process (PSP) [9] provide to developers a framework to collect data, perform analysis, and retrieve useful information. However, such approaches present a number of practical difficulties in their actual implementation [6, 10]. Moreover, according to the principles of the lean management [14, 17] from which the Agile Methods comes from [16], the manual data collection does not provide any value to final user, therefore it has to be removed. Metrics are useful but developers have not to spend time in their collection, thus collecting data without any human interaction is a way to address this problem. However, data collected automatically have different qualities compared to the one collected manually. The main advantages are: reliable and detailed data. The main disadvantages are: difficulty of interpretation, large amount of data to manage. It is interesting to explore the possibility of using the information stored in source code repositories to access the actual approach to software development adopted by different OS communities in similar and very different projects. Moreover, it is possible to analyze very old repositories and compare retrieved data to recent ones to find out how the approach has changed in time. In addition, it should be possible to compare the acquired data and the bug reports to verify if there are any relationships useful to improve the whole development process [13]. Many OS projects have started several years ago (e.g., Linux, XFree86, Apache HTTP Server, etc.) and they are still in progress; during these years, version control repositories have collected a very huge amount of information. Such historical information should be useful to identify changes in the development process that produce enhancements in software (e.g., quality, less effort required, etc.) without planning to monitor the process and without waiting enough time to collect and evaluate ad hoc collected data.

3

Measuring Open Source projects

The analysis of source code repositories relies on the availability of many large projects to perform the data collection and tune the analysis. Open Source 2

projects represent a valuable set of code to test and tune a system like that. Version control systems have been designed to store information about changes in the code but they do not collect any information regarding the type and the purpose of the modification introduced. Identifying the purpose of such changes automatically is extremely difficult or impossible in most of the cases. However, it is possible to classify them automatically through the semantic of the language used. A very simple classification identifies three main types of modifications that developers can introduce in source code: 1. comments: include all changes that affect source code comments and do not modify any executable instructions; 2. non-structural modifications: include modifications of the source code instructions that do not affect any execution paths of the program (all the function calls and the instructions except the flow control ones: if-then-else, for, do-while, switch, etc.); 3. structural modifications: include modifications of the source code that change execution paths of the program. An automated system able to understand the semantic of the language can easily check for all these kinds of modifications [12]. The architecture of the data collection system is showed in Figure 1. The data extractor access the version control system and extract all the version

Figure 1: Data collection architecture of all the files for a specific project, compares the different versions and classifies the modifications according to the previous classification. Such information is stored into a data warehouse that provides an interface to support queries made by the users. A file retrieved from a version control system contains the following information:

3

1. file name; 2. author name; 3. version number; 4. number of lines added or removed; 5. comment of the version; 6. module name; 7. date of the modification. Such information for every file in a project and during its lifecycle can be used to perform several kinds of data analysis including a behavioral analysis of the development team and of the single developers [4]. In particular, it is useful to apply the concepts of the gamma analysis [15] used in the social sciences. The goals of such analysis will be the identification of common and/or recurrent behaviors inside the same development team and across different ones. The aim is to find out if the development is carried out using different approaches or most of the developers behave in the same way. Moreover, it will be interesting to analyze how the guidelines and the project-specific rules affect the development and how the single developers produce software.

4

Conclusions and future work

This paper presents a first attempt to analyze the behaviour of OS developers through the usage of non-invasive technologies developerd to support Agile processes. The goal is to retrieve useful information from the source code repositories and provide continuous feedback to the developers to improve the abilities of the single developers. The proposed system is able to perform basic analysis of CVS based repositories containing source code written in C, C++ and Java. The CVS version control system and the selected languages are the most used in the OS community, therefore it is possible to analyze several OS projects. However, the system is not able to support encrypted connections and the new version control system (Subversion) that is going to substitute many CVS repositories.

Acknowledgment This work was partially supported by MAPS (Agile Methodologies for Software Production) research project, contract/grant sponsor: FIRB research fund of MIUR, contract/grant number: RBNE01JRK8.

4

References [1] Abrahamsson, P., O. Salo, J. Ronkainen [2002] Agile software development methods. VTT Publications. Available online at: http://www.inf.vtt.fi/pdf/publications/2002/P478.pdf. [2] Beck, K., M. Beedle, A. Bennekum, A. Cockburn, W. Cunningham, M. Fowler, J. Grenning, J. Highsmith, A. Hunt, R. Jeffries, J. Kern, B. Marick, R. Martin, S. Mellor, K. Schwaber, J. Sutherland, D. Thomas [2001] Manifesto for Agile Software Development. Available online at: http://www.agilemanifesto.org/ [3] Cederqvust, P. [1999] Version Management with CVS. Available online at: http://www.cvshome.org/manual [4] Cooley, R., B. Mobasher, J. Srivastava [1997] Web Mining: Information and Pattern Discovery on the World Wide Web. Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’97), November. [5] DeMarco, T. [1982] Controlling Software Projects - Management, Measurement and Estimation. Yourdan Press. [6] Disney, A.M., P.M. Johnson [1998] Investigating Data Quality Problems in the PSP. Proceedings of the 6th International Symposium on the Foundations of Software Engineering (SIGSOFT’98), Orlando, FL, USA, November. [7] Feller, J., B. Fitzgerald [2002] Understanding Open Source Software Development. Addison-Wesley. [8] Fenton, N.E., S.H. Pfleeger [1994] Software Metrics: a Rigorous and Practical Approach. Thomson Computer Press. [9] Humphrey, W. [1995] A Discipline for Software Engineering. Addison-Wesley. [10] Johnson, P.M., A.M. Disney [1999] A critical analysis of PSP data quality: Results from a case study. Journal of Empirical Software Engineering, December. [11] Johnson, P.M. [2001] You can’t even ask them to push a button: Toward ubiquitous, developer-centric, empirical software engineering. Proceedings of the NSF Workshop for New Visions for Software Design and Productivity: Research and Applications, Nashville, TN, USA, December. [12] Metsker, S.J. [2001] Building Parsers with Java. Adison-Wesley. [13] Mokus, A., R.T. Fielding, J. Herbsleb [2000] A Case Study of Open Source Development: The Apache Server. Proceedings of the International Conference on Software Engineering, Limerick, Ireland, May. 5

[14] Ohno, T. [1988] Toyota Production System: Beyond Large-Scale Production. Productivity Press. [15] Pelz, D.C. [1985] Innovation Complexity and Sequence of Innovating Strategies. Knowledge: Creation Diffusion, Utilization, Vol. 6, pp. 261-291. [16] Poppendieck, M., T. Poppendieck [2003] Lean Software Development: An Agile Toolkit for Software Development Managers. Addison-Wesley. [17] Womack, J.P., D.T. Jones [2003] Lean Thinking: Banish Waste and Create Wealth in Your Corporation. Free Press.

6

Suggest Documents