LNEE 156 - Using Version Control System to Construct ... - Springer Link

3 downloads 11925 Views 169KB Size Report
Instead of buying commercial software, more and more people try to use open ... Documentations can help software development and maintenance. The.
Using Version Control System to Construct Ownership Architecture Documentations Po-Han Huang, Dowming Yeh, and Wen-Tin Lee

Abstract. Ownership architecture was usually constructed by investigating the comments at the top of source files. That is, to associate developer names with source files is to examine the comments manually. If such documentation can be produced automatically, it will be more immediate to indicate the status of the project. This research focus on the logs in the version control system. The data within version control logs is in a regular form and information can be retrieved quickly. The importance of developers can also be estimated by the number of own files and frequency of making a change. In order to understand the system architecture, the directory structure of source code can be used to identify function components of the system essentially. The source files in a directory implement the same function component, and the owners of these source files can be considered a team. Using the documents, researcher can know the ownership architecture and more information about the status of the project.

1 Introduction Instead of buying commercial software, more and more people try to use open source software to meet their needs in recent years. Open source software is characterized by its open source code and free distribution. Many volunteers participate in open source projects to contribute their efforts. A successful open source project with well-managed development team and community can always produce high quality software. Po-Han Huang Graduate Institute of Information Education, National Kaohsiung Normal University, Kaohsiung, Taiwan e-mail: [email protected] Dowming Yeh · Wen-Tin Lee Department of Software Engineering, National Kaohsiung Normal University, Kaohsiung, Taiwan e-mail: {dmyeh,wtlee}@nknu.edu.tw

F.L. Gaol (Ed.): Recent Progress in DEIT, Vol. 1, LNEE 156, pp. 41–46. c Springer-Verlag Berlin Heidelberg 2013 springerlink.com 

42

P.-H. Huang, D. Yeh, and W.-T. Lee

Software projects should have documentations that describe the architecture of the system. Documentations can help software development and maintenance. The ownership architecture was defined in a previous research [1]. The Ownership architecture that shows the relationship between developers and source files is useful to understand a software system. Besides, ownership architecture documentations have other use: 1. Identification of Experts: If a developer has any questions about some part of the system, the most effective way is to consult another experienced developer face to face. Developers can rapidly find the appropriate expert via the ownership architecture documentation. 2. Finding Non-Functional Dependencies: To understand a system, function call and data access is often used to find dependencies of the source code. But in some cases, this approach cannot find relationships that are not visible in the source code. If a developer worked on several parts of a system, we may reasonably hypothesize that these parts have non-functional dependencies. By the ownership architecture, the ownership of every source code and subsystem can be easily found. 3. Quality Estimates: An experienced developer usually can write code with good quality. By reading the list of the developers in the ownership architecture documentation, we can estimate the quality of source code. 4. Adjusting number of team members: Wrong number of developers may reduce development efficiency. Too many developers may not be able to adequately partition the work, and too few developers may cause delay. Ownership architecture can be used to help adjusting the number of team members. However, many software projects do not have such documentations or the content is not up-to-date. Therefore the purpose of this research is to reconstruct system documents based on existed project development data and open source projects are main research objects.

2 Ownership Architecture The ownership architecture was defined in the previous research [1]. The elements of ownership architecture are source files, subsystems, developers, and teams (see Fig. 1). It uses subsystems to group source files, teams to group developers, and shows the ownership between developers and source files. The main purpose of the study is to construct ownership architectures to help understanding software systems. Several studies have adopted the concept of ownership to help understanding systems. For example, the ownership of source files was used to understand how the developers drove the evolution of the system, including the number of developers and the behaviors of developers [3]. There are also other applications like domain expert identification, understanding the organization’s development or team structure [2]. For most software projects, ownership architecture documentations do not exist. Therefore the ownership architecture has to be reconstructed by existed data.

Using Version Control System to Construct Ownership Architecture

43

Much useful information that reveals status of projects has been reserved in the process of software development, including comments in source code, logs in version control systems, and project documentations. The data is public and easy to get for open source projects. The characteristics and constraints of each data are as follows: 1. Source code: Most of source files have a comment at the top of the file. This comment may contain a copyright notice, the developer’s name, and change information of the source file. The owner of each source file can be found by retrieving the developer’s name. But this method has some constraints. First, not all source files have comments or the developer’s name is omitted. In this case, to find the owner is almost impossible. Secondly, the format of comments is not fixed. In the previous research [1], ownership architecture was constructed by investigating the comments at the top of source files. However, the method to associate developer names with source files is to examine the comments manually. To retrieve the information in comments automatically is difficult and advanced techniques are needed. 2. Project documentations: The lists of developers usually can be found in project documentations or project websites. Some detailed lists may also describe achievements of each developer. By the documentation, we can know how many developers attend a project and the work allocation. But the relationship between developers and source files is not demonstrated, and the number of developers may increase as the software develops. If the document update is not frequent, some developers may not appear in the list. Although the list of developers cannot be used to construct the ownership architecture, it is helpful to identify developers. 3. Version control systems: Version control systems are used generally to manage source code. After developers doing a change to the source file, the version control system will record the modification. Each change log contains the developer’s name, the date of change made, and comments. We can find at least one developer of each source file by analyzing the change logs. Because change logs have a fixed format, to program a tool which can extract information automatically is practicable. There are several studies that have utilized the information in version control systems. Version control logs are investigated to explain the rational of dependencies and software architecture [4]. Other applications are like measures of expertise [6] and social network analysis [5].

 Fig. 1 Ownership Architecture

44

P.-H. Huang, D. Yeh, and W.-T. Lee

3 Design and Implementation 3.1 Data Collection In this research, the Password Safe project [7] in SourceForge [8] was chosen as the research object. SourceForge is an open source software development web site and provides free hosting to open source software development projects with a centralized resource for managing projects, issues, communications, and code. The Password Safe project has many public areas, like bug tracking system, forums, and software repositories. Because ViewVC was used, we can use web browsers to browse SVN repository of the project. ViewVC [9] is a browser interface for CVS and SVN repositories. It generates HTML to present navigable directory, revision, and change log listings. The developers of Password Safe project use SVN to manage the source code. Detailed information like source code and change logs can be found by clicking hyperlink. The research goal is to know the ownership between developers and source files. Change logs in the software repository are necessary. In order to collect data efficiently, we use the offline browser to download html files that contain change logs for later analyzing.

3.2 Data Analysis Each change logs contains revision number, date and developer, file length, and comment. Because every change log has the same format, we can design a tool to retrieve the information automatically. To find the pattern, we use a text editor to open the change log and analyze its format in the form of text. After analysis, the pattern is shown in Table 1.

Table 1 Pattern of a change log Information Revision number Date Developer File length Diff to

Html code number Modified \n \n date by developer
File length: length(s) previous number

Table 2 shows how to use JAVA regular expression to retrieve the name of the developer in a change log. First, a regular expression, specified as a string, must be compiled into an instance of Pattern class. The resulting pattern can then be used to create a Matcher object that can match arbitrary character sequences against the regular expression. In this case, the string between by ””and ”” will be found. Finally, the Matcher object matches the entire input text and returns matched strings.

Using Version Control System to Construct Ownership Architecture

45

Table 2 Pattern Matching in Java Function Patter definition Pattern matching Matched strings

Java code Pattern p = Pattern.compile(” (?

Suggest Documents