Extracting New Metrics from Version Control System for ... - IEEE Xplore

0 downloads 0 Views 229KB Size Report
marcello@ufg.br, {hadn,thierson}@inf.ufg.br. Abstract—Previous studies have evaluated the work done by software developers using data extracted from ...
2014 Brazilian Symposium on Software Engineering

Extracting new metrics from Version Control System for the comparison of software developers Marcello H. D. de Moura ∗ , Hugo A. D. do Nascimento † , Thierson C. Rosa ∗,

Centro de Recursos Computacionais Instituto de Informática Universidade Federal de Goiás (UFG) Caixa Postal 131 – 74.001-970 – Goiânia – GO – Brazil [email protected], {hadn,thierson}@inf.ufg.br

Although all these pieces of research used data from VCS, their main focus was to understand the evolution of the source code, with few insights on the outcomes of the relationship among the developers.

Abstract—Previous studies have evaluated the work done by software developers using data extracted from version control systems (VCS). However, they have focused mostly on counting the amount of written lines of code and the number of commits, which are general information that can be obtained from these software repositories. In the present article, we innovate by considering fine-grain operations at line and file levels stored in the VCS, like additions, deletions and modifications, which allow to derive a much more detailed and rich information about the work done by developers. We also define a new set of metrics to measure such fine-grain information and present two simple approaches for comparing developers based on the proposed metrics. This helps to improve our understanding of how important and alike the developers were. A case study using data from a real software development project is described. The study showed that the metrics and the comparative approaches resulted in information that is consistent with the perception of the project manager. Furthermore, our investigation points to a great potential for future work by expanding the set of metrics and exploring new comparative approaches.

In turn, other studies using VCS were more specific in devoting their attention to the analysis of the software developers. Gilbert and Karahalios [8] combined data from VCS with emails exchanged by the developers. They aimed to identify any evidence of a relationship between communication by email and modifications of the source code. Jermakovics et al. [9] tried to discover hidden structures in the collaboration network formed by groups of developers. Mockus and Herbsleb [10], Minto and Murphy [11], and Schuler and Zimmermann [12] identified developers that are experts in parts of code based on the frequency of changes made by them. In addition, Zhang et al. [13, 14] extracted VCS data and computed metrics for composing profiles of the developers and clustering them in groups. Recently, Bella et al. [15] analyzed VCS logs of ten software projects, aiming at identifying key developers and at classifying them. By using statistical techniques like PCA, factor analysis and cluster analysis, they were able to classify the developers in all projects in four groups: core developers, active developers, occasional developers, and rare developers.

Keywords—metrics, comparison of software developers, version control systems.

I.

I NTRODUCTION

Version Control Systems (VCS), like Subversion [1] and Git [2], store revisions of the files of a software development project, registering its historical evolution. In general, these revisions are motivated by different needs that appear during the software development process, such as: adding new features to the software, optimizing its code or fixing bugs.

Our research is similar to later ones, in the sense that we also focus on the analysis of the work done by software developers, instead of concentrating on the evolution of the source code. We investigated how developers contribute to a software project by measuring their activities, so that key developers can be identified and characterized. To this end, the present paper explores finer-grain operations at line and file levels that can be extracted from a VCS, like additions, deletions and modifications. This has not been done in previous studies and allows to derive a much more detailed and rich information about the work performed by the developers, which is then captured by a new set of metrics formally defined. It is important to note that we avoid to compare the contributions of the developers against a target measure (an absolute value). Instead, we suggest simpler comparison approaches that are based on the human natural ability for evaluating people: comparing every developer against the others. Two comparison approaches are described: performance-based hierarchy and similarity comparison.

VCSs have also been widely used for helping to understand the software development process. For example, LopezFernandez et al. [3] and Huang and Liu [4] mined VCSs and used, respectively, complex network analysis and graph drawing approaches to study the collaboration between developers. Girba et al. [5] created more advanced information visualizations from the VCS data to verify behavioral patterns of the developers. Among the patterns they observed, the following ones were mentioned: Takeover – when a developer writes plenty of code in a short period of time; Bugfix – indicating minor corrections in the code; and Teamwork – representing an alternating collaboration between the team members. Later on, Voinea et al. [6, 7] analyzed the work done by software developers through sophisticated visualizations. Their visualizations employed the Overview & Detail technique for producing macroscopic and microscopic views of changes in the source code. 978-1-4799-4223-7/14 $31.00 © 2014 IEEE DOI 10.1109/SBES.2014.25





Before we start describing our work, we must emphasize that the data in a VCS can not be taken as a full and precise description of the software development process. It is incomplete [16] and may lead to distinct interpretations as we 41

will discuss later in the paper. This implies that the information extracted from a VCS has to be revalidated by the project managers and complemented with their own knowledge.

fileX.txt ========= line1 line2 line3

The remainder of this paper is organized as follows: Section II describes how a VCS can be processed for extracting fine-grain operations; Section III introduces the new set of metrics; Section IV presents two approaches for comparing the developers; in Section V, we report a case study using our metrics and the comparison approaches with a real software project; finally, Section VI points out our conclusions and suggests future work. II.

diff -u fileX.txt fileY.txt (output) ==================================== --- fileX.txt 2014-06-02 16:44:22.411596560 -0300 +++ fileY.txt 2014-06-02 16:44:21.195596523 -0300 @@ -1,3 +1,3 @@ +line0 line1 -line2 -line3 +line2.1

E XTRACTING FINE - GRAIN OPERATIONS FROM VCS Figure 1: Two versions of a file and their diff in the unified format.

As described before, we are interested in extracting finegrain data of a software development project from a VCS. We mine the VCS for three types of operations: additions, deletions and modifications of files and lines of code. We also want to know the order in which the operations are performed.

The output of the extractor is a data structure with the historical evolution of the software project. It contains the necessary information for computing our metrics. We now define this data structure formally.

For this aim, we implemented an extractor that connects to a VCS and recovers the data of a software project 1 . The extractor works as follows: 1) 2) 3) 4)

5)

fileY.txt ========= line0 line1 line2.1

A data structure for describing the evolution of the software project is initialized. The first revision of the project (that represents its fist commit) is recovered from the VCS. If the revision adds new files, then the data structure is incremented to represent this fact. For every file changed by the revision, the difference between its old and new versions are computed and expressed using the unified diff format [17, Chapter 2]. Next, the result of the diff command is processed in order to obtain fine-grain operations in line level. The data structure is then updated to reflect the changes inside the files. If there is another revision, in sequence, to be analyzed, it is recovered from the VCS and the above process is repeated starting at Line 3.

A. Revision History Let P be a software project in a VCS and D be the set of developers that have worked in it. Also, let A be the set of all files created during the development of P, and Ar ⊆ A be the set of files that were removed (not reaching the final version) of P. For each file a ∈ A which exists or existed during the development of P, we define a history file Ha containing operations of creation, modification or removal made on the lines of a. The project history (HP ) of P is the set of history files corresponding to the |A| files in P, i.e., HP = {Ha1 , Ha2 , . . . , Ha|A| }. A history file Ha is a sequence of history lines hal1 , hal2 , . . . , halt  for t consecutive lines that appear or have appeared in at least one of the versions of file a. Each history line hali , in turn, is a sequence of operations a,i a,i hali = oa,i 1 , o2 , . . . , oend  that occurred on line li of file a. Each operation o of a history line has four attributes: o.rev, o.devel, o.type and o.text, which are defined as follows:

The unified diff of two versions of a same file specifies all line additions and line deletions that occurred in them. We apply a set of rules to interpret the diff result and to differentiate modifications from simple additions and deletions. The modifications are given by additions that immediately follow and match a sequence of one or more deletions. It is necessary to match every deletion with an addition in sequence in order to identify a modification. An example is illustrated in Figure 1. The figure shows two versions of a file, called fileX.txt and FileY.txt, and their diff in the unified format. From the diff, we have that “line0” was added, “line2” was modified (since “+line2.1” matches with the first deletion in sequence, that is, “-line2”) and "line3" was deleted (no addition matches with it).

– The number of the revision of project P when the operation o was performed. o.devel – The reference to the developer that performed the operation o, with o.devel ∈ D. o.type – The type of the operation, which can be: ADD (addition), M OD (modification) and DEL (deletion/removal). o.text – The content of the line after applying the operation.

o.rev

The first operation of every history line always has attribute type equal to ADD. Also, ADD appears just once in a history line. It is followed by zero or more operations with attribute type equal to M OD. There may be only one operation of type DEL per line and, when it happens, it becomes the last operation in the corresponding history line.

1 Most VCSs provide a rich API for extracting the differences from any two revisions of a given piece software stored in it and for collecting many other data. We use, however, only two simple resources of a VCS: returning the amount of revisions and recovering a particular revision. This makes our work independent of the VCS’s technology and suitable for implementing our own way of analyzing the source code.

Whenever a new line j is created in an intermediate a , the history line position in the file a, say between lia and li+1 42

P. This metric returns the total amount of M OD operations performed by d in the lines of P.

haj for j is created and inserted in its corresponding position in the sequence Ha , i.e., between hali and hali+1 . On the other hand, operations of type M OD or DEL on a line j only cause the insertion of a new operation o in the corresponding history line halj .

Effo_Mod(d) =

If a file a is removed in some revision of the project development, we assume that all of its lines are deleted. This is indicated by a final operation of the type DEL in all history lines in Ha . Thus, the size of a history line hali , denoted by |hali |, is less than or equal to the number of revisions of file a that appear in the VCS for project P.

a∈A i=1 j=1

Effo_Del(d) =

Given a developer d, we introduce metrics that evaluate the work of d in a project P according to two aspects that we call “effort” and “code-survival”. Effort represents the total amount of operations performed by d. This does not take into consideration if the result of such operations were further modified or removed by d or by another developer. Codesurvival, on the other hand, indicates the amount of operations that were performed by d and were never changed by anyone. Moreover, successive M OD operations done by d on a line are considered here as a single operation for the goal of evaluating its code-survival. This is not the case for the evaluation of effort.

Surv_Add(d) =

if oa,i end .devel = d and oa,i end .type = DEL; otherwise.

0

(3)



⎧ 1 ⎪ ⎪ ⎪ |Ha | ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎩

0

if oa,i 1 .devel = d and ∀ oa,i s with s > 1, (oa,i s .type = M OD and oa,i s .devel = d); otherwise.

(4)

The metric Surv_Add corresponds to the total number of lines that were created by a developer d and not changed by another developer or removed later by d himself. The term 2 oa,i 1 .devel = d indicates that the first operation of the line i of a was made by the developer d. The other terms indicate that if there exists any other operation on the same line, it should be of type M OD and performed only by d. The survival of modifications at line level (Surv_Mod: D → N) for a developer d ∈ D is defined in Eq. (5).

A. Metrics for evaluating developers individually

Surv_Mod(d) =

Measurements of effort may be computed at different levels of granularity (line level and file level) for a developer d. For instance, Eq. (1), Eq. (2) and Eq. (3) represent metrics of effort at a line level. Metric Effo_Add: D → N, for a given developer d ∈ D, corresponds to the creation effort of d in project P and represents the total number of lines added by d in the project, as stated in Eq. (1).

a∈A i=1



a∈(A−Ar ) i=1

In the following paragraphs, we present metrics to assess the effort and the code-survival of developers individually (Section III-A). We then introduce metrics to assess the relationship between developers based on the types and the ordering of operations in the history lines (Section III-B). In Section III-C, we extend the metrics to the file level. At last, we discuss course-grain metrics regarding commits (Section III-D).

if oa,i 1 .devel = d otherwise.

(2)

Similarly to effort, measures of code-survival may be computed at different levels of granularity for a developer d. However, it only makes sense to have such measures for operations of types ADD and M OD. Operations of type DEL, whenever they occur, are always the last ones in the history lines and thus, they are not affected by any other operation. Eq. (4) shows how to compute the survival of additions (Surv_Add: D → N) for a given developer d at line level.

The order in which different types of operations occur in a history line is also useful for our purpose. This order may help unveiling interesting relationships between the developers. For example, we may discover that the lines created by a developer x were frequently modified or removed by a developer y. As far as we know, such relationships have not been considered in detail by previous works.

a|   |H  1 0

⎧ a| ⎨ 1   |H a∈A i=1

M ETRICS FOR THE DEVELOPERS

Effo_Add(d) =

⎪ ⎩ 0

if oa,i j .devel = d and oa,i j .type = M OD; otherwise.

The removal effort (Effo_Del: D → N) for a developer d ∈ D is defined in Eq. (3) and it gives the total number of lines deleted by d.

In the next section, we introduce a new set of metrics based on the types of operations extracted from the VCS. III.

⎧ |ha ⎪ 1 li | ⎨ a|   |H 



⎧ 1 ⎪ ⎪ ⎪

|Ha | ⎪ ⎨



a∈(A−Ar ) i=1

⎪ ⎪ ⎪ ⎪ ⎩

0

if oa,i end .type = M OD and oa,i end .devel = d and ∃w, 1 ≤ w < |ha li |, such that oa,i w .devel = d; otherwise. (5)

This metric evaluates the amount of lines in P that satisfy the following conditions: the last operation (oend ) is of the type M OD and was performed by developer d; and there is at least one operation in this line before oend that was done by another developer. The fine-grained information extracted from the VCS allows for deriving more refined metrics to assess the work of developers. For instance, the metric Effo_Mod, as defined in Eq. (2), may not be considered effective to describe the

(1)

Eq. (2) defines Effo_Mod: D → N for a developer d ∈ D which represents the modification effort of d in a project

2 Remember

43

that the first operation of a history line is always an ADD.

B. Uncovering and measuring relationships between developers

modification effort in all situations, since some developers could commit their work less regularly then others. Thus, the metric for such developers would be strongly affected by their commit habit. We can avoid this problem by compressing every sequence of modifications made by a same developer into a single M OD operation. For this aim, we propose the concept of compressed history line as follows: given a history line hl , its corresponding compressed history line, represented by hl , contains the same sequence of operations except that every sequence of operations of type M OD made by a same developer d appears in hl as a single M OD operation performed by d.

The fine-grain information we extract from VCS also allows to uncover relationships among developers based on the sequence of operations they executed on the lines of the project. This kind of information is important from the managerial point of view. For instance, suppose that a project manager detects that some developer d has low creation survival, by analyzing the value of d for the metric Surv_Add, according to Eq. (4). The manager may then verify which developers modified the lines created by d. Such add_mod relationship can provide to the manager a notion of how much work had to be refactored and by whom, what indicates wasted time. Although the information from the VCS is not sufficient to describe the cause of the low creation survival of d and its implications, the relationship shows that there are some aspects of the development process of d that may need attention.

The compressed history lines allow for computing the distinct (non-repetitive) modification effort of a developer, (Effo_Dist_Mod: D → N), which is defined in Eq. (6). The operations in the compressed history line are represented in Eq. (6) by the symbol o instead of o. ⎧

Effo_Dist_Mod(d) =

|Ha | |hli | ⎪ ⎨

  

a∈A i=1 j=1

⎪ ⎩

1

if oa,i j .devel = d

0

and oa,i j .type = M OD; otherwise.

In what follows we present the line metric Line_Add_Mod: D × D → N, as one of the possible ways of measuring the relationship commented above. The metric is defined in Eq. (10). It computes the total number of pairs of operations where the first one (of type ADD) is executed by developer x and the second operation is a M OD and is performed by developer y.

(6)

The metrics mentioned before can be combined in many ways to produce new interesting ones. Next, we present new combined metrics that may be of interest to project managers. The first one is the metric Surv_Add_Div_Effo_Add (Surv_Add_Div_Effo_Add: D → R), defined in Eq. (7).

Line_Add_Mod(x, y) =

⎧ 1 ⎪ ⎪ ⎪ ⎪ ⎪ a| ⎪ ⎨   |H a∈A i=1

Surv_Add_Div_Effo_Add(d) =

Surv_Add(d) Effo_Add(d)

(7)

This metric provides an indication of the balance between the effort and the code-survival of the work performed by a developer. When the value of the metric is equal to one, we have that all lines created by the developer remained unchanged until the end of the project. This information helps to differentiate developers who have similar values of survival creation but whose creation efforts are distinct.

Surv_Mod(d) Effo_Dist_Mod(d)

(8)

This metric may be used to compare developers with similar values of modification code-survival but different modification effort.

Line_Add_Del(x, y) =

⎧ 1 ⎪ ⎪ ⎪ ⎪ ⎪ a| ⎪ ⎨  |H  a∈A i=1

Finally, it is useful to evaluate the work of a developer in relation to the work of all other developers involved in the project. To this end, we define normalized versions of all metrics proposed above. They are indicated by the general relative metric Metric_Rel: D → R in Eq. (9), which corresponds to the ratio of any of the above metrics for a developer d to the sum of the same metric computed for each developer in D.

Line_Mod_Mod(x, y) =

(9)

44

⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

0

if |hli | > 1 and oa,i 1 .devel = x and oa,i 1 .type = ADD and oa,i 2 .devel = y and oa,i 2 .type = DEL; otherwise.

⎧ ⎪ 1 ⎪ ⎪ ⎪ |Ha | |hli |−1 ⎪    ⎨ a∈A i=1

Metric(d) Metric_Rel(d) =  x∈D Metric(x)

0

(10)

Similarly, other metrics can be created to assess operation-based relationships between developers. The metrics Line_Add_Del(x, y): D × D → N, Line_Mod_Mod(x, y): D × D → N and Line_Mod_Del(x, y): D × D → N, defined respectively by Eq. (11), Eq. (12) and Eq. (13), follow the same structure just presented. They only vary in the pair of operations taken in sequence (, and ). Note that the metrics for the pair is defined using the compressed history line, so that it does not consider repetitive M OD changes by a same developer.

A similar metric can be proposed regarding M OD operations. We define the metric Surv_Mod_Div_Effo_Dist_Mod: R → N in Eq. (8). Surv_Mod_Div_Effo_Dist_Mod(d) =

⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

if |hli | > 1 and oa,i 1 .devel = x and oa,i 1 .type = ADD and oa,i 2 .devel = y and oa,i 2 .type = M OD; otherwise.

j=1

⎪ ⎪ ⎪ ⎪ ⎪ ⎩

0

(11)

if oa,i j .devel = x and oa,i j .type = M OD and oj+1 .devel = y and oj+1 .type = M OD; otherwise. (12)

 

⎧ ⎪ ⎪ 1 i ⎨ 

a∈A i=1

j=1

⎪ |Ha | |hl |−1 ⎪

Line_Mod_Del(x, y) =

⎪ ⎪ ⎪ ⎪ ⎩

0

file is removed at revision i, then Li holds this information but the file no longer appears in future revisions4 .

if oa,i j .devel = x and oa,i j .type = M OD and oj+1 .devel = y and oj+1 .type = DEL; otherwise. (13)

The Metric (File_Add_Mod: D × D → N) defined in Eq. (16) indicates the number of changes made immediately by the developer y in files created by the developer x. This corresponds to the same relationship measured by Eq. (10), but with a greater granularity.

Each of the four previous metric induces a matrix of dimension |D| × |D|, which cells contain the quantity of occurrences of a certain pair of operations. The sum of the rows and columns of these matrices are also metrics for the developers. Thus, we can create variations of the above metrics based on the sum of a row or a column, excluding the cells on the main diagonal, as exemplified below. Line_Add_ΣMod(d) =



Line_Add_Mod(d, y)

(14)

Line_Add_Mod(x, d)

(15)

File_Add_Mod(x, y) =

⎧ 1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

a∈A ⎪ ⎪ ⎪

0

y∈D−{d}

Line_ΣAdd_Mod(d) =



The metrics in Eq. (11) and Eq. (13) have a similar representation in file level and will not be presented here. We, nevertheless, present the metric File_Mod_Mod, as it has a different notation.

x∈D−{d}

Eq. (14) gives the total amount of changes made by other developers immediately on the lines created by developer d (Line_Add_ΣMod: D → N). Eq. (15) represents the total amount of changes made by developer d immediately on the lines created by other developers (LineΣAdd_Mod: D → N). The same concept can be used to produce individual metrics based on Eq. (11), Eq. (12) and Eq. (13). In all, eight new metrics on individual aspects are obtained.

|S|−1

File_Mod_Mod(x, y) =

C. Extending the metrics for the file level

A project revision is a triple (r, d, L), where: r is the label of the revision,



d is a identifier of the developer who made the revision, with d ∈ D, and



L is a list of pairs (a, t) where a is a file and t ∈ {A, M, D} describes the operation3 performed on file a in revision r.



|S|  

⎧ 1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

i=1 j=i+1 a∈A ⎪ ⎪ ⎪

0

if, in triples (ri , di , Li ) and (rj , dj , Lj ), di = x, dj = y and (a, M ) ∈ Li ∩ Lj , and there is no triple (rk , dk , Lk ), i < k < y, such that (a, M ) ∈ Lk ; otherwise. (17)

The metric File_Mod_Mod considers pairs of consecutive modifications (M, M ) on all files, even if these files are subsequently deleted from the software repository. Note that the calculation of this metric can be implemented in a more efficient way than that implicitly suggested by Eq. (17). For this, it is necessary to maintain an updated list of the last actions performed on each file a ∈ A as we traverse from L1 to Lm in S.

The concepts presented in the previous sections can be applied to measure changes on files, in a higher level of granularity. For this aim, we define new concepts, in addition to those introduced in Section II-A, which are very close to the way a VCS stores the history of files in a project.



if there are triples (ri , di , Li ) and (rj , dj , Lj ) in S, with i < j, such that di = x, dj = y, (a, A) ∈ Li and (a, M ) ∈ Lj , and for which there is no triple (rk , dk , Lk ) with i < k < j such that (a, t) ∈ Lk for any operation of type t; otherwise. (16)

A metric that measures the total number of pairs of consecutive operations on files performed by the pair of developers (x, y) is given in Eq. (18). File_Oper_Oper(x, y) =

A project revision sequence is a sequence S = (r1 , d1 , L1 ), (r2 , d2 , L2 ), . . . , (rm , dm , Lm ) of project revisions that represent the history of changes made on the files of P without going into detail about the changes made on their individual lines. Every file a ∈ A is created in a project revision and may be retained unchanged, modified or removed in the following revisions. If the file is not changed in a revision, say i, then it does not appear in the list Li . If a

File_Add_Mod(x, y)+ File_Add_Del(x, y)+ File_Mod_Mod(x, y)+ File_Mod_Del(x, y)

(18)

Note again that the relationship-based metrics in file level described here induce relationship matrices as in Section III-B. Consequently, the sums of rows or columns of these matrices also result in individual metrics for a developer, as defined below, in Eq. (19) and Eq. (20). File_Add_ΣMod(d) =



File_Add_Mod(d, y)

(19)

y∈D−{d}

3 We use the first letter of ADD, M OD and DEL, respectively to differentiate file operations from line operations.

4 The re-inclusion in a VCS of a previously removed file is possible, but this is represented in our case by the creation of another file with the same name.

45



File_ΣAdd_Mod(d) =

File_Add_Mod(x, d)

developers into classes (C1 , C2 , . . . , Cn ) hierarchically dominated such that each class is a set of equivalent solutions (that is, a set of developers that are not mutually dominated). We call those sets equivalence classes. In addition, each developer in a class Ck , with k > 1 , is dominated by at least one developer in the previous class, Ck−1 . The hierarchy of equivalence classes can be built by an iterative algorithm: firstly, we find all nondominated developers, remove them from the set of developers and insert them in a new class; we then repeat this process for the remaining developers, creating every time a new class. The final result is the list of hierarchical classes. The initial class will contain the developers that have the highest score on the metrics under analysis, and the last class will have developers with the lowest scores.

(20)

x∈D−{d}

Each of the above metrics in file level can be normalized by dividing its value for a given developer d by the sum of its values for all developers. D. Metrics regarding commits Obviously, we can derive from the VCS measures that consider more coarse-grained operations in the project, as it is the case of commit operations. Since a commit can be represented as a project revision, we can easily define a relationship metric Commits: D × D → N, in Eq. (21), that measures the amount of commits made by the developer x followed immediately by a commit made by developer y. ⎧

Commits(x, y) =

|S|−1 ⎪ ⎨

 i=1

⎪ ⎩

1 0

if triples (ri , di , Li ) and (ri+1 , di+1 , Li+1 ) are such that di = x and di+1 = y; otherwise.

Evidently, this approach demands all metrics to have the same meaningful orientation (the higher their value the better the result). If a metric fi does not have such meaning, then we should leave it out of the analysis or use an inverted version of it (for example, 1/fi or U − fi , for an upper bound U ). B. Similarity Comparison

(21)

The second approach is a visual comparison of the similarity between the developers. It consists of a 2-D scatter plot that shows a point for every developer. The distance between any two points in the chart is proportional, as much as possible, to the similarity of their related developers (according to the set of metrics). A statistical tool known as Multidimensional Scaling (MDS) [22, 23] can be employed to produce this visualization.

For each developer d, we can also compute the total number of commits performed by d (ΣCommits: D → N) on a project P. This metric is defined in Eq. (22). ΣCommits(d) =

⎧ |S| ⎨ 1  i=1

IV.

⎩ 0

if triple (ri , di , Li ) is such that di = d; otherwise.

(22)

The approach is sensitive to the scale of the metrics and to the degree of correlation between them. Therefore, we use the normalized versions of the metrics as presented in Eq. (9). We also select and use only a subset of the metrics with low correlation.

COMPARISON OF THE DEVELOPERS

Given the new metrics and a particular VCS repository with historical data, we can now use them to compare the developers that worked in a software project. For this aim, we propose two approaches, described next.

In order to automatically build a subset of non-correlated (or low correlated) metrics, a heuristic method was devised: firstly, a correlation matrix of all normalized metrics is computed and two sets are created: an empty set R and a set M with the normalized metrics. We then try to increase R as much as possible by moving metrics from M to it. We start by choosing from M the metric with lowest sum of correlations to the ones already in R. If R = ∅ or if there are more than one metric that satisfy this condition, we choose the metric with smallest sum of correlation to the all other metrics (in M ∪ R). Next, we verify if the correlation of the chosen metric with every other metric already in R is lower or equal to a threshold. If this true, then the chosen metric is moved from M to R; otherwise, it is just removed from M and discarded. This process is repeated until there is no metric in M. Then, the final set R will contain suitable metrics for the similarity comparison.

A. Performance-based hierarchy The first approach is based on the idea of Pareto Frontier or Pareto Optimal for solving multi-objective optimization problems [18, 19]. Three concepts are particularly useful for this: dominance, equivalent solutions and efficient solution. They are defined as follows. Let fi , i = 1, 2, . . . , k, be objective functions to be analyzed and X , the set of all feasible solutions to a maximization problem. Consider two solutions xa and xb ∈ X , If fi (xa ) ≥ fi (xb ) for all i and if the inequality is strict for some i, then we say that xa dominates xb (xa is better than xb ). If xa does not dominate xb neither the opposite, then xa and xb are called equivalent solutions. A solution is said to be efficient if it is not dominated by any other solution in X . The set of all efficient solutions in X is the Pareto Frontier5 .

The above method can be modified to work on a previously initialized set R with a few metrics manually chosen. This is useful when we want to guarantee that some particular metrics will be considered in the comparison of the developers, but we also wish to have as many attributes of analysis (metrics) as possible. In fact, we have chosen some of the metrics described in Section III-A for composing an initial set R during our experiments. This will be explained further, in the following section.

In our context, we make an association of those concepts with the software development process by treating metrics as functions and developers as solutions. The goal is to group the 5 The problem of finding the Pareto Frontier is related to the Maximum Vector Problem [20, 21].

46

V.

T HE C ASE S TUDY

4)

In order to evaluate the metrics and the comparison approaches, we implemented them and performed a case study with qualitative assessment on a real software-development project.

For Step 1, we used a set consisting only of four metrics manually chosen: Surv_Add, Surv_Mod, Surv_Add_Div_Effo_Add and Surv_Mod_Div_Effo_Dist_Mod. These metrics were selected because they are simple but can indicate important information about the work of the developers. They also showed to be low-correlated in our analysis.

The case study analyzed data regarding the development of the software Weby, a content management system built by the Federal University of Goiás (UFG) and currently used for hosting more than 400 internal web sites6 . The Weby project was started by the IT sector of the university in July 2010 and it is now on its version 0.6.

The interviews (run in Step 3) started with an explanation about the metrics and the comparisons that were produced. Then, a set of previously prepared questions were asked. We considered that the interview had to be done with the project manager, since he knew all actors involved in the project and how they worked.

For the present evaluation, we considered only the period of time from the inception of the software to the end of February 2012 (1 year and 7 months), when its first fully-functional version was completed. During this period, eleven developers contributed to the evolution of the source code and 1,294 code revisions were made and hosted in the VCS (a Subversion system) of UFG. One developer, who was also the project manager, was a permanent staff member; two developers were external collaborators and eight were trainees.

After the interview and based on the final questions answered by the project manager, a new set of metrics was selected. Then Steps 1 to 4 were repeated in a second assessment. A. Metrics and comparisons computed in the first assessment

Table I presents general information about the operations performed by the developers in the project7 . The values in the table correspond only to source code files. The columns of the table are: the amount of commits and the total number of added, modified and deleted files and lines. D. d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 Total

Commits 474 159 2 170 30 99 61 183 20 24 72 1,294

Add. 482 47 0 314 43 333 12 848 1 8 7 2,095

Files Mod. 1,807 453 6 585 78 367 379 783 34 74 199 4,765

Del. 64 4 0 12 1 17 15 29 0 5 4 151

Add. 110,204 4,340 26 44,013 1,736 51,673 1,116 85,686 102 542 1,190 300,628

Lines Mod. 7,026 1,531 31 1,577 142 1,548 923 4,688 398 196 489 18,549

Analysis and interpretation of the results obtained from the interview.

Table II presents the four metrics computed for all developers in the first assessment of our evaluation, considering solely changes in the line level in the VCS.

Del. 54,710 1,587 165 1,224 205 3,220 1,214 5,289 15 476 308 68,413

D.

Surv_Add

Surv_Mod

d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11

102,817 3,188 0 41,929 1,185 50,630 483 83,409 55 225 1,053

539 294 0 410 21 479 163 1,302 211 43 315

Surv_Add_Div_ Effo_Add 0.932 *0.734 0.000 0.952 *0.682 0.979 *0.432 0.973 *0.539 *0.415 *0.884

Surv_Mod_Div_ Effo_Dist_Mod 0.253 *0.609 0.000 0.455 *0.437 *0.807 *0.612 0.632 *0.875 *0.605 *0.734

Table II: Metrics computed for the Weby project in the first assessment.

Table I: General information about the work of the developers in the Weby project. A preprocessing was needed before using Table II in the performance-based hierarchy comparison (only): when the normalized value of Effo_Add (or Effo_Dist_Mod) was less than 0.1 (10%), we disconsidered (assumed as zero) the value of the combined metric Surv_Add_Div_Effo_Add (or Surv_Mod_Div_Effo_Dist_Mod) for the given developer, since it could be a high number (close to 1) but not representative. For instance, if a developer created only one line and it survived, this would still result in the maximum score of 1 in metric Surv_Add_Div_Effo_Add. The 0.1 limit was not reached in the metric Effo_Add by developers d2, d5, d7, d9, d10 and d11, and in the metric Effo_Dist_Mod by developers d2, d5, d6, d7, d9, d10 and d11, in the Weby project. The positions of Table II marked with an asterisk (*) indicate the values that were considered zero by this preprocessing for the performance-based hierarchy comparison.

The evaluation was conducted through two assessments involving four steps each: 1) 2) 3)

Calculation of the values of a set of metrics for all developers using the VCS repository. Computation of the hierarchy of classes and the MDS visualization, as described in Section IV. Interview with the project manager, aiming to verify if the classes and the visualization produced by the comparison approaches match his perception about the developers.

6 The name Weby comes from the two initial letters of the word “Web” and the final two letters of the word “Ruby”, the programming language used for its development. The system was released in March 2014 as free software and is now available for download at https://github.com/cercomp/weby. 7 The identity of the developers was suppressed in this paper to ensure their privacy.

The result of the performance-based hierarchy comparison for the Weby developers, as a hierarchical list of classes, is

47

4)

“Is there any developer that you think was mistakenly classified?” Ans: “Despite the changes suggested in the previous question, the classification is not bad. It is just more defined, that is, having more classes than I would propose.”

b. Questions regarding the MDS visualization: 1)

2) 3)

Figure 2: Results of the MDS for the Weby project. The points, representing developers, were colored according to their proximity. 4) shown in Table III. The similarity comparison, using MDS, is shown by the two-dimensional visualization in Figure 2. Equivalence Classes 1 2 3 4 5 6

Developers d1, d6 and d8 d4 d2 and d11 d5, d7 and d9 d10 d3

c. Questions about the metrics that were used: 1)

Table III: The hierarchy of equivalence classes for the Weby project.

2)

B. The interview

a. Questions asked after presenting the hierarchical classes of equivalence:

2)

3)

“Do you agree that the higher the value obtained in each of these metrics the better the performance of the developer? Why?” Ans.: Yes, because as you mentioned they are oriented (in a same direction).” “Which other metrics do you find interesting/useful for an evaluation of the developers? Why?” Ans: “None. The current metrics seem to characterize well the developers.”

The manager did not shown explicit interest in the use of any other metric. However, he orally commented to the interviewer that, if metrics Line_ΣMod_Del, Line_ΣAdd_Del, File_ΣMod_Del and File_ΣAdd_Del were selected, then they should count negatively to the evaluation, since they consider the work done by the developers as a target for removal.

Next, we present the questions asked to the manager and his written answers, during the interview (translated to English). The names of the developers were not omitted to the manager.

1)

“Are the developers that appear close to one another in the visualization indeed similar in their technical production?” Ans: “Yes, it (the MDS) showed the similarity between them.” “Do the developers produce similar results?” Ans: “Those who are close (in the visualization) did.” “How would you label (give names based on some characteristics of similarity to) the “group” of developers visibly close?” Ans: The group d2, d5, d7, d9, d10 and d11 are changers and d4, d6 are creators of interface; the remainders, d1 and d8, have distinct individual characteristics.” “Is there any discrepancy or similarity between the results of the classes, presented previously, and the current MDS visualization?” Ans: “Yes, there is a visible similarity in the MDS and in the classes of dominance.”

“Does this separation of the developers in classes make sense to you?” Ans: “Yes.” “If you were to choose one or more developers for a future project, would this classification help? Why? Which developers would you choose?” Ans: “It would help, because it allows to realize the performance of the people. Developers d1, d8 and d6.” “Would you classify the developers in the same way proposed by us? Why? If not, how would your rank be?” Ans: “I would make some small modifications in order to raise developer d4 to Class 1 and the developer d10 to Class 4.

C. Analysis and interpretation of the results in the first assessment Developer d3 took part only in the beginning of the project. Therefore, his production was not significant. Some of the developers that were trainees also worked in the project for a short period of time and had little contribution. This aspect was shown by the comparison approaches, particularly in the last equivalence classes and in the bottom-left corner of the MDS visualization. According to the interviewee’s opinion, the results were satisfactory and consistent with his impressions as a software development specialist and a project manager. Furthermore, his comments regarding the orientation of the metrics led us

48

from projects hosted in VCSs. Specifically, our proposal obtains data describing operations performed by developers in different levels of granularity and the sequence in which they occur in a software development project. We also innovate by defining different measures of efforts while considering the code-survival of the operations performed by the developers. Two approaches were suggested for comparing the developers and a case study with a real software project was carried out. The results showed the usefulness of the metrics and of the comparison approaches. The new metrics may help to unveil interesting facts about the developers and their relationship. For example, a very high value in a metric for a developer may indicate an important aspect to be analyzed. Developers that get high values in many metrics were key actors of the software development process. We, however, can not say much about the developers with low scores in the metrics, since there may be many reasons for a poor performance. This leads us to the limitations of any approach that uses VCS data to infer about the work of developers: the logs in VCSs are in general incomplete and can lead to ambiguous interpretation. For instance, the work of a developer may be modified by many other developers not because it was badly done, but due to unpredictable changes of the software requirements. Such changes negatively affect the measurement of code-survival. The removal of unused pieces of code (for making the software small and more legible) also reduces the values of code-survival and may be erroneously interpreted as a negative result. In addition, if not much work is assigned to a particular developer, in comparison to the tasks given to others, then the effort of such developer in line level will be proportionally low. Those limitations and some others that we can think of imply that we have to consider carefully the use of VCS data. In general, we need to combine this data with more pieces of information, coming from the semantic analysis of software code and even from other sources. In our work, we tried to compensate this weakness by involving the project manager in the process of analyzing the values of the metrics and the results of the comparison approaches.

Figure 3: The MDS analysis in the second assessment, using an extended set of metrics.

to consider the creation of a mechanism for reversing their values when requested, but this was not implemented yet. D. Using other metrics – the second assessment After the interview, and because the project manager had commented about metrics measuring deletion, we added the metric Effo_Del to the previous set of metrics. We then used the method described in Section IV-B to compute an extended set of low-correlated metrics. The result was a set R with the four initial metrics, the metric Effo_Del and two metrics automatically selected: Files_Add_ΣDel and Files_ΣMod_Del. With this new set, we generated the MDS visualization shown in Figure 3. The new MDS separated developers d2 and d7 from developers d5, d9, d10 and d11, pushed developer d4 even further away from d6, and brought together developers d6 and d8.

Finally, our research can be enhanced and extended in many ways. Some suggestions for future investigations are:

Although the method for selecting low-correlated metrics was created to facilitate the comparison by similarity, we decided to check how the equivalence classes would change with the new set of metrics. For this goal, it was considered that all metrics had a positive orientation (the higher the better). As a consequence, the dominance relationship suffered only a small amendment: developer d4 moved from Class 2 to Class 1, while the other developers remained as described in Table III. Interestingly, this new result is much closer to the suggestion made by the project manager during the interview. The results were then presented to the project manager, who reported that they better describe the similarities and differences of the developers than the initial ones. This reinforces our intuition that adding new low-correlated metrics to the analysis helps improve the comparison. VI.



defining new metrics to capture other aspects of the work done by developers, such as measuring the amount of software documentation;



using clustering techniques to group the developers instead of the methods based on Pareto Frontier and on the MDS approach; and



exploring more sources of data about the software development process, like bug tracking systems and tools for semantic code inspection.

ACKNOWLEDGEMENT

C ONCLUSION

The authors would like to thank Leonardo R. Souza for participating in the interview in our case study, and Cercomp - UFG for all the support given to this work.

In this work, we presented new formal definitions and metrics that allow the extraction of basic but important information 49

R EFERENCES

[19]

C. M. Pilato, B. Collins-Sussman, and B. W. Fitzpatrick, Version Control with Subversion., 2nd ed. O’Reilly Media, September 2008. [2] S. Chacon, Pro Git, 1st ed. Berkely, CA, USA: Apress, 2009.

[20]

[1]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17] [18]

L. Lopez-Fernandez, G. Robles, and J. M. Gonzalez-Barahona, “Applying Social Network Analysis to the Information in CVS Repositories.” in First International Workshop on Mining Software Repositories, 2004, pp. 101–105. S.-K. Huang and K.-m. Liu, “Mining version histories to verify the learning process of legitimate peripheral participants,” SIGSOFT Software Engineering Notes, vol. 30, no. 4, pp. 1–5, May 2005. T. Girba, A. Kuhn, M. Seeberger, and S. Ducasse, “How Developers Drive Software Evolution.” in Proceedings of the Eighth International Workshop on Principles of Software Evolution, ser. IWPSE’05. Washington, DC, USA: IEEE Computer Society, 2005, pp. 113–122.

[21]

[22] [23]

L. Voinea and A. Telea, “An Open Framework for CVS repository Querying, Analysis and Visualization.” in Proceedings of the 2006 international workshop on Mining software repositories - MSR’06. New York, NY, USA: ACM Press, May 20-28 2006, pp. 33–39. L. Voinea, J. Lukkien, and A. Telea, “Visual Assessment of Software Evolution.” Science of Computer Programming, vol. 65, no. 3, pp. 222– 248, April 2007. E. Gilbert and K. Karahalios, “Codesaw: A social visualization of distributed software development,” in Proceedings of the 11th IFIP TC 13 International Conference on Human-computer Interaction - Volume Part II, ser. INTERACT’07. Berlin, Heidelberg: Springer-Verlag, 2007, pp. 303–316. A. Jermakovics, A. Sillitti, and G. Succi, “Mining and visualizing developer networks from version control systems,” in Proceedings of the 4th International Workshop on Cooperative and Human Aspects of Software Engineering, ser. CHASE ’11. New York, NY, USA: ACM, 2011, pp. 24–31. A. Mockus and J. D. Herbsleb, “Expertise browser: A quantitative approach to identifying expertise,” in Proceedings of the 24th International Conference on Software Engineering, ser. ICSE ’02. New York, NY, USA: ACM, 2002, pp. 503–512. S. Minto and G. C. Murphy, “Recommending emergent teams,” in Proceedings of the Fourth International Workshop on Mining Software Repositories, ser. MSR ’07. Washington, DC, USA: IEEE Computer Society, 2007, p. 5. D. Schuler and T. Zimmermann, “Mining usage expertise from version archives,” in Proceedings of the 2008 International Working Conference on Mining Software Repositories, ser. MSR ’08. New York, NY, USA: ACM, 2008, pp. 121–124. S. Zhang, Y. Wang, Y. Yang, and J. Xiao, “Capability assessment of individual software development processes using software repositories and dea,” in Proceedings of the Software Process, 2008 International Conference on Making Globally Distributed Software Development a Success Story, ser. ICSP’08. Berlin, Heidelberg: Springer-Verlag, 2008, pp. 147–159. S. Zhang, Y. Wang, and J. Xiao, “Mining Individual Performance Indicators in Collaborative Development Using Software Repositories.” in Software Engineering Conference, 2008. APSEC ’08. 15th AsiaPacific, December 2008, pp. 247 –254. E. Di Bella, A. Sillitti, and G. Succi, “A multivariate classification of open source developers,” Information Sciences, vol. 221, no. 0, pp. 72 – 83, Feb. 2013. S. Negara, M. Vakilian, N. Chen, R. Johnson, and D. Dig, “Is It Dangerous to Use Version Control Histories to Study Source Code Evolution?” in ECOOP 2012 – Object-Oriented Programming, ser. Lecture Notes in Computer Science, J. Noble, Ed. Springer Berlin Heidelberg, 2012, vol. 7313, pp. 79–103. D. MacKenzie, P. Eggert, and R. Stallman, Comparing and Merging Files with GNU Diff and Patch. Network Theory, 1997. R. F. Rodrigues and P. S. de Souza, “Asynchronous teams: a multialgorithm approach for solving combinatorial multi-objective optimization problems.” in Proceedings of the 5th Workshop of the DGORWorking Group Multicriteria Optimization and Decision Theory, Germany, May 1995.

50

J. Stolfi, H. A. D. do Nascimento, and C. F. X. de Mendonça, “Heuristics and pedigrees for drawing directed graphs.” Journal of the Brazilian Computer Society, vol. 6, pp. 38 – 49, July 1999. W. Stadler, “A Survey of Multicriteria Optimization or The Vector Maximum Problem, part I: 1776–1960.” Journal of Optimization Theory and Applications, vol. 29, pp. 1–52, 1979. P. Godfrey, R. Shipley, and J. Gryz, “Algorithms and Analyses for Maximal Vector Computation.” The VLDB Journal, vol. 16, pp. 5–28, 2007. I. Borg and P. Groenen, Modern Multidimensional Scaling: Theory and Applications. Springer, 2005. A. Buja, D. F. Swayne, M. L. Littman, N. Dean, H. Hofmann, and L. Chen, “Data Visualization with Multidimensional Scaling.” Journal of Computational and Graphical Statistics, vol. 17, no. 2, pp. 444–472, 2008.

Suggest Documents