Extracting new metrics from Version Control System for the comparison of software developers Marcello Moura1 , Hugo Nascimento2 e Thierson Rosa2 2 ´ Centro de Recursos Computacionais1 , Instituto de Informatica ´ (UFG) Universidade Federal de Goias ˆ Caixa Postal 131 – 74.001-970 – Goiania – GO – Brazil
[email protected], {hadn,thierson}@inf.ufg.br
ˆ Goiania, 21 de Setembro 2014
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
1 / 48
Summary I 1
Introduction
2
Extracting fine-grain operations from VCS
3
Metrics for the developers
4
Comparison of the developers
5
The case study
6
Conclusion
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
2 / 48
Summary
1
Introduction
2
Extracting fine-grain operations from VCS
3
Metrics for the developers
4
Comparison of the developers
5
The case study
6
Conclusion
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
3 / 48
Introduction
Version Control Systems (VCSs), like Subversion and Git, store revisions of the files of a software development project, registering its historical evolution.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
4 / 48
Introduction
VCSs have been used for: Helping to understand the software development process – Lopez-Fernandez et al. [2004], Huang and Liu [2005], Girba et al. [2005], Voinea and Telea [2006] and Voinea et al. [2007]. Helping to know more about the developers – Gilbert and Karahalios [2007], Jermakovics et al. [2011], Mockus and Herbsleb [2002], Minto and Murphy [2007], Schuler and Zimmermann [2008], Zhang et al. [2008a,b] and Di Bella et al. [2013].
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
5 / 48
Introduction
Our work focuses on understanding the developers by the analisys of their work. 1
We identify and count finer-grain operations at line and file levels that can be extracted from a VCS, like additions, deletions and modifications. This allows to derive a much more detailed and rich information about the work performed by the developers.
2
We calculate a new set of formally defined metrics.
3
Developers are characterized by comparing each one of them against the others. Two comparison approaches for this aim are described.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
6 / 48
Introduction
Note: The VCS data can not be taken as a full and precise description of the software development process. It is incomplete and may lead to distinct interpretations. (e.g. Negara et al. [2012]) Information extracted from a VCS has to be revalidated by the project managers and complemented with their own knowledge.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
7 / 48
Introduction
Note: The VCS data can not be taken as a full and precise description of the software development process. It is incomplete and may lead to distinct interpretations. (e.g. Negara et al. [2012]) Information extracted from a VCS has to be revalidated by the project managers and complemented with their own knowledge.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
7 / 48
Summary
1
Introduction
2
Extracting fine-grain operations from VCS
3
Metrics for the developers
4
Comparison of the developers
5
The case study
6
Conclusion
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
8 / 48
Extracting fine-grain operations from VCS
Basic notation:
P – a software project in a VCS D – the set of developers that worked on P . A – the set of all files created during the development of P A r ⊆ A – the set of files that were removed (not reached the final version) of P .
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
9 / 48
Extracting fine-grain operations from VCS We mine the VCS for three types of operations: additions, deletions and modifications of files and lines of code.
Project History
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
9 / 48
Extracting fine-grain operations from VCS
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
9 / 48
Summary
1
Introduction
2
Extracting fine-grain operations from VCS
3
Metrics for the developers
4
Comparison of the developers
5
The case study
6
Conclusion
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
10 / 48
Metrics for the developers
Aspects defined for consideration: 1
Effort – represents the total amount of operations of a type performed by a developer.
2
Code-survival – indicates the amount of operations of a type performed by a developer and not changed later by anyone.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
11 / 48
Metrics for the developers A. Metrics for evaluating developers individually
|Ha |
Effo Add(d ) =
∑∑
a∈A i =1
Effo Mod(d ) =
a,i
1 if o1 .devel = d 0 otherwise.
a |Ha | |hli | 1
a,i
if oj .devel = d a,i
and oj .type = MOD ; a∈A i =1 j =1 0 otherwise.
∑ ∑ ∑
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
12 / 48
Metrics for the developers A. Metrics for evaluating developers individually
a,i 1 if o1 .devel = d a,i and ∀ os with s > 1, |Ha | Surv Add(d ) = (osa,i .type = MOD ∑ r ∑ a,i a∈(A −A ) i =1 and os .devel = d ); 0
otherwise.
a,i 1 if oend .type = MOD a,i and oend .devel = d |Ha | Surv Mod(d ) = ∑ r ∑ and ∃w , 1 ≤ wa,i< |hlai |, a∈(A −A ) i =1 such that ow .devel 6= d ; 0 otherwise.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
13 / 48
Metrics for the developers A. Metrics for evaluating developers individually
Surv Add Div Effo Add (d ) =
Moura, Nascimento e Rosa
Surv Add (d ) Effo Add (d )
Extracting new metrics from VCS ...
14 / 48
Metrics for the developers B. Uncovering and measuring relationships between developers
Also, ADD DEL, MOD MOD, MOD DEL. Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
15 / 48
Metrics for the developers B. Uncovering and measuring relationships between developers
1 if |hl i | > 1 a,i and o1 .devel = x |Ha | a,i and o1 .type = ADD Line Add Mod(x , y ) = ∑ ∑ a,i and o2 .devel = y a∈A i =1 a,i and o2 .type = MOD ; 0 otherwise.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
16 / 48
Metrics for the developers B. Uncovering and measuring relationships between developers
Line Add ΣMod(d ) =
∑
Line Add Mod(d, y)
∑
Line Add Mod(x, d)
y ∈D −{d }
Line ΣAdd Mod(d ) =
x ∈D −{d }
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
17 / 48
Metrics for the developers C. Extending the metrics for the file level
A project revision is a triple (r , d , L), where: r is the label of the revision, d is a identifier of the developer who made the revision, with d ∈ D , and L is a list of pairs (a, t) where a is a file and t ∈ {A, M , D } describes the operation. A project revision sequence is a sequence S = h(r1 , d1 , L1 ), (r2 , d2 , L2 ), . . . , (rm , dm , Lm )i of project revisions that represent the history of changes made on the files of P without going into detail about the changes made on their individual lines.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
18 / 48
Metrics for the developers C. Extending the metrics for the file level
File Add Mod(x , y ) =
1 if there are triples (ri , di , Li ) and (rj , dj , Lj ) in S, with i < j , such that di = x , dj = y , (a, A) ∈ Li and (a, M ) ∈ Lj ,
∑
a∈A
Moura, Nascimento e Rosa
and for which there is no triple (rk , dk , Lk ) with i < k < j such that (a, t ) ∈ Lk for any operation of type t ; 0 otherwise.
Extracting new metrics from VCS ...
19 / 48
Metrics for the developers C. Extending the metrics for the file level
File Add ΣMod(d ) =
∑
File Add Mod(d, y)
∑
File Add Mod(x, d)
y ∈D −{d }
File ΣAdd Mod(d ) =
x ∈D −{d }
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
20 / 48
Metrics for the developers D. Metrics regarding commits
1 if triples (ri , di , Li ) and |S |−1 (ri +1 , di +1 , Li +1 ) are such that Commits(x , y ) = ∑ di = x and di +1 = y ; i =1 0 otherwise.
1 if triple (ri , di , Li ) is such that di = d ; ΣCommits(d ) = ∑ i =1 |S |
0 otherwise.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
21 / 48
Metrics for the developers
Metric Rel(d ) =
Moura, Nascimento e Rosa
Metric(d )
∑x ∈D Metric(x )
Extracting new metrics from VCS ...
22 / 48
Summary
1
Introduction
2
Extracting fine-grain operations from VCS
3
Metrics for the developers
4
Comparison of the developers
5
The case study
6
Conclusion
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
23 / 48
Comparison of the developers A. Performance-based hierarchy
All metrics should have the same orientation Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
24 / 48
Comparison of the developers B. Similarity Comparison
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
25 / 48
Summary
1
Introduction
2
Extracting fine-grain operations from VCS
3
Metrics for the developers
4
Comparison of the developers
5
The case study
6
Conclusion
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
26 / 48
The case study Evaluating the metrics and the comparison approaches with qualitative assessment on a real software-development project. The software Weby A content management system built by UFG. Hosting more than 400 internal web sites1 . Considered time (1 year and 7 months). Eleven (11) developers contributed to the evolution of the source code. One developer was also the project manager.
1,294 code revisions into VCS (Subversion) of UFG.
1
The available at https://github.com/cercomp/weby. Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
27 / 48
The case study
D. d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 Total
Commits 474 159 2 170 30 99 61 183 20 24 72 1,294
Add. 482 47 0 314 43 333 12 848 1 8 7 2,095
Moura, Nascimento e Rosa
Files Mod. 1,807 453 6 585 78 367 379 783 34 74 199 4,765
Del. 64 4 0 12 1 17 15 29 0 5 4 151
Add. 110,204 4,340 26 44,013 1,736 51,673 1,116 85,686 102 542 1,190 300,628
Lines Mod. 7,026 1,531 31 1,577 142 1,548 923 4,688 398 196 489 18,549
Del. 54,710 1,587 165 1,224 205 3,220 1,214 5,289 15 476 308 68,413
Extracting new metrics from VCS ...
28 / 48
The case study The evaluation was conducted through two assessments involving four steps each: 1
Calculation of the values of a set of metrics for all developers.
2
Computation of the hierarchy of classes and the MDS visualization.
3
Interview with the project manager, aiming to verify if the classes and the visualization produced by the comparison approaches match his/her perception about the developers.
4
Analysis and interpretation of the results obtained from the interview.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
29 / 48
The case study Formulário de Entrevista Nome do Entrevistado: Nome do Projeto: Cargo: Formação: Local e Data: 1 Explicar os dados existentes e as métricas. (Explicar o que o sistema desenvolvido faz) 2
Apresentar a classificação por classe de dominância. (Explicar o significado de cada classe)
3
Perguntas sobre a classe de dominância.
a) “Essa separação faz sentido para você?”
b) “Se você fosse escolher um ou mais desenvolvedores para um projeto futuro, esta classificação ajudaria? Por quê? Quais os desenvolvedores você escolheria?”
c) “Você classificaria os desenvolvedores dessa mesma forma? Por quê? Se não, como seria sua classificação?”
d) “Tem algum desenvolvedor que você acha que foi classificado equivocadamente?”
4
Apresentar a visualização em MDS. (Explicar o que significa a distância entre dois desenvolvedores)
5
Perguntas sobre a visualização em MDS.
e) “Os desenvolvedores que estão próximos são, de fato, parecidos na sua produção técnica? Eles produzem resultados semelhantes?”
f) “Como você rotularia (daria nomes com base em alguma característica de similaridade) os “grupos” de pessoas visivelmente próximas?”
g) “Há alguma discrepância ou semelhança entre os resultados das classes de dominância, apresentadas anteriormente, e a visualização MDS atual?”
6
Perguntas sobre o conjunto total de métricas.
h) “Você concorda que quanto maior for o valor obtido em cada uma dessas 4 métricas melhor foi o desempenho do desenvolvedor? Por quê?”
i) “Quais outras métricas (da planilha completa) você acha interessante/útil para uma avaliação dos desenvolvedores? Por quê?”
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
30 / 48
The case study A. Metrics and comparisons computed in the first assessment
D. d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11
Surv Add
Surv Mod
Surv Add Div Effo Add
Surv Mod Div Effo Dist Mod
102,817 3,188 0 41,929 1,185 50,630 483 83,409 55 225 1,053
539 294 0 410 21 479 163 1,302 211 43 315
0.932 *0.734 0.000 0.952 *0.682 0.979 *0.432 0.973 *0.539 *0.415 *0.884
0.253 *0.609 0.000 0.455 *0.437 *0.807 *0.612 0.632 *0.875 *0.605 *0.734
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
31 / 48
The case study
Equivalence Classes
Developers
1
d1, d6, d8
2
d4
3
d2, d11
4
d5, d7, d9
5
d10
6
d3
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
32 / 48
The case study
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
33 / 48
The case study
Equivalence Classes
Developers [first]
Developers [second]
1
d1, d6, d8
d1, d6, d4, d8
2
d4
d2, d11
3
d2, d11
d5, d7, d9
4
d5, d7, d9
d10
5
d10
d3
6
d3
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
34 / 48
The case study
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
35 / 48
Summary
1
Introduction
2
Extracting fine-grain operations from VCS
3
Metrics for the developers
4
Comparison of the developers
5
The case study
6
Conclusion
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
36 / 48
Conclusion I We presented new formal definitions and metrics that allow the extraction of basic but important information from projects hosted in VCSs. We considered measures of efforts and code-survival. Two approaches were suggested for comparing the developers. A case study with a real software project was carried out. The results showed the usefulness of the metrics and of the comparison approaches. The new metrics may help to unveil interesting facts. But there are limitations in the use of VCS data. The logs are in general incomplete and can lead to ambiguous interpretation. Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
37 / 48
Conclusion II We tried to compensate this weakness by involving the project manager.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
38 / 48
Future Work
Future investigations include: formulating new metrics; using other techniques to compare the developers; improving the diff analysis for detecting other types of operation; exploring more sources of data.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
39 / 48
Questions?
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
40 / 48
Extracting new metrics from Version Control System for the comparison of software developers Marcello Moura1 , Hugo Nascimento2 e Thierson Rosa2 2 ´ Centro de Recursos Computacionais1 , Instituto de Informatica ´ (UFG) Universidade Federal de Goias ˆ Caixa Postal 131 – 74.001-970 – Goiania – GO – Brazil
[email protected], {hadn,thierson}@inf.ufg.br
ˆ Goiania, 21 de Setembro 2014
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
41 / 48
References I Enrico Di Bella, Alberto Sillitti, and Giancarlo Succi. A multivariate classification of open source developers. Information Sciences, 221(0):72–83, February 2013. ISSN 0020-0255. doi: http://dx.doi.org/10.1016/j.ins.2012.09.031. Eric Gilbert and Karrie Karahalios. Codesaw: A social visualization of distributed software development. In Proceedings of the 11th IFIP TC 13 International Conference on Human-computer Interaction - Volume Part II, INTERACT’07, pages 303–316, Berlin, Heidelberg, 2007. Springer-Verlag. ISBN 3-540-74799-0, 978-3-540-74799-4.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
42 / 48
References II ´ Tudor Girba, Adrian Kuhn, Mauricio Seeberger, and Stephane Ducasse. How Developers Drive Software Evolution. In Proceedings of the Eighth International Workshop on Principles of Software Evolution, IWPSE’05, pages 113–122, Washington, DC, USA, 2005. IEEE Computer Society. ISBN 0-7695-2349-8. doi: 10.1109/IWPSE.2005.21. Shih-Kun Huang and Kang-min Liu. Mining version histories to verify the learning process of legitimate peripheral participants. SIGSOFT Software Engineering Notes, 30(4): 1–5, May 2005. ISSN 0163-5948. doi: 10.1145/1082983.1083158.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
43 / 48
References III Andrejs Jermakovics, Alberto Sillitti, and Giancarlo Succi. Mining and visualizing developer networks from version control systems. In Proceedings of the 4th International Workshop on Cooperative and Human Aspects of Software Engineering, CHASE ’11, pages 24–31, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0576-1. doi: 10.1145/1984642.1984647. Luis Lopez-Fernandez, Gregorio Robles, and Jesus M. Gonzalez-Barahona. Applying Social Network Analysis to the Information in CVS Repositories. In First International Workshop on Mining Software Repositories, pages 101–105, 2004.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
44 / 48
References IV Shawn Minto and Gail C. Murphy. Recommending emergent teams. In Proceedings of the Fourth International Workshop on Mining Software Repositories, MSR ’07, page 5, Washington, DC, USA, 2007. IEEE Computer Society. ISBN 0-7695-2950-X. doi: 10.1109/MSR.2007.27. Audris Mockus and James D. Herbsleb. Expertise browser: A quantitative approach to identifying expertise. In Proceedings of the 24th International Conference on Software Engineering, ICSE ’02, pages 503–512, New York, NY, USA, 2002. ACM. ISBN 1-58113-472-X. doi: 10.1145/581339.581401.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
45 / 48
References V Stas Negara, Mohsen Vakilian, Nicholas Chen, RalphE. Johnson, and Danny Dig. Is It Dangerous to Use Version Control Histories to Study Source Code Evolution? In James Noble, editor, ECOOP 2012 - Object-Oriented Programming, volume 7313 of Lecture Notes in Computer Science, pages 79–103. Springer Berlin Heidelberg, 2012. ISBN 978-3-642-31056-0. doi: 10.1007/978-3-642-31057-7 5. David Schuler and Thomas Zimmermann. Mining usage expertise from version archives. In Proceedings of the 2008 International Working Conference on Mining Software Repositories, MSR ’08, pages 121–124, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-024-1. doi: 10.1145/1370750.1370779.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
46 / 48
References VI L Voinea, J Lukkien, and A Telea. Visual Assessment of Software Evolution. Science of Computer Programming, 65 (3):222–248, April 2007. ISSN 01676423. Lucian Voinea and Alexandru Telea. An Open Framework for CVS repository Querying, Analysis and Visualization. In Proceedings of the 2006 international workshop on Mining software repositories - MSR’06, pages 33–39, New York, NY, USA, May 20-28 2006. ACM Press. ISBN 1595933972. doi: 10.1145/1137983.1137993. Shen Zhang, Yongji Wang, and Junchao Xiao. Mining Individual Performance Indicators in Collaborative Development Using Software Repositories. In Software Engineering Conference, 2008. APSEC ’08. 15th Asia-Pacific, pages 247 –254, December 2008a. doi: 10.1109/APSEC.2008.12. Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
47 / 48
References VII Shen Zhang, Yongji Wang, Ye Yang, and Junchao Xiao. Capability assessment of individual software development processes using software repositories and dea. In Proceedings of the Software Process, 2008 International Conference on Making Globally Distributed Software Development a Success Story, ICSP’08, pages 147–159, Berlin, Heidelberg, 2008b. Springer-Verlag. ISBN 3-540-79587-1, 978-3-540-79587-2.
Moura, Nascimento e Rosa
Extracting new metrics from VCS ...
48 / 48