Extracting new metrics from Version Control System

0 downloads 0 Views 460KB Size Report
g) “Há alguma discrepância ou semelhança entre os resultados das classes de dominância, apresentadas anteriormente, e a visualização MDS atual?”.
Extracting new metrics from Version Control System for the comparison of software developers Marcello Moura1 , Hugo Nascimento2 e Thierson Rosa2 2 ´ Centro de Recursos Computacionais1 , Instituto de Informatica ´ (UFG) Universidade Federal de Goias ˆ Caixa Postal 131 – 74.001-970 – Goiania – GO – Brazil [email protected], {hadn,thierson}@inf.ufg.br

ˆ Goiania, 21 de Setembro 2014

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

1 / 48

Summary I 1

Introduction

2

Extracting fine-grain operations from VCS

3

Metrics for the developers

4

Comparison of the developers

5

The case study

6

Conclusion

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

2 / 48

Summary

1

Introduction

2

Extracting fine-grain operations from VCS

3

Metrics for the developers

4

Comparison of the developers

5

The case study

6

Conclusion

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

3 / 48

Introduction

Version Control Systems (VCSs), like Subversion and Git, store revisions of the files of a software development project, registering its historical evolution.

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

4 / 48

Introduction

VCSs have been used for: Helping to understand the software development process – Lopez-Fernandez et al. [2004], Huang and Liu [2005], Girba et al. [2005], Voinea and Telea [2006] and Voinea et al. [2007]. Helping to know more about the developers – Gilbert and Karahalios [2007], Jermakovics et al. [2011], Mockus and Herbsleb [2002], Minto and Murphy [2007], Schuler and Zimmermann [2008], Zhang et al. [2008a,b] and Di Bella et al. [2013].

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

5 / 48

Introduction

Our work focuses on understanding the developers by the analisys of their work. 1

We identify and count finer-grain operations at line and file levels that can be extracted from a VCS, like additions, deletions and modifications. This allows to derive a much more detailed and rich information about the work performed by the developers.

2

We calculate a new set of formally defined metrics.

3

Developers are characterized by comparing each one of them against the others. Two comparison approaches for this aim are described.

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

6 / 48

Introduction

Note: The VCS data can not be taken as a full and precise description of the software development process. It is incomplete and may lead to distinct interpretations. (e.g. Negara et al. [2012]) Information extracted from a VCS has to be revalidated by the project managers and complemented with their own knowledge.

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

7 / 48

Introduction

Note: The VCS data can not be taken as a full and precise description of the software development process. It is incomplete and may lead to distinct interpretations. (e.g. Negara et al. [2012]) Information extracted from a VCS has to be revalidated by the project managers and complemented with their own knowledge.

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

7 / 48

Summary

1

Introduction

2

Extracting fine-grain operations from VCS

3

Metrics for the developers

4

Comparison of the developers

5

The case study

6

Conclusion

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

8 / 48

Extracting fine-grain operations from VCS

Basic notation:

P – a software project in a VCS D – the set of developers that worked on P . A – the set of all files created during the development of P A r ⊆ A – the set of files that were removed (not reached the final version) of P .

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

9 / 48

Extracting fine-grain operations from VCS We mine the VCS for three types of operations: additions, deletions and modifications of files and lines of code.

Project History

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

9 / 48

Extracting fine-grain operations from VCS

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

9 / 48

Summary

1

Introduction

2

Extracting fine-grain operations from VCS

3

Metrics for the developers

4

Comparison of the developers

5

The case study

6

Conclusion

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

10 / 48

Metrics for the developers

Aspects defined for consideration: 1

Effort – represents the total amount of operations of a type performed by a developer.

2

Code-survival – indicates the amount of operations of a type performed by a developer and not changed later by anyone.

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

11 / 48

Metrics for the developers A. Metrics for evaluating developers individually

|Ha | 

Effo Add(d ) =

∑∑

a∈A i =1



Effo Mod(d ) =

a,i

1 if o1 .devel = d 0 otherwise.

a |Ha | |hli |   1

a,i

if oj .devel = d a,i

and oj .type = MOD ; a∈A i =1 j =1  0 otherwise.

∑ ∑ ∑

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

12 / 48

Metrics for the developers A. Metrics for evaluating developers individually

 a,i  1 if o1 .devel = d    a,i and ∀ os with s > 1, |Ha |   Surv Add(d ) = (osa,i .type = MOD ∑ r ∑ a,i a∈(A −A ) i =1   and os .devel = d );    0

otherwise.

 a,i  1 if oend .type = MOD    a,i and oend .devel = d |Ha |   Surv Mod(d ) = ∑ r ∑  and ∃w , 1 ≤ wa,i< |hlai |, a∈(A −A ) i =1   such that ow .devel 6= d ;    0 otherwise.

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

13 / 48

Metrics for the developers A. Metrics for evaluating developers individually

Surv Add Div Effo Add (d ) =

Moura, Nascimento e Rosa

Surv Add (d ) Effo Add (d )

Extracting new metrics from VCS ...

14 / 48

Metrics for the developers B. Uncovering and measuring relationships between developers

Also, ADD DEL, MOD MOD, MOD DEL. Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

15 / 48

Metrics for the developers B. Uncovering and measuring relationships between developers

 1 if |hl i | > 1    a,i   and o1 .devel = x   |Ha |  a,i and o1 .type = ADD Line Add Mod(x , y ) = ∑ ∑ a,i  and o2 .devel = y  a∈A i =1   a,i  and o2 .type = MOD ;    0 otherwise.

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

16 / 48

Metrics for the developers B. Uncovering and measuring relationships between developers

Line Add ΣMod(d ) =



Line Add Mod(d, y)



Line Add Mod(x, d)

y ∈D −{d }

Line ΣAdd Mod(d ) =

x ∈D −{d }

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

17 / 48

Metrics for the developers C. Extending the metrics for the file level

A project revision is a triple (r , d , L), where: r is the label of the revision, d is a identifier of the developer who made the revision, with d ∈ D , and L is a list of pairs (a, t) where a is a file and t ∈ {A, M , D } describes the operation. A project revision sequence is a sequence S = h(r1 , d1 , L1 ), (r2 , d2 , L2 ), . . . , (rm , dm , Lm )i of project revisions that represent the history of changes made on the files of P without going into detail about the changes made on their individual lines.

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

18 / 48

Metrics for the developers C. Extending the metrics for the file level

File Add Mod(x , y ) =

 1 if there are triples (ri , di , Li )     and (rj , dj , Lj ) in S, with i < j ,     such that di = x , dj = y ,     (a, A) ∈ Li and (a, M ) ∈ Lj , 

∑

a∈A  

         

Moura, Nascimento e Rosa

and for which there is no triple (rk , dk , Lk ) with i < k < j such that (a, t ) ∈ Lk for any operation of type t ; 0 otherwise.

Extracting new metrics from VCS ...

19 / 48

Metrics for the developers C. Extending the metrics for the file level

File Add ΣMod(d ) =



File Add Mod(d, y)



File Add Mod(x, d)

y ∈D −{d }

File ΣAdd Mod(d ) =

x ∈D −{d }

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

20 / 48

Metrics for the developers D. Metrics regarding commits

  1 if triples (ri , di , Li ) and |S |−1   (ri +1 , di +1 , Li +1 ) are such that Commits(x , y ) = ∑ di = x and di +1 = y ;  i =1   0 otherwise.

  1 if triple (ri , di , Li ) is such that di = d ; ΣCommits(d ) = ∑  i =1 |S |

0 otherwise.

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

21 / 48

Metrics for the developers

Metric Rel(d ) =

Moura, Nascimento e Rosa

Metric(d )

∑x ∈D Metric(x )

Extracting new metrics from VCS ...

22 / 48

Summary

1

Introduction

2

Extracting fine-grain operations from VCS

3

Metrics for the developers

4

Comparison of the developers

5

The case study

6

Conclusion

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

23 / 48

Comparison of the developers A. Performance-based hierarchy

All metrics should have the same orientation Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

24 / 48

Comparison of the developers B. Similarity Comparison

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

25 / 48

Summary

1

Introduction

2

Extracting fine-grain operations from VCS

3

Metrics for the developers

4

Comparison of the developers

5

The case study

6

Conclusion

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

26 / 48

The case study Evaluating the metrics and the comparison approaches with qualitative assessment on a real software-development project. The software Weby A content management system built by UFG. Hosting more than 400 internal web sites1 . Considered time (1 year and 7 months). Eleven (11) developers contributed to the evolution of the source code. One developer was also the project manager.

1,294 code revisions into VCS (Subversion) of UFG.

1

The available at https://github.com/cercomp/weby. Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

27 / 48

The case study

D. d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 Total

Commits 474 159 2 170 30 99 61 183 20 24 72 1,294

Add. 482 47 0 314 43 333 12 848 1 8 7 2,095

Moura, Nascimento e Rosa

Files Mod. 1,807 453 6 585 78 367 379 783 34 74 199 4,765

Del. 64 4 0 12 1 17 15 29 0 5 4 151

Add. 110,204 4,340 26 44,013 1,736 51,673 1,116 85,686 102 542 1,190 300,628

Lines Mod. 7,026 1,531 31 1,577 142 1,548 923 4,688 398 196 489 18,549

Del. 54,710 1,587 165 1,224 205 3,220 1,214 5,289 15 476 308 68,413

Extracting new metrics from VCS ...

28 / 48

The case study The evaluation was conducted through two assessments involving four steps each: 1

Calculation of the values of a set of metrics for all developers.

2

Computation of the hierarchy of classes and the MDS visualization.

3

Interview with the project manager, aiming to verify if the classes and the visualization produced by the comparison approaches match his/her perception about the developers.

4

Analysis and interpretation of the results obtained from the interview.

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

29 / 48

The case study Formulário de Entrevista Nome do Entrevistado: Nome do Projeto: Cargo: Formação: Local e Data: 1 Explicar os dados existentes e as métricas. (Explicar o que o sistema desenvolvido faz) 2

Apresentar a classificação por classe de dominância. (Explicar o significado de cada classe)

3

Perguntas sobre a classe de dominância.

a) “Essa separação faz sentido para você?”

b) “Se você fosse escolher um ou mais desenvolvedores para um projeto futuro, esta classificação ajudaria? Por quê? Quais os desenvolvedores você escolheria?”

c) “Você classificaria os desenvolvedores dessa mesma forma? Por quê? Se não, como seria sua classificação?”

d) “Tem algum desenvolvedor que você acha que foi classificado equivocadamente?”

4

Apresentar a visualização em MDS. (Explicar o que significa a distância entre dois desenvolvedores)

5

Perguntas sobre a visualização em MDS.

e) “Os desenvolvedores que estão próximos são, de fato, parecidos na sua produção técnica? Eles produzem resultados semelhantes?”

f) “Como você rotularia (daria nomes com base em alguma característica de similaridade) os “grupos” de pessoas visivelmente próximas?”

g) “Há alguma discrepância ou semelhança entre os resultados das classes de dominância, apresentadas anteriormente, e a visualização MDS atual?”

6

Perguntas sobre o conjunto total de métricas.

h) “Você concorda que quanto maior for o valor obtido em cada uma dessas 4 métricas melhor foi o desempenho do desenvolvedor? Por quê?”

i) “Quais outras métricas (da planilha completa) você acha interessante/útil para uma avaliação dos desenvolvedores? Por quê?”

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

30 / 48

The case study A. Metrics and comparisons computed in the first assessment

D. d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11

Surv Add

Surv Mod

Surv Add Div Effo Add

Surv Mod Div Effo Dist Mod

102,817 3,188 0 41,929 1,185 50,630 483 83,409 55 225 1,053

539 294 0 410 21 479 163 1,302 211 43 315

0.932 *0.734 0.000 0.952 *0.682 0.979 *0.432 0.973 *0.539 *0.415 *0.884

0.253 *0.609 0.000 0.455 *0.437 *0.807 *0.612 0.632 *0.875 *0.605 *0.734

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

31 / 48

The case study

Equivalence Classes

Developers

1

d1, d6, d8

2

d4

3

d2, d11

4

d5, d7, d9

5

d10

6

d3

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

32 / 48

The case study

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

33 / 48

The case study

Equivalence Classes

Developers [first]

Developers [second]

1

d1, d6, d8

d1, d6, d4, d8

2

d4

d2, d11

3

d2, d11

d5, d7, d9

4

d5, d7, d9

d10

5

d10

d3

6

d3

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

34 / 48

The case study

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

35 / 48

Summary

1

Introduction

2

Extracting fine-grain operations from VCS

3

Metrics for the developers

4

Comparison of the developers

5

The case study

6

Conclusion

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

36 / 48

Conclusion I We presented new formal definitions and metrics that allow the extraction of basic but important information from projects hosted in VCSs. We considered measures of efforts and code-survival. Two approaches were suggested for comparing the developers. A case study with a real software project was carried out. The results showed the usefulness of the metrics and of the comparison approaches. The new metrics may help to unveil interesting facts. But there are limitations in the use of VCS data. The logs are in general incomplete and can lead to ambiguous interpretation. Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

37 / 48

Conclusion II We tried to compensate this weakness by involving the project manager.

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

38 / 48

Future Work

Future investigations include: formulating new metrics; using other techniques to compare the developers; improving the diff analysis for detecting other types of operation; exploring more sources of data.

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

39 / 48

Questions?

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

40 / 48

Extracting new metrics from Version Control System for the comparison of software developers Marcello Moura1 , Hugo Nascimento2 e Thierson Rosa2 2 ´ Centro de Recursos Computacionais1 , Instituto de Informatica ´ (UFG) Universidade Federal de Goias ˆ Caixa Postal 131 – 74.001-970 – Goiania – GO – Brazil [email protected], {hadn,thierson}@inf.ufg.br

ˆ Goiania, 21 de Setembro 2014

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

41 / 48

References I Enrico Di Bella, Alberto Sillitti, and Giancarlo Succi. A multivariate classification of open source developers. Information Sciences, 221(0):72–83, February 2013. ISSN 0020-0255. doi: http://dx.doi.org/10.1016/j.ins.2012.09.031. Eric Gilbert and Karrie Karahalios. Codesaw: A social visualization of distributed software development. In Proceedings of the 11th IFIP TC 13 International Conference on Human-computer Interaction - Volume Part II, INTERACT’07, pages 303–316, Berlin, Heidelberg, 2007. Springer-Verlag. ISBN 3-540-74799-0, 978-3-540-74799-4.

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

42 / 48

References II ´ Tudor Girba, Adrian Kuhn, Mauricio Seeberger, and Stephane Ducasse. How Developers Drive Software Evolution. In Proceedings of the Eighth International Workshop on Principles of Software Evolution, IWPSE’05, pages 113–122, Washington, DC, USA, 2005. IEEE Computer Society. ISBN 0-7695-2349-8. doi: 10.1109/IWPSE.2005.21. Shih-Kun Huang and Kang-min Liu. Mining version histories to verify the learning process of legitimate peripheral participants. SIGSOFT Software Engineering Notes, 30(4): 1–5, May 2005. ISSN 0163-5948. doi: 10.1145/1082983.1083158.

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

43 / 48

References III Andrejs Jermakovics, Alberto Sillitti, and Giancarlo Succi. Mining and visualizing developer networks from version control systems. In Proceedings of the 4th International Workshop on Cooperative and Human Aspects of Software Engineering, CHASE ’11, pages 24–31, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0576-1. doi: 10.1145/1984642.1984647. Luis Lopez-Fernandez, Gregorio Robles, and Jesus M. Gonzalez-Barahona. Applying Social Network Analysis to the Information in CVS Repositories. In First International Workshop on Mining Software Repositories, pages 101–105, 2004.

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

44 / 48

References IV Shawn Minto and Gail C. Murphy. Recommending emergent teams. In Proceedings of the Fourth International Workshop on Mining Software Repositories, MSR ’07, page 5, Washington, DC, USA, 2007. IEEE Computer Society. ISBN 0-7695-2950-X. doi: 10.1109/MSR.2007.27. Audris Mockus and James D. Herbsleb. Expertise browser: A quantitative approach to identifying expertise. In Proceedings of the 24th International Conference on Software Engineering, ICSE ’02, pages 503–512, New York, NY, USA, 2002. ACM. ISBN 1-58113-472-X. doi: 10.1145/581339.581401.

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

45 / 48

References V Stas Negara, Mohsen Vakilian, Nicholas Chen, RalphE. Johnson, and Danny Dig. Is It Dangerous to Use Version Control Histories to Study Source Code Evolution? In James Noble, editor, ECOOP 2012 - Object-Oriented Programming, volume 7313 of Lecture Notes in Computer Science, pages 79–103. Springer Berlin Heidelberg, 2012. ISBN 978-3-642-31056-0. doi: 10.1007/978-3-642-31057-7 5. David Schuler and Thomas Zimmermann. Mining usage expertise from version archives. In Proceedings of the 2008 International Working Conference on Mining Software Repositories, MSR ’08, pages 121–124, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-024-1. doi: 10.1145/1370750.1370779.

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

46 / 48

References VI L Voinea, J Lukkien, and A Telea. Visual Assessment of Software Evolution. Science of Computer Programming, 65 (3):222–248, April 2007. ISSN 01676423. Lucian Voinea and Alexandru Telea. An Open Framework for CVS repository Querying, Analysis and Visualization. In Proceedings of the 2006 international workshop on Mining software repositories - MSR’06, pages 33–39, New York, NY, USA, May 20-28 2006. ACM Press. ISBN 1595933972. doi: 10.1145/1137983.1137993. Shen Zhang, Yongji Wang, and Junchao Xiao. Mining Individual Performance Indicators in Collaborative Development Using Software Repositories. In Software Engineering Conference, 2008. APSEC ’08. 15th Asia-Pacific, pages 247 –254, December 2008a. doi: 10.1109/APSEC.2008.12. Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

47 / 48

References VII Shen Zhang, Yongji Wang, Ye Yang, and Junchao Xiao. Capability assessment of individual software development processes using software repositories and dea. In Proceedings of the Software Process, 2008 International Conference on Making Globally Distributed Software Development a Success Story, ICSP’08, pages 147–159, Berlin, Heidelberg, 2008b. Springer-Verlag. ISBN 3-540-79587-1, 978-3-540-79587-2.

Moura, Nascimento e Rosa

Extracting new metrics from VCS ...

48 / 48

Suggest Documents