Empirical Validation of Pair Programming Motivation - Semantic Scholar

1 downloads 0 Views 534KB Size Report
situations and save quality of process and product: Boehm's spiral process, Radip Application Development, Rational Unified Process. (...). ▫There was an urging ...
Empirical Validation of Pair Programming

Corrado Aaron Visaggio° [email protected], °Research Centre on Software Technology - RCOST °University of Sannio Benevento, Italy PhD Symposium ICSE 2005 ICSE 2005

1

Corrado Aaron Visaggio

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

Motivation ƒPlan driven approaches for developing software can fail in contexts where: ƒ the availability of resource may vary in an unpredictable way ƒ the time pressure is much stronger than expected ƒ the requirements of the system to develop are emerging or unstable.

ƒSome alternatives have been explored in order to face such situations and save quality of process and product: Boehm’s spiral process, Radip Application Development, Rational Unified Process (...). ƒThere was an urging need to achieve an higher flexibility than the ones these processes offered. ƒIn the last decades the Agile Methods for software developments burst into the scene, proposing a radically different way to manage software process. ICSE 2005

Corrado Aaron Visaggio

2

The Problem (1/2)

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

In the 2001 the Agile Manifesto was published, defining the novel “agile way” of the software production with for principles: ƒ Individual and interactions over process and tools ƒ Working software over comprehensive documentation ƒ Customer collaboration over contract negotiation ƒ Responding to change over following a plan The doubt: may agile method deteriorate the engineering rigor and discipline achieved with the plan -driven approach? ƒIndividual and interactions over process and tools: does the process remain repeatable? ƒWorking software over comprehensive documentation: does the process remain repeatable? Is the process measurable? ƒCustomer collaboration over contract negotiation: what happens to the product’s quality when the architecture emerges from the process? ƒResponding to change over following a plan: is it possible to realise dependable estimates on the process? ICSE 2005 Corrado Aaron Visaggio

The problem (2/2) and the research goals

3

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

There is not a large consensus about one relevant issue: is it worth to adopt agile methods when developing software or is it too risky,provided that it contrasts with some good practices of software engineering? It was not feasible to deal with the entire set of agile practices in the space of a thesis: Pair Programming (2P) was selected for focusing my investigation. Pair Programming was analysed according to three dimensions: Suitable contexts

Pair Programming Specific Benefits ICSE 2005

Costs/Benefits

Suitable contexts: Is 2P Suitable for distributed Process? Costs/Benefits: Is 2P advantageous in terms of Return on Investement? Specific Benefit: Is 2P helpful for knowledge leveraging?

Corrado Aaron Visaggio

4

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

The research Plan and Method

the Purpose of the Research: validate, by empirical investigation, pair

programming, according to three dimensions: sutibale contexts, ratio costs/benefits, and specific benefit. Establish Research Questions: success factors

Controlled experiments with students: defects removal yes thesis

Bugs? no

Controlled experiments with professionals: confidence of industry

Technological transfer

Field Experiments: dependable results ICSE 2005

Post-doc

5

Corrado Aaron Visaggio

The First Dimension

Suitable contexts

Pair Programming Specific Benefits

Ratio costs/benefits

Specific Benefits of the Practice

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

Costs/Benefits

Suitable Contexts of the Practice

Warning: this research activity is still ongoing!

ICSE 2005

Does Pair Programming cost more than Solo Programming? Is Pair Programming more beneficial than Solo Programming in terms of quality achieved? Corrado Aaron Visaggio

6

Productivity and quality

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

Research Objects: ƒ Productivity: pair programming is supposed to fasten production cycles. ƒ Quality: pair programming is supposed to increase the quality of code’s modules and overall architecture of the system.

Research Question: ƒ Can pair programming improve the performances of project’s teams, in terms of productivity and quality?

Experiments: ƒ An experiment at University of Sannio, Benevento, Italy

Hypotheses: ƒ Hoa: the pair programming does not affect the speed of programming. ƒ Hob: the pair programming does not affect the quality of code and architecture. ICSE 2005

7

Corrado Aaron Visaggio

The Experiment Outlook

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

An initial experiment on Productivity of Pair Programming suggests that pair programming can fasten production cycles. 60 Subjects (graduate students of Computer Engineering) are grouped in teams of two kinds: paired programmers and solo programmers teams. Each team was responsible for the development of a system for the software requirements traceability. The teams follow an incremental process: at each iteration they receive the new group of features to implement and each iteration corresponds to a point of observation.This experimentation is yet ongoing. Points of data collection

Demos of the teams’ products

Kick off 1st 1st group of features ICSE 2005

2nd 2nd group of features

3rd 3rd group of features

4th 4th group of features

Corrado Aaron Visaggio

iteration

8

The Second Dimension

Suitable contexts

Pair Programming Specific Benefits

Ratio costs/benefits

Specific Benefits of the Practice

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

Costs/Benefits

Suitable Contexts of the Practice

Is Pair Programming an effective means for diffusing and enforcing design knowledge in a project’s team? ICSE 2005

9

Corrado Aaron Visaggio

Knowledge Transfer

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

One of the expected benefits of pair programming is fostering the knowledge transfer. Software design requires an efficient management of knowledge at team level and documentation is not enough because: ƒ strategies for problem solving are scarcely captured; ƒ it is necessary to deal with different levels of abstraction: implementation, database, business logic, presentation, deployment, interaction with other systems, and communication protocols; ƒ documentation has a very low bandwidth: face to face communication can be most effective and time-saving. Could pair designing be an appropriate alternative for diffusing and enforcing software system knowledge among project team’s members? ICSE 2005

Corrado Aaron Visaggio

10

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

The Experimentation Research Objects:

ƒ Diffusing Knowledge: disseminating knowledge within project team initial phases of the project. ƒ Enforcing knowledge: improving the individual knowledge of project’s participants -advanced phases of the project.

Research Question: ƒ Is pair designing effective for diffusing and improving knowledge within project’s teams?

Experiments: ƒ An explorative experiment (demonstrating that pair design can foster knowledge leveraging) ƒ One Experiment at University of Sannio, Benevento, Italy. ƒ A replica at University of Castilla-La-Mancha, Ciudad Real, Spain.

Hypotheses:

ICSE 2005

ƒ Hoa: the pair designing does not affect the diffusion of design knowledge when performing evolution tasks. ƒ Hob: the pair designing does not affect the improvement of design Corrado Aaron Visaggio knowledge when performing evolution tasks.

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

The Experiment Subjects

Treatment

5 MUTEGS 5 MUTEGS

Paired MUTEGS MUTEGS

5 MUTS 5 MUTS 8 MUTS 8 MUTEGS

Subjects 64 students 3BScMngmt 3BscSys 5MSc 32 students 3BScMngmt 3BscSys 5MSc ICSE 2005

Paired MUTS MUTS

11

Input

Output

Requirement Specification; Use case Diagram; Class Diagram; Entry questionnaire QA (or QB); Exit questionnaire QB(or QA).

Modifications to Use Case Diagram and Class Diagram; Answered entry questionnaire QA (or QB); Answered exit questionnaire QB(or QA).

Experimental Design Experiment # 1 (Italy)

Solo Solo

Treatment Paired 3BScMngmt-3BScMngmt 3BscSys-3BscSys 5MSc-5MSc

Input Requirement Specification; Use case Diagram; Class Diagram; Entry questionnaire QA (or QB); Exit questionnaire QB(or QA).

Output Modifications to Use Case Diagram and Class Diagram; Answered entry questionnaire QA (or QB); Answered exit questionnaire QB(or QA).

Solo Corrado Aaron Visaggio

Experimental Design 12 Experiment # 2 (Spain)

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

The Experiment’s Process start start

1. 1. each subject each subject studied studied documentation documentation for for 30 30 minutes, minutes, individually individually

2. 2. an entry questionnaire, an entry questionnaire, individually, individually, for for about about 15 15 minutes; minutes;

3. 3. the the pairs pairs and and the the solo solo designers designers performed performed the the maintenance maintenance tasks tasks for for 22 hours; hours;

4. 4. each each subject subject answered answered an an exit exit questionnaire individually questionnaire individually

end end ICSE 2005

13

Corrado Aaron Visaggio

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

The Randomisation tests Test Between Entry Questionnaires of Subjects of MUTS Pairs sample (α) Subjects of MUTS Solos sample (β) Entry Questionnaires of Subjects of MUTEGS Pairs sample(α) Subjects of MUTEGS Solos sample(β) Entry Questionnaires of Solos of the 3BScSys sample(α) Pairs of the 3BScSys sample (β) Entry Questionnaires of Solos of the 5MSc sample(α) Pairs of the 5MSc sample (β) Entry Questionnaires of Solos of the 3BScMngmnt sample(α) Pairs of the 3BScMngmnt sample (β)

Rank Sum α

Rank Sum β

p-level

171,000

39,000

0,214768

Experiment

Italian Experiment 112,000

59,000

0,130919

425,000

395,000

0,214741

31,500

46,5000

0,229767

425,000

395,000

0,321966

Spanish Experiment

The experiments samples of pairs and those of solos were formed by equivalent subjects. ICSE 2005

Corrado Aaron Visaggio

14

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

The Satistical Tests: Knowledge Diffusion Test Between MUTS Pairs (α) MUTS Solos (β) MUTEGS Pairs (α) MUTEGS Solos (β) MUTS Pairs (α) MUTEGS Pairs (β) Pairs 5MSc(α) Solos 5MSc(β) Pairs 3BScSys(α) Solos 3BScSys(β) Pairs 3BScMngmnt(α) Solos 3BScMngmnt(β)

Rank Sum α 116,500

Rank Sum β 54,50

p-level

78,50

57,50

0,270

135,00

75,00

0,023

51,500

26,500

0,030912

253,000

567,000

0,00017

447,000

778,00

0,00000

experiment

0,049 Italian experiment

Spanish experiment

Results and Interpretation: ƒ Empirical Evidence: pairs outperformed solos: pair design is a candidate means for diffusing knowledge. ƒ Side Effect: pair design success in diffusing knowledge may depend on the individual skills. ICSE 2005

15

Corrado Aaron Visaggio

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

The Statistical Tests: Knowledge Improving Test Between MUTS Pairs (α) MUTS Solos(β) MUTEGS Pairs (α) MUTEGS Solos (β) MUTS Pairs (α) MUTEGS Pairs (β) Spanish Pairs 3BScSys (α) Spanish Solos 3BScSys (β) Spanish Pairs 5MSc (α) Spanish Solos 5MSc (β) Spanish Pairs 3BScMngmnt (α) Spanish Solos 3BScMngmnt (β)

Rank Sum α

Rank Sum β

p-level

123,500

47,500

0,0102

53,500

66,500

0,2164

110,500

42,500

0,0428

49,500

28,500

0,086984

551,000

269,000

0.000942

51,500

26,500

0,042337

Experiment

Italian experiment

Spanish experiment

Results and Interpretation: ƒ Empirical Evidence: confirmation of knowledge diffusion results: ƒ pair design is a candidate means for improving knowledge ƒ pair design success in improving knowledge may depend on the individual skills. ICSE 2005

Corrado Aaron Visaggio

16

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

Qualitative Analysis Pairs MUTS Pairs MUTEGS Pairs MUTS Solos MUTEGS Solos Pairs 3BScSys Solos 3BScSys Pairs 5MSc Solos 5MSc Pairs 3BScMngmnt Solos 3BScMngmnt

Statistical Parameter average max min std dev Statistical Parameter average max min std dev ICSE 2005

Std Dev. 1,75 1,60 1,03 1,55 1,02 1,26 0,98 0,82 0,73 0,94

Average 5,8 3,9 4,25 5,13 6,00 4,44 6,17 5,33 6,30 4,21

Max 9 7 6,00 7,00 7,00 6,00 7,00 7,00 8,00 5,00

Min 4 1 3,00 3,00 3,00 3,00 5,00 5,00 5,00 1,00

Experiment Italian Experiment

Spanish Experiment

MUTS Pairs

MUTS Solos

MUTEGS Pairs

MUTEGS Solos

2,000 5,000 -1,000 1,915

-1,400 2,000 -3,000 2,074

-0,800 1,000 -3,000 1,643

-0,750 1,000 -2,000 1,500

5MSc Pairs

5MSc Solos

1,167 3,000 -1,000 1,722

-0,500 3,000 -2,000 1,871

3BScSys 3BScSys Pairs Solos 1,714 -0,579 4,000 3,000 -1,000 -4,000 1,736 1,865 Corrado Aaron Visaggio

3BScMngmnt 3BScMngmnt Pairs Solos 1,111 -1,036 3,000 2,000 -1,000 -5,000 1,278 1,85617

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

The Questionnaires Two Questionnaires were used to evaluate knowledge built Test Between Questionnaire A (α) Questionnaire B (β) in the experiment Questionnaire A (α) Questionnaire B (β) in the replica

Rank Sum α

Rank Sum β

p-level

540,00

406,00

0,161

598,00

677,00

0,2068

The Experiment results were independent by the specific questionnaire used

ICSE 2005

Corrado Aaron Visaggio

18

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

Conclusions

Pair designing is helpful for: ƒ diffusing knowledge, when the team is not familiar with the project, at the initial phases; ƒ Improving knowledge when the team needs a better and deeper understanding of the project, at the advanced phases.

ƒpair designing results and performance may depend on the individual skills of components.

ICSE 2005

19

Corrado Aaron Visaggio

The Third Dimension

Suitable contexts

Pair Programming Specific Benefits

Ratio costs/benefits

Specific Benefits of the Practice

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

Costs/Benefits

Suitable Contexts of the Practice

Are distributed processes suitable for pair programming? ICSE 2005

Corrado Aaron Visaggio

20

How Distribution Affects Pair Programming Benefits

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

A more than emerging trends ƒ Global Software Development ƒ 24h production cycles, reduce costs of resources, and enhance mobility ƒ ƒ

Global software development process; Virtual teaming Pair programming

Pair Programming increases software quality without increasing significantly the time of developing

ƒ

Distribution hinders fluidity for communication and comfort for collaboration … … what is the impact on working practices that rely on C&C ?

ƒ

ICSE 2005

Corrado Aaron Visaggio

Experimentation

21

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

Research Objects: ƒ Quality: Pair Programming helps to achieve high quality of code, thanks to contemporary reviews of code and design ƒ Performance: the pair’s work fastens the production, thanks to intense collaboration.

Research Questions: RQ1 Are there significant differences in effort when the pair’s components are distributed, referring to co-located pair’s components? ƒ RQ2 Are there significant differences in quality produced when the pair’s components are distributed, referring to colocated pair’s components? Experiment: Subjects were volunteer Students Universities of Sannio and Naples ƒ

ICSE 2005

Corrado Aaron Visaggio

22

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

Hypotheses Null hypotheses

H0RQ1: Does not exist a significant difference in effort required for implementing modifications between distributed pair programming and co-located pair programming, μdistr_time = μ

co-loc_time

H0RQ2: Does not exist a significant difference between the quality of maintenance performed, μdistr_quality = μ

co-loc_quality

Alternative hypotheses

H1RQ1: A significant difference in effort required for implementing modifications between distributed pair programming and co-located pair programming does exist μdistr_time ≠ μ

co-loc_time

H1RQ2: A significant difference between quality of maintenance realised does exist μdistr_quality ≠μ Corrado Aaron

ICSE 2005

co-loc_quality Visaggio

Experiment’s Characterisation (1/2)

23

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

Effort spent, Measured as the difference of the start time and the end time required to accomplish the maintenance tasks

ƒ ƒ

ƒ

Ratio scale

Quality of the maintenance realised, A scoring function counting the successful test cases

ƒ Ordinal scale Subjects were trained with: an introductory seminar (4hrs), lab exercises (2hrs), a proof run (2hrs), an assessment seminar (2hrs)

•• •• •• •• ICSE 2005 ••

Documentation Documentation to to students students

listings listingsof ofthe theprograms programs textual description textual descriptionof ofmaintenance maintenancetasks tasks time sheet to fill in time sheet to fill in description descriptionof ofthe thecorrect correctexecution executionof ofpair pairprogramming programmingroles roles questionnaire to be compiled at the end of the experiment Corrado Aaron Visaggio questionnaire to be compiled at the end of the experiment 24

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

Experiment’s Characterisation(2/2) Technological platform Tools VNC NetMeeting JBuilder

Function Purpose Share the desktop: it lets the remote control of a PC. Collaboration Text chat.

Communication

IDE for Java Programs.

Programming

Motivation The experimenters had experience in using it in previous projects; Open Source. Its usage was well known to all the experimental subjects. Subjects had experience in using it in previous projects.

Experimental design Group A Group B

Round I Co-located Distributed

ICSE 2005

P1 P1

Round II Distributed Co-located

25

Corrado Aaron Visaggio

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

Tests and Results Round I

Round II

•Group A co-located

•Group A distributed

Mann Whitney

•Group B distributed p-level Effort round I

0,564

Effort round II

1,000

Quality round I

0,465

Quality round II

0,011

P2 P2

Mann Whitney

•Group B co-located

description Mann Whitney test on effort data between Group A (colocated) and Group B (distributed) in round I. Mann Whitney test on effort data between Group A (distributed) and Group B (co-located) in round II. Mann Whitney test on quality data between Group A (colocated) and Group B (distributed) in round I. Mann Whitney test on quality data between Group A (distributed) and Group B (co-located) in round II.

Only the round II quality’s results are statistically significant ICSE 2005

Corrado Aaron Visaggio

26

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

Dismissal Hypothesis Effort Run I

Effort Run II

Box Plot ( 2v*4c)

Box Plot ( 2v*4c)

180

200

160

180

140

160

120

140

100

120

80

100

60

80

40

Median 25%-75% 60Non-Outlier Range Var25

Var26

Median 25%-75% Non-Outlier Range Var22

Co-located

Var23

Distributed

After Afteran aninitial initialperiod periodof ofcollaboration collaborationthe thedistributed distributed pairs pairstend tendto towork workas assolo soloprogrammer programmer ICSE 2005

27

Corrado Aaron Visaggio

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

Quality Run I

Run II Box Plot ( 2v*4c)

Box Plot ( 2v*4c) 10

9

9

8

8

7 7

6 6

5 5

4

4

3

3

Median 25%-75% 2 Non-Outlier Range

2 Var18

Var19

Co-located

Median 25%-75% Non-Outlier Range Var21

Var22

Distributed

Quality results give a confirmation of the dismissal hypothesis ICSE 2005

Corrado Aaron Visaggio

28

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

Replica’s characterisation ƒ ƒ ƒ ƒ ƒ ƒ

Replica aimed at confirming the dismissal hypothesis What changed University of Naples student subjects C++ rather than Java More intensive and focused training Reduce the time for performing the tasks ƒ

From 180 min to 90 min

ICSE 2005

29

Corrado Aaron Visaggio

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

Replica’s results p-level Effort

0,083

Quality

0,043

Description Mann Withney tests on effort data between colocated and distributed pairs Mann Whitney tests on quality data between colocated and distributed pairs

There is empirical evidence that distribution affects quality

Effort

Box Plot ( 2v*4c) 90

80

70

Quality Box Plot ( 2v*4c)

9,5

9,0

8,5

8,0

60 7,5

50 7,0

40 6,5

30

20

ICSE 2005

6,0 Median 25%-75% 5,5Non-Outlier Range

Var14 Var15 Co-located Distributed Co-located Distributed Corrado Aaron Visaggio Var9

Median 25%-75% Non-Outlier Range

Var10

30

Experimental Validity (1/2) Round I

Round II

•Group A co-located

•Group A distributed

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

Wilcoxon

•Group B distributed

•Group B co-located Wilcoxon

p-level Effort Group A

0,465

Effort Group B

0,715

Quality Group A

0,345

Quality Group B

0,969

Wilcoxon and II Wilcoxon and II Wilcoxon and II Wilcoxon and II

description test on effort data of the Group A between round I test on effort data of the Group B between round I test on quality data of the Group A between round I test on quality data of the Group B between round I

There is no empirical evidence of maturation ICSE 2005

31

Corrado Aaron Visaggio

Experimental Validity (2/2) Round I

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

Round II

•Group A co-located

•Group A distributed

Wilcoxon

•Group B distributed

Effort first experiment Quality first experiment

p-level 0,508 0,445

Effort replica

0,715

Quality replica

0,109

•Group B co-located

Description Wilcoxon test on effort data between round I and round II the first experiment. Wilcoxon test on quality data between round I and round II the first experiment. Wilcoxon test on effort data between round I and round II the replica. Wilcoxon test on quality data between round I and round II the replica.

in in in in

There is no empirical evidence that monooperation bias affects experiment validity ICSE 2005

Corrado Aaron Visaggio

32

Qualitative analysis ƒ

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

Post experiment assessment ƒ Questionnaire ƒ Open discussion

Communication: a vocal support preferable ƒ No need for video ƒ Acquaintance: pairs have to be used working together ƒ Anarchic behaviour: distribution emphasises the lack of a proper protocol for working in pair ƒ

ICSE 2005

Corrado Aaron Visaggio

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

Conclusions

ƒ ƒ ƒ

ICSE 2005

33

Distribution seems to affect pair programming quality No empirical evidence that effort increases when distributing pair programming Pair dismissal because of a poor technology

Corrado Aaron Visaggio

34

Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions

ICSE 2005

Corrado Aaron Visaggio

35