Empirical Validation of Pair Programming
Corrado Aaron Visaggio°
[email protected], °Research Centre on Software Technology - RCOST °University of Sannio Benevento, Italy PhD Symposium ICSE 2005 ICSE 2005
1
Corrado Aaron Visaggio
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
Motivation Plan driven approaches for developing software can fail in contexts where: the availability of resource may vary in an unpredictable way the time pressure is much stronger than expected the requirements of the system to develop are emerging or unstable.
Some alternatives have been explored in order to face such situations and save quality of process and product: Boehm’s spiral process, Radip Application Development, Rational Unified Process (...). There was an urging need to achieve an higher flexibility than the ones these processes offered. In the last decades the Agile Methods for software developments burst into the scene, proposing a radically different way to manage software process. ICSE 2005
Corrado Aaron Visaggio
2
The Problem (1/2)
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
In the 2001 the Agile Manifesto was published, defining the novel “agile way” of the software production with for principles: Individual and interactions over process and tools Working software over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan The doubt: may agile method deteriorate the engineering rigor and discipline achieved with the plan -driven approach? Individual and interactions over process and tools: does the process remain repeatable? Working software over comprehensive documentation: does the process remain repeatable? Is the process measurable? Customer collaboration over contract negotiation: what happens to the product’s quality when the architecture emerges from the process? Responding to change over following a plan: is it possible to realise dependable estimates on the process? ICSE 2005 Corrado Aaron Visaggio
The problem (2/2) and the research goals
3
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
There is not a large consensus about one relevant issue: is it worth to adopt agile methods when developing software or is it too risky,provided that it contrasts with some good practices of software engineering? It was not feasible to deal with the entire set of agile practices in the space of a thesis: Pair Programming (2P) was selected for focusing my investigation. Pair Programming was analysed according to three dimensions: Suitable contexts
Pair Programming Specific Benefits ICSE 2005
Costs/Benefits
Suitable contexts: Is 2P Suitable for distributed Process? Costs/Benefits: Is 2P advantageous in terms of Return on Investement? Specific Benefit: Is 2P helpful for knowledge leveraging?
Corrado Aaron Visaggio
4
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
The research Plan and Method
the Purpose of the Research: validate, by empirical investigation, pair
programming, according to three dimensions: sutibale contexts, ratio costs/benefits, and specific benefit. Establish Research Questions: success factors
Controlled experiments with students: defects removal yes thesis
Bugs? no
Controlled experiments with professionals: confidence of industry
Technological transfer
Field Experiments: dependable results ICSE 2005
Post-doc
5
Corrado Aaron Visaggio
The First Dimension
Suitable contexts
Pair Programming Specific Benefits
Ratio costs/benefits
Specific Benefits of the Practice
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
Costs/Benefits
Suitable Contexts of the Practice
Warning: this research activity is still ongoing!
ICSE 2005
Does Pair Programming cost more than Solo Programming? Is Pair Programming more beneficial than Solo Programming in terms of quality achieved? Corrado Aaron Visaggio
6
Productivity and quality
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
Research Objects: Productivity: pair programming is supposed to fasten production cycles. Quality: pair programming is supposed to increase the quality of code’s modules and overall architecture of the system.
Research Question: Can pair programming improve the performances of project’s teams, in terms of productivity and quality?
Experiments: An experiment at University of Sannio, Benevento, Italy
Hypotheses: Hoa: the pair programming does not affect the speed of programming. Hob: the pair programming does not affect the quality of code and architecture. ICSE 2005
7
Corrado Aaron Visaggio
The Experiment Outlook
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
An initial experiment on Productivity of Pair Programming suggests that pair programming can fasten production cycles. 60 Subjects (graduate students of Computer Engineering) are grouped in teams of two kinds: paired programmers and solo programmers teams. Each team was responsible for the development of a system for the software requirements traceability. The teams follow an incremental process: at each iteration they receive the new group of features to implement and each iteration corresponds to a point of observation.This experimentation is yet ongoing. Points of data collection
Demos of the teams’ products
Kick off 1st 1st group of features ICSE 2005
2nd 2nd group of features
3rd 3rd group of features
4th 4th group of features
Corrado Aaron Visaggio
iteration
8
The Second Dimension
Suitable contexts
Pair Programming Specific Benefits
Ratio costs/benefits
Specific Benefits of the Practice
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
Costs/Benefits
Suitable Contexts of the Practice
Is Pair Programming an effective means for diffusing and enforcing design knowledge in a project’s team? ICSE 2005
9
Corrado Aaron Visaggio
Knowledge Transfer
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
One of the expected benefits of pair programming is fostering the knowledge transfer. Software design requires an efficient management of knowledge at team level and documentation is not enough because: strategies for problem solving are scarcely captured; it is necessary to deal with different levels of abstraction: implementation, database, business logic, presentation, deployment, interaction with other systems, and communication protocols; documentation has a very low bandwidth: face to face communication can be most effective and time-saving. Could pair designing be an appropriate alternative for diffusing and enforcing software system knowledge among project team’s members? ICSE 2005
Corrado Aaron Visaggio
10
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
The Experimentation Research Objects:
Diffusing Knowledge: disseminating knowledge within project team initial phases of the project. Enforcing knowledge: improving the individual knowledge of project’s participants -advanced phases of the project.
Research Question: Is pair designing effective for diffusing and improving knowledge within project’s teams?
Experiments: An explorative experiment (demonstrating that pair design can foster knowledge leveraging) One Experiment at University of Sannio, Benevento, Italy. A replica at University of Castilla-La-Mancha, Ciudad Real, Spain.
Hypotheses:
ICSE 2005
Hoa: the pair designing does not affect the diffusion of design knowledge when performing evolution tasks. Hob: the pair designing does not affect the improvement of design Corrado Aaron Visaggio knowledge when performing evolution tasks.
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
The Experiment Subjects
Treatment
5 MUTEGS 5 MUTEGS
Paired MUTEGS MUTEGS
5 MUTS 5 MUTS 8 MUTS 8 MUTEGS
Subjects 64 students 3BScMngmt 3BscSys 5MSc 32 students 3BScMngmt 3BscSys 5MSc ICSE 2005
Paired MUTS MUTS
11
Input
Output
Requirement Specification; Use case Diagram; Class Diagram; Entry questionnaire QA (or QB); Exit questionnaire QB(or QA).
Modifications to Use Case Diagram and Class Diagram; Answered entry questionnaire QA (or QB); Answered exit questionnaire QB(or QA).
Experimental Design Experiment # 1 (Italy)
Solo Solo
Treatment Paired 3BScMngmt-3BScMngmt 3BscSys-3BscSys 5MSc-5MSc
Input Requirement Specification; Use case Diagram; Class Diagram; Entry questionnaire QA (or QB); Exit questionnaire QB(or QA).
Output Modifications to Use Case Diagram and Class Diagram; Answered entry questionnaire QA (or QB); Answered exit questionnaire QB(or QA).
Solo Corrado Aaron Visaggio
Experimental Design 12 Experiment # 2 (Spain)
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
The Experiment’s Process start start
1. 1. each subject each subject studied studied documentation documentation for for 30 30 minutes, minutes, individually individually
2. 2. an entry questionnaire, an entry questionnaire, individually, individually, for for about about 15 15 minutes; minutes;
3. 3. the the pairs pairs and and the the solo solo designers designers performed performed the the maintenance maintenance tasks tasks for for 22 hours; hours;
4. 4. each each subject subject answered answered an an exit exit questionnaire individually questionnaire individually
end end ICSE 2005
13
Corrado Aaron Visaggio
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
The Randomisation tests Test Between Entry Questionnaires of Subjects of MUTS Pairs sample (α) Subjects of MUTS Solos sample (β) Entry Questionnaires of Subjects of MUTEGS Pairs sample(α) Subjects of MUTEGS Solos sample(β) Entry Questionnaires of Solos of the 3BScSys sample(α) Pairs of the 3BScSys sample (β) Entry Questionnaires of Solos of the 5MSc sample(α) Pairs of the 5MSc sample (β) Entry Questionnaires of Solos of the 3BScMngmnt sample(α) Pairs of the 3BScMngmnt sample (β)
Rank Sum α
Rank Sum β
p-level
171,000
39,000
0,214768
Experiment
Italian Experiment 112,000
59,000
0,130919
425,000
395,000
0,214741
31,500
46,5000
0,229767
425,000
395,000
0,321966
Spanish Experiment
The experiments samples of pairs and those of solos were formed by equivalent subjects. ICSE 2005
Corrado Aaron Visaggio
14
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
The Satistical Tests: Knowledge Diffusion Test Between MUTS Pairs (α) MUTS Solos (β) MUTEGS Pairs (α) MUTEGS Solos (β) MUTS Pairs (α) MUTEGS Pairs (β) Pairs 5MSc(α) Solos 5MSc(β) Pairs 3BScSys(α) Solos 3BScSys(β) Pairs 3BScMngmnt(α) Solos 3BScMngmnt(β)
Rank Sum α 116,500
Rank Sum β 54,50
p-level
78,50
57,50
0,270
135,00
75,00
0,023
51,500
26,500
0,030912
253,000
567,000
0,00017
447,000
778,00
0,00000
experiment
0,049 Italian experiment
Spanish experiment
Results and Interpretation: Empirical Evidence: pairs outperformed solos: pair design is a candidate means for diffusing knowledge. Side Effect: pair design success in diffusing knowledge may depend on the individual skills. ICSE 2005
15
Corrado Aaron Visaggio
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
The Statistical Tests: Knowledge Improving Test Between MUTS Pairs (α) MUTS Solos(β) MUTEGS Pairs (α) MUTEGS Solos (β) MUTS Pairs (α) MUTEGS Pairs (β) Spanish Pairs 3BScSys (α) Spanish Solos 3BScSys (β) Spanish Pairs 5MSc (α) Spanish Solos 5MSc (β) Spanish Pairs 3BScMngmnt (α) Spanish Solos 3BScMngmnt (β)
Rank Sum α
Rank Sum β
p-level
123,500
47,500
0,0102
53,500
66,500
0,2164
110,500
42,500
0,0428
49,500
28,500
0,086984
551,000
269,000
0.000942
51,500
26,500
0,042337
Experiment
Italian experiment
Spanish experiment
Results and Interpretation: Empirical Evidence: confirmation of knowledge diffusion results: pair design is a candidate means for improving knowledge pair design success in improving knowledge may depend on the individual skills. ICSE 2005
Corrado Aaron Visaggio
16
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
Qualitative Analysis Pairs MUTS Pairs MUTEGS Pairs MUTS Solos MUTEGS Solos Pairs 3BScSys Solos 3BScSys Pairs 5MSc Solos 5MSc Pairs 3BScMngmnt Solos 3BScMngmnt
Statistical Parameter average max min std dev Statistical Parameter average max min std dev ICSE 2005
Std Dev. 1,75 1,60 1,03 1,55 1,02 1,26 0,98 0,82 0,73 0,94
Average 5,8 3,9 4,25 5,13 6,00 4,44 6,17 5,33 6,30 4,21
Max 9 7 6,00 7,00 7,00 6,00 7,00 7,00 8,00 5,00
Min 4 1 3,00 3,00 3,00 3,00 5,00 5,00 5,00 1,00
Experiment Italian Experiment
Spanish Experiment
MUTS Pairs
MUTS Solos
MUTEGS Pairs
MUTEGS Solos
2,000 5,000 -1,000 1,915
-1,400 2,000 -3,000 2,074
-0,800 1,000 -3,000 1,643
-0,750 1,000 -2,000 1,500
5MSc Pairs
5MSc Solos
1,167 3,000 -1,000 1,722
-0,500 3,000 -2,000 1,871
3BScSys 3BScSys Pairs Solos 1,714 -0,579 4,000 3,000 -1,000 -4,000 1,736 1,865 Corrado Aaron Visaggio
3BScMngmnt 3BScMngmnt Pairs Solos 1,111 -1,036 3,000 2,000 -1,000 -5,000 1,278 1,85617
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
The Questionnaires Two Questionnaires were used to evaluate knowledge built Test Between Questionnaire A (α) Questionnaire B (β) in the experiment Questionnaire A (α) Questionnaire B (β) in the replica
Rank Sum α
Rank Sum β
p-level
540,00
406,00
0,161
598,00
677,00
0,2068
The Experiment results were independent by the specific questionnaire used
ICSE 2005
Corrado Aaron Visaggio
18
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
Conclusions
Pair designing is helpful for: diffusing knowledge, when the team is not familiar with the project, at the initial phases; Improving knowledge when the team needs a better and deeper understanding of the project, at the advanced phases.
pair designing results and performance may depend on the individual skills of components.
ICSE 2005
19
Corrado Aaron Visaggio
The Third Dimension
Suitable contexts
Pair Programming Specific Benefits
Ratio costs/benefits
Specific Benefits of the Practice
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
Costs/Benefits
Suitable Contexts of the Practice
Are distributed processes suitable for pair programming? ICSE 2005
Corrado Aaron Visaggio
20
How Distribution Affects Pair Programming Benefits
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
A more than emerging trends Global Software Development 24h production cycles, reduce costs of resources, and enhance mobility
Global software development process; Virtual teaming Pair programming
Pair Programming increases software quality without increasing significantly the time of developing
Distribution hinders fluidity for communication and comfort for collaboration … … what is the impact on working practices that rely on C&C ?
ICSE 2005
Corrado Aaron Visaggio
Experimentation
21
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
Research Objects: Quality: Pair Programming helps to achieve high quality of code, thanks to contemporary reviews of code and design Performance: the pair’s work fastens the production, thanks to intense collaboration.
Research Questions: RQ1 Are there significant differences in effort when the pair’s components are distributed, referring to co-located pair’s components? RQ2 Are there significant differences in quality produced when the pair’s components are distributed, referring to colocated pair’s components? Experiment: Subjects were volunteer Students Universities of Sannio and Naples
ICSE 2005
Corrado Aaron Visaggio
22
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
Hypotheses Null hypotheses
H0RQ1: Does not exist a significant difference in effort required for implementing modifications between distributed pair programming and co-located pair programming, μdistr_time = μ
co-loc_time
H0RQ2: Does not exist a significant difference between the quality of maintenance performed, μdistr_quality = μ
co-loc_quality
Alternative hypotheses
H1RQ1: A significant difference in effort required for implementing modifications between distributed pair programming and co-located pair programming does exist μdistr_time ≠ μ
co-loc_time
H1RQ2: A significant difference between quality of maintenance realised does exist μdistr_quality ≠μ Corrado Aaron
ICSE 2005
co-loc_quality Visaggio
Experiment’s Characterisation (1/2)
23
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
Effort spent, Measured as the difference of the start time and the end time required to accomplish the maintenance tasks
Ratio scale
Quality of the maintenance realised, A scoring function counting the successful test cases
Ordinal scale Subjects were trained with: an introductory seminar (4hrs), lab exercises (2hrs), a proof run (2hrs), an assessment seminar (2hrs)
•• •• •• •• ICSE 2005 ••
Documentation Documentation to to students students
listings listingsof ofthe theprograms programs textual description textual descriptionof ofmaintenance maintenancetasks tasks time sheet to fill in time sheet to fill in description descriptionof ofthe thecorrect correctexecution executionof ofpair pairprogramming programmingroles roles questionnaire to be compiled at the end of the experiment Corrado Aaron Visaggio questionnaire to be compiled at the end of the experiment 24
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
Experiment’s Characterisation(2/2) Technological platform Tools VNC NetMeeting JBuilder
Function Purpose Share the desktop: it lets the remote control of a PC. Collaboration Text chat.
Communication
IDE for Java Programs.
Programming
Motivation The experimenters had experience in using it in previous projects; Open Source. Its usage was well known to all the experimental subjects. Subjects had experience in using it in previous projects.
Experimental design Group A Group B
Round I Co-located Distributed
ICSE 2005
P1 P1
Round II Distributed Co-located
25
Corrado Aaron Visaggio
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
Tests and Results Round I
Round II
•Group A co-located
•Group A distributed
Mann Whitney
•Group B distributed p-level Effort round I
0,564
Effort round II
1,000
Quality round I
0,465
Quality round II
0,011
P2 P2
Mann Whitney
•Group B co-located
description Mann Whitney test on effort data between Group A (colocated) and Group B (distributed) in round I. Mann Whitney test on effort data between Group A (distributed) and Group B (co-located) in round II. Mann Whitney test on quality data between Group A (colocated) and Group B (distributed) in round I. Mann Whitney test on quality data between Group A (distributed) and Group B (co-located) in round II.
Only the round II quality’s results are statistically significant ICSE 2005
Corrado Aaron Visaggio
26
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
Dismissal Hypothesis Effort Run I
Effort Run II
Box Plot ( 2v*4c)
Box Plot ( 2v*4c)
180
200
160
180
140
160
120
140
100
120
80
100
60
80
40
Median 25%-75% 60Non-Outlier Range Var25
Var26
Median 25%-75% Non-Outlier Range Var22
Co-located
Var23
Distributed
After Afteran aninitial initialperiod periodof ofcollaboration collaborationthe thedistributed distributed pairs pairstend tendto towork workas assolo soloprogrammer programmer ICSE 2005
27
Corrado Aaron Visaggio
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
Quality Run I
Run II Box Plot ( 2v*4c)
Box Plot ( 2v*4c) 10
9
9
8
8
7 7
6 6
5 5
4
4
3
3
Median 25%-75% 2 Non-Outlier Range
2 Var18
Var19
Co-located
Median 25%-75% Non-Outlier Range Var21
Var22
Distributed
Quality results give a confirmation of the dismissal hypothesis ICSE 2005
Corrado Aaron Visaggio
28
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
Replica’s characterisation
Replica aimed at confirming the dismissal hypothesis What changed University of Naples student subjects C++ rather than Java More intensive and focused training Reduce the time for performing the tasks
From 180 min to 90 min
ICSE 2005
29
Corrado Aaron Visaggio
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
Replica’s results p-level Effort
0,083
Quality
0,043
Description Mann Withney tests on effort data between colocated and distributed pairs Mann Whitney tests on quality data between colocated and distributed pairs
There is empirical evidence that distribution affects quality
Effort
Box Plot ( 2v*4c) 90
80
70
Quality Box Plot ( 2v*4c)
9,5
9,0
8,5
8,0
60 7,5
50 7,0
40 6,5
30
20
ICSE 2005
6,0 Median 25%-75% 5,5Non-Outlier Range
Var14 Var15 Co-located Distributed Co-located Distributed Corrado Aaron Visaggio Var9
Median 25%-75% Non-Outlier Range
Var10
30
Experimental Validity (1/2) Round I
Round II
•Group A co-located
•Group A distributed
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
Wilcoxon
•Group B distributed
•Group B co-located Wilcoxon
p-level Effort Group A
0,465
Effort Group B
0,715
Quality Group A
0,345
Quality Group B
0,969
Wilcoxon and II Wilcoxon and II Wilcoxon and II Wilcoxon and II
description test on effort data of the Group A between round I test on effort data of the Group B between round I test on quality data of the Group A between round I test on quality data of the Group B between round I
There is no empirical evidence of maturation ICSE 2005
31
Corrado Aaron Visaggio
Experimental Validity (2/2) Round I
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
Round II
•Group A co-located
•Group A distributed
Wilcoxon
•Group B distributed
Effort first experiment Quality first experiment
p-level 0,508 0,445
Effort replica
0,715
Quality replica
0,109
•Group B co-located
Description Wilcoxon test on effort data between round I and round II the first experiment. Wilcoxon test on quality data between round I and round II the first experiment. Wilcoxon test on effort data between round I and round II the replica. Wilcoxon test on quality data between round I and round II the replica.
in in in in
There is no empirical evidence that monooperation bias affects experiment validity ICSE 2005
Corrado Aaron Visaggio
32
Qualitative analysis
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
Post experiment assessment Questionnaire Open discussion
Communication: a vocal support preferable No need for video Acquaintance: pairs have to be used working together Anarchic behaviour: distribution emphasises the lack of a proper protocol for working in pair
ICSE 2005
Corrado Aaron Visaggio
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
Conclusions
ICSE 2005
33
Distribution seems to affect pair programming quality No empirical evidence that effort increases when distributing pair programming Pair dismissal because of a poor technology
Corrado Aaron Visaggio
34
Introduction 2P Economics 2P and Knowledge Distributing 2P Conclusions
ICSE 2005
Corrado Aaron Visaggio
35