An initial framework for research on pair programming - Empirical ...

An Initial Framework for Research on Pair Programming Hans Gallis¹, Erik Arisholm¹, and Tore Dybå¹, ² ¹Simula Research Laboratory, Norway, +47 67 82 82 00, {hansga, erika}@simula.no ²SINTEF Telecom and Informatics, Norway, +47 73 59 29 47, [email protected] Abstract In recent years, several claims have been put forward in favour of pair programming, as opposed to individual programming. However, results from existing studies on pair programming contain apparent contradictions. The differences in the context in which the studies were conducted may be one explanation for such results. This paper presents an initial framework for research on pair programming. The aim is to support empirical studies and meta-analysis for developing theories about pair programming. The framework is based on (1) existing studies on pair programming, (2) ongoing studies by the authors, and (3) theories from group dynamics. Keywords: Empirical software engineering, research methods, pair programming, partner programming, team collocation, collaboration, group dynamics.

1. Introduction Pair programming (PP) seems to have gained popularity within the industry and academia in recent years. Much of the increased interest in PP is probably due to the introduction of extreme programming (XP), in which PP is one of 12 key practices [3, 4]. Several benefits have been claimed for PP over individual programming regarding, for example, time-to-market, cost, quality, information and knowledge transfer, and trust and morale [43]. However, the results of the existing empirical studies contain apparent contradictions [8, 17, 27, 29-31, 44, 45]. Research in software engineering is difficult because it involves a complex interaction of human, organizational and technological elements [23]. Consequently, it is important to plan and design research in such a way that it is possible to develop initial theories and validate them in different contexts [1, 2, 23, 25, 26, 32, 33, 35, 39, 48]. In an attempt to support research on PP, we propose an initial research framework that categorises what we consider to be important independent, dependent and context variables of empirical studies. To support theory

building, the framework is motivated by existing work from group dynamics [16]. The goals of this paper are threefold: x

x

x

To present the most important empirical studies on PP to show that there is a need for a research framework which can serve as a basis for the development of theories. To reach an “… initial understanding, including identifying likely variables, capturing the magnitude of problems and variables, documenting behaviours, and generating theories to explain perceived behaviour…” [32] in PP. To act as a foundation for meta-analysis for research on PP.

The remainder of this paper is organised as follows. Section 2 outlines the concepts and existing claims for PP, as well as an historical account of related theories. Section 3 summarizes existing empirical studies. Section 4 presents the proposed initial framework for research on PP. Section 5 concludes.

2. Pair Programming In 1996, XP was used for the first time, on the C3 project at Daimler Chrysler [3]. PP was then officially born as one of the key development practices in XP and as the stand-alone development activity we know today. PP is said to be the simple concept of two programmers working on the same task using one computer and keyboard [3, 4, 43, 46]. In XP, PP involves not just coding, but also other phases of the software development process such as design and testing. A pair consists of a driver, who types at the computer or writes down the design, and a navigator (also called a partner), who actively observes the work of the driver: looking for tactical and strategic defects, thinking of alternatives, writing down “things-to-do”, and looking up references [43]. The main claims regarding the benefits of PP versus individual programming on coding tasks are as follows [43]: x Pairs produce code with fewer defects,

1 Proceedings of the 2003 International Symposium on Empirical Software Engineering (ISESE’03) 0-7695-2002-2/03 $ 17.00 © 2003 IEEE

x x x x x

Pairs produce higher-quality code in about half the time (elapsed time) as individuals, Pair programmers are happier programmers, Pair programming builds trust and improves teamwork, Pair programmers, especially if they rotate partners, know more about the overall system, Pairs learn continually by discussing solutions and watching each other’s techniques.

The concepts underlying PP are not new. In the early 1970’s, Weinberg focused on the principle of letting others review a programmer’s code. He called this egoless programming. According to Weinberg, this lead to fewer defects, and as a result, the quality of the code is higher [42]. Continuous code review and inspection “on-the-fly” is a central aspect of PP. In his classic book “The Mythical Man-Month”, Brooks presented the surgical team as an approach to solve the cost of communication and ill-effects of miscommunication within a development team [7]. Brooks referred to a surgeon, or chief programmer in software engineering terms, as the main craftsman who has overall responsibility and makes all the decisions. His hypothesis was that this arrangement would reduce the communication overhead and the amount of miscommunication. Flor and Hutchins studied two programmers collaborating on a software maintenance task. They observed the exchange of ideas, feedback and debate between the two collaborators. The results indicated that this collaboration significantly reduced the probability of ending up with a poor design [15]. Constantine used the expression dynamic duos to explain the way people worked at a company called Whitesmith, Ltd [9]. He observed the effects of two developers working together at one computer, each making the code more visible to the other. The observations suggested that learning a new language was particularly efficient through dynamic duos. The strength of ‘visibility’ was explained as follows: “I caught on to the fact that running a problem past a fellow programmer was often the most efficient way to find some elusive bug or to work out some tricky algorithm. In fact, most of us learn that talking out your ideas, using a colleague as a sounding board or getting feedback on something still half-jelled, is not only effective and enriches the end product, but it’s also fun and builds good working relationships.” [9] Coplien introduced the programming in pairs organizational pattern [10]. This pattern is based on the assumptions “two heads are better than one” and “people are scared to solve problems alone”. The result of using

the pattern was stated to be a more efficient implementation process.

3. Existing Empirical Studies on Pair Programming To show that there is a need for a research framework from which to build theories, this section summarizes the most important existing empirical studies on PP according to time, cost, quality, information and knowledge transfer, and trust and morale. Table 1 gives an overview of these studies. In an experiment with professional subjects, five individual programmers were compared with five pairs [31]. The subjects had 45 minutes to perform a programming task in an unknown application domain. The dependent variables of the experiment were the quality of the solutions (indicated by “readability” and “functionality”) and a qualitative assessment of the programmers’ morale. The results suggested that the pairs produced higher quality solutions, enjoyed the problemsolving process more and had greater confidence in their solution than the individual programmers. An experiment conducted at the University of Utah compared 13 individuals using the Personal Software Process (PSP) with 14 pairs using the Collaborative Software Process, which includes PP [44, 45]. The students had to complete four assignments in 6 weeks. The experiment compared the time to complete the assignments (in number of hours from start to finish), the cost (indicated by the number of programmer hours) and the quality of the solutions (indicated by the number of passed test cases) between the two groups. After an initial “adjustment period” (the first assignment) the pairs spent, on average, only 15 percent more programmer hours than the individuals. Consequently, the pairs spent less time than the individuals, but at increased cost. All pairs also passed more of the test cases than the individuals. Thus, it is difficult to draw conclusions about the cost benefits, because the individuals might have produced similar quality if they had spent 15 percent more time. However, in terms of trust and morale, more than 90 percent of the students using PP reported that they enjoyed it more than programming alone. Almost 95 percent of the students stated that they had more confidence in their solutions when programming in pairs. Nawrocki and Wojciechowski conducted an experiment with 21 students, comparing three different groups: (1) six students using PSP, (2) five students using a modified version of XP with individual programming, and (3) 10 students (in five pairs) using a modified version of XP including PP [30]. They found almost no difference in the development time (indicated by the number of hours from start to finish) between the three groups. Furthermore, there was almost no difference in


Table 1: An overview of existing empirical studies on pair programming. Author(s) Nosek (1998)

Type of study S ubjects N Experiment Prof. 15

Main dependent variables (metrics) Quality (readability and functionality), Programmers morale (qualitative assessment) Six weeks PSP (13) versus CSP Time to complete the (14 pairs) assignments (number of hours from start to finish), Cost (number of programmer hours), Quality (number of passed test cases) N/A PSP (6), XP with PP (5 Time (number of hours pairs) and XP with from start to finish individual progr. (5) elapsed time), Quality (number of lines of code and number of resubmissions due to defects in code) Two Individual (141) versus Quality - score on academic pair programming (86 programming semesters pairs) assignment (functionality and readability), Learning effect - score on final exam

Task Duration Independent variable(s) Unknown appli- 45 minutes Individuals (5) versus cation domain pairs (5) (database script)

Williams et al. Experiment (2000), Williams (2000)

Stud.

41

Four programming assignments

Nawrocki and Experiment Wojciechowski (2001)

Stud.

21

Four programs proposed by W. Humphrey

McDowell et al. (2002)

Experiment

Stud.

313

Course assignments

Müller and Tichy (2001)

Case study

Stud.

12

Software tasks

Gallis et al. (2002)

Case study

Prof.

4

Project coding tasks

quality (indicated by the number of lines of code and number of re-submissions due to defects in code). Consequently, the results suggest doubled costs for the pairs compared with the individuals – without increased quality. The effects of PP on student performance was studied in an introductory programming class [27]. The study consisted of 313 students where PP was compared with individual programming. The results indicated that the pairs produced “better” programs (indicated by the grade obtained on their delivered assignment). The authors suggested that the students who used PP benefited from the information and knowledge transfer process of PP, because fewer students who used PP dropped the course than students who worked individually. Müller and Tichy conducted a case study on CS graduate students using XP. The course had 12 participants performing software tasks in pairs over a period of 11 weeks [29]. About half of the participants stated that the learning effect from PP declined with the

11 weeks

Evaluation of XP Information and (including PP) to knowledge transfer gather experience with (qualitative assessment), the process Morale (qualitative assessment) Project Partner programming Information and estimate: 5 (1 pair) versus pair knowledge transfer, months programming (1 pair) Programmers morale (qualitative assessments)

duration of the course. All but one student enjoyed PP. However, some students thought it was a waste of time to watch their partner during trivial programming tasks. The preliminary results of an ongoing industrial case study indicate that PP is inefficient for trivial programming tasks [17]. The study compared “partner programming” (cf. Section 4) with pair programming. The group who practiced PP preferred individual programming on programming tasks that included routine work. Furthermore, the navigator was often passive because the driver was more familiar with the programming language and had more experience with implementing complex algorithms. Thus, the driver took most of the decisions and controlled the interaction, leading to a passive navigator. Consequently, the results indicate that pair programming was less efficient than partner programming; the potential benefits of pair programming seem to depend on the type of task and on the pair configuration.


Independent variable

Dependent variables

(Programmer collaboration)

(Outcomes)

- Individual programming - Partner programming - Pair programming - Team collocation

- Time - Cost - Quality - Information and knowledge transfer - Trust and morale - Risk

Context variables Subject variables

- Education and experience - Personality - Roles - Communication - Switching partners

Task Variables

Environmental variables

- Type of development - Software development process activity - Software development tools - Type of task - Work space facilities

Figure 1: An initial framework for research on pair programming.

4. An Initial Framework for Research on Pair Programming Results from the existing studies are apparently contradictory regarding the evaluation of the claims of PP. For example, the two independent empirical studies presented in the previous section were both conducted on students comparing PP with individual programmers using PSP [30, 45]. The results conflicted, with respect to both the quality of the solutions and the time spent to perform the tasks. Why are the results contradictory? The “quality” indicators used in the two studies were different, i.e., lines of code and number of re-submissions due to defects in code versus the number of passed test cases, respectively. Thus, the results are difficult to compare. Furthermore, the contradictory results may also be explained by the differences in the context in which the studies were conducted. Consequently, the independent variables, the dependent variables and the context variables of empirical studies need to be defined, to enable meta-analysis and support the development of (cause-effect) theories regarding PP. Research needs to identify when and how PP is beneficial compared with other levels of programmer collaboration. The goal of our initial research framework is to identify the most important variables to be considered in empirical studies of PP (Figure 1). The choice of variables in this framework has been motivated by existing

empirical studies and theories from group dynamics. The independent variable of the framework identifies various levels of programmer collaboration. The dependent variables reflect the main claims regarding PP. The context variables reflect the conditions under which the hypotheses regarding a relationship between the independent and dependent variables have been studied. This implies that research on PP should be explicit about the context for which the study is representative. For example, the context variables of case studies should be described in detail. Furthermore, controlled experiments on PP should be designed to control for or limit the number of context variables.

4.1. Independent Variable Existing empirical studies have compared PP with individual programming. PP is, however, just one way in which programmers can collaborate. Team collocation [38] and partner programming [8] are examples of other types of programmer collaboration (Figure 1). The independent variable in the initial framework can thus take one of four different values:


Levels of programmer collaboration

None (individual)

Partner Programming

Pair Programming

Team collocation (war room approach)

Figure 2: Different levels of programmer collaboration structured according to physical proximity. x

x

x

x

Individual programming is the traditional approach to perform software development tasks. A single developer works alone on a development task and does not share the office or workspace with other developers. Thus, communication with other developers cannot be carried out “on-the-fly”. Partner programming means that two programmers work on different tasks on different computers, but they share the same physical office or workspace [8]. Thus, the two programmers are able to share ideas, thoughts, problems and so forth. The whole team is not collocated. Pair programming is the concept of two programmers working on the same task using one computer and keyboard [3, 4, 43, 46]. See Section 2 for a thorough description of the concept. Team collocation is a level of collaboration where the whole team is located in a single physical room [38]. The office space is outfitted with shared key resources, which are updated by the team.

x

4.2. Dependent Variables To give a satisfactory account of “when and how PP are beneficial compared to other levels of programmer collaboration”, it is necessary to link the term “beneficial” to certain measurement criteria, i.e., a dependent variable. A dependent variable can be viewed as the outcome we want to achieve by using PP. The main dependent variables in the framework are: x Time. Time to market is critical for software organizations. Customers may also be more concerned with the reliability of lead-times than with

x

just obtaining the shortest possible lead-time [13]. Williams et al. reported that the pairs in their experiment always handed in their assignments on time, while some individuals did not hand in an assignment or handed it in late [44]. This suggests that the time to market when using PP may be more predictable than when using individual programming. Time should be represented by elapsed time. Cost. Development effort (usually measured by programmer hours) is frequently considered to be the predominant cost driver for most software projects. Other indicators of software costs include the ratio of actual versus planned development effort, rework, and reuse [13]. In addition, Williams and Kessler mention the cost of training new personnel as an important factor of PP [43]. The amount of time it takes to train a new person might be reduced by using PP because both the tutor and the “apprentice” can work on project specific tasks during the training (ibid.). Brooks sees the training cost as a part of the communication costs [7]. It is also important to distinguish explicitly between costs for initial development and maintenance. For example, PP may increase the development cost but reduce the maintenance cost [8, 44]. Quality. There are three important views concerning software quality: the user view, the manufacturing view, and the product view [24]. Customers’ expectations consist of two kinds of requirement: functional and non-functional, requirements and the implicit, or tacit, requirements. This is focused in the ISO 8402 standard which defines quality as “the totality of features and characteristics of a product or service that bear on its ability to satisfy stated or


x

x

x

implied needs” [21]. Quality can be measured according to many different attributes. One of the more recent quality models, ISO 9126, decomposes software quality into six quality characteristics: functionality, reliability, usability, efficiency, maintainability, and portability [22]. Information and knowledge transfer. Information and knowledge can be both explicit and tacit. PP may be a good way to share tacit knowledge through the partners’ visual observation of each other’s work. It is, unfortunately, difficult to quantify the “amount” of information and knowledge transfer directly. One potential indirect measure might be the amount of code reuse, and the degree to which coding standards within a project are adhered. Trust and morale. People’s satisfaction usually has a great influence on productivity and teamwork in general. Thus, in the long term, the trust and morale within a team might be a very important predictor of the success of a software project. As for information and knowledge transfer, it is difficult to assess the trust and morale within a team in a repeatable, objective way. Common ways of measuring trust and morale in existing studies on PP have been to ask the subjects to quantify it using ordinal-scale measures or using open-ended interview type questions followed by a qualitative analysis. Risk. There are numerous types of risks in a software development project (e.g., schedule slips, changed requirements, staff turnover). Two types of risks seem important regarding PP. First, the risk associated with losing key developers is an important aspect in almost any software development project. This might be reduced by using PP because multiple people will be familiar with each part of the system [43]. Second, as presented in Chapter 2, Flor and Hutchins stated that two programmers collaborating on the same maintenance task significantly reduced the probability of ending up with a poor design [15]. Adherence to coding standards is an important aspect of PP and might, for example, prevent poor readability of the code. Risk is, as with knowledge and information transfer, difficult to measure directly. Adherence to coding standards, process, and design techniques are possible indirect measures of such risks.

It may be important to study several of these dependent variables together [24]. Voas pointed to the difficulties of having faster (time), better (quality), and cheaper (cost) as desired goals for a software development project at the same time [41]. There is a trade-off between these goals that needs to be considered when attempting to compare the benefits of PP with other levels of programmer collaboration.

4.3. Context Variables When conducting empirical studies in software engineering it is important to be explicit about the target population regarding subjects, tasks and environments, i.e., the context of the study [36]. Evidential credibility depends on both the producer and the receiver of the results (ibid.). The scepticism in the industry towards PP may therefore stem from evidence that is seen as inapplicable to their organisation’s specific context of use. In the following subsections, we present the context variables of our framework according to subject, task and environment. 4.3.1. Subject Variables The following subsections present what we believe are the most important human factors regarding performance in the completion of software tasks. Education and experience In our ongoing industrial case study [17] the pair programmers felt that similar syntactical and technical knowledge of the programming language used was the most important requirements for being efficient. Their discussions usually included domain and configuration aspects as well as technical coding aspects. This indicates that each subject’s knowledge of domain, design, architecture, and programming language are important context variables. Furthermore, Flor and Hutchins observed that the programmers’ skills in using CASE tools was an important factor when studying the efficiency of collaborative programming [15]. Professionals and students differ in the prior experience (skills and techniques) that they bring to a problem-solving activity [36]. In their study of thinking strategies between experts and novices when playing computer games, Hong and Liu found that novices mostly used trial and error and sometimes heuristic, but never analogical thinking strategies [20]. Experts, on the other hand, used analogical thinking strategies frequently. This indicates that, if the goal of PP is information and knowledge transfer, the navigator or the driver should be an expert. People that are used to working collaboratively will probably be more efficient than people who have little experience of working in groups. Existing studies suggest that PP becomes more efficient over time [44]. Thus, when conducting research on PP it may be important to describe what kind of training in PP the participants have. Personality A pair consists of two persons with individual personalities. Hohmann defined personality as “a complex set of relatively stable behavioural and emotional


characteristics that can be used to uniquely identify a person” [19]. Personality as a whole can be characterised by appeal to several different criteria, but it is the introversion-extraversion scale that is usually used to categorise personalities in a social setting. Williams and Kessler discussed programmer pairing according to these two personality types [43]. For example, by pairing an extrovert expert driver and an introvert novice navigator we will probably not achieve “shorter time-to-market”. The navigator will probably be made passive and thus be prevented from contributing. This might further result in doubled development costs compared with one individual expert. However, if the desired goal is “information and knowledge transfer” this configuration might be a wellfunctioning pair; the extrovert expert driver can talk and explain while coding. Hohmann discussed personality in relation to his integrated Structure-Process-Outcome (SPO) framework and stated that personality has a strong impact on how a person operates within a process [19]. The integrated SPO framework makes up a complete model for problem solving and includes goals and values in addition to structure, process, outcome and personality. According to Hohmann, personality includes: x Mental set – individuals’ implicit expectations or beliefs that they bring to a problem-solving task. x Self-efficacy – a person’s beliefs about their capacity to continually engage in problem solving. x Assertive/passive – an assertive personality is more likely to cope with a chaotic process, whereas a passive personality is more likely to prefer a structured process. x Tolerance of anxiety – a person’s apprehension about the future state of an outcome. PP is claimed to reduce an individual person’s anxiety, but some people might not need this support at all. x High/low tolerance for ambiguity – indicates whether or not a person reacts when ambiguity appears. Theories in group dynamics also include socially motivated factors such as the need for affiliation, intimacy and power [16]. These personality types may influence the relative importance of the dependent variable “trust and morale”, compared with the other dependent variables, such as cost. One important challenge for a project manager is to decide what type of personalities the team should, or does, consist of. Some experience reports [12, 34, 46] have attempted to collect this kind of information. Roles It has been suggested that the way in which work done in pairs should be structured is a problem for PP [29]. This is in line with the results of our industrial case study, in which the pair did not know how to make the

collaboration efficient [17]. They did not have guidelines for when they should switch roles. Furthermore, since the programmers seldom switched roles, the navigator often became passive because he did not know the programming language as well as the driver. There is also anecdotal evidence that passivity is a possible outcome of the strict role definitions in PP [12, 18]. People tend to free ride when their contributions are combined in a single product and no one is monitoring the size of each person’s contribution [16]. They do less than their share of the work, but they still take an equal share of the group’s rewards. Frequent role switching might prevent free riding. Communication Up to 70 percent of the total time used in software development projects is claimed to be spent on design meetings, e.g., resolving problems within the team, resolving misunderstandings about the specification and other aspects of communication [38]. DeMarco and Lister reported that software developers generally spend 30% of their time working alone, 50% of their time working with one other person, and 20% of their time working with two or more people [11]. Communication is thus one of the most important factors when trying to improve a software engineering practice. Müller and Tichy point to the need for a better understanding of how best to structure pair interaction [29]. The communication within a pair may determine the main benefits of PP. For example, if the pairs do not communicate much, verbally or non-verbally, then the only remaining difference between PP and individual programming is code review on-the-fly. By observing the communication in efficient pairs one can probably identify the kind of knowledge that seems to be communicated. The communication of knowledge can, for example, be divided into domain, design, syntactical and architectural knowledge. Switching partners There is an issue as to when the pair should switch partners, work individually or break up. XP prescribes frequent switching of partners in a pair. The arguments are that information and knowledge will be spread throughout the whole team [4, 43]. Switching partners may also prevent groupthink, which is “a distorted style of thinking that renders group members incapable of making a rational decision” [16]. Groupthink occurs when people in a group try so hard to come to an agreement that they fail to look for better alternatives. However, Williams et al. suggest that pairs become more efficient over time [44]. Thus, there is a trade-off between achieving efficiency and preventing groupthink. Similarly, according to Tuckman’s group development stages, people in a group tend to be more effective over


time [16, 40]. The model states that a group typically goes through the following five stages: (1) orientation (forming), (2) conflict (storming), (3) structure (norming), (4) work (performing), and (5) dissolution (adjourning). The stages indicate that the pair should work together for a while before they eventually switch partners or break up. Conversely, Müller and Tichy observed that students in their case study learned from each other, but that the benefit levelled off over time [29]. This indicates that, if the goal is knowledge sharing, the pairs should switch partners quite often. Switching partners will be the same as joining and leaving a group. According to the minimax principle, people will join groups that provide them with the maximum number of valued rewards while incurring the fewest number of possible costs [16]. Thus, an alternative strategy for switching partners might be to leave it up to each pair, rather than prescribing a fixed interval for partner switching. In either case, the strategy employed regarding partner switching is an important context variable, which may have significant impact on the performance of the pairs. 4.3.2. Task Variables The tasks to be solved and related development activities are probably the most important context variables when trying to assess when and how different levels of collaboration are beneficial. Type of development activity PP emphasizes not only coding, but also design and testing. Moløkken and Jørgensen found that estimates made by a group as a whole tend to be less optimistic than those made by its members individually [28]. The reason for this was stated to be that people acting together as a group became aware of factors and tasks that they did not think of as individuals acting alone (ibid.). So far, existing studies on PP have only focused on the coding activity, but there is a further issue as to whether or not PP has the same benefits for, for example, design activities. Type of task A task can be described according to its size, complexity and duration [36]. It has been claimed that complex tasks are better solved in pairs than individually because two heads are better than one [10, 44]. This conflicts with results from group dynamics. Bond and Titus reported, based on 241 studies on approximately 24.000 persons, that performing trivial tasks in groups resulted in higher efficiency, but not better quality, than performance of the same tasks by individuals [5]. They called this social facilitation. By contrast, solving complex tasks in groups resulted in both lower performance and poorer quality. Similar results are reported in [47]. Zajonc defined a trivial task as one that is based upon dominant responses.

Dominant responses are well-trained and instinctive responses. Complex tasks are based on non-dominant responses, which are unknown or untrained responses. Zajonc stated further that complex tasks are best performed individually, but training in presentation and repetition are well suited for groups. 4.3.3. Environmental Variables In a realistic industrial environment the tasks are usually performed according to formal or informal processes with different tools and methods. In addition to the layout of the workspace, these environmental factors constitute the third category of context variables. Software development process PP will most likely be part of a software development process, e.g., XP or Rational Unified Process (RUP). The software development process will place limitations on and/or support different activities that might affect the PP activity. Typically, RUP is a more standardised development process than is, for example, XP. Another difference could be how much emphasis the development process places on the design activity. RUP has a stronger focus on formal design than XP. One of the experiments conducted within PP compared the Personal Software Process (PSP) with the Collaborative Software Process (CSP) [45]. The study conducted by Müller and Tichy used PP within a slightly reduced XP development process [29]. The work process within which PP is used may influence the outcome of the dependent variables. Steiner’s law of group productivity suggests that the work process in which the work is conducted is of great importance [37]. Software development tools When studying the efficiency of collaborative programming, Flor and Hutchins noted that the CASE tools used were important [15]. Software development tools also differ in usability and the functionality they offer, and thus constitute an important context variable when assessing the relative benefits of PP and other levels of programmer collaboration. Using pen and paper versus real software development tools can lead to quite different results [36]. Workspace facilities PP prescribes the use of just one computer and one screen. Müller and Tichy found that 75 percent of the students questioned whether a single display was sufficient [29]. They proposed at least one screen more where they could have the programming interface and documentation available without switching from application to application. Furthermore, in our industrial case study the programmers thought that a flip-over or a white-board


with post-it notes could further ease the collaboration setting [17]. A longitudinal field study was conducted within a large organization to investigate the effects of relocating employees from traditional offices to open offices [6]. Such a workspace facility would be typical of XP and of team collocation (Figure 2). The results were consistent with other related case studies and showed reduced satisfaction with the physical environment, increased physical stress, deterioration of team member relations, and lower job performance among the employees. A similar study of three hours exposure to simulated lowintensity open-office noise concluded that the noise did not result in elevated stress, but lowered the task performance, which is indicative of reduced motivation [14].

4.4. Discussion of Key Issues The proposed research framework presents variables which – based on existing studies on PP, literature from group dynamics and the authors’ own ongoing studies – seem important regarding PP. It is, however, difficult to a priori assess which variables are the most important or influential when conducting studies on PP; More empirical studies have to be conducted to demonstrate the usefulness of the different metrics. The metrics used in the existing studies are different and inconsistent and thus difficult to compare. To enable meta-analysis of different studies (and types of studies) it is important to explicitly describe what is actually measured. The research framework is based upon the idea that none of the collaboration levels are “better” than the others in general. That is, some software tasks might be solved more efficiently individually and others collaboratively. The focus of the framework is to reveal factors which might influence when and how PP is beneficial compared with the other levels of programmer collaboration. The main challenge in answering the “when” question will be to assess the effect of the different levels of programmer collaboration restricted by the context variables. Regarding the “how” question, an important issue will be how to construct well-working pairs. This will depend on several of the context variables presented in the research framework (e.g., all the subject variables).

5. Conclusions and Future Directions In this paper we have presented existing claims about PP. Results of existing empirical work on PP regarding the claims are apparently contradictory. A possible explanation of the apparently conflicting results might be attributed to the lack of a theoretical framework to support

the research. As a first step towards solving this problem we have developed an initial framework for research on PP based on existing studies on PP, preliminary results from ongoing studies conducted by the authors, and theories from group dynamics. The framework attempts to identify and categorise important independent, dependent and context variables for empirical studies on PP. It also provides an initial understanding of the relationships between the variables. To obtain a better understanding of the relationships between the different variables in the framework, and to assess the key issues, it will be necessary to combine the results of many empirical studies. Ideally, the studies should include case studies in industry, surveys investigating how the techniques are used in practice, action research studies to directly improve the practice, and controlled experiments to study cause-effect relationships. By conducting different types of studies it will also be possible to investigate which research methods best answer our questions.

Acknowledgement This research is part of PROFIT (PROcess improvement For IT industry) funded by the Research Council of Norway. We thank Objectnet for participating. We also thank the anonymous reviewers of this paper for valuable comments.

References [1] V. R. Basili, R. W. Selby, and D. H. Hutchens, "Experimentation in Software Engineering," IEEE Transactions on Software Engineering, vol. SE-12, no. 7, pp. 733–743, 1986. [2] V. R. Basili, F. Shull, and F. Lanubile, "Building Knowledge through Families of Experiments," IEEE Transactions on Software Engineering, vol. 25, no. 4, pp. 456–473, 1999. [3] K. Beck, "Embrace Change with Extreme Programming," IEEE Computer, vol. 32, no. 10, pp. 70–77, 1999. [4] K. Beck, Extreme Programming Explained: AddisonWesley, 2000. [5] C. F. Bond and L. J. Titus, "Social facilitation: A metaanalysis of 241 studies," Psychological Bulletin, no. 94, pp. 265–292, 1983. [6] A. Brennan, J. S. Chugh, and T. Kline, "Traditional Versus Open Office Design. A Longitudinal Field Study," Environment and Behaviour, vol. 34, no. 3, pp. 279–299, 2002.


[7] F. P. Brooks, The Mythical Man-Month. Essays on Software Engineering: Addison-Wesley Publishing Company, 1975.

[22] ISO 9126, "Information Technology: software product evaluation: quality characteristics and guidelines for their use," International Organization for Standardization, 1992.

[8] A. Cockburn and L. Williams, "The Costs and Benefits of Pair Programming," in Extreme Programming Examined, G. Succi and M. Marchesi, Eds.: Addison Wesley, 2001.

[23] D. R. Jeffery and L. G. Votta, "Guest Editor's Special Section Introduction," IEEE Transactions on Software Engineering, vol. 25, no. 4, pp. 435–437, 1999.

[9] L. L. Constantine, Constantine on Peopleware. New Jersey: Prentice-Hall, 1995.

[24] B. Kitchenham, Software Metrics. Measurement for Software Process Improvement: Blackwell Publishers Inc., 1996.

[10] J. O. Coplien, "A Generative Development-Process Pattern Language," in Pattern Languages of Program Design, J. O. Coplien and D. C. Schmidt, Eds. Massachusetts, USA: Addison-Wesley, 1995, pp. 183–237. [11] T. DeMarco and T. Lister, Peopleware: Productive Projects and Teams, 2 ed. New York: Dorset House Publishing, 1999. [12] A. J. Dick and B. Zarnett, "Paired Programming & Personality Traits," proc. 3rd International Conference on eXtreme Programming and Agile Processes in Software Engineering (XP 2002), 2002. [13] T. Dybå, "Enabling Software Process Improvement: An Investigation of the Importance of Organizational Issues," Norwegian University of Science and Technology, Trondheim, Ph.D., 2001. [14] G. W. Evans and D. Johnson, "Stress and Open-Offices Noise," Journal of Applied Psychology, vol. 85, no. 5, pp. 779–783, 2000. [15] N. V. Flor and E. L. Hutchins, "Analyzing Distributed Cognition in Software Teams: A Case Study of Team Programming During Perfective Software Maintenance," proc. Fourth Workshop on Empirical Studies of Programmers, pp. 36–64, 1991. [16] D. R. Forsyth, Group Dynamics, 3 ed: Wadsworth Publishing Company, 1999. [17] H. Gallis, E. Arisholm, and T. Dybå, "A Transition from Partner Programming to Pair Programming - an Industrial Case Study," Pair Programming Work Shop in 17th Annual ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), 2002. [18] M. M. Hohman and A. S. Slocum, "Mob Programming and the Transition to XP," proc. First XP Universe Conference, 2001. [19] L. Hohmann, Journey of the Software Professional - A Sociology of Software Development. New Jersey, USA: Prentice-Hall, 1997. [20] J.-C. Hong and M.-C. Liu, "A study on thinking strategy between experts and novices of computer games," Computers in Human Behaviour, vol. 19, no. 2, pp. 245– 258, 2003. [21] ISO 8402, "Quality Management and Quality Assurance Quality vocabulary," International Organization for Standardization, 1994.

[25] B. A. Kitchenham, S. L. Pfleeger, L. M. Pickard, P. W. Jones, D. C. Hoaglin, K. E. Emam, and J. Rosenberg, "Preliminary Guidelines for Empirical Research in Software Engineering," IEEE Transactions on Software Engineering, vol. 28, no. 8, pp. 721–734, 2002. [26] R. M. Lindsey and A. S. C. Ehrenberg, "The Design of Replicated Studies," The American Statistician, vol. 47, no. 3, pp. 217–228, 1993. [27] C. McDowell, L. Werner, H. Bullock, and J. Fernald, "The Effects of Pair-Programming on Performance in an Introductory Programming Course," proc. Proceedings of the 33rd SIGCSE technical symposium on Computer science education, pp. 38–42, 2002. [28] K. Moløkken and M. Jørgensen, "Software Effort Estimation: Unstructured Group Discussion as a Method to Reduce Individual Bias," proc. 15th Annual Workshop of the Psychology of Programming Interest Group, 2003. [29] M. M. Müller and W. F. Tichy, "Case Study: Extreme Programming in a University Environment," proc. International Conference on Software Engineering (ICSE), pp. 537–544, 2001. [30] J. Nawrocki and A. Wojciechowski, "Experimental Evaluation of Pair Programming," proc. European Software Control and Metrics (Escom), 2001. [31] J. T. Nosek, "The Case for Collaborative Programming," Communications of the ACM, vol. 41, no. 3, pp. 105–108, 1998. [32] S. L. Pfleeger, "Albert Einstein and Empirical Software Engineering," IEEE Computer, vol. 32, no. 10, pp. 32–38, 1999. [33] H. D. Rombach, V. R. Basili, and R. W. Selby, "Experimental Software Engineering Issues: Critical Assessment and Future Directions," proc. Dagstuhl Workshop, 1992. [34] D. Sanders, "Student Perceptions of the Suitability of Extreme and Pair Programming," proc. First XP Universe Conference, 2001. [35] J. Singer and N. G. Vinson, "Ethical Issues in Empirical Studies of Software Engineering," IEEE Transactions on Software Engineering, vol. 28, no. 12, pp. 1171–1180, 2002. [36] D. I. K. Sjøberg, B. Anda, E. Arisholm, T. Dybå, M. Jørgensen, A. Karahasanovic, E. F. Koren, and M. Vokác, "Conducting Realistic Experiments in Software


Engineering," proc. 2002 International Symposium on Empirical Software Engineering (ISESE'02), pp. 17–26, 2002. [37] I. D. Steiner, Group process and productivity. New York: Academic Press, 1972. [38] S. D. Teasley, L. A. Covi, M. S. Krishnan, and J. S. Olson, "Rapid Software Development through Team Collocation," IEEE Transactions on Software Engineering, vol. 28, no. 7, pp. 671–683, 2002.

[43] L. Williams and R. Kessler, Pair Illuminated: Addison-Wesley, 2002.

Programming

[44] L. Williams, R. R. Kessler, W. Cunningham, and R. Jeffries, "Strengthening the Case for Pair Programming," IEEE Software, vol. 17, no. 4, pp. 19–25, 2000. [45] L. A. Williams, "The Collaborative Software Process," University of Utah, Utah, USA, PhD, 2000.

[39] W. F. Tichy, "Should Computer Scientists Experiment More?," IEEE Computer, vol. 31, no. 5, pp. 32–40, 1998.

[46] L. A. Williams and R. R. Kessler, "All I Really Need to Know About Pair Programming I learned in Kindergarten," Communications of the ACM, vol. 43, no. 5, pp. 108–114, 2000.

[40] B. W. Tuckman, "Development sequences in small groups," Psychological Bulletin, no. 63, pp. 384–399, 1965.

[47] R. B. Zajonc, "Social facilitation," Science, no. 149, pp. 269–274, 1965.

[41] J. Voas, "Faster, Better, and Cheaper," IEEE Software, vol. 18, no. 3, pp. 96–97, 2001.

[48] M. V. Zelkowitz and D. R. Wallace, "Experimental Models for Validating Technology," IEEE Computer, vol. 31, no. 5, pp. 23–31, 1998.

[42] G. M. Weinberg, The Psychology of Computer Programming. New York: Van Nostrand Reinhold Company, 1971.


An initial framework for research on pair programming - Empirical ...

An initial framework for research on pair programming - Empirical ...

Suggest Documents

Pair Programming in an Introductory Computer Science Course: Initial

Pair Programming

Framework for Empirical Research on Science Teaching and Learning1

An Empirical Framework for Parsing Bangia Assertive

Empirical Validation of Pair Programming Motivation - Semantic Scholar

Towards an Infrastructure Procurement Framework: An Initial ...

The Effects of Pair-Programming on Performance in an ... - CiteSeerX

The Effects of Neuroticism on Pair Programming: An ...

An Empirical Research on Interactive Relationship

An Adaptive Programming Framework for Web Applications

An Object-Oriented Programming Framework for ... - cs.UManitoba.ca

Controlled Experiments on Pair Programming - Semantic Scholar

Collaborative-Adversarial Pair Programming

Pair Programming - Agile Academy

Assessing the effectiveness of distributed pair programming for an ...

Research on Programming Languages for

A framework for stream programming on DSP

User Experience Over Time: An Initial Framework

An Empirical Comparison of Seven Programming ... - CiteSeerX

COMPARISON MATRICES: AN EMPIRICAL RESEARCH

Research Communications An Empirical, Quantitative

PAIR PROGRAMMING STRATEGIES FOR MIDDLE SCHOOL GIRLS

Pair Programming vs. Side-by-Side Programming*

Pair Programming in Introductory Programming Labs - CiteSeerX