A Pair Programming Experiment in a Large Computing Course

8 downloads 85534 Views 252KB Size Report
Jan 2, 2003 - in Computer Science courses [2, 3, 4, 5, 10, 11, 12, 8]. Pair programming works as follows: • All of the work is done by both team members ...
Computer Science Department of The University of Auckland CITR at Tamaki Campus (http://www.citr.auckland.ac.nz/)

CITR-TR-122

Jan 2003

A Pair Programming Experiment in a Large Computing Course Radu Nicolescu 1 and Robert Plummer 2

Abstract Pair programming is a methodology where two programmers sit at the same machine to code - one typing and the other “navigating”. In an advanced course on Distributed Computing, we used conventional “solo” (i.e., individual) programming for the first three home assignments, and allowed the students to pair freely on the last two home assignments, over a relatively short period. A midterm exam was given before pair programming was introduced, and a final exam was given at the end of the course. This approach allowed us to contrast practical results and exam performance before and after pair programming. We report on an after-the-fact survey, which shows strong positive attitudes toward the paired approach. We also analyze the marks of the assignments, the midterm exam, and the final exam. Students who paired performed significantly better on assignments but didn’t improve their scores on written examinations. Overall, the results seem to support the idea that pair programming has a positive role in the classroom, but more experience seems necessary, to answer some still open questions.

1

[email protected] Department of Computer Science, The University of Auckland Auckland, New Zealand

2

[email protected] Department of Computer Science, Stanford University Stanford, CA, USA

You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the CITR Tamaki web site under terms that include this permission. All other rights are reserved by the author(s).

A Pair Programming Experiment in a Large Computing Course Radu Nicolescu∗ Department of Computer Science, The University of Auckland Auckland, New Zealand Robert Plummer† Department of Computer Science, Stanford University Stanford, CA, USA January 2, 2003

Abstract Pair programming is a methodology where two programmers sit at the same machine to code — one typing and the other “navigating”. In an advanced course on Distributed Computing, we used conventional “solo” (i.e., individual) programming for the first three home assignments, and allowed the students to pair freely on the last two home assignments, over a relatively short period. A midterm exam was given before pair programming was introduced, and a final exam was given at the end of the course. This approach allowed us to contrast practical results and exam performance before and after pair programming. We report on an after– the–fact survey, which shows strong positive attitudes toward the paired approach. We also analyze the marks of the assignments, the midterm exam, and the final exam. Students who paired performed significantly better on assignments but didn’t improve their scores on written examinations. Overall, the results seem to support the idea that pair programming has a positive role in the classroom, but more experience seems necessary, to answer some still open questions.

1

Introduction

Extreme Programming (XP) [1] is an emerging methodology for software development that encompasses the entire development process–planning, coding, testing, etc. The proponents of XP claim impressive gains in productivity as compared to traditional approaches. For course assignments, however, the full ∗ email: † email:

[email protected] [email protected]

1

XP methodology is often not relevant, since much of the planning and design is specified in the assignment handout. One of the revolutionary aspects of XP is that all coding is done by programmers working together in pairs, and this aspect of the approach is beginning to attract considerable attention as a methodology for programming assignments in Computer Science courses [2, 3, 4, 5, 10, 11, 12, 8]. Pair programming works as follows: • All of the work is done by both team members working together at a single PC. No work is done individually. • At any given moment, one person is at the keyboard and is known as the driver. The other person is observing and is known as the navigator. • The driver and navigator collaborate on all aspects of the software development: design, coding, debugging, etc. • The driver does the actual typing of the code. • The navigator is thinking about broader issues: How does the current code fit into the larger picture? How should the code be tested? What should be done next? Are any discoveries being made that require changes to code already written? • The navigator also observes the code being written and checks it for defects, but the primary role of the navigator is not to check for syntax errors. This role is called the navigator because the job is to steer the software development in the right direction. • The driver and navigator are in constant communication, asking and answering questions of each other and brainstorming on the best way to solve the problem. • Periodically, the driver and navigator change roles. This is a crucial part of the methodology—the partners spend equal amounts of time driving and navigating. One interesting aspect of pair programming for coursework is that it goes against much of what students are typically told about collaboration. Most courses insist on individual work, and collaboration is labeled as cheating. With pair programming, there is always a partner to talk to, and collaboration is required rather than discouraged. One hope is that one partner is enough, and that incidents of cheating will be greatly reduced.

2

Prior research

A number of studies have been conducted using pair programming for course assignments. Some of these have been carried out in the introductory programming course [2, 4, 5, 11, 12], while others have taken place in more advanced 2

courses [3, 10, 8]. In all of these studies, the approach has been to divide the students into a group that pairs and a group that doesn’t, and to maintain these groups throughout the course. The results of these studies are remarkably consistent. Comparing students who worked in pairs to those who used traditional “solo” programming, researchers find the following: Pair programming improves software quality. Measured by percentage of test cases passed [3, 10] or scores on assignments [4, 11], pair programming has been found to improve the quality of the software produced. Table 1 summarizes results obtained in a senior software engineering course at the University of Utah [10], where the numbers indicate the percentage of test cases passed. The differences between the two groups were statistically significant, with an α probability of less than 1% that they could have occurred by chance. Table 1: Percentage of test cases passed [10] Program Program Program Program

Individuals

Pairs

73.4 78.1 70.4 78.1

86.4 88.6 87.1 94.4

1 2 3 4

Table 2 summarizes results obtained in an introductory programming course at the University of California Santa Cruz [4], where the numers indicate overall assignment scores. Since one might wonder whether some pairs involved one strong and one weak student, and if so, whether the work of the stronger student produced the result, the mean for the top half of the non–paired students was computed. The α level of statistical significance for all these results was 0.1%. Table 2: Overall assignment scores [4] Pairing (all) Non–pairing (all) Non–pairing (top half)

Mean

Median

Std. Deviation

86.3 67.0 77.1

88.0 68.0 80.0

13.9 21.4 19.6

Pair programming produces software in less elapsed time. The total number of programmer hours spent by a pair has been found to be about 15% greater than that of a solo programmer, at least after the first warm–up project [10]. Since the pair partners are working at the same time, this means that 3

the elapsed time is just slightly more (about 57.5%) than half that of a solo programmer doing the same task. In industrial settings, even larger gains have been reported [1]. Pair programming does not significantly affect written examination scores. Although the results are somewhat mixed, the general finding is that students who work in pairs get about the same scores on final exams as students who work on their own [4, 5]. Differing drop rates and differing abilities (as measured by SAT scores) complicate the analysis, but it appears that working in pairs leads to better software but not better written examination scores.

3

Experiment

Encouraged by the reported results, we decided to use for pair programming in CS335ST, Distributed Programming, at the University of Auckland. This was a large third stage class, with 240 enrolled students, of which 185 passed and the rest failed for a number of reasons (didn’t sit the final exam, didn’t pass the combined midterm plus final examinations, or just didn’t obtain enough marks). The course marks were divided between five home assignments during the semester, a midterm written exam, and a final written exam, as described in Table 3 (rows in chronological order). Table 3: Course assessment A1 (assignment A2 (assignment A3 (assignment Midterm A4 (assignment A5 (assignment Final

Course Marks

Submissions

5% 5% 1% 20% 5% 5% 60%

233 223 211 230 225 217 223

#1) #2) #3) #4) #5)

A3 was an optional “mini–assignment” bonus that did not contribute significantly to the course grade; it carried only 1% of the total marks and is not included in our analysis. Midterm is an in–class written examination that was administered before the pair programming experiment began. Final is the final written examination at the end of the course, and was split into two parts: Final1 (final part 1, including topics related to A1 and A2 ), and Final2 (final part 2, including topics related to A4 and A5 ). All students worked solo on the first three assignments, and they were allowed to work in pairs on A4 and A5 if they wished. The experiment covered only these last two assignments, over a 24 days period at the end of the course.

4

Below we have divided the students in groups, based on whether they decided to pair on A4 and/or A5 . Our experiment differed from others in four important ways. First, all students worked on their own as solo programmers for the first three assignments, and took the midterm exam after the third assignment was complete. This gives us an interesting look at the relative abilities of the two groups. In other studies, students who paired did so for the entire term. A second difference from prior work was the way in which pairs were formed. For the last two assignments, students were allowed to work in pairs if they wished, and if so, they chose their own partners. Other researchers have assigned partners, either randomly or based on preliminary student preferences [11, 4]. We took the self–selection approach for the following reasons: • We anticipated that some students would prefer to work on their own, and since the assignments were part of their course grade, we felt that we should provide that option. • Successful pair programming depends on the partners having compatible personalities. We felt the best way to achieve that was to let the students select their own partners. • Students who work as a pair must have compatible schedules. selection of partners again seemed like the best approach.

Self–

A third and a forth difference were that our pair programming experiment covered only a relatively short period of time (24 days exactly), and only independent home assignments, over which we don’t have the same control as in a supervised lab experiment. It should be noted that in this course as in many other CS courses the assignments are regarded more as incentives for students to do practical exercises and less as a primary means of assessment. Assignments count for 20% of the total course grade (and a few small bonuses are typically offered). Thus our approach itself was a bit “extreme” on these factors. While it would certainly be interesting to find out how XP works in more “extreme” cases, we should also put a grain of salt and refrain from overly generalizing from our experience. 136 students (56.66%) chose to work in pairs on A4 , and again 136 students (56.66%) chose to pair on A5 . Most of students that choose to work in pair or solo on A4 continued to work in the same way on A5 , but there were quite a few students changing camps or even pair partners. Their reasons for doing so are summarized below in section (4). The breakdown of these numbers is detailed by Table 4. The next two sections summarize the results.

5

Table 4: A4 /A5 breakdown: pairs, solos, did not submit N5 N4 (did not submit A4 ) S4 (solo on A4 ) P4 (pair on A4 ) Total a 100

4

S5

P5

Total

(did not submit A5 )

(solo on A5 )

(pair on A5 )

14

1

0

15

3

62

24

89

6

18

112a

136

23

81

136

240

of these 112 students kept the same partners

Survey: Attitudes Toward Pair Programming

During the course of the last assignment (A5 ), a survey was distributed to the students. 104 responses were obtained; 67 from students who paired on A4 , and 37 from students who worked solo on A4 .

4.1

Responses from students who worked solo

One of the questions asked the students who worked on their own to indicate the reasons for that choice, with results detailed by Table 5 (multiple reasons could be selected). Table 5: Top reasons for solos I prefer working on my own I was worried about scheduling I couldn’t find a partner I was afraid that a partner would slow me down I was afraid I would not learn some key concepts Miscellaneous other reasons

30% 22% 12% 12% 12% 12%

The most common reason cited was simply that the students preferred working on their own. Since we did allow that option, we are unable to report how that attitude might have changed had those students been required to try pair programming. We also asked two questions that were answered by placing a mark on a five– point scale. The questions, the labels on the scale, and the average response are shown by Table 6. A response of 1.0 would be at the lower, or negative end of the scale, a response of 3.0 would be neutral, and a response of 5.0 would be at the higher, or positive end. 6

Table 6: Questions for solos Question Do you think you learned more or less than you would have if you had worked with a partner? How do you think the time that you personally spent on this project compares to the time it would have taken you to do it with a partner?

Low end label

High end label

Much less

Much more

3.25

Solo much slower

Solo much faster

2.94

Average response

Consistent with the fact that these students chose to work on their own, they felt that they learned slightly more than they would have with a partner. They also felt that working solo took about the same amount of time that working in pairs would have taken.

4.2

Responses from students who worked in pairs

We asked several questions of the students who paired. We were hoping to capture their feelings about pair programming and how well it worked in the class assignment setting. The questions used a five–point scale as above. The results are shown by Table 7. These results show a very positive attitude toward pair programming among those who tried it. Those students felt that working in pairs was effective for class assignments and that they were very close to achieving the goal of sharing the workload equally. The paired students also felt that they learned more than they would have on their own, and this was a stronger response that we obtained for the students who worked solo (3.75. vs. 3.25). However, our question wasn’t clear enough to differentiate between practical skills and theoretical understanding. Finally, these students felt that pair programming was definitely faster that working solo.

7

Table 7: Questions for pairs Question How effective do you think pair programming was for this project? Did you and your partner contribute equally to the project? What is your rating of your own performance? What is your rating of your partner’s performance? Do you think you learned more or less than you would have if you had worked on your own? How do you think the time that you personally spent on this project compares to the time it would have taken you to do it on your own?

Low end label

High end label

Not effective

Very tive

Very equal

Equal

un-

effec-

Average response 4.11

4.28

Did very little Did very little

Did most of work Did most of work

Much less

Much more

3.75

Pairs much slower

Pairs much faster

4.23

8

3.20 3.22

5

Student scores on assignments and exams

We begin our analysis of student scores with three tables that show preliminary statistics: means (Table 8), standard deviations (Table 9), and sample sizes (Table 10). Data are provided for A1 , A2 , A4 , A5 , Midterm, and Final, as seen in the first six columns of the tables. In the rows, we have shown the results for several samples. The first five rows represent all students together, those who worked solo on A4 , those who paired on A4 , those who worked solo on A5 , and those who paired on A5 , respectively. The final four rows show data from intersections of these groups. For example, S4 P5 represents the students who worked solo on A4 but who paired on A5 (we keep the same abbreviations used in Table 4). Students who didn’t submit are not included in these statistics. The “Count(s)” column shows the range of submissions for each group. For example, in Table 10, row P4 (students who paired on A4 ), the count is shown as 132–136. This means that for all the data on that row, the smallest number of submissions was 132, and the largest was 136. We provide this information to show that even though the sample sizes vary, the range is small in any given row. Table 8: Breakdown of course marks — means

All S4 P4 S5 P5 S4 S5 S4 P5 P4 S 5 P4 P5

A1

A2

A4

A5

Midterm

Final

Count(s)

93.24 96.28 93.16 93.88 96.32 94.50 102.46 93.65 95.00

90.15 91.61 90.76 89.77 92.96 90.92 93.93 87.07 92.74

90.22 88.52 91.34 90.37 90.93 89.05 88.75 94.91 91.39

86.44 84.16 87.89 80.70 89.86 80.76 92.92 79.71 89.21

46.02 48.29 44.89 49.13 45.33 49.60 46.14 46.89 45.16

46.35 50.00 44.30 49.73 44.79 50.24 48.98 47.82 43.93

217–233 85–89 130–136 78–81 132–136 59–62 23–24 18 108–112

Table 9: Breakdown of course marks — standard deviations All S4 P4 S5 P5 S4 S5 S 4 P5 P4 S 5 P4 P 5

A1

A2

A4

A5

Midterm

Final

Count(s)

17.02 13.54 17.36 15.42 13.50 14.39 7.65 17.52 14.13

21.85 19.06 21.58 21.15 19.04 19.64 18.49 26.02 19.23

12.30 15.88 9.14 13.04 10.94 14.12 17.59 6.81 8.96

20.25 22.36 18.74 23.34 17.37 23.23 17.46 24.79 17.36

14.57 13.90 14.73 13.61 14.81 13.22 15.60 15.19 14.71

16.01 15.39 15.82 16.04 15.56 15.64 15.23 18.05 15.55

217–233 85–89 130–136 78–81 132–136 59–62 23–24 18 108–112

9

Table 10: Breakdown of course marks — counts All S4 P4 S5 P5 S4 S5 S 4 P5 P4 S 5 P4 P 5

A1

A2

A4

A5

Midterm

Final

Count(s)

233 87 136 80 136 61 24 18 112

223 85 132 78 132 59 24 18 108

225 89 136 80 136 62 24 18 112

217 86 130 81 136 62 24 18 112

230 89 134 81 134 62 24 18 110

223 85 134 80 135 61 23 18 112

217–233 85–89 130–136 78–81 132–136 59–62 23–24 18 108–112

The data shows some interesting results. For example students who paired on both assignments P4 P5 had means on those assignments of 91.39 and 89.21 respectively, while students who worked solo on both S4 S5 had inferior results, with means of 89.05 and 80.76. In order to determine the statistical significance of such comparisons, the preliminary statistics were further processed to obtain t–scores and their associated “degrees of freedom”, and then the α significance levels for the inequalities in the means. The formulae we used are mentioned on the appendix. Table 11 shows a few of the computed significance levels, omitting any entry containing α error probabilities greater than 0.1 (10%). We have added +/signs to show which of two tested means was greater. For example, the cell at the intersection of the “S5 vs. P5 ” row with the “A5 ” column contains 0.0027. This means that for A5 , the difference in the means was statistically significant at the level of α = 0.0027, and that the mean for the S5 group was less than that of the P5 group. As we see from Table 8, the difference is also of practical relevance, a difference of 9.16 marks out 100 (i.e., nearly a 10% quality improvement). As another example, consider the comparison of “S4 S5 ” with “P4 P5 ”, i.e., the students who worked solo on the last two assignments compared with students who paired on both. Reading across the appropriate row in Table 11, we see that the differences in their means were not statistically significant for A1 , A2 , or A4 . On A5 , the pairs got significantly better scores, and, as shown in Table 8, the difference was of practical relevance: an improvement of almost 8.5 points out of 100. Continuing across in Table 11, we see the opposite trend on the Midterm and Final : students who worked solo did better in the examinations, and the difference was statistically significant. The comparisons of S5 with P5 show similar results. Not all large differences are statistically relevant, because the standard deviations and the sample sizes also play a substantial role in estimating the statistical significance. For example, the A5 mean for “P4 S5 ” is greater than the

10

A5 mean for “P4 P5 ”, but this is not statistically significant (we cannot exclude that it was generated by chance). Table 11: Selected significance levels (signs show which mean is greater) S4 vs. P4 S5 vs. P5 S5 vs. S4 P5 S4 S5 vs. S4 P5 S4 S5 vs. P4 S5 S4 S5 vs. P4 P5 S4 P5 vs. P4 S5 S4 P5 vs. P4 P5 P4 S5 vs. P4 P5

A1

A2

A4

A5

Midterm

Final

+ -0.0004 -0.0015 + +0.0605 +0.0006 -

+ + + + -

+ + -0.0180 +0.0640

-0.0027 -0.0080 -0.0112 + -0.0140 +0.0642 + -

+0.0817 +0.0567 + + + +0.0439 + +

+0.0090 +0.0287 + + + +0.0124 + + +

The reader is invited to make other comparisons in Table 8, using Table 11 as a guide to statistical significance. We present below some of our own conclusions from this data. For brevity, if group X has a higher mean than group Y for some assignment or examination θ, we will write X Àθ Y if the difference is statistically significant, and X >θ Y if it is not. For example, the S5 vs. P5 observation cited above could be written as P5 ÀA5 S5 .

6

Conclusions

We have drawn several conclusions from our experiment: Pair programming produces better practical results, for the assignments worked in pairs. For A4 , pairs seem to obtain slightly better results: P 4 > A4 S4 , P4 S5 ÀA4 S4 S5 , P4 P5 >A4 S4 P5 , P4 P5 >A4 S4 S5 . The results are consistent with our hypothesis, but not very significant statistically, with one exception. This exception concerns the the top–performer on A4 , the group P4 S5 , and suggests that there could be a benefit even on the very first paired assignment. For A5 , pairs appear to obtain much better results: P5 ÀA5 S5 , S4 P5 ÀA5 S5 , S4 P5 ÀA5 S4 S5 , P4 P5 ÀA5 S4 S5 , P 4 P 5 > A5 P 4 S5 . 11

These results are consistent with our hypothesis, are more statistically significant, and are practically relevant (for some population segments the difference is close to 10% in raw marks). Interestingly, the top–performers for A5 are S4 P5 , a group that didn’t pair on A4 and pairs for the first time on A5 . It might be interesting to compare the relative rankings (as determined by means on assignments) of the students who worked solo or paired on both A4 and A5 with those who paired only on one assignment. Not all means differences are statistically significant, but these rankings seem to corroborate well with the preceding arguments. The results are shown in Figure 1. Figure 1: The evolution of the relative rankings on assignments

There are two particularly interesting groups. S4 P5 were the best on A1 and A2 , but chose not to pair on A4 and fell to the bottom of the chart. Then they “saw the light”, paired on A5 , and again became the top–performing group. We see exactly the reverse in the group P4 S5 . These students did poorly on A1 and A2 , but when they paired on A4 they became the best group. When they chose not to pair on A5 , they returned to their previous level. Together, these results strongly suggest that there is a benefit to pair programming. These results are also consistent with the results reported in the literature [3, 8, 10], that pair programming produces significantly better assignment results. Given a choice, weaker students are more likely to pair than stronger students. Our midterm exam was administered before any pairing took place. Looking at those results for paired vs. solo groups, we see the following: S4 ÀMidterm P4 , S5 ÀMidterm P5 , S4 S5 ÀMidterm P4 P5 . 12

The students who worked solo appear to be the stronger students. This is a plausible result, since one of the attractions of pair programming to students who have never tried it is that there will be someone to help answer questions. The stronger students would be expected to feel the need for help less. It is interesting to compare this result with the results just cited for assignment scores. In spite of being “weaker” (as measured by the midterm), the students who paired produced better assignments. Another way to measure the student’s ability is to look at their A1 and A2 scores. There the comparisons are mixed. In some cases the solo students were better, and in others the paired students were better, though no systematical pattern seems to emerge (and the differences in the means on A1 and A2 are not statistically significant when we compare S4 vs. P4 or S5 vs. P5 ). Pair work doesn’t seem to have much effect on the performance on the written examination, but this should be subject to further study. Given the strong assignment results of the students who paired, we were hoping to see a corresponding effect on the final exam. In fact, we got the opposite result: S4 ÀFinal P4 , S5 ÀFinal P5 , S4 S5 ÀFinal P4 P5 . These results are exactly in keeping with those just cited for the midterm. The strong exam–takers continued to be strong, and participating in pair programming did nothing to narrow the gap (in fact, it widened it slightly – with one exception, concerning S4 P5 ). This seems consistent with the traditional strong correlation between midterm and final exam marks that is characteristic to many CS courses. Comparatively, the correlation of the combined exam scores with the practical work (the assignments) is remarkably weaker: good exam scores usually imply good assignments results, but good assignments scores do not imply a particular exam result. The strong midterm/final correlation for all students is illustrated in Figure 2, and the weaker assignments/exam correlation for all students is illustrated in Figure 3. The scattergrams presented in the following figures include this time all students, even those that did not sit the midterm or the final exam (we naturally assumed a mark 0 for such cases). The strong midterm/final correlation and the weak assignment/exam correlation are both statistically significant, at the level α < 0.001. We feel that further study of the effect of pair programming on exam performance is needed. Various additional hypotheses could be advanced to explain the better performance of the students who worked solo: • Pairs may have learned a bit less (despite their claims to the contrary). 13

Figure 2: Midterm/final scatterplot for all students

Figure 3: Assignments/final scatterplot for all students

• The practical skills developed in the assignments are different from the skills required for the written examinations, or require more time to be properly assimilated (in our experiment the two pair assignments covered less than 4 weeks together). • Our results might have been contaminated by some form of cooperation between some students other than pairing on A4 and A5 . We cannot discard this hypothesis altogether. 14

• Our students may have been trained to work individually to the point that the good students are not prepared to try another methodology. Figure 4 shows one additional bit of evidence: the breakdown of solos/pairs for A5 by hypothetical grades, solely based on midterm scores. This diagram corroborates the preceding argument that some good students were reluctant to pair and preferred to work solo, while some relatively weaker students were keener to try the new approach. For comparison, Figure 5 shows the actual breakdown of solos/pairs for A5 by final grades. Figure 4: S5 (left) vs. P5 (right) chart by midterm “grades”

Figure 5: S5 (left) vs. P5 (right) chart by final grades

15

Curious results — open questions We have noticed an interesting correlation on A1 marks for P5 pair members (see Figure 6). That is, if we compare the A1 scores of the partners, we find a significant correlation α < 0.001 (and remains significant at α < 0.01 even after discarding outliers such as students with less than 60 marks). A similar conclusion holds for A2 . One possible explanation is that students later chose partners of similar ability. An alternate explanation is that they were already collaborating in one form or another. Figure 6: A1 scatterplot for all P5 pairs

The fact that we have solo programming and exam data for all students lets us answer some questions that might arise from our results, but it also raises some interesting questions. For example, both groups that paired for only one assignment have an interesting evolution. The most remarkable is S4 P5 — this is a group of 24 students that starts and evolves differently from any other group (see Tables 8,11 and Figure 1). They already outperform all the others on A1 and maintain their superiority on all the other assignments, with a drop on A4 (where they were still reluctant to work in pairs). Empirically, they seem to have improved their examination marks from the midterm to the final, better than any other group in the study. This suggest that pair programming works out much better for students with already good programming abilities, students that like to face challenges and solve problems on their own, and probably take a more responsible approach to pair programming itself. This shows interesting and intriguing population dynamics but we don’t want to speculate further on this issue.

16

7

Summary

Although this was an informal experiment, we feel that it supports the notion that pair programming is an effective methodology for programming projects. The students who decided to pair had positive attitudes toward the approach, feeling that they learned more and got their assignments done in less time. The assignment scores provide an objective measure of the project success. Even students who performed less well on written exams did better on assignments when they worked in pairs, at times with up to 10% improvements, although they had less time for the last assignments. This is in keeping with other researchers’ findings that pairs produce better software [3, 4, 10]. We had initially also hoped that the final exam scores of the paired students would improve relative to the solo group. However, this was not the case — statistically for most groups the relative final exam scores were similar to their midterm scores, i.e., about at the level (or slightly below the level) existing before the start of the pair programming experiment. A benefit to instructors is, of course, that there are fewer papers to grade. If this can be achieved while the time to complete assignments goes down (as reported by other researchers [10]), the quality goes up, and the overall learning doesn’t deteriorate, then pair programming has a definite place in the classroom.

Acknowledgement We would like to thank Monica Dumitrescu for her very generous advice and encouragement.

References [1] Beck, K., Extreme Programming Explained: Embrace Change, Addison– Wesley Publishing Co., Reading, MA, USA, 2000. [2] Bevan, J., Werner, L., and McDowell, C., Guidelines for the User of Pair Programming in a Freshman Programming Class, Proceedings of the 15th CSEE&T (2002), IEEE Comp. Soc. Press, Northern Kentucky, USA, pp.100–107. [3] Cockburn A. and Williams, L., The Costs and Benefits Pair Programming, in Extreme Programming Examined, G. Succi and M. Marchesi (Eds.), Addison–Wesley Publishing Co., Reading, MA, USA, 2000, pp. 223–248. [4] McDowell, C., Werner, L., Bullock, H., and Fernald, J., The Effect of Pair Programming on Performance in an Introductory Programming Course, Proceedings of the 33rd SIGCSE Technical Symposium (2002), ACM Press, Northern Kentucky, USA, pp.38–42.

17

[5] Nachiappan N., Williams, L., Ferzli M., Wiebe E., Yang K., Miller C., and Balik S., Improving the CS1 Experience with Pair Programming, to be presented at the 34th SIGCSE Technical Symposium (2003), Reno, Nevada, USA. [6] Ott, L., An Introduction to Statistical Methods and Data Analysis, PWS– Kent Publishing Co., Boston, MA, USA, 3rd edition, 1988. [7] Weinberg, G.H., Schumacher J.A., and Oltman D., Statistics — An Intuitive Approach, Broks/Cole Publishing Co., Monterey, CA, USA, 4th edition, 1981. [8] Williams, L. A., The Collaborative Software Process, PhD Dissertation, University of Utah, Salt Lake City, UT, USA, 2000, Department of Computer Science. [9] Williams, L. and Kessler R.K., Experimenting with Industry’s “Pair Programming” Model in the Computer Science Classroom, J. Computer Science Education 11, 1 (2001), 7–20. [10] Williams, L., Kessler, R., Cunningham, W., and Jeffries, R., Strengthening the Case for Pair–Programming, IEEE Software 17, 4 (2000), 19–25. [11] Williams, L., Wiebe E., Yang K., Ferzli M., and Miller C., In Support of Pair Programming in the Introductory Computer Science Course, J. Computer Science Education 12, 3 (2002), 197–212. [12] Williams, L., Yang, K., Wiebe, E., Ferzli, M., and Miller, C., Pair Programming in an Introductory Computer Science Course: Initial Results and Recommendations, presented at OOPSLA Educators’ Symposium (2002), Seattle, WA, USA.

A

Appendix

In this appendix we briefly present the statistical formulae used to analyse the data, cf. [6, 7]. Mean and standard deviation. For a given variable X, its mean and standard deviation are inferred by the well–known formulae: X

=

s

=

ΣX sn

= estimated mean of X

(ΣX)2 ΣX 2 − n − 1 n(n − 1)

= estimated standard deviation of X

18

(1) (2)

T–scores. The differences of the means have been analysed with Student two– tailed t–tests for independent samples with unequal variances. Theoretically, the t–test requires normal populations but it is tolerant to small deviations from normality, and by virtue of the central limit theorem it is reasonable accurate for large samples (larger than 30, as a rule of thumb). Given two variables X1 and X2 , the t–score and the degrees of freedom are approximately estimated by the following formulae: d = c1

=

c2

=

sd

=

t

=

df

=

X 1 − X 2 = estimated difference of the means s21 = estimated standard deviation of the X1 means n1 s22 = estimated standard deviation of the X2 means n √2 c1 + c2 = estimated standard error of the difference d = t–score sd

(3) (4) (5) (6) (7)

2

(c1 + c2 ) −2 c1 2 c2 2 n1 −1 + n2 −1

= degrees of freedom

(8)

Correlations. Given two variables X and Y , the t–score and the degrees of freedom of their correlation are estimated by the following formulae (based on Pearson’s r correlation coefficient): r t df

nΣXY − (ΣX)(ΣY ) p p nΣX 2 − (ΣX)2 nΣY 2 − (ΣY )2 √ n−2 = t–score = r√ 1 − r2 = n − 2 = degrees of freedom

=

= correlation coefficient (9) (10) (11)

Probabilities. Finally, we have computed the statistical significance of the results by an “exact” conversion routine that returns an “α” error probability as a function of the t–score and the number of degrees of freedom (similar to the function ProbT() defined in the SAS statistical system).

19

Suggest Documents